Professor Christoph Quix
Senior Researcher
Fraunhofer Institute for Applied Information Technology, Germany

The idea of data lakes has been introduced to address the problem of the integration of heterogeneous information in big data applications. Data lakes collect data from heterogeneous sources in its original format and perform only a shallow integration on the syntactical level. The semantic integration of the data is left to the user, who can integrate data by using a unified query interface. Data quality is a challenge in data lakes as data is copied `as-is’ from the sources; thus, data might be incorrect, inconsistent, or difficult to interpret as corresponding metadata is missing. At RWTH Aachen University and the Fraunhofer-Institute for Applied Information Technology (FIT), we are currently developing a data lake system in which metadata and data quality management govern the data ingestion process in a data lake and thereby avoid that the data lake turns into a data swamp. Data quality of incoming data is continuously monitored, and if a new data source is, for example, insufficiently described by metadata, counter actions such as a more detailed metadata extraction or metadata matching can be enabled. The hands-on workshop will give an overview of the big data and current trends, hands-on Apache Spark, issues in big data applications and hands-on data integration.

 

Short CV: Christoph Quix is a senior researcher in the Life Science Informatics group at the Fraunhofer Institute for Applied Information Technology (FIT) in St. Augustin, Germany, where he leads the department for High Content Analysis. Earlier, he was an assistant professor in the Information Systems Group (Informatik 5) of RWTH Aachen University, Germany, where he completed his habilitation in early 2013 and received his Ph.D. degree in computer science. His research focuses on data integration, big data, management of heterogeneous data, metadata management, and semantic web technologies. He has about 80 publications in scientific journals and international conferences. He has been involved in several national and international research projects, which have been conducted in cooperation with research and industry partners. He was a PC chair of CAiSE 2014, member of the PC for several major conferences on databases and data modeling (e.g., ER, ICDE, and ODBASE), and the organizing chair of several international workshops.


Tentative Program
Hands-on Workshop
Big Data Management: Theory & Practice

25th August 2016 – Thursday
Venue:
Faculty of Computing, UTM Johor Bahru, Malaysia
8.00 – 8:30 am Registration
8.30 – 10:00 am Session 1: Introduction

  • Explaining Big Data
  • Current Trends
  • Research Challenges
  • Big Data Systems: Hadoop, Apache Spark & Co
10.00 – 10.30 am Morning Break
10.30 – 12.30 noon Session 2: Hands-On Part 1: Apache Spark

  • Setting up a simple data processing workflow in Spark
12.30 – 2:00 pm Lunch
2.00 -3.30 pm Session 3: Important Issues in Big Data Applications

  • Not just Volume: Variety
  • Data Integration & Metadata Management
3:30 – 5.00 pm Session 4: Hands-On Part 2: Data Integration

  • Defining Data Integration Workflows
  • Combining data from heterogeneous data sources
5.00 – 5:30 pm Closing

 

 

 

Research Ecosystem
Universiti Teknologi Malaysia UTM Nexus - Research & Innovation

Office of Deputy Vice Chancellor (Research & Innovation)

DVCRI Profile Johor Bahru Office Kuala Lumpur Office

Higher Institution Centre of Excellence (HI-COE)

Advance Membrane Technology Research Centre - AMTEC Institute of Noise & Vibration - INV Wireless Communication Centre - WCC

Research Institute

Centre of Excellence (COE)

Institute of High Voltage & High Current - IVAT UTM-MPRC Institue for Oil & Gas - IFOG Centre for Artificial Intelligence & Robotics - CAIRO Centre for Engineering Education - CEE Centre for Advanced Composite Materials - CACM Innovation Centre in Agritechnology for Advanced Bioprocessing - ICA Institute of Bioproduct Development - IBD

Service Entity

Research Management Centre - RMC Penerbit UTM Press Centre for Community & Industry Network - CCIN Innovation & Commercialisation Centre - ICC University Laboratory Management Centre - PPMU Institut Sultan Iskandar - UTM-ISI

Get the latest news & events

Customer Satisfaction Index