{"id":5486,"date":"2016-08-25T08:00:43","date_gmt":"2016-08-25T08:00:43","guid":{"rendered":"https:\/\/bigdata.utm.my\/?p=5486"},"modified":"2016-09-07T06:36:11","modified_gmt":"2016-09-07T06:36:11","slug":"25-aug-2016-big-data-management-theory-practice-hands-on-workshop","status":"publish","type":"post","link":"https:\/\/research.utm.my\/bdc\/2016\/08\/25\/25-aug-2016-big-data-management-theory-practice-hands-on-workshop\/","title":{"rendered":"25 AUG 2016: Big Data Management: Theory &amp; Practice \u2013 Hands on Workshop"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-5481 size-full\" src=\"https:\/\/research.utm.my\/wp-content\/uploads\/sites\/32\/2016\/08\/Big-Data-Workshop-2016-ProfChristophQuix.jpg\" width=\"960\" height=\"720\" \/><\/p>\n<p><strong>Professor Christoph Quix<\/strong><strong><br \/>\n<\/strong><em>Senior Researcher<\/em><br \/>\nFraunhofer Institute for Applied Information Technology, Germany<\/p>\n<p>The idea of data lakes has been introduced to address the problem of the integration of heterogeneous information in big data applications. Data lakes collect data from heterogeneous sources in its original format and perform only a shallow integration on the syntactical level. The semantic integration of the data is left to the user, who can integrate data by using a unified query interface. Data quality is a challenge in data lakes as data is copied `as-is\u2019 from the sources; thus, data might be incorrect, inconsistent, or difficult to interpret as corresponding metadata is missing. At RWTH Aachen University and the Fraunhofer-Institute for Applied Information Technology (FIT), we are currently developing a data lake system in which metadata and data quality management govern the data ingestion process in a data lake and thereby avoid that the data lake turns into a data swamp. Data quality of incoming data is continuously monitored, and if a new data source is, for example, insufficiently described by metadata, counter actions such as a more detailed metadata extraction or metadata matching can be enabled. The hands-on workshop will give an overview of the big data and current trends, hands-on Apache Spark, issues in big data applications and hands-on data integration.<\/p>\n<p>&nbsp;<\/p>\n<p><u>Short CV<\/u>: Christoph Quix is a senior researcher in the Life Science Informatics group at the Fraunhofer Institute for Applied Information Technology (FIT) in St. Augustin, Germany, where he leads the department for High Content Analysis. Earlier, he was an assistant professor in the Information Systems Group (Informatik 5) of RWTH Aachen University, Germany, where he completed his habilitation in early 2013 and received his Ph.D. degree in computer science. His research focuses on data integration, big data, management of heterogeneous data, metadata management, and semantic web technologies. He has about 80 publications in scientific journals and international conferences. He has been involved in several national and international research projects, which have been conducted in cooperation with research and industry partners. He was a PC chair of CAiSE 2014, member of the PC for several major conferences on databases and data modeling (e.g., ER, ICDE, and ODBASE), and the organizing chair of several international workshops.<\/p>\n<p style=\"text-align: center\"><strong><br \/>\nTentative Program<br \/>\nHands-on Workshop<br \/>\nBig Data Management: Theory &amp; Practice<\/strong><\/p>\n<table style=\"height: 730px\" width=\"696\">\n<tbody>\n<tr>\n<td colspan=\"2\" width=\"621\"><strong>25<sup>th<\/sup> August 2016 \u2013 Thursday<br \/>\nVenue: <\/strong><strong>Faculty of Computing, UTM<\/strong><strong> Johor Bahru, Malaysia<br \/>\n<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"175\"><strong>8.00 \u2013 8:30 am<\/strong><\/td>\n<td width=\"447\">Registration<\/td>\n<\/tr>\n<tr>\n<td width=\"175\"><strong>8.30 \u2013 10:00 am<\/strong><\/td>\n<td width=\"447\"><strong>Session 1: Introduction<\/strong><\/p>\n<ul>\n<li>Explaining Big Data<\/li>\n<li>Current Trends<\/li>\n<li>Research Challenges<\/li>\n<li>Big Data Systems: Hadoop, Apache Spark &amp; Co<\/li>\n<\/ul>\n<\/td>\n<\/tr>\n<tr>\n<td width=\"175\"><strong>10.00 &#8211; 10.30 am<\/strong><\/td>\n<td width=\"447\">Morning Break<\/td>\n<\/tr>\n<tr>\n<td width=\"175\"><strong>10.30 \u2013 12.30 noon<\/strong><\/td>\n<td width=\"447\"><strong>Session 2: Hands-On Part 1: Apache Spark<\/strong><\/p>\n<ul>\n<li>Setting up a simple data processing workflow in Spark<\/li>\n<\/ul>\n<\/td>\n<\/tr>\n<tr>\n<td width=\"175\"><strong>12.30 \u2013 2:00 pm<\/strong><\/td>\n<td width=\"447\">Lunch<\/td>\n<\/tr>\n<tr>\n<td width=\"175\"><strong>2.00 -3.30 pm<\/strong><\/td>\n<td width=\"447\"><strong>Session 3: Important Issues in Big Data Applications<\/strong><\/p>\n<ul>\n<li>Not just Volume: Variety<\/li>\n<li>Data Integration &amp; Metadata Management<\/li>\n<\/ul>\n<\/td>\n<\/tr>\n<tr>\n<td width=\"175\"><strong>3:30 \u2013 5.00 pm<\/strong><\/td>\n<td width=\"447\"><strong>Session 4: Hands-On Part 2: Data Integration <\/strong><\/p>\n<ul>\n<li>Defining Data Integration Workflows<\/li>\n<li>Combining data from heterogeneous data sources<\/li>\n<\/ul>\n<\/td>\n<\/tr>\n<tr>\n<td width=\"175\"><strong>5.00 \u2013 5:30 pm<\/strong><\/td>\n<td width=\"447\">Closing<\/td>\n<\/tr>\n<tr>\n<td width=\"175\"><\/td>\n<td width=\"447\"><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><strong>\u00a0<\/strong><\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Professor Christoph Quix Senior Researcher Fraunhofer Institute for Applied Information Technology, Germany The idea of data lakes has been introduced to address the problem of the integration of heterogeneous information in big data applications. Data lakes collect data from heterogeneous sources in its original format and perform only a shallow integration on the syntactical level. [&hellip;]<\/p>\n","protected":false},"author":10656,"featured_media":5481,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[2,3],"tags":[],"class_list":["post-5486","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-events","category-highlights"],"_links":{"self":[{"href":"https:\/\/research.utm.my\/bdc\/wp-json\/wp\/v2\/posts\/5486","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/research.utm.my\/bdc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/research.utm.my\/bdc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/research.utm.my\/bdc\/wp-json\/wp\/v2\/users\/10656"}],"replies":[{"embeddable":true,"href":"https:\/\/research.utm.my\/bdc\/wp-json\/wp\/v2\/comments?post=5486"}],"version-history":[{"count":1,"href":"https:\/\/research.utm.my\/bdc\/wp-json\/wp\/v2\/posts\/5486\/revisions"}],"predecessor-version":[{"id":5487,"href":"https:\/\/research.utm.my\/bdc\/wp-json\/wp\/v2\/posts\/5486\/revisions\/5487"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/research.utm.my\/bdc\/wp-json\/wp\/v2\/media\/5481"}],"wp:attachment":[{"href":"https:\/\/research.utm.my\/bdc\/wp-json\/wp\/v2\/media?parent=5486"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/research.utm.my\/bdc\/wp-json\/wp\/v2\/categories?post=5486"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/research.utm.my\/bdc\/wp-json\/wp\/v2\/tags?post=5486"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}