You are on page 1of 17
Data Science And Big Data Analytics 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 1 Agenda • Research Group • Administrative Issues • Content and Aims 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 2 Research Group: Software Engineering for Distributed Systems • Logo indicates workbench for systematic software development and continuous quality assessment and quality improvement 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 3 Current Research Topics • Social Network Analysis Spread of information in social communities • Quality Assurance (QA) Managed Software Evolution Usage based testing Usability Engineering Test Languages TTCN-3, UML Testing Profile QA for test specifications Interested in these topics? Contact us for students projects, B.Sc., M.Sc., or Ph.D. theses. Interoperability of Grid and Cloud systems Reliability of Cloud systems 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 4 Research Group • Scientific staff: Prof. Dr. Jens Grabowski Dr. Steffen Herbold M.Sc. Xiao-Wei Wang Dipl. Math. Verena Herbold Dipl.-Inf. Daniel Honsel M.Sc. Ella Albrecht Dr. Patrick Harms M.Sc. Fabian Glaser M.Sc. Michael Göttsche M.Sc. Fabian Trautsch • Students projects, Bachelor and Master theses (usually 4-8 students) • Supported by Annette Kadziora Dipl.Ing (FH) Gunnar Krull • Web: http://www.swe.informatik.uni-goettingen.de 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 5 Duties of the Research Group • Prof. Jens Grabowski Dean of Students Vice Director of the Institute of Computer Science Speaker of the Ph.D. program for Computer Science • Organization of the students library Fabian Glaser, Hanna Holderied 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 6 Teaching offered WiSe 2015/2016 • Lecture: “Data Science and Big Data Analytics” • Lecture: “Software Testing” • Lecture: “Software Technik I” (B.Sc. only) • Practical course: “Software Testing” (block course) • Seminar: “Advanced Topics in Software Engineering” • Seminar: “Technologies and Design of Graphical User Interfaces” 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 7 (Planned) Teaching in SoSe 2016 • Lecture: “Software Evolution” (Prof. Dr. Jens Grabowski) • Lecture: “Requirements Engineering” (Dr. Steffen Herbold) • Seminar: “Advanced Topics in Software Engineering” (whole group) 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 8 Administration • Time and Place Lecture Tuesday, 14:15-15:45 o’clock (s.t.) Room: Ifi 0.101 Exercise Thursday, 13:15-14:45 o’clock (s.t.) Room: Ifi -1.101 • Kind of Course, ECTS Lecture in the M.Sc. for Applied Computer Science 5 ECTS M.Inf.1151.Mp: Data Science und Big Data Analytics • Examination Written exam at the end of the semester Precondition for participation in the exam Passing the exercise 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 9 Administration • Announcements and course materials are distributed via Stud.IP • Course material Material is provided by Dell EMC through the Dell EMC Academic Alliance Electronic versions (PDF) of the slides Slides are protected by copyright and cannot be distributed freely We recommend to consider the theoretic side of data science • Lectures available as Web stream https://webconf.vc.dfn.de/datascience-ugoe/ 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 10 Exercise • Practical application of concepts from the lecture • NOT weekly! First exercise to be announced • Divided into two parts Five programming exercises Solutions must be presented to a lecturer during the exercise sessions 50% of points on each exercise sheet required for passing Final project Small data analysis project as group work Presentation of the results required for passing 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 11 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 12 Overall Course Goal • The goal of the Data Science And Big Data Analytics Course is for you to be able to immediately participate as a Data Science team member on big data and other analytics projects Data Scientist p-o-v Open Practical 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 13 Expected Background • Strong mathematical, quantitative • • • capability Experience with statistical methods and basic proficiency with a statistical software package, such as R or RStudio, Minitab, Matlab, SAS, or SPSS Experience with the conditioning and management of business data including databases Basic programming skills, preferably including SQL 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 14 Course Objectives Upon completion of this course, you should be able to: • Immediately participate and contribute as a data science team member on big data and other analytics projects by: Deploy a structured lifecycle approach to data science and big data analytics projects Reframe a business challenge as an analytics challenge Apply analytic techniques and tools to analyze big data, create statistical models, and identify insights that can lead to actionable results Select optimal visualization techniques to clearly communicate analytic insights to business sponsors and others Use tools such as R and RStudio, MapReduce/Hadoop, in-database analytics, and window and MADlib functions • Explain how advanced analytics can be leveraged to create competitive advantage and how the data scientist role and skills differ from those of a traditional business intelligence analyst 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 15 Course Modules and Navigation Icons Data Science and Big Data Analytics 1. Introduction to Big Data Analytics 2. Data Analytics Lifecycle + Lab 3. Review of Basic Data Analytics Methods Using R + Labs 4. Advanced Analytics - Theory & Methods + Labs 5. Advanced Analytics - Technology & Tools + Labs 6. The Endgame, or Putting it All Together + Final Lab 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 16 Topics : Data Science and Big Data Analytics Introduction Review of Basic Data Advanced Analytics Advanced Analytics The Endgame, or Course to Big Data Analytic Methods Using – Theory and - Technology and Putting it All Together Analytics + Data Analytics Lifecycle R Methods Tools + Final Lab on Big Data Analytics Big Data Overview Using R to Look at Data - Introduction to R K-means Clustering Operationalizing an Analytics Project State of the Practice in Analytics Analyzing and Exploring the Data Association Rules Analytics for Unstructured Data (MapReduce and Hadoop) The Data Scientist Linear Regression Statistics for Model Building and Evaluation Big Data Analytics in Industry Verticals Data Analytics Lifecycle The Hadoop Ecosystem Logistic Regression Naive Bayesian Classifier Decision Trees Time Series Analysis In-database Analytics – SQL Essentials Advanced SQL and MADlib for Indatabase Analytics Creating the Final Deliverables Data Visualization Techniques + Final Lab – Application of the Data Analytics Lifecycle to a Big Data Analytics Challenge Text Analysis 2 EMC PROVEN PROFESSIONAL Copyright © 2012 EMC Corporation. All Rights Reserved. Introduction and Course Agenda 17