The document discusses the big data problem and outlines the typical big data analysis pipeline including data acquisition, cleaning, integration, modelling, and interpretation. It notes common challenges with big data like heterogeneity, scale, timeliness, and privacy. It also covers the evolution of data and analytical architectures from mainframes to relational databases to today's era of unstructured data from the internet of things. Finally, it discusses the emerging big data ecosystem and three key roles: data engineers, data scientists, and data analysts.
Original Description:
It's a standard textbook followed by a national level institute to teach Big data analytics.
The document discusses the big data problem and outlines the typical big data analysis pipeline including data acquisition, cleaning, integration, modelling, and interpretation. It notes common challenges with big data like heterogeneity, scale, timeliness, and privacy. It also covers the evolution of data and analytical architectures from mainframes to relational databases to today's era of unstructured data from the internet of things. Finally, it discusses the emerging big data ecosystem and three key roles: data engineers, data scientists, and data analysts.
The document discusses the big data problem and outlines the typical big data analysis pipeline including data acquisition, cleaning, integration, modelling, and interpretation. It notes common challenges with big data like heterogeneity, scale, timeliness, and privacy. It also covers the evolution of data and analytical architectures from mainframes to relational databases to today's era of unstructured data from the internet of things. Finally, it discusses the emerging big data ecosystem and three key roles: data engineers, data scientists, and data analysts.
Data Acquisition and Recording Information Extraction and Cleaning Data Integrations and Aggregations Query Processing, Data Modelling and Analysis Interpretations Common Challenges Heterogeneity and Incompleteness Scales Timeliness Privacy Human Collaborations System Architecture Evolution of Big Data • 1970s and earlier era was of mainframes. Data was essentially primitives and structured. • 1980s and 1990s era of relational Databases. Era was of data intensive applications. • World Wide Web and Internet of things (IoTs) era of unstructured, structured and semi-structured data. BI vs Data Science BI vs Data Science Current Analytical Architecture Drivers of Big Data Emerging Big Data Ecosystem and New Approach in Analytics Three Key Roles of The New Data Ecosystem Three Key Roles of The New Data Ecosystem Three Key Roles of The New Data Ecosystem Three Key Roles of The New Data Ecosystem Three Key Roles of The New Data Ecosystem Three Key Roles of The New Data Ecosystem Three Key Roles of The New Data Ecosystem Data Scientist Data Scientist Data Scientist