This document discusses key concepts in data science including the characteristics of big data, the 5 V's of big data, differences between data science, information science and business intelligence, explaining the data science life cycle and data wrangling process. It also covers data preprocessing techniques including discretization, normalization, scaling, handling missing and noisy data, data reduction methods, measures of dispersion, and statistical analyses such as mean, median, mode, Bayes' theorem, hypothesis testing, t-tests, chi-square tests, Pearson correlation, and the need for hypothesis testing.
This document discusses key concepts in data science including the characteristics of big data, the 5 V's of big data, differences between data science, information science and business intelligence, explaining the data science life cycle and data wrangling process. It also covers data preprocessing techniques including discretization, normalization, scaling, handling missing and noisy data, data reduction methods, measures of dispersion, and statistical analyses such as mean, median, mode, Bayes' theorem, hypothesis testing, t-tests, chi-square tests, Pearson correlation, and the need for hypothesis testing.
This document discusses key concepts in data science including the characteristics of big data, the 5 V's of big data, differences between data science, information science and business intelligence, explaining the data science life cycle and data wrangling process. It also covers data preprocessing techniques including discretization, normalization, scaling, handling missing and noisy data, data reduction methods, measures of dispersion, and statistical analyses such as mean, median, mode, Bayes' theorem, hypothesis testing, t-tests, chi-square tests, Pearson correlation, and the need for hypothesis testing.