Professional Documents
Culture Documents
Introduction To Data Science
Introduction To Data Science
Machine learning- Usage and development of algorithms that allow computers to learn and make
predictions
Operational Research- Optimizing processes, resources, decision making, etc. within a business
Data engineering- Building and maintaining robust infrastructure for data pipelines (feeding and
analyzing large volumes of data)
Types of ML
Regression (supervised)
Classification (supervised)
Clustering (supervised)
Regression:
Classification:
Clustering:
Basic statistics to describe data:
Descriptive statistics:
- There are three main prominent areas:
- Distribution- frequency of each value occurring within the data
- Central tendency- the averages of the data
- Variability- how spread out the values are from central tendency
Distribution:
- Datasets are made up of distribution values, and we can summarize the frequency of each
possible value using numbers or percentages. This is usually done through a frequency table.
- The simple frequency table represents all values grouped together with their main
categories. We can easily identify the most popular group using this
- The group frequency table creates numerical groupings based on the amount of visits each
person had to the library. We can identify further information on the distribution of values,
ie. Here most people visit the library between 9 and 12 times.
Central tendency:
- Central tendency represents the center, or average of the dataset
- Mean, median and mode are mostly used for finding the average
- Mean: add up all values and divide by the amount of values
- Median: the exact middle value
- Mode: the most commonly found value
Variability:
- Variability tells us how spread out the values within the dataset are
- Range, standard deviation and variance are the most common metrics of variability
- Range- largest value minus smallest value
- Standard deviation- Average amount of variability within the dataset. High SD means high
variability, low SD means low variability
- Variance- Standard Deviation squared
Standard deviation: