Professional Documents
Culture Documents
STATISTICS
Session 1
Moneyball
What is data?
2
What is statistics?
3
Why do we need statistics?
6
Big Data and Data Analytics
7
1. Data Collection and Sampling
¨ Data collection & storage: The very first exercises in data collection
required interactions with subjects, paper documents and physical
storage. As technology has evolved, data collection and storage has
been digitized, and the risks have shifted.
¨ Population versus Sample: With large populations, you often have
to sample that population, either cross sectionally, across time, or
both.
¨ The Sampling Sins: While sampling, by itself, is part of statistics,
there are two fundamental sins that you should try to avoid:
¤ Bias: The sample has to be representative of the population. If the
sampling method creates bias, the results from the sample cannot be
extrapolated to the population.
¤ Noise: Even if the sample is representative, the results that you obtain will
have statistical error or noise that can muddy your conclusions.
8
2. Data Descriptive(Statistic)s
9
3. Data Distributions
10
4. Data Relationships
¨ In statistics, we can also examine how two or more variables are related
to each other.
¤ In some cases, we restrict ourselves to chronicling whether that co-movement is in
the same direction, opposite directions and that there is no co-movement at all.
¤ In others, we try to find whether there is causation, where one variable’s
movement is the cause of the other variable’s movement.
¤ Finally, if there is a link, we can use statistics to try to predict a variable, based upon
observed values of another variable.
¨ As an example, consider the linkage between stock prices and interest
rates.
¤ We can measure whether interest rates and stock prices move together, in opposite
directions and are unrelated.
¤ We can also examine which direction the causation runs (do higher stock prices
cause higher interest rates or vice versa)
¤ And if there is causation or a link, we can see if we use changes in interest rates can
be use to predict changes in stock prices or vice versa.
11
5. Probabilities
12
6. And Probabilistic Tools
13