You are on page 1of 4

Notes from pre-MBA stats

1. Numerical or quan0ta0ve data is best interpreted by mean, median, devia0on, percen0le. Mean is a good
metric to get an idea of data. However, it does not address outliers and errors. Median does!
2. Another metric which gives an idea of the range of data is Standard devia0on.
3. Percen0le is very useful to get an idea of how the data is distributed and in what range.
4. Categorical data is the type of data in which we categorise individual data values. We lose the opportunity
to make averages etc.: 
Grouped categorical info
Unique or mul0 type
EG: The state/province in which a retail
outlet is located.
The types of movies this user has preferred in the past.
5. Qualita0ve or Categorical data in which the order of categories does not maQer is called cardinal data.
6. In case if order of categories is important, then it is called ordinal data. Eg:kids and teens, young adults,
middle agers, seniors
7. Nominal data: Informa0on about data and unique iden0fica0on
Eg: Phone numbers, Card number,
Customer name
Storage: Not for analysis
Unique iden0fiers
8. Correla0on is to understand the connec0on between 2 diff variables within a given dataset.
9. Datasets with the same mean, std dev and correla0on can be different overall which can be seen by
ploZng them. Eg: Anscombe’s Quartet. Similar summary sta0s0cs but essen0ally different type of data
10. Be careful in iden0fying if the datatype is cardinal or ordinal
11. Most info can be drawn from numerical than categorical than nominal data.
12. The descrip0ve sta0s0cs you can do with cardinal data include frequencies, propor0ons, percentages, and
central points. And, to visualize cardinal data, you can use a pie chart or a bar chart.
13. The descrip0ve sta0s0cs that you can do with ordinal data include frequencies, propor0ons, percentages,
central points, percen0les, median, mode, and the interquar0le range. Here the visualiza0on methods that can be
used are the same as nominal data.
14. Two types of quan0ta0ve data are discrete ( number of students) data and con0nuous(height) data.
15. Con0nuous data can be visualized by histogram or box plot while bar graphs or stem plots can be used for
discrete data.
16. Interval data: It represents ordered data that is measured along a numerical scale with equal distances
between the adjacent units. These equal distances are also referred to as intervals. You can compare the data with
interval data and add/subtract the values but cannot mul0ply or divide as it doesn't have a meaningful zero. The
descrip0ve sta0s0cs you can apply for interval data include central point, range, and spread.
17. Like Interval data, ra0o data are also ordered with the same difference between the individual units.
However, they also have a meaningful zero so they cannot take nega0ve values. Now with real zero points, we can
also mul0ply and divide the numbers. Besides, you can sort the values as well. The descrip0ve sta0s0cs you can do
with ra0o data are the same as interval data and include central point, range, and spread.
18. Probability: Formal name for chance
19. Random experiment: An experiment without a predetermined outcome
EG: Roll a dice, toss a coin
20. Sample space: Set of all possible outcomes
21. Subsets of the sample space can be called an event I.e. set of possible outcomes.
22. Mutually exclusive events: Occurrence of one event implies non-occurrence of another event, disjoint
events.
23. Collec0vely exhaus0ve events: At least one of the many event is guaranteed to occur.
24. Par00ons are events which are MECE
25. A commonly used par00on is the trivial par00on I.e. the event happened or did not happen.
26. Axioms of probability- 
P(sample set) =1 , P(empty set) = 0
For any event, E, 0<= P(E) <= 1
For mutually
exclusive events, E1,E2….
P(E1UE2U….) = P(E1)+P(E2)+ …..
27. P(E1UE2) = P(E1)+P(E2)-P(E1^(intersec0on)E2)
28. In case of MECE events, P(A1)+P(A2)+…+P(An) = 1
29. P(A’) = 1-P(A)
30. Condi0onal probability : Given an event has occurred, what is the probability another event has occurred?
P(A|B) - Probability A given B
31. P(A^B) = P(B)*P(A|B)
32. Beware: P(A|B) != P(B|A)
33. Independence: Event B’s occurrence (or non-occurrence) provides no addi0onal clue about event A.

P(A|B) = P(A) then A and B are independent events
So, P(A^B)/P(B) = P(A)
OR P(A^B) = P(A)*P(B)
34. A probability distribu0on is considered uniform if every outcome is equally as likely.
35. Random variables: It is neither random nor a variable, it is a func0on. A func0onal mapping from sample
space to real numbers.
Gives an ordering to the outcomes.
Numerical opera0ons on a representa0ve of the
outcome.
36. Cumula0ve distribu0on func0on: FX(x) = P(X<=x)
EG: P(X<=3) <= P(X<=4)
37. CDF func0on is non decreasing, as limit x-> -infinity, FX(x) = 0, and as limit x-> infinity, FX(x) = 1
38. Discrete random variables: Random variable X takes a discrete set of values
39. For these, Probability mass func0on: FX(xi) := P(X=xi)
40. Con0nuous random variable : takes a con0nuous set of values.
41. For these, Probability density func0on which is a deriva0ve of the CDF. 
fX(x)= dF(x)/dx. It is not a
probability and can take values greater than 1.
42. Uniform distribu0on: All values are equally likely
43. Normal distribu0on or bell curve: Usually defined by 2 metrics: Mean and Standard devia0on. The
probability density func0on ploZng on a graph.
44. Sampling: Need for sampling is due to the cost of data collec0on from the en0re popula0on, also limited
access to the popula0on
45. Characteris0cs of good sample: A representa0ve of popula0on, Significantly reduced cost as compared to
collec0ng data from en0re popula0on.
46. Sampling types: 
1. Finite popula0on: with replacement, without replacement
2. Infinite popula0on
47. Simple random sampling: No bias as uniform probability of each result in popula0on being sampled.
48. Stra0fied sampling: Divide popula0on intro strata and sample
49. Systema0c random sampling: Put every member of a popula0on into some order. Choosing a random
star0ng point and select every nth member to be in the sample.
50. Cluster random sampling: Split a popula0on into clusters. Randomly select some of the clusters and
include all members from those clusters in the sample.
51. Convenience sampling: Choose a small convenient sample keeping in mind to avoid systema0c bias
52. Voluntary sampling: More convenient and cost-effec0ve but may have more bias
53. Snowball sampling: To handle user un-willingness to express opinions. Who-knows-who network to
gather user data.
54. Purposive sample: Researchers recruit individuals based on who they think will be most useful based on
the purpose of their study.
55. In a con0nuous data set of values, P(X=a) = 0
56. P(X<=b) = P(X<a) + P(a<=X<=b)
57. Expecta0on of a random variable is the weighted average of the different values the random variable
takes weighed by the probabili0es of the different values. Expected value is analogous to average. It is more
central tendency not the value you expect X to take.
58. Law of Large Numbers: Average value of a large dataset converges to the central tendency. As we take
larger and larger independent and iden0cally distributed (i.i.d.) samples, sample characteris0cs represent
popula0on characteris0cs.
59. Central Limit Theorem: Under mild condiBons: Sum of sufficiently many i.i.d. random variables is
approximately normally distributed.
60. Variance V(X) = E(X2) – [E(X)]2
61. E(X2) = Σ x2 * p(x).
62. Popula0on: Collec0on of all elements of interest
63. Parameter: Characteris0cs of the popula0on(for example, the popula0on mean, variance, propor0on)
64. Sample:Collec0on of a small number of elements from the popula0on
65. Random Sample: Collec0on of a small number of elements from the popula0on in which each element
has an equal chance of being selected in the sample
66. Es0mator: A sample sta0s0c (for example, sample mean, variance, propor0on). Methods to es0mate
popula0on parameters based on a random sample.
67. Es0mate: A value of the sample sta0s0c given a sample
68. Parameter values are constant, while es0mates vary based on the sample.....and they are almost never
equal to the parameter values.
69. A sampling distribu0on is the probability distribu0on of a sample sta0s0c that arises from choosing
random samples from a popula0on.
70. The mean of the sampling distribu0on is called the expected value of the sample sta0s0c.
71. The standard devia0on of the sampling distribu0on is called the standard error of the sta0s0c.
72. Sampling distribu0ons are not always smooth and symmetric.
73. The expected value of the sample sta0s0c is not always the popula0on parameter that the sta0s0c
es0mates.
74. In prac0ce, we usually have only one sample. Therefore, we will not be able to see the sampling
distribu0on of any sta0s0cs, but the knowledge of this distribu0on is of immense prac0cal importance.
75. Unbiasedness: A point es0mator is said to be unbiased if it neither systema0cally overes0mates nor
underes0mates the popula0on parameter it es0mates.
76. Efficiency: A point es0mator is said to have a higher rela0ve efficiency than another if it has a smaller
standard error than the other.
77. A point es0mator is said to be consistent if the values of the point es0mator tend to be closer to the
popula0on parameter as the sample size increases.
78. A good es0mator should be:
Unbiased: Its expected value should be the same as the parameter it es0mates
Rela0vely efficient: Its standard error should be the minimum among all candidate es0mators
Consistent: As sample size increases, the es0mate should be closer to the parameter it es0mates
79. Observa0ons About Sampling Distribu0ons of Sample Means:
The sampling distribu0on of sample means is a normal distribu0on when the number of observa0ons in the
sample is large enough.
Based on the central limit theorem, we know that this normal distribu0on is centred around the popula0on mean
and its standard devia0on is sigma/root of n where sigma is the popula0on standard devia0on, and n is the
number of observa0ons in the sample.
80. Bias can be corrected by mul0plying the es0mate with square root of n/(n-1) where n is the number of
observa0ons in the sample.
81. Observa0ons About Sampling Distribu0ons of Sample Propor0ons:
The sampling distribu0on of sample propor0ons can be approximated by a normal distribu0on when the
number of observa0ons in the sample is large enough, and the popula0on propor0on is neither too large
nor too small.
This normal distribu0on is centered around the popula0on propor0on, and its standard devia0on is
square root of p(1-p)/n, where p is the sample propor0on, and n is the number of observa0ons in the
sample.
82. if the width of the interval increases, keeping the sample size constant, the chance that the population
parameter will be inside that interval, that increases. We have also seen that if we keep the width of the
interval constant and increase the sample size, the chance of the population parameter being inside that
interval also increases. The question is, what is the minimum width of the interval so that the population
mean will be inside that interval with the probability alpha, regardless of the value of the sample
mean? This interval is called a 100 alpha percent confidence interval. For instance, if we want a 90
percent confidence interval, we want to find out the minimum width of the interval so that the population
mean will be inside that interval with probability 0.9, regardless of the value of the sample mean.
83.

You might also like