You are on page 1of 41

STATISTICS AND PROBABILITY

FOR DATA SCINECE NOTES:


1.What is expected from students?
to characterize myself
r u here to learn
focus, interest, attitude towards learning

2.Go beyond the text book, go beyond class room and self-learning is very
much essential
3.be learner and self-learner.
4.Take notes, Have your own interest
5.Have a Unique Focus, Be a unique person
6.Discover yourself
1.What is Data? where do we get data? How do we get data?Why do we get
data?
2.Statistics is meant for "handling data"
3.All the way of Obervation is Data
4.There types of Data
5.We Observe and get data
6.We need to take decisions ,so we need data
7.Must know about data
8.Think about Data,do a small research
9.Google ClassRoom Code:-
https://classroom.google.com/c/MTcwNTM4NTY0Mzgx?cjc=szcuwph
1.https://chat.whatsapp.com/Dh268faw4xSCpSov4eSOWv (Whatsapp group
link)
2.Introduction to statistics:-
1. Lies, Damnded lies, and statistics
2. Why study statistics?
1.Data is everywere
2.Use stats to make decisions that affect our lives
3.understanding stats methods will help us to make decisions
effectively
3. Applications of stats in the business world
1.Finance
2.Marketing
3.Personel
4.Operating Management
4.Collecting,organizing,presenting,analyzing,and interpreting data to
assist in making more effective decisions
5. Statistical analysis-used to maipulate summarize, and investigate data,
to get useful decision-making information results
1.Need to understand interpretation(important)
types of statistics:-
2.Descriptive statistics- Methods of organizing, summarizing, nad presenting
data in an informative way
3.Inferential statistics- methods uded to determine something about a populaion,
sample
4.Sampling:-
1.Random:-equal chance of being selected
1.Simple random sample- each sample have equal chance of being
selected
2.Stratified sample- population into group called strata and then
take sample
3.cluster sample- populations into strata and then randomly
selected are in cluster sample
are 4.Systematic sample-randomly select a starting point and take
everyn-th piece of data from a list of population
2.Non Random:-

1. Statistical data are usually obtained by counting or measuring items.


1.Qualitative-Data are measurements that each fail into
onceofseveralcategories
2.Quantitative-Data are observations that are measured on a numerical
scale
2. Talked continous and discrete data.
3. Types of variables
4.Numerical scale of measurement:-
1.Nominal
2.Ordinal
3.Interval
4.Ratio
1.gave an example for organization of data
2.bring with 5 questions
1.Presentation of data:-
1.Tabular
2.Visual:-
1.Graphical
2.Diagram
2.Principles behind presentation
3.cumulative frequency
1.Talked about real life collection of data, ratings and etc
1.Module 2
2.Mean,median, mode+ range(measures of central tendency)
3.what is mean, median, mode?
4.Mean-- numerical average of the data set (arranges randomly and find
average)
5.Median---the middle of a set of data (arranges from low to high or vice versa)
6.Mode--piece of data that occurs most frequently in the data set (one, more
than one,no mode)
7.Range ---dispersion between the highest and lowest value
1.MEAN FORMULA
X=E fxm/n
X= Mean
f= frequency
Xm= Class mark
n=total frequency
Xm=(LL+UL)/2
1.Median Formula:-
1.(n+1)/2 for ungrouped data
to find out the middle item
2.Median= (L + N/2 -P.c.f x i)/f for grouped data
L-- lower limit of median class
N-- Total no of observation or the total of the frequency
P.c.f--Previous cummulative frequency of median class
i-- size of the median class
f--frequency of median class
Marks(x) No.of students(f) Cummulative frequency (c.f)
0-10 1 1
10-20 2 1+2=3
N/2
1.discrete :-
1.find c.f
2.size of N+1/2
3.ans of N+1/2 is the median
2.Characteristics of the median:-
1.the median can be computer at the open-ended distribution,coz it's
located in class interval
2.
3.Mode:-
Mo=( L + (^1/^1+^2) X i)

Mo--Mode
L-- Lowerlimit of the class
^1-- Difference between current and previous
^2-- Difference between current and post
i--

Mode=3median-2mean
merits of mode:
Demerits of Mode:

Relationship of the mean,median and mode

Mean=Median=Mode

1.standard deviation:-
sigma=square root of (E(xi-`x)^2)/n
1.The mean must be high and why would the deviation is less?
2.standard deviation
sigma=squafe of (Ef(x-x-)^2)/n (grouped data)
SD=Square root of((Ed^2)/n)

1.If deviation is high, then result is not effective


1. There is a group activity:-
2 in a group and 3 in a group
2.Problem : back ground, significance
3.Population, Sample-Size of both
4.Questionnaire(10 Questions)
5.Collect data
6.Present data(table, graph)
7.Analysis(measures of central tendency, Measures of
dispersion),manual,softaware tool(EXcel)
8.inference-findings (observations)
9.Conclusion
10.From today to next friday!

1.Quartile Deviation:-
Quantiles -they are the values which divide a set of data into equal parts.
Median- divided the distribution into two equal parts
Quartile-four equal parts
Decile-ten equalparts
Percentile-one hundred equal parts
2.First quartile(25th percentile) the middle no btw the smallest and the median
of the data set,Second quartile(50th percentile)the median of the data that
seperates the lower and upper quartile,Third quartile(75th percentile)the middle
value btw the median and the highest value of the data
3.(IQR=Q3-Q1)interquartile range, (QD=Q3-Q1/2)Semi-quartile range or
Quartile deviation
4.QD from ungrouped data
Arrange the test scores from highest to lowest

1.Quartile Deviation(I wasn’t there I suppose)

1.Mean Deviation: -
1.It gives us an idea of how spread out from the center the set of values is
2.for ungrouped data:
MD = E |……(incomplete 😊)
1. demo a pie chart, column chartfor mean ,mean deviation
2. Mean deiation for grouped data
MD= (E f|x-u|)/E f
u=mean, x =each value, f=frequency
3.U=E fx/Ef

1. Discussed some problem


2.module 1and 2 are for the test

1.Module 1 and 2 for internals


2.Total Recall

1. Linear Relationship
2. correlation of linear between two random variables
3. Measure the relationship and future growth of relationship

Correlation and Regression


R=[nE(xy)-(Ex)(Ey)]/{Root[nEx2-(Ex)2}{root[nEy2-(Ey)2]}
Ex:1. The time x in years that an employee spent at a company and the
employee’s hourly pay, y, for 5 employees are listed in the table below.
Calculate and interpret the correlation co-efficient r. Include a plot of the data in
your discussion.
x y X2 Y2 Xy
5 25 25 625 125
3 20 9 400 60
4 21 16 441 84
10 35 100 1225 350
15 38 225 1444 570
Ex=37 Ey=139 Ex2=375 Ey2=4135 Exy=1189

M= [nExy-(Ex)(Ey)]/[nE(x2)-(Ex)2]
Correlation helps relationship between the variables under consideration is
measure through the correlation analysis
Correlation denotes the interdependency
Correlation and causation:
Causation means cause & effect relation
Range for correlation (-1 < = r > = +1)
Furious correlation or Zero Correlation
Types of Correlation:
1. Positive correlation
2. Negative correlation

Positive correlation:
The correlation is said to be positive if the values of two variables
changing with same direction
Examples: Pub. Exp. And Sales, Height & weight

Negative correlation:
The correlation is said to be negative when the values of variables change
with opposite direction
Examples: Price and Quantity. Demanded

Zero Correlation is also know as Furious Correlation:


Partial correlation:
Methods for studying correlation
 Scatter diagram method: we plot points and check the line. If it is
linear then it is “POSITIVE Correlation”
Advantages
Simple, non-mathematical method,
not influenced by the size of extreme item,
first step in investing the relationship between two variables
 Karl Pearson’s Coefficient of Correlation: when deviation taken
from actual mean
Formula: r(x,y)=E xy/ root(E x2 Ey2)
x=X-X`(x bar)
y=Y-Y`(y bar)
When deviation taken from an assumed mean:
R= N Edxdy- Edx Edy/ root(Edx2-(Edx)2 ) * root(Edy2-(Edy)2)

Direct method

R=(N. E xy- E x. Ey)/root(n Ex2 –(Ex)2).root(n.Ey2-(Ey)2)

Spearman’s Rank Correlation Method

6 ∑ d2
ρ=1−
n ( n2−1 )
Rank Correlation with Repeated Ranks
Where m is no of repeated RANKS!

Multiple Correlation: if there is more than 2 variables


How we have to analyze the Correlation of variables?
Scatter diagram and …
Regression Analysis: study of relationship between two variables. Study of
predict future. It is one of the most commonly used tools for business analysis.
Simple Regression:
Multiple Regression:

For Ex: y= F(x)


Y is dependent and F(x) is independent.

Cross Sectional: data gathered from the same period


Time Series: Involves data observed over equally spaced points in time

Simple Linear Regression


How can we give solution with two regressions involved?

byx= regression coefficient


r = -0.43
x=-0.38y + 35 , y=-0.48x + 19.68

Probability:
Revision:
If P(a)=0.4 P(b)=0.3 where a and b are mutualy exclusive events
then P(A U B)?
Ans. P(A U B) = P(a)+P(b)
= 0.4 +0.3
P(A U B) =0.7
Idea of random variable(function) was taught on 9 & 10

2 D random variables:
Binomial Distribution
finite

P+q=1, p+1/3=1,..

You might also like