Professional Documents
Culture Documents
UNIVERSITY INSTITUTE OF
ENGINEERING
COMPUTER SCIENCE& ENGINEERING
Bachelor of Engineering
Statistical Method Using R-(20 SMT-460)
What is Correlation?
• Correlation/ connection refers to a
process for establishing the
relationships between two
variables.
Correlation
1.Positive correlation:-> If increase in one variable will increase the other variable
value.For example, there is a positive correlation between smoking and alcohol
use. As alcohol use increases, so does smoking.
2.Negative correlation:-> This means that as one variable increases, the other
decreases, and vice versa. If price is more then demand will be less
3.Linear correlation:-> If the ratio of change between two variable remains same.
Marks w.r.t topper of class
4.Curvilinear correlation:-> If the ratio of change between two variable changes.
Student strength in class
5.Simple correlation:-> Relation between two variables only sunlight & Temp
6.Partial correlation:-> Relation between three variables only temp, rainfall & yield
7.Multiple correlation:-> Relation between three/four variables
What is Regression?
Regression analysis measures the nature and extent of two or more variables
which enables us to make predictions. Regression analysis is a mathematical
measure of the average relationship between two or more variables in terms of
the original units of the data.
How much relation between variables is correlation.
How is the relationship between variables is the regression.
What is Regression?
The term "regression" literally means "stepping back towards the average".
It was first used by a British biometrician Sir Francis Galton (1822-1911),
in connection with the inheritance of stature
•In Regression Analysis there are two types of variables.
Dependent variable/regressed or explained variable:-> The variable
whose value is influenced or is to be predicted. The dependent variable
is shown by “y”
Independent variable/ regressor or predictor or explanatory
variable:-> variable which influences the values or is used for prediction.
independent variables are shown by “x”.
Properties of Regression?
coefficients.
If one of the regression coefficients is greater than unity, the other must
of scale.
The value of the dependent variable is based on the value of the independent
variable.
Example: The value of pollution level at a specific temperature.
Example: One can determine the likelihood of choosing an offer on your website
(dependent variable). For analysis purposes, you can look at various visitor
characteristics such as the sites they came from, count of visits to your site, and
activity on your site (independent variables). This can help determine the
probability of certain visitors who are more likely to accept the offer. As a result,
it allows you to make better decisions on whether to promote the offer on your
site or not.
Lines of Regression
If the variables in a bivariate distribution are related, we will find that the points
in the scatter diagram will cluster round some curve called the "curve of
Regression“
Lines of Regression
If the curve is a straight line, it is called the line of regression and there is said to
be linear regression' between the variables, otherwise regression is said to be
Curvilinear.
Lines of Regression
The line of regression is the line which gives the best estimate to the value of one
variable for any specific value of the other variable. Thus the line of regression is
the line of "best fit" and is obtained by the principles of least squares..
Solution:
X on Y => X=a+bY
Σ X= Na+ b Σ Y ………………..1
Solution
Y on X=> Y=a+bX
Σ Y= Na+ b ΣX ………………..1
bxy= 5(88)-15*25
5(151)-625
bxy= 65/50=1.3
(Y-Y̅ ) =bxy (X- X̅ )
Y= 1.3X+1.1
X̅ =ΣX =>42/6 =7
N
Y̅ =ΣY =>30/6 =5
N
Probability Distribution
P(E)=3/6=>⅓
P(O)=3/6=>⅓
0 head= ¼
1 head= ½
2 head= ¼
• X= Number of heads
•
Sample space = {HH,TH,HT,TT}
•
Random Variables : for Number of heads
•
0 head= ¼
• 1 head= ½
• 2 head= ¼
DISCRETE DISTRIBUTIONS:
Discrete distributions have a finite number of different possible outcomes.
•We can add up individual values to find out the probability of an interval
•Discrete distributions can be expressed with a graph, piece-wise function
or table
•In discrete distributions, graph consists of bars lined up one after the
other
Binominal Distribution
The binomial distribution represents the probability for 'x' successes of an
experiment in 'n' trials, given a success probability 'p' for each trial at the
experiment.
Step 1:-> Success= P and Failure = q
p=1-q q=1-p
r = 0, 1, 2, 3, 4, …
Step 4:->Mean, μ = np
Variance, σ2 = npq
deviation is 3/2
Solution:
•Mean, μ = np = 9 …………………………………….. 1
•Standard Deviation σ= √(npq) = 3/2 …………………………………….. 2
•Divide equation by 2 by 1
=> ( npq = 9/4) / (np=9) => q= ¼
The probability of a man hitting a target is ¼. He fires 7 times . What is the probability of hitting
• Solution:-> p= ¼ q= 1-¼ =¾ n= 7
• P(X>=2) =>P(x=2)+P(x=3)+P(x=4)+P(x=5)+P(x=6)+P(x=7)
• q= 1-[P(x=0)+P(x=1)]
• P(x=0) = 7c0 p
• P(r:n,p) = n r r
C p (q)
n-r
• 1-7290/16384 =>4547/8192
Bernoulli Distribution
A Bernoulli variable has only two values: success and failure. If we know
the probability of success, p, then the probability of failure is 1-p.A pass or
fail exam can be modeled by a Bernoulli Distribution.
If we have a Binomial Distribution where n = 1 then it becomes a Bernoulli
Distribution
The 3 conditions for a Bernoulli trial are:
1. Each trial has only two possible outcomes: True/False, Yes/No,
Success/Failure, etc.
2. The trials are independent. They do not influence each other.
3. The probabilities of success and failure do not change. They remain the
same for all trials.
The expected value of a Bernoulli distribution is the probability of success,
p: EX = p.
λ=np
Where
x = 0, 1, 2, 3...
Five coins tossed 3200 times. what is the probability to getting 5 heads two times?
Solution:
Normal Distribution
• Normal Distribution, also called the Gaussian Distribution, is the most significant
• The Normal Distribution is defined by the probability density function for a continuous
random variable in a system. Let us say, f(x) is the probability density function and X is the
random variable. Hence, it defines a function which is integrated between the range or
interval (x to x + dx), giving the probability of random variable X, by considering the values
• f(x) ≥ 0 ∀ x ϵ (−∞,+∞)
•Where,
•x is the variable
•μ is the mean
•σ is the standard deviation
•In a normal distribution, the mean, median and mode are equal.(i.e., Mean =
Median= Mode).
•The normal distribution should be defined by the mean and standard deviation.
•The normal distribution curve must have only one peak. (i.e., Unimodal)
•The curve approaches the x-axis, but it never touches, and it extends farther away
from the mean.
Normal distribution has any positive standard deviation. We know that the mean helps to
determine the line of symmetry of a graph, whereas the standard deviation helps to know how
If the standard deviation is smaller, the data are somewhat close to each other and the graph
becomes narrower.
If the standard deviation is larger, the data are dispersed more, and the graph becomes wider.
The standard deviations are used to subdivide the area under the normal curve. Each
subdivided section defines the percentage of data, which falls into the specific region of a graph.
Answer:
Exponential Distribution
The exponential distribution is commonly used to model time: the time between
arrivals, the time until a component fails, the time until a patient dies. We have