Professional Documents
Culture Documents
The sample data from a research survey conducted in various cities on the amount of
time 13-15 year-old children spent with mobiles are as follows::
City Time with mobiles
(hours per week)
Hyderabad 46
Mumbai 50
Pune 46
Bangalore 54
Bhubneshwar 42
Indore 30
Bhopal 42
New Delhi 50
Chandigarh 46
For the above sample, determine the following measures:
a. The mean
b. The standard deviation
c. The mode
d. The 75th percentile
Based on your calculations comment on the time spent on mobile
Answer: a) Arithmetic mean is defined as the sum of all values divided by number of values and
is represented by X. If the number of values is finite, then the data is said to be discrete data. The
number of occurrences of each value of the data set is called frequency of that value. A
systematic presentation of the values taken by variable together with corresponding frequencies
is called a frequency distribution of the variable.
x̅= 46 + 50 + 46 + 54 + 42 + 30 + 42+ 50 + 46
9
= 406/9
= 45.11
Comment: On the basis of above calculation, we can say that children are spending on an
average 45 minutes with their mobiles. Bangalore is worst in this category as the children are
spending highest time i.e54 hours per week with their mobiles whereas Indore children arebest in
this category as they are spending only 30 hours per week with their phones.
b) The standard deviation of a set of values is the positive square root of meanof the squared
deviations of the values from their arithmetic mean. It isdenoted by “” (sigma).
x x^2
46 2116
50 2500
46 2116
54 2916
42 1764
30 900
42 1764
50 2500
46 2116
406 18692
= √[(∑x^2/N) - (∑x/N)^2]
= √[(18692/9 – (406/9)^2]
= √(2076.88 – 45.11^2)
= √2076.88 – 2035.012
= √41.88
= 6.47
Comment: Basically, a small standard deviation means that the values in a statistical data set are
close to the mean of the data set, on average, and a large standard deviation means that the values
in the data set are farther away from the mean, on average. Here, standard deviation is low as
compared to mean data which means that data is more reliable.
c) Mode is the value which has the highest frequency and is denoted by Z.Modal value is most
useful for business people. For example, shoe andreadymade garment manufacturers will like to
know the modal size of thepeople to plan their operations. For discrete data with or without
frequency,it is that value corresponding to highest frequency.
30 1
46 3
54 1
50 2
42 2
d) 75th percentile
1 30
2 42
3 42
4 46
5 46
6 46
7 50
8 50
9 54
75th percentile = Size of 7th item +7/9(Size of 8th item – size of 7th item)
= 50 + 7/9 (50-50)
= 50
Comment: On the basis of percentile calculation, we can say that Mumbai and New Delhi are
equally good in terms of percentile.
2. ‘Mumbai Ice Cream an ice cream store gives relationship between ice cream sold
andtemperature. The store has taken a sample of a week’s data. Below you are given
theresults of the sample
Day Cones Sold Temperature
1 350 110
2 200 100
3 210 90
4 100 80
5 80 70
6 70 60
7 50 50
a. Which variable is the dependent variable?
b. Compute the least squares estimated line.
c. Is there a significant relationship between the sales of cone and temperature?
d. Predict sales of a 95 degree day.
Answer: a) According to the question, ‘Mumbai ice cream’an ice cream store in Mumbai
believes that the sales of ice cream at the shop depends upon the temperature. Therefore, cones
sold will be dependent variable as it depends on the temperature of that day.
ӯ = 560/7 = 80
= 683900-593600
333200-313600
= 90300/19600
= 4.61
x -x̅ = bxy ( y - ӯ)
x = 4.61y -217.37
c) Is there a significant relationship between the sales of ice cream cones and temperature
A negative correlation is a relationship between two variables such that as the value of one
variable increases, the other decreases. Correlation is expressed on a range from +1 to -1, known
as the correlation coefficient. Values below zero express negative correlation. A perfect negative
correlation has a coefficient of -1, indicating that an increase in one variable reliably predicts a
decrease in the other one. A perfect positive correlation, which has a coefficient of +1, indicates
that an increase or decrease in one variable always predicts the same directional change for the
second variable. Lower degrees of correlation are expressed by non-zero coefficients between +1
and -1. Zero indicates a lack of correlation: There is no tendency for the variables to fluctuate in
tandem either positively or negatively.
Here, temperature and sales are two variables so we can find the correlation coefficient and
check the relationship between these variables.
= 683900 - 593600
1612800 - 1123600
= 90300/489200
= 0.18
Correlation: √bxy*√byx
= √ 0.8298
r = 0.91
x = 4.61y - 217.37
x = 437.95 - 217.37
x = 220.58
The data set contains information about the salary, covering a year’s period.The five variables in
the data table are described below:
Amount: Amount of the salary paid in dollars
Age: Age of the students in years
Marital Status: Marital status of the students
Institute: Type of institute chosen by students
Gender:Students Gender
The variables are coded with a Continuous, Ordinal or Nominal modeling type. A first step in
any analysis is to ensure that your variables have the correct Modeling Type:
• Continuous variables, like Amount, have numeric values.
• Ordinal variables, such as age, have either numeric or character values which represent
ordered categories.
• Nominal variables, like Gender, can also have either numeric or character values, and
represent unordered categories or labels.
= 52.5/9
= 5.83
Median = 2,3,3.5,4,5,5,7,8,15
Combining two groups which differ in their mean values on one of thevariables is not likely to
faithfully represent the true situation as far as the correlation is concerned. Addition of an
extreme casecan lead to changes in the amount of correlation. Dropping of such a caseleads to
reduction in the correlation while the converse is also true.
r = N∑XY - ∑X∑Y
√[N∑X^2 – (∑X)^2]*√[N∑Y^2 – (∑Y)^2]
r = 9*1474.5 – 52.5*248
√[9*429.25 – (52.5)^2]*√[9*6980 – (248)^2]
r = 13270.50 - 13020
√(3863.25 – 2756.25)*√(62820 – 61504)
r= 250.50
√(1107)*√(1316)
r = 250.50
33.27*36.28
r = 250.50/1206.99
r = 0.21