You are on page 1of 8

1.

The sample data from a research survey conducted in various cities on the amount of
time 13-15 year-old children spent with mobiles are as follows::
City Time with mobiles
(hours per week)
Hyderabad 46
Mumbai 50
Pune 46
Bangalore 54
Bhubneshwar 42
Indore 30
Bhopal 42
New Delhi 50
Chandigarh 46
For the above sample, determine the following measures:
a. The mean
b. The standard deviation
c. The mode
d. The 75th percentile
Based on your calculations comment on the time spent on mobile

Answer: a) Arithmetic mean is defined as the sum of all values divided by number of values and
is represented by X. If the number of values is finite, then the data is said to be discrete data. The
number of occurrences of each value of the data set is called frequency of that value. A
systematic presentation of the values taken by variable together with corresponding frequencies
is called a frequency distribution of the variable.

x̅= 46 + 50 + 46 + 54 + 42 + 30 + 42+ 50 + 46
9

= 406/9

= 45.11

Therefore, mean is 45.11

Comment: On the basis of above calculation, we can say that children are spending on an
average 45 minutes with their mobiles. Bangalore is worst in this category as the children are
spending highest time i.e54 hours per week with their mobiles whereas Indore children arebest in
this category as they are spending only 30 hours per week with their phones.

b) The standard deviation of a set of values is the positive square root of meanof the squared
deviations of the values from their arithmetic mean. It isdenoted by “” (sigma).
x x^2
46 2116
50 2500
46 2116
54 2916
42 1764
30 900
42 1764
50 2500
46 2116
406 18692

 = √[(∑x^2/N) - (∑x/N)^2]

= √[(18692/9 – (406/9)^2]

= √(2076.88 – 45.11^2)

= √2076.88 – 2035.012

= √41.88

= 6.47

Comment: Basically, a small standard deviation means that the values in a statistical data set are
close to the mean of the data set, on average, and a large standard deviation means that the values
in the data set are farther away from the mean, on average. Here, standard deviation is low as
compared to mean data which means that data is more reliable.

c) Mode is the value which has the highest frequency and is denoted by Z.Modal value is most
useful for business people. For example, shoe andreadymade garment manufacturers will like to
know the modal size of thepeople to plan their operations. For discrete data with or without
frequency,it is that value corresponding to highest frequency.

30 1
46 3
54 1
50 2
42 2

Modal value is 46, which is corresponding to the highest frequency 3.

Comment: In our case 46 is the highest frequency which is 3 in numbers. It means, on an


average,in these three citieschildren are spendingequal time with their mobile phones.

d) 75th percentile

1 30
2 42
3 42
4 46
5 46
6 46
7 50
8 50
9 54

Rank = 75/100 * (9+1)


= 0.75*10
= 7.50

75th percentile = Size of 7th item +7/9(Size of 8th item – size of 7th item)

= 50 + 7/9 (50-50)

= 50

Comment: On the basis of percentile calculation, we can say that Mumbai and New Delhi are
equally good in terms of percentile.

2. ‘Mumbai Ice Cream an ice cream store gives relationship between ice cream sold
andtemperature. The store has taken a sample of a week’s data. Below you are given
theresults of the sample
Day Cones Sold Temperature
1 350 110
2 200 100
3 210 90
4 100 80
5 80 70
6 70 60
7 50 50
a. Which variable is the dependent variable?
b. Compute the least squares estimated line.
c. Is there a significant relationship between the sales of cone and temperature?
d. Predict sales of a 95 degree day.

Answer: a) According to the question, ‘Mumbai ice cream’an ice cream store in Mumbai
believes that the sales of ice cream at the shop depends upon the temperature. Therefore, cones
sold will be dependent variable as it depends on the temperature of that day.

b) least squares estimated line


S.no Cones Sold (X) Temperature(Y XY X^2 Y^2
)
1 350 110 38500 122500 12100
2 200 100 20000 40000 10000
3 210 90 18900 44100 8100
4 100 80 8000 10000 6400
5 80 70 5600 6400 4900
6 70 60 4200 4900 3600
7 50 50 2500 2500 2500
N=7 1060 560 97700 230400 47600

Putting the values in the required normal equations we have,


x̅ = 1060/7 = 151.43

ӯ = 560/7 = 80

bxy = n∑xy - ∑x∑y


n∑y^2 – (∑y)^2

bxy = 7(97700) – 1060*560


7(47600)-(560)^2

= 683900-593600
333200-313600

= 90300/19600

= 4.61

x -x̅ = bxy ( y - ӯ)

x – 151.43 = 4.61 (y - 80)

x -151.43 = 4.61y -368.80

x = 4.61y -217.37

c) Is there a significant relationship between the sales of ice cream cones and temperature
A negative correlation is a relationship between two variables such that as the value of one
variable increases, the other decreases. Correlation is expressed on a range from +1 to -1, known
as the correlation coefficient. Values below zero express negative correlation. A perfect negative
correlation has a coefficient of -1, indicating that an increase in one variable reliably predicts a
decrease in the other one. A perfect positive correlation, which has a coefficient of +1, indicates
that an increase or decrease in one variable always predicts the same directional change for the
second variable. Lower degrees of correlation are expressed by non-zero coefficients between +1
and -1. Zero indicates a lack of correlation: There is no tendency for the variables to fluctuate in
tandem either positively or negatively.

Here, temperature and sales are two variables so we can find the correlation coefficient and
check the relationship between these variables.

byx = n∑xy - ∑x∑y


n∑x^2 – (∑x)^2

byx = 7(97700) – 1060*560


7(230400)-(1060)^2

= 683900 - 593600
1612800 - 1123600

= 90300/489200

= 0.18

Correlation: √bxy*√byx

= √[( 4.61 ) x ( 0. 18)]

= √ 0.8298

r = 0.91

d) Predict sales of a 95 degree day

We will consider y = 95 to calculate the value of x

x = 4.61y - 217.37

x = 4.61 ( 95) -217.37

x = 437.95 - 217.37

x = 220.58

Therefore, sales of a 95 degree day will be 220.58.

3. According to one of the recent study conducted by an academic researcher


oninternational placement of students from leading institutes in India there is a
highvariation in the salary offered by institutes. The following details have been
gatheredfrom the placement institute of the colleges. The researcher wants to understand
the trendswith regard to international placement based on the data he has gathered.
Amount (in USD in Age Marital Status Type of institute Gender
lakhs per annum)
2 35 Single University Male
5 24 Married PGDM Male
3.5 29 Married University Female
5 26 Single University Male
4 26 Married PGDM Female
8 25 Single PGDM Female
15 34 Married PGDM Male
3 26 Single PGDM Male
7 23 Single PGDM Male
a. Using descriptive statistics explore salary, and identify factors that appear to influence
the amount of the salary received.
b. Do a correlation analysis between ‘Amount’ and ‘Age’ and interpreted the coefficient of
correlation.

Answer: a) Descriptive statistics is used to present thegeneral description of data which is


summarised quantitatively. Thisis mostly useful in clinical research, when communicating the
resultsof experiments.

The data set contains information about the salary, covering a year’s period.The five variables in
the data table are described below:
 Amount: Amount of the salary paid in dollars
 Age: Age of the students in years
 Marital Status: Marital status of the students
 Institute: Type of institute chosen by students
 Gender:Students Gender

The variables are coded with a Continuous, Ordinal or Nominal modeling type. A first step in
any analysis is to ensure that your variables have the correct Modeling Type:
• Continuous variables, like Amount, have numeric values.
• Ordinal variables, such as age, have either numeric or character values which represent
ordered categories.
• Nominal variables, like Gender, can also have either numeric or character values, and
represent unordered categories or labels.

Amount of salary = 2, 5, 3.5, 5, 4, 8, 15, 3, 7


Mean = ∑X/N = (2+5+3.5+5+4+8+15+3+7)/9

= 52.5/9

= 5.83

Median = 2,3,3.5,4,5,5,7,8,15

= (9+1)/2 = 10/2 = 5th term = 5

b) The following are the properties of correlation coefficient.


 Its value always lies between – 1 and 1
 It is not affected by change of origin or change of scale
 It is a relative measure. It does not have any unit attached to it

Factors influencing the size of correlation coefficient


The size of ‘r’ is very much dependent upon the variability of measuredvalues in the correlation
sample. The greater the variability, the higher willbe the correlation, everything else being equal.
The size of ‘r’ is alteredwhen researchers select extreme groups of subjects in order to
comparethese groups with respect to certain behaviors. Selecting extreme groups onone variable
increases the size of ‘r’ over what would be obtained with morerandom sampling.

Combining two groups which differ in their mean values on one of thevariables is not likely to
faithfully represent the true situation as far as the correlation is concerned. Addition of an
extreme casecan lead to changes in the amount of correlation. Dropping of such a caseleads to
reduction in the correlation while the converse is also true.

Amount (in USD) - X Age - Y X^2 Y^2 XY


2 35 4 1225 70
5 24 25 576 120
3.5 29 12.25 841 101.5
5 26 25 676 130
4 26 16 676 104
8 25 64 625 200
15 34 225 1156 510
3 26 9 676 78
7 23 49 529 161
52.5 248 429.25 6980 1474.5

r = N∑XY - ∑X∑Y
√[N∑X^2 – (∑X)^2]*√[N∑Y^2 – (∑Y)^2]

r = 9*1474.5 – 52.5*248
√[9*429.25 – (52.5)^2]*√[9*6980 – (248)^2]
r = 13270.50 - 13020
√(3863.25 – 2756.25)*√(62820 – 61504)

r= 250.50
√(1107)*√(1316)

r = 250.50
33.27*36.28

r = 250.50/1206.99

r = 0.21

Correlation coefficient is 0.21

Interpretation: Correlation coefficient is 0.21 so it is clear that it is positive correlation. A


positive correlation means that if one variable gets bigger, the other variable tends to get bigger.
As the answer is quite low so we can say that age does not affect salary payments much.

You might also like