You are on page 1of 8

Module 4.

Data Management

INTRODUCTION

 Whatever exists at all exists in some amount… and whatever exists in some amount can be measured. –
Edward L. Thorndike (1874- 1949)
                
Welcome to this adventure!

LEARNING COMPASS

At the end of the unit, the students will be able to:

1. use variety of statistical tools to process and manage numerical data;


2. use the methods of linear regression and correlations to predict the value of a variable given certain
conditions; and
3. advocate the use of statistical data in making important decisions.

Let’s Begin

Watch the video clip using this link: https://www.youtube.com/watch?v=Y8YRCXU_Kps


and then write at least ten highlights based on the video presented.

1. Statistics is the study of the collection, analysis, interpretation, presentation, and organization of
data. 
2. Data is a set of values of quantitative or qualitative variables.
3. One way of organizing discrete data into a more useful format is a frequency table.
4. A frequency table organizes the data by counting the number of occurrences (the frequency) of
each possible outcome.
5. Often, the ‘probability’ column will be labeled relative frequency.
6. If we prefer, we can turn a frequency or a relative frequency into a bar graph.
7. If the data from a frequency or relative frequency table is continuous rather than discrete, we
construct the ‘bars’ with no space between them and call the resulting graph a histogram instead of a
bar graph.
8. If you are counting things, the counts a discrete. But if you are measuring a person’s height, the
values sort of bleed into each other, hence it is continuous.
9. A stem-and-leaf plot is another visual way to display data. In constructing a stem-and-leaf display,
we view each number as having two parts. The left digit is considered the stem and the right digit is
the leaf.
10. We can compare these data by placing the two displays side by side. Some people call this a back-
to-back stem-and-leaf plot.

We’re on Our Way


Examining and analyzing the highlights written. 
The study of data collection, analysis, interpretation, presentation, and organization is known as statistics. Data
is a collection of numerical or qualitative values. A value is discrete if it is countable. A value is continuous if it is
not countable (e.g., measurements). There are five organization and visual methods discussed: Frequency
table- organizes data by measuring the number of times each conceivable outcome occurs. Bar graph-
numerical values of variables are represented by the height or length of lines or rectangles of equal width.
Relative frequency table- when a table shows relative frequencies for different categories of a categorical
variable. Histogram- are used when the values of the data are continuous. Stem-and-leaf plot/display- we view
each number as having two parts. The left digit is considered the stem and the right digit is the leaf.

Let’s Dig Deeper

Interactive Lecture Discussion on the “Data Management” using these links.   


https://www.regent.edu/app/uploads/2019/01/ML-Math-201-The-Normal-Probability-Distribution.pdf ;
https://sphweb.bumc.bu.edu/otlt/mph-
modules/bs/bs704_multivariable/bs704_multivariable5.html;https://www.bmj.com/about-bmj/resources-
readers/publications/statistics-square-one/11-correlation-and-regression

How far have we gone?

After the thorough discussion, please solve the following questions: 

1. The following data give the weight (in pounds) lost by 9 new members of a health club at the end of
their first 3 months of membership. Complete the table and compute the variance and standard deviation.
x X2
13 169
14 196
17 289
20 400
25 625
18 324
16 256
15 225
5 25
∑x=143 ∑x2=2509

Mean= 143
8
= 15.89

Variance= s2= 2509-2272.11


8
  = 29.61
Standard Deviation= √29.61
= 5.44

2. Consider the sample 13,19,18,20,16,9,10,7, and 8. Find the first quartile, second quartile, and the third
quartile.
First Quartile Q1 = 8.5
Second Quartile Q2 = 13
Third Quartile Q3 = 18.5

3. The life expectancy of a particular car battery is 24 months with a standard deviation of 2 months.
What is the z- score if a particular car battery lasted for only 20 months? 
- z= 20-24 = -4 = -2 P= (z<-2) = 1- P(z<2) = 1-0.9772= o.0228 or 2.28%
2 2
4. Suppose a score on an aptitude test is normally distributed about a mean of 60 with a standard
deviation of 18. What is the z-score of a test result of 48?
- z= 48-60 =-12 = -0.67 P= (z<-0.67) = 1- P(z<0.67) = 1-0.7486= 0.2514 or 25.14%
18 18
5. Find the area under the standard normal distribution curve.

1. Between z=0 and z = 1.33 P= (0<z<1.33) = 0.9082-0.5= 0.4082 or 40.82%


2. To the right of z= 1.37 P= (z<1.37) = 0.0853 or 8.53%
3. Between z= 0.46 and z = 2.56 P= (0.46<z<2.56) = 0.9948- 0.6772 = 0.3176 or 31.76%
4. Between z= 0 and z = -2.03 P= (-2.03<z<0) = 0.5-0.0212= 0.4788 or 47.88%
5. To the left of z = 2.25 P= (z<2.25) = 0.9878 or 98.78%

Walk the Extra Mile

Assessing my understanding.
1. SJS Company has been selling to retail customers in the Metro Manila area.  They advertise extensively
on radio, print ads, and in the internet. The owner would like to review the relationship between the
amount spent on advertising expense (in Php0s) and sales in (Php0s). Below is the information on
advertising expense and sales for the last 9 months.
Month Jan Feb Mar Apr May June July Aug Sept
Advertising 10 8 12 11 13 15 14 13 16
Expense
Sales 190 215 190 210 235 208 170 175 250
Revenue
Step 1: State the hypotheses.
H0: ρ = 0
H1: ρ ≠ 0
Step 2: The level of significance and critical region. α= 0.05 and t critical: ±2.306

Step 3: Compute the table and compute for the value of r and t after completing the table below.

Month x y X2 Y2 xy
Jan 10 190 100 36100 1900
Feb 8 215 64 46225 1720
Mar 12 190 144 36100 2280
Apr 11 210 121 44100 2310
May 13 235 169 55225 3055
Jun 15 208 225 43264 3120
Jul 14 170 196 28900 2380
Aug 13 175 169 30625 2275
Sept 16 250 256 62500 4000

Result Details & Calculation:


X Values
∑ = 112
Mean = 12.444
∑ (X – Mx)2 = 50.222

Y Values
∑ = 1843
Mean = 204.778
∑ (Y – My)2 = 5633.556

X and Y Combined
N=9
∑ (X – Mx) (Y – My) = 104.889

R Calculation
r = ∑ [(X – Mx) (Y – My)] / √ ∑ (X-Mx)2 ∑ (Y-My)2
r = 104.889 / √ (50.222) (5633.556) = 0.1972

T Calculation
t= r- ρ/ √1-r2/n-2
t= 0.20-0/√1-0.20/9-2= 1.44

Step 5: Decision rule: Accept null hypothesis. Since the test value does not fall in the critical region or the
absolute value of the t-value is less than the critical value.

Step 6: Conclusion: There is no significant relationship between the amount spent on advertising expense and
sales revenue.
A rate analyst for Meralco, was asked to determine if there is a linear relationship between electrical
consumption and the number of rooms in a single –family dwelling. Since electricity consumption varies from
month to month, he decided to study usage during the month of March. He collected the following data.

No. of 6 10 8 7 11 5 4 3 3 6
Rooms(x)
Kilowatt- 3.5 14 7 4 12 3 2 1 1.5 6
hours (y)
Determine the regression equation. Solve for the standard error of estimate, and coefficient of
determination.
Step 1: Complete the table.

x y X2 Y2 xy

6 3.5 36 12.25 21 3.61 5.18 2.82


10 14 100 196 140 73.96 11.02 8.89
8 7 64 49 56 2.56 7.98 0.96
7 4 49 16 28 1.96 6.46 6.05
11 12 22 144 132 43.56 12.54 0.29
5 3 25 9 15 5.76 3.42 0.18
4 2 16 4 8 11.56 1.9 0.01
3 1 9 1 3 19.36 0.38 0.38
3 1.5 9 2.25 4.5 15.21 0.38 1.25
6 6 36 36 36 0.36 5.18 0.67

Step 2: Solve the following.

Σ x=__63___ Σy =__54___ =__6.3__

Σ x2 =__336___ Σy2 =_469.5_ =___5.4__

Σ ( =__177.9__ Σ =__21.5__

Slope ( b1):__1.52__ Intercept ( b0):__-4.18__

Simple Linear Regression Equation:


- Simple linear regression line: ŷ = b0 + b1x
- Regression coefficient = b1 = Σ [ (xi - x) (yi- y)] / Σ [ (xi - x)2]
- Regression slope intercept = b0 = y - b1 x
- Regression coefficient = b1 = r (sy / sx)
- Standard error of regression slope = sb1= sqrt [ Σ (yi – ŷi)2 / (n - 2)] / sqrt [ Σ(xi - x)2 ]
You may copy the formula below in writing your regression equation.
EXTENSION OF LEARNING  

Extension of Learning 1: Research at least one problem on Correlation.

- Researchers interested in determining if there is a relationship between death anxiety and religiosity
conducted the following study. Subjects completed a death anxiety scale (high score = high anxiety) and
also completed a checklist designed to measure an individuals degree of religiosity (belief in a particular
religion, regular attendance at religious services, number of times per week they regularly pray, etc.) (high
score = greater religiosity . A data sample is provided below:

Death Anxiety Religiosity

38 4

42 3

29 11

31 5

28 9

15 6

24 14

17 9

19 10

11 15

8 19

19 17

3 10

14 14

6 18

What is your computed answer?


What does this statistic mean concerning the relationship between death anxiety and religiosity?
What percent of the variability is accounted for by the relation of these two variables?

Extension of Learning 2: Research at least one problem on Regression.    

- It is hypothesized that there are fluctuations in norepinephrine (NE) levels which accompany fluctuations in
affect with bipolar affective disorder (manic-depressive illness). Thus, during depressive states, NE levels
drop; during manic states, NE levels increase. To test this relationship, researchers measured the level of NE
by measuring the metabolite 3-methoxy-4-hydroxyphenylglycol (MHPG in micro gram per 24 hour) in the
patient's urine experiencing varying levels of mania/depression. Increased levels of MHPG are correlated
with increased metabolism (thus higher levels) of central nervous system NE. Levels of mania/depression
were also recorded on a scale with a low score indicating increased mania and a high score increased
depression. The data is provided below.

MHPG Affect

980 22

1209 26

1403 8

1950 10

1814 5

1280 19

1073 26

1066 12

880 23

776 28

Compute the correlation coefficient.


What does this statistic mean concerning the relationship between MHPG levels and affect?
What percent of the variability is accounted for by the relationship between the two variables?
What would be the slope and y-intercept for a regression line based on this data?
What would be the predicted affect score if the individual had an MHPG level of 1100? of 950? of 700?

REFERENCES

Offline Source: Sirug, Winston S. Mathematics in the Modern World: A CHED General Education Curriculum
Compliant, Mindshapers Company, Incorporated; 2018 

Online Sources:
(n.d.). Retrieved October 03, 2020, from https://www.youtube.com/watch?v=Y8YRCXU_Kp s

(n.d.). Retrieved October 03, 2020, from https://www.regent.edu/app/uploads/2019/01/ML-Math-201-The-


Normal-Probability-Distribution.pdf
(n.d.). Retrieved October 03, 2020, fromhttps://sphweb.bumc.bu.edu/otlt/mph-
modules/bs/bs704_multivariable/bs704_multivariable5.html;https://www.bmj.com/about-bmj/resources-
readers/publications/statistics-square-one/11-correlation-and-regression

You might also like