You are on page 1of 38

DATA MANAGEMENT

SCIENCE AND MATHEMATICS DEPARTMENT


COLLEGE OF EDUCATION, ARTS AND SCIENCES
LEARNING OUTCOMES:
- Use a variety of statistical tools to process and manage numerical
data.
- Use methods of linear regression and correlations to predict the
value of a variable given certain conditions.
- Advocate the use of statistical data in making important decisions.
•Data Management is application of methods for organizing and
analyzing large amounts of information; solve problems
involving probability and statistics; and carry out a culminating
investigation that integrates statistical concepts and skills.

• Statistics is a branch of Mathematics that examines and


investigates ways to process and analyze the data
gathered. It provides procedure in the data collection,
presentation, organization, and interpretation to have
meaningful idea that is useful to decision-makers.
FREQUENCY DISTRIBUTION
FREQUENCY DISTRIBUTION

It is a group of data into categories showing the number of


observations in each of the non-overlapping classes
•Class interval - set of numbers defined by the lower
and upper limits of a class relative to the class size

•Class boundary - composed of the lower class


boundary and upper class boundary. It is a number
halfway between the upper class limit of one class
the the lower limit of the next class interval.

•Class mark - middle value of each class interval


Steps in creating a frequency distribution in excel:
1. Put the data in column.
2. Create a table with the columns- class limits, class boundaries,
lower limit, upper limit, frequency and class mark.
3. Group the data according to given.
4. To calculate the frequency, highlight the frequency column, type
“ =frequency(data array,bins array) ” and press CTRL SHIFT
ENTER.
EXAMPLE 1. Put the data in column. 2. Create a table with the columns- class limits, class
boundaries, lower limit, upper limit, frequency and class mark.
The latest scores of the
students of National
University who took a
Mathematics test are listed
below:
92     78     73     89    
98     89     83     75     83   
100
69 71 96 67 81
73 88 86 82 94

Construct a frequency
distribution with a class
size of 7.
3. Highlight the frequency column 4. Type “ = frequency(data array,bins array) ”
5. SELECT RANGE A2:A21 THEN TYPE ”,”
6. SELECT RANGE D2:D6 THEN CLICK CTRL SHIFT ENTER
If it is entered correctly, you would
see formula wrapped in
curly braces { }
7. FOR THE CLASS MARK, TYPE IN THE COLUMN “ =AVERAGE(C2,D2) AND ENTER

8. YOU MAY DRAG DOWN THE + BUTTON TO FIND THE OTHER CLASS MARK
GRAPHICAL REPRESENTATION OF
FREQUENCY DISTRIBUTION
A FREQUENCY DISTRIBUTION CAN BE PRESENTED
GRAPHICALLY USING ANY OF THE FOLLOWING

Frequency polygon is a line graph where the frequency of Histogram is a graph where rectangular bars are used
each class interval is plotted against the corresponding class to present the frequency distribution. The horizontal
mark. The horizontal axis contains the class marks, while axis contains the class boundaries, while the vertical
the vertical axis represents the frequency.  axis the frequencies.
1. Highlight the column that contains the class mark for
each class and the column that contains the frequencies.

2. In the menu bar, choose insert then chart and


select line. 
3. Right click on the graph
and choose select data. 
4. ON THE
SERIES PART,
REMOVE THE
CLASS MARK.
In the horizontal category
(X) axis labels, click the icon.
Drag the column that
contains the class mark and
click ok
Frequency Polygon
6

0
70 77 84 91 98
1. Highlight the column that contains the class
boundaries for each class and the column that
contains the frequencies.

2. In the menu bar, choose insert then chart


and choose histogram. 
Select change chart type and choose clustered column.
Histogram
6

0
66.5-73.5 73.5-80.5 80.5-87.5 87.5-94.5 94.5-101.5
CORRELATION AND REGRESSION
• Correlation refers to relationship between two variables.
• Scatter plot/scatter diagram - the graphical representation of bivariate data

Positive correlation Negative correlation No correlation


PEARSON’S PRODUCT -MOMENT CORRELATION COEFFICIENT

Pearson’s product-moment correlation coefficient is used to measure the linear


relationship between two variables that are normally distributed. It is denoted
by r.

To interpret the value of correlation coefficient, we can use the table below.
EXAMPLE
The head of production department wanted to determine if there is a
relationship between the number of workers who produce canned goods and
the number of canned goods produced per day.
Steps in computing correlation in excel:
2. In the dialog box, type
1. Put and label data in each “=correl”
column.
3. Select the range A2:A9 then 4. Select the range B2:B9 then
type a “ , ” enter
The number of workers and number of produced canned goods
indicates a very high positive correlation.

5. The correlation coefficient (r) will appear


in the cell you selected.
Steps in creating a scatterplot of correlation data in excel:

1.Highlight your data


2. From the insert tab, select the scatterplot icon and
select the scatter.
Scatterplot
350

300

250

200

150

100

50

0
8 10 12 14 16 18 20 22

A scatterplot should appear on your spreadsheet. 


REGRESSION
• Regression analysis is a simple statistical tool used to model the dependence of
a variable on one (or more) explanatory variables.
• Least square regression equation is an equation that is used to predict the value
of the dependent variable based on the value of the independent variable.
• Least square regression line is the graphical representation of the least square
regression equation and can be used to determine the approximate value of the
dependent variable based on the value of the independent variable given in the
scatter plot.
• Coefficient of determination ( ) is used to determine how well the least square
regression line fits the sample data.
STEPS IN CREATING REGRESSION EQUATION IN EXCEL:

Scatterplot
350

300

250

200

150

100

50

0
8 10 12 14 16 18 20 22

1. Insert a scatterplot graph into a blank sheet.


2. Select the x-axis and y-axis data. 3. Right-click on any of the dots and select “Add
Trendline” from the menu.
4. Select Display Equation on chart and Display
R-squared value on chart on the chart boxes
•Thus, the Least Square Regression Equation is y = 18.872x- Scatterplot
59.952 and approximately it will be 95.79% that the least square 350

regression equation does a good job in predicting the produced 300 f(x) = 18.87 x − 59.95
canned. R² = 0.96
250

200

150

100

50

0
8 10 12 14 16 18 20 22
To determine the expected produced canned goods if there are 75 and 89 workers, let us
solve for y in the least square regression equation when x = 75 and x = 89.

a. x = 75
y= -59.95 + 18.87(75)
y=1355
b.x = 89
y= -59.95 + 18.87(89)
y=1619
Thus, when there are 75 workers, the produced canned goods will be 1355. When there are
89 workers, the produced canned goods will be 1619.

You might also like