Basic Statistical Concepts

Definition of Terms

Statistics refer to the science that deals with the collection, tabulation or presentation, analysis, and interpretation of numerical or quantitative data. Collection of data refers to the process of obtaining numerical measurements. Tabulation or presentation of data refers to the organization of data into tables, graphs or charts, so that logical and statistical conclusion can be derived from the collected measurements. Analysis of data pertains to the process of e tracting the given data relevant information from which numerical description can be formulated. !nterpretation of data refers to the tas" of drawing conclusions from the analyzed data. !t also normally involves the formulation of forecasts or prediction about larger groups based on the data collected from small group.

Population and Sample

Population or the universe refers to the collection of all traits under study or under consideration. A small part of this big group is called a sample. # ample of population and sample$ graduate students of %#&'T are an e ample of population while students in the ()A program are a sample. &sing the language of mathematics, the universal set is the population while the subset refers to the sample. *ence, !f the universal set is a set of counting numbers, the set of even numbers is a subset, so with the set of odd numbers. A population can be finite or infinite. The population of a certain school in a particular term is finite while the population consisting of all possible outcomes +heads, tails in successive tosses of coin, is infinite.

Parameter and Statistics

A parameter refers to the numerical characteristic of the population li"e the population mean, population standard deviation, population variance, and many more. !t is usually un"nown and 1

estimated only by a corresponding statistic computed from the sample data. Thus, the population mean is estimated by the sample mean, population standard deviation through the sample standard deviation, the population variance by the sample variance, etc. The mean weight of a sample of 1.. sophomore students selected from the entire population of the sophomore students in a certain high school is a statistics. The mean weight of all students comprising the population is a parameter, which is estimated by the sample mean weight of the sophomore students. /enerally, the characteristics of a population are called parameters, while the characteristics of a sample are called statistics. Below are different symbols for parameters and statistics in most statistical writing:

Characteristics (ean 'tandard Deviation 0ariance )roportion )earson Correlation Coef. %umber of Cases

Parameter +mu, +sigma, ) 1 %


s sp r n

Variables Variable is one of the basic concepts in statistics. !t refers to observable characteristics or phenomena of a person or ob2ect whereby the members of the group or set vary or differ from one another. A variable is a symbol such as 3, 4, 5, a, b, c, etc. which can assume any domain of the variable. !f the variable can assume only one value it is called a constant. +e.g. 6 ,

Discrete and Continuous Variables A variable which can be theoretically assume any value between two given values is called a continuous variable, otherwise it is called a discrete variable. # ample$ the number of houses in a community is a discrete variable6 it can be measure any of the values ., 1, -, 7, etc. but cannot be 1.8, 7.79, 9.:-9, etc.

The weight of an individual, which can be 98.7 "g., 8..8. "g., ;..798 "g., etc depending on the accuracy of measurement, is a continuous variable. !n general, measurement gives rise to continuous data while enumeration or counting gives rise to discrete data.

Dependent and Independent Variables

0ariables can be grouped into dependent and independent variables with respect on their use. Independent variable is used as predictor if the ob2ective is to predict the value of one variable on the basis of the other. Contrary to this, dependent variable means the variable whose value is predicted. To illustrate, if we want to predict or foresee the students< academic achievement in mathematics, we may analyze the different factors such as gender, study habits, intelligence quotient, interest, attitudes, socio6economic status and many more. *ence, the independent variables are gender, study habits, intelligence quotient, interest, attitudes, and socio6economic status. =n the other hand, the dependent variable is the student< academic achievement in mathematics.

ses of Statistics According to Ary and >acobs +1?;:,, statistics is a body of scientific methods for analyzing quantitative data. 'tatistics produces two functions$ +1, they aid the scientist in organizing, summarizing, interpreting and communicating quantitative information obtained from observations and +-, they allow scientist to e trapolate the data to reach tentative conclusions about the larger group from which the smallest group was derived. The statistical procedure dealing with the first function are generally called descriptive statistics +gathering, classification, presentation of data and collection of summarizing values, while the procedures dealing with the second function are called inferential statistics +critical 2udgement and mathematical methods,.

!ypes of Data

'tatistical tools rely on the types of data that are collected. Among the different types are as follows$

Primary and Secondary Data

Primary data refer to information which are gathered directly from the original source or which are based on direct or first hand e perience +e.g. @ autobiographies, diaries, etc.,. Secondary data refer to information which are ta"en from published or unpublished data which are previously gathered by other individuals or agencies +e.g.6 boo"s, magazines, newspapers, etc.,.

"ualitati#e and "uantitati#e Data

Qualitative data are categorized data, which ta"e the form of categories or attributes +e.g. 6 se , year level, religion, etc.,. =n the other hand, quantitative data or numerical data are obtained from measurements +e.g. @ height, weight, ages, scores, etc.,.

$easurement Scales

Aualitative data can be converted to quantitative data through the process called measurements. By measurements, numbers are utilized to code ob2ects in order that they can be treated statistically. There are four types of measurements. They are as follows$

%ominal $easurements. %ominal measurements are used only for identification or classification purposes. # ample$ students numbers, names of boo"s, number of vehicles, etc.

&rdinal $easurements. =rdinal measurements do not only classify items. They also give the order of classes, items or ob2ects. # ample$ first runner6up, second runner6up, third runner6up, etc.

Inter#al $easurements . !n interval measurements, numbers are assigned to the items or ob2ects. They measure the degree of differences between any two classes. # ample$ weight, height, temperature, !A, test scores, etc.

'atio $easurements . Cor ratio measurements, the ratio of the numbers assigned in the measurements shows the ratio in the amount of property being measured. (ultiplication and division 9

have meanings in ratio measurements. # ample$ Boris is 9. years old and (organa is -. years old, then their ages may be e pressed in the ratio -$1 +two is to one,. Sampling !echni(ues

!t is not necessary for the researcher to e amine every member of the population to get data or information about the population. Cost and time constraints will prohibit one from underta"ing a study of the entire population. Sampling techniques are utilized to test the validity of conclusions or inferences from the sample of population.

'andom Sampling. Dhat is random samplingE Random sampling is a method of selecting sample size from a population or universe such that each member of the population has an equal chance of being selected in the sample and all possible combinations of size have an equal chance of being selected as the sample.

Stratified 'andom Sampling. !n this method the population is first divided into groups @ based on homogeneity @ in order to avoid possibility of drawing samples whose members come only from one stratum.

Cluster Sampling. !t is the advantageous procedure when the population is spread out over a wide geographical area. !t is also means as a practical sampling technique used if the complete list of the members of the population is not available. A cluster refers to an intact group which has a common characteristics.

$ethods sed in the Collection of Data

). Direct or Inter#iew $ethod This is a method of person6to6person e change between the interviewer and the interviewee. The following are the advantage of the direct or interview method$


!t can give complete information needed in the study. 8

-. !t can yield inaccurate information since the interviewer can influence the respondent<s answer through his facial e pressions, tone of voice, or wording of the questions. 7. The interviewer may cheat by turning in dishonest responses if their e pected or desired responses are not obtained.

*. Indirect or "uestionnaire $ethod The questionnaire method is one of the easiest methods of data gathering. !n this method, written responses are given to prepared questions. A questionnaire is a list of questions which are intended to elicit answer to the problems of a study. !t should be attractive, includes illustrations, pictures, and s"etches. !ts contents, especially the directions, must be precise, clear, and self6 e planatory. +. 'egistration $ethod This method of gathering information is enforced by certain law. # amples are the registration of births, deaths, motor vehicles, marriages, and licenses. The advantage of this method is that information is "ept systematized and made available to all because of the requirement of the law. ,. &bser#ation $ethod =bservation method is utilized to gather data regarding attitudes, behavior, values, and cultural patterns of the sample under study. !t is usually used when the sub2ects cannot tal" or write. -. ./periment $ethod An e periment is applied to collect data if the investigator wants to control the factors affecting the variable being studied.

$ethods of Presenting Data

Collected data are useless and invalid if they are not presented effectively for analyses and interpretations. Data are presented in four general methods$ F1G te tural method, F-G tabular method, F7G semi6tabular method, and F9G graphical method or presentation.

0re(uency Distribution Dhen the researcher gathers all the needed data, the ne t tas" is to organize and present them with the use of appropriate tables and graphs. Crequency distribution is one system used to facilitate the description of important features of the data.

Class Inter#al or Class 1imits 2 refers to the grouping defined by a lower limit and an upper limit. Class Boundaries 3 if heights are recorded to the nearest inch, the class interval :. @ :theoretically includes all measurements from 8?.8... to :-.8... in. These numbers, indicated briefly by the e act numbers 8?.8 and :-.8, are class boundaries, or the true class limitsH the smaller number F8?.8G is the lower class boundary, and the larger number F:-.8G is the upper class boundary. Class $ar4 2 is the midpoint or middle of a class interval. # ample$ it is obtained by finding the average of the lower class limit and the upper class limit. The class mar" of the class limit 8 @ ? is F8 I ?GJ- or ;. Class Si5e 3 refers to the difference between the upper class boundary and the lower class boundary of a class interval. Class 0re(uency 2 means the number of observation belonging to a class interval.

6raphical Presentation of Data 7istogram 6 is made up of vertical bars that are 2oined together, ma"ing an appropriate graph for continuous data. The base of each bar or rectangle is equal to the class boundaries, wherein height corresponding to its class frequency. 0re(uency Polygon @ is commonly called linear graph. !t is very useful device to show changes in values over successive periods of time. An advantage of the frequency distribution is that it can be used to compare two or more distributions graphically on one pair of a es. Bar 6raph @ is used to represent discrete data, where the bars are separated. The length of each bar is arbitrary. *owever, the bars must be of the same width. Thus, the bar graph is almost li"e as the histogram, the only difference is that the bars of the histogram are 2oined. Pie Diagram or Pie Chart @ is used to show percentage distribution. !t is made up a circle subdivided into sectors proportional in size to the quantities or percentages they represent.

!ypes of 0re(uency Cur#es 1. The symmetrical or bell2shaped frequency curves, frequency curves are characterized by the fact that observations equidistant from the central ma imum have the same frequency. An important e ample is the normal curve. -. !n 82shaped and re#ersed 82shaped shaped frequency curves, a ma imum occurs at the end. 7. !n the moderately asymmetrical or s4ewed frequency curves, the tail of the curve to one side of the central ma imum is longer than that to the other. !f the longer tail occurs to the right, the curve is said to be s"ewed to the right or have positive s"ewness, while if the reverse is true, the curve is said to be s"ewed to the left or have negative e s"ewness. 9. A 2shaped frequency curve has ma ima at both ends. 8. A bimodal frequency curve has two ma ima. :. A multi2modal frequency curve has more than two ma ima.