You are on page 1of 45

Definition of Statistics

Introduction Statistics is a set of tools used to organize and analyze data. Data must either be numeric in origin or transformed by researchers into numbers. For instance, statistics could be used to analyze percentage scores English students receive on a grammar test: the percentage scores ranging from 0 to 100 are already in numeric form. Statistics could also be used to analyze grades on an essay by assigning numeric values to the letter grades, e.g., A=4, B=3, C=2, D=1 and F=0. Employing statistics serves two purposes, (1) description and (2) prediction. Statistics are used to describe the characteristics of groups. These characteristics are referred to as variables..

STATISTICS Is the science of the collection, organization, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments. A branch of applied mathematics concerned with the collection and interpretation of quantitative data and the use of probability theory to estimate population parameters Statistics is the science of making effective use of numerical data relating to groups of individuals or experiments Practice of collecting, organizing, describing, and analyzing data to draw conclusions from the data to apply to a cause.

Importance of Statistics
In Computer Engineering: To make systems(programs) with no flaws for ISO certifications and process verifications It is used to study repetitive operations in order to set standards. For data interpretation that is frequently seen in computer engineering. Example: They will make a program for data interpretation that is easy to use for people in the same time has no flaws.

In Astronomy: Astronomy is one of the oldest branch of statistical study, it deals with the measurement of distance, sizes, masses and densities of heavenly bodies by means of observations. During these measurements errors are unavoidable so most probable measurements are founded by using statistical methods. Example: An astronomer found 10 heavenly bodies that is the same look. And he studied the 10 and 8 out of 10 of the heavenly bodies were of the same characteristics.

In Mathematics: Statistics is branch of applied mathematics. The large number of statistical methods like probability averages, dispersions, estimation etc. is used in mathematics and different techniques of pure mathematics like integration, differentiation and algebra are used in statistics. Example: There are problems in math that needs analization, interpretation, and explanation to solved the problems.

In Economics: Statistics play an important role in economics. In economics research statistical methods are used for collecting and analysis the data and testing hypothesis. The relationship between supply and demands is studies by statistical methods, the imports and exports, the inflation rate, the per capita income are the problems which require good knowledge of statistics. Example: A sole proprietor wants to know what product is the most demanded today and the cost of that said product.

In Health and Medicine: edical Statistics deals with the application of biostatistics to medicine and the health sciences, including epidemiology, public health, forensic medicine, and clinical research. Find data on indicators of the nation s health, such as health inequalities, mobility rates, smoking drinking and drug use, and abortion statistics. Example: In pharmacological research, statistics is used to summarize (descriptive statistics) experimental data in terms of central tendency (mean or median) and variance (standard deviation, standard error of the mean, confidence interval or range) but more importantly it enables us to conduct hypothesis testing.

Basic Statistical Terms


VARIABLE - is a quantity that may assume any of set of values. Examples are monthly income, average grade, volume, price and so forth. CONSTANT - is a quantity that does not change its value. Example: The mathematical symbol (Greek alphabet pi), is a constant because its value does not change which is always equal to 3.1416... Likewise, the equivalence of 2.54 centimeters to an inch is a constant.

TYPES OF DATA UNGROUPED (or RAW) DATA -which are not organized in any specific way. They are simply the collection of data as they are gathered. While some computations may be made for this kind of data, analysis and interpretation. INTERVALS Lower Upper 1.00 1.50 1.50 2.00 2.00 2.50 NON-CUMUL. Abs. Rel. 24 .169 24 .169 21 .148 CUMULATIVE Abs. Rel. 24 .169 48 .338 69 .486

GROUPED DATA - are raw data organized into groups of categories with corresponding frequencies. Organized in this manner, the data is referred to as a Frequency Distribution.

Example: The information regarding the no of children per family is given in the following table. No. of children : No. of families : 0 3 1 20 2 15 3 8 4 3 5 1

PRIMARY DATA - are measured and gathered by the researcher that published it. SECONDARY DATA - are republished by another researcher or agency. POPULATION - is the entire collection of all possible observations of a particular characteristic of interest. Example: the population of the grades of all students who took an entrance examination; the monthly incomes of all employees in the government. SAMPLE - is a representative set of observations that reflect the characteristic of the whole, that is, the population from which it is taken.

PARAMETER - is any statistical characteristic of a population. Example: the Mean and the Standard Deviation. Thus, we say the population Mean is a parameter of the population. STATISTIC - is any statistical characteristic of a sample such as the Mean and the Standard Deviation also. The Sample Mean is a sample statistic. FREQUENCY DISTRIBUTION - is a tabulation of the values that one or more variables take in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way the table summarizes the distribution of values in the sample.

For example, the heights of the students in a class could be organized into the following frequency table. Height range Number of student Cumulative number 4.5 5.0 feet 25 25 5.0 5.5 feet 35 60 5.5 6 feet 20 80 6.0 6.5 feet 20 100

LEVELS / SCALES OF MEASUREMENT Statistical operations on numerical values depend upon the nature of such values. Numerical values may be categorized by levels of measurement namely, nominal, ordinal, interval and ratio (Seigel, Castillian, 1988). NOMINAL LEVEL - is the crudest form of measurement. The number of symbols is used for the purpose of categorizing forms into groups. The categories are mutually exclusive, that is being one category automatically excludes another. Example: Sex: M Male F - Female

Faculty Tenure: 1 Tenured

ORDINAL LEVEL - is a sort of improvement of nominal level. Data are ranked from "bottom to top" or "low to high" manner statements of the kind "greater than" or "less than" maybe made here Example: Class Standing: (Excellent, Good, Poor) Teacher's Evaluation: 1 - Poor 2 - Fair 3 - Good 4 - Very Good

INTERVAL LEVEL - possesses the properties of the nominal and ordinal levels. The distances between any two numbers on the scale are known and it does not have a stable starting point. Example: Consider the I.Q. scores of four students, 90, 150, 85 and 145. Here we can say that the difference between 90 and 150 is the same as the difference between 85 and 145, but we cannot claim that the second student is twice as intelligent as the first. RATIO LEVEL - possesses all the properties of the nominal, ordinal and interval levels, in addition, this has an absolute zero point. Data can be classified and be placed in a proper order. We can compare the magnitudes of these data. Example: Age, income, exam scores, performance, ratings, grades of students and tuition fees as examples of ratio variables.

Sampling Techniques
Sampling Technique or Sampling Plan Is the procedure of gathering sampling units from the population. Is the method of selecting a sample size (n) from a universe (N) such that each member of the population has an equal chance of being included in the sample and all possible combinations of size (n) have an equal chance of being selected as the sample. Technique of drawing sample from the population. Sampling is being applied once that the entire elements of the population is not available or the population size is too large.

A. Probability Sampling Technique is a sampling technique wherein each of the population unit has an equal chance of being drawn or being selected as members of the sample. is a sampling technique in which the probability of getting any particular sample may be calculated 1. Random Sampling Is a basic type of probability sampling. Using this technique, each individual in the population has an equal chance of being drawn into the sample. Is the method of selecting a sample size (n) from a universe (N) such that each member of the population has an equal chance of being included in the sample and all possible combinations of size (n) have an equal chance of being selected as the sample.

1.1. Lottery Sampling or Raffle Sampling Assigning numbers to each member of the population usually carries out the lottery sampling method. the items are placed in a container. All are thoroughly mixed, and elements are drawn as needed. The lottery sampling method is usually carried out by assigning numbers to each member of the population. 1.2. Table of Random Number The selection of each member of the population is left adequately to chance, and every member of the population has an equal chance of being chosen. 1.2.1 Direct Selection Method Is used when there are only few sample units to be selected.

1.2.2. Remainder Method Is used whenever the direct selection method cannot be applied. There are two ways of conducting the remainder method: 1. When the number taken from the Table of Random Numbers is subtracted from the upper limit within which this number falls, the remainder is the sample unit. 2. When the upper limit of the set is subtracted from the number taken from the Random Table and yields a number equal or less than N, the remainder is the sample unit.

2. Systematic Sampling Used to select the members of the sample from a large population. Picking every nth element of the population as a member of the sample when using this method. The most common form of systematic sampling is an equal-probability method, in which every kth element in the frame is selected, where k, the sampling interval (sometimes known as the skip), is calculated as:

where n is the sample size, and N is the population size.

3. Stratified Sampling In this technique, the set of interest is divided into groups or aggregates from which the actual sampling is done. In this method, the population is subdivided into at least two different subpopulations (or strata) that share the same characteristics and then the elements of the sample are drawn from its stratum proportionately. Determining sample size
SLOVIN s FORMULA

n = ___N____ 1+Ne^2 where: n = sample size e = margin of error N = population

4. Cluster or Area Sampling Is a sampling wherein group or clusters instead of individuals are randomly chosen This involves dividing the population into nonoverlapping clusters

4. Cluster or Area Sampling Is a sampling wherein group or clusters instead of individuals are randomly chosen This involves dividing the population into nonoverlapping clusters

B. Non-Probability Sampling Technique Is a sampling technique wherein the sample units do not have equal chances of being drawn is a sampling technique wherein members of the sample are drawn from the population based on the judgment of the researchers Examples of non-probability sampling include: a. Convenience or Haphazard- members of the population are chosen based on their relative ease of access. To sample friends, co-workers, or shoppers at a single mall, are all examples of convenience. b. Snowball Sampling - The first respondent refers a friend. The friend also refers a friend, etc.

c. Judgmental sampling or Purposive sampling The researcher chooses the sample based on who they think would be appropriate for the study. This is used primarily when there is a limited number of people that have expertise in the area being researched. d. Deviant Case Get cases that substantially differ from the dominant pattern (a special type of purposive sample). e. Case study The research is limited to one group, often with a similar characteristic or of small size. f. ad hoc quotas A quota is established (say 65% women) and researchers are free to choose any respondent they wish as long as the quota is met.

Collection of Data
Any statistical investigation must necessarily be based on accurate data. In order to ensure the accuracy of data , one must know the right sources and methods of collecting them. 1.Primary data Refer to information which are gathered directly from an original source, or which are based on direct or first-hand experience. Examples: First-person accounts, autobiographies, and diaries.

2.Secondary data Refer to information which are taken from published or unpublished data which were previously gathered by other individuals or agencies. Examples: Published books, newspapers, magazines, biographies, business reports, and the like.

Methods used in the collection of data The direct or interview method This is a method of person-to-person exchange between the interviewer and the interviewee. The interview method provides consistent and more precise information since clarification may be given by the interviewee. The indirect or questionnaire method Written responses are given to prepared questions. A questionnaire is a list of questions which are intended to elicit answers to the problems of a study. This method is inexpensive and can cover a wide area in a shorter span of time.

The registration method This method of gathering information is enforced by certain laws. Examples are the registration of births, deaths, motor vehicles, marriages, and licenses. The advantage of this method is that information is kept systematized. The observation method In this method, the investigator observes the behavior of persons or organizations and their outcomes. It is usually used when the subjects cannot talk or write The experiment method This method is used when the objective is to determine the cause and effect relationship of certain phenomena under controlled conditions. Scientific researchers usually use the experiment method.

Types of questions Structured question This is a type of question that leaves only one way or few alternative ways of answering it. Here are some examples of this type of question. Unstructured or open-ended questions As the name suggests, there are questions which can be answered in many ways. Probing questions or questions that want to elicit reasons are normally of this type.

Presenting a Data
Textual form Presenting a data in a paragraph form. Example: There are 100 students enrolled in the college of education. 54 students came from the English major. 25 students came from mathematics major. And lastly 25 students came from Filipino major.

Tabular from The data are places in a table. Example: Age Interval(yrs) 15-19 20-24 25-29 30-34

Frequency 13 15 20 10

Graphical presentation The data are presented in a diagrammatic form. The graphical representation of data makes the reading more interesting, less time-consuming and easily understandable. Kinds of Graphical presentation: Pie Chart A pie chart is a way of summarizing a set of categorical data. It is a circle which is divided into segments. Each segment represents a particular category.

Bar Chart A bar chart is a way of summarizing a set of categorical data. It is often used in exploratory data analysis to illustrate the major features of the distribution of the data in a convenient form. It displays the data using a number of rectangles, of the same width, each of which represents a particular category. The length (and hence area) of each rectangle is proportional to the number of cases in the category it represents, for example, age group, religious affiliation.

Histogram A histogram is a way of summarizing data that are measured on an interval scale (either discrete or continuous). It is often used in exploratory data analysis to illustrate the major features of the distribution of the data in a convenient form. It divides up the range of possible values in a data set into classes or groups.

Frequency distribution Frequency distribution is a tabulation of the values that one or more variables take in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way the table summarizes the distribution of values in the sample. Raw data is a term for data collected on source which has not been subjected to processing or any other manipulation, it is also known as primary data. An array is a systematic arrangement of objects, usually in rows and columns.

The range of a sample (or a data set) is a measure of the spread or the dispersion of the observations. It is the difference between the largest and the smallest observed value of some quantitative characteristic and is very easy to calculate. The average of the values of the class limits for a given class. A class mark is also called a mid-value or central value and usually denoted by x. When data are continuous, class boundary are used. We can get the class boundary when we subtract .5 by the lover limit and adding .5 by the upper limit.

Frequency polygons are a graphical device for understanding the shapes of distributions. They serve the same purpose as histograms, but are especially helpful in comparing sets of data. Frequency polygons are also a good choice for displaying cumulative frequency distributions. To create a frequency polygon, start just as for histograms, by choosing a class interval. Then draw an X-axis representing the values of the scores in your data. Mark the middle of each class interval with a tick mark, and label it with the middle value represented by the class. Draw the Y-axis to indicate the frequency of each class. Place a point in the middle of each class interval at the height corresponding to its frequency. Finally, connect the points. You should include one class interval below the lowest value in your data and one above the highest value.

Cumulative Frequency Corresponding to a particular value is the sum of all the frequencies up to and including that value.

Ogive (a cumulative line graph) Is best used when you want to display the total at any given time. The relative slopes from point to point will indicate greater or lesser increases.

Summation Notation
Is the operation of combining a sequence of numbers using addition; be summed may be integers, rational, number, real numbers, or complex numbers. Besides numbers, other types of values can be added as well: vectors, matrices, polynomials, and in general elements of any additive group.

EXAMPLES: 4 (2+i^2) = (2+1^2) + (2+2^2) + (2+3^2) +(2+4^2) i=1 = 3+6+11+18 = 40

6 (2) = 2+2+2+2+2+2 i=1 =12 5 (5-i) = (5-1) + (5-2) + (5-3) + (5-4) + (5-5) i=1 = 4 + 3 + 2 +1 + 0 = 10

You might also like