This action might not be possible to undo. Are you sure you want to continue?
Statistics is a scientific method of collecting, organizing, summarizing, presenting, analyzing and interpreting data. Valid conclusion and making reasonable decision on the basis of such analysis is drawn. Statistics (in plural form) refer to any set of quantitative data or classified numerical record.

Historical Overview of Statistics
The term statistics came from the Latin word statista which means state. Statistics in the early days was widely used for the purposes of governing the state such as the figures on the geographical areas conquered and the number of soldiers killed in the battlefield and for purposes of taxation Statistics developed as a science due to man s propensity for gambling. Mathematicians were consulted by gamblers to explain the laws of chance concerning the occurrences of events in the games of chance that led to the early development of probability, on which statistics rests. Individuals who shaped statistics today: y Achenwall The first to introduce the word statistiks in a preface to a statistical work. Zimmerman and Sinclair Introduced and popularized the name statistics in their books. Girotamo Cardano An Italian mathematician who wrote Liber de Ludo Aleae where the first study of the principles of probability appeared Blaise Pascal Worked on the Game of Points that marked the beginning of the mathematics of probability De Moivre Discovered the equation for normal distribution Adolf Quetelet A Belgaim astronomer who applied the theory of probability to anthropology psychology and education.
y y
y
y y
y
y
y
Francis Galton Developed the use of percentiles and worked with Charled Darwin in the application of statistics to heredity and correlation theory. Karl Pearson Worked with Galton to develop regression and correlation theory and sampling theory Ronald Fisher Introduced the Fisher s test used in the analysis of variance.
Two Phases of Statistics:
1. Descriptive Statistics seeks only to describe and analyze a given group without drawing conclusion or inference about a larger group 2. Inferential Statistics seeks only to draw conclusion or inference about the larger group based on the sample subset of the larger group.
Defenition of Terms:
It is important to know some terminologies that will be used in the study of statistics. 1. Data and Information Data is a set of observation, values and elements under investigation Information is data that has been collected and processed into meaningful form. 1.1 Qualitative Data and Quantitative Data Qualitative Data refers to the categorical or attributes of information that can be classified by some criterion. Quantitative Data refers to numerical information. 1.1.1 Discrete Data and Continuous Data Discrete Data are obtained by observing values of a discrete variable. Continuous Data are obtained by observing values of a continuous variable 2. Constant and Variable Constant is an attribute that remains the same or does not vary Variable is a characteristic that varies from one person or thing to another.
2.1 Qualitative Variable and Quantitative Variable Qualitative Variable is a non numerically valued variable. Quantitative Variable is a numerically valued variable. 2.1.1 Discrete Variable and Continuous Variable Discrete Variable is a quantitative variable where the possible values form a finite (or countably infinite) set of numbers. Continuous Variable is a quantitative variable whose possible values form some interval of numbers. 3. Population and Sample Population is the collection of all individuals, objects, items, places, events or data under Consideration in a statistical study Sample is the portion or representative part of the population chosen for study. 4. Measurement It is a process of assigning values or score to persons or objects. 4.1 Nominal scale assigns number or other symbols to persons or objects to be used mainly for identification and classification purposes. e.g. gender, religion, socioeconomic status. 4.2 Ordinal scale places measurements into categories each category indicating different level of some attributes that is being measured. Categories can be ordered or distance between categories is undetermined. e.g. academic ranks, school ranks. 4.3 Interval scale is the distance between any two different numbers in the scale of Known size. It does not always have a meaningful zero point. A zero point is a point that indicates the absences of what we are measuring. e.g. Kelvin Temperature, results of counting and measurements.
SAMPLING AND SAMPLING TECHNIQUES
Sampling
refers to the method of selecting a portion from the population under study.
Types of Sampling Techniques
1. Probability sampling allows every unit of the population the chance of being included In the study 1.1 Simple random sampling is the process of selecting a sample giving each sampling unit an equal chance of being included in the sample. This is the most commonly used method and basic to all sampling designs. This is the most suitable method for homogenous groups. 1.2 Systematic sampling with a random start is a method of selecting a sample by taking every k unit from the ordered population. The first being selected at random. k is called the Sampling interval. Procedure: i. Number the units of the population consecutively from 1 to n ii. Determine the sampling interval (k) by the formula: k=N/n where N = population size n = sample size iii. Use the table of random numbers to choose r. r is the first unit of the sample size. The formula for obtaining the sample size (n) is Slovin s Formula: n = N/(1+Ne2) where n = sample size N = population size c = margin error 1.3 Stratified sampling is used if the population is made up of groups or items which are heterogeneous w/ respect to the characteristics under study. The population should be classified or
stratified into more or less homogeneous number of population or strata before sampling is done. Stratified random sampling consists of selecting a simple random sample form each of the sub population which the population has been classified. 1.4 Cluster sampling is method of selecting a sample of distinct groups or Clusters of smaller units called elements. The sample cluster may Be chosen by random sampling using systematic sampling with a random start. 1.5 Multistage sampling is done in stages. The selection of the sample is accomplished in two or more stages. The population is first divided into a number of first stage primarily units from which is a sample is drawn. Within the sample first stage units, a sample record stage or secondary units is drawn. 2. Nonprobability sampling selects the sample in such a way that not all the units of the Population is given the chance of being selected some have no chance at all. 2.1 Purposive sampling selects the sample based on the preselected characteristics 2.2 Quota sampling chooses the sample based on the required number or Percentage of the population, the selection of which is not based On randomization. 2.3 Convenience sampling selects the sample that can be easily picked and made Part of the group since the population is infinite. *** Nonrandom samples can be described but cannot be used for Making conclusions or inferences.
COLLECTION OF DATA
Gathered available facts/data from published or unpublished sources should be accurate, timely, complete, and relevant to the problem.
Sources of Data 1. Primary data are obtained from published or unpublished materials by the researchers themselves. These are gathered from an original source or which are based on a first hand experience. e.g. diaries, autobiographies and first person accounts 2. Secondary data are obtained from existing documents or published or unpublished reports by people organizations other than original collection. e.g. newspapers, magazines, biographies, published books
Methods of Data Collection
1. Survey Method Data is obtained by asking people either directly (interview) or indirectly (questionnaire) through the use of schedule set of questions. 1.1 Interview is a person to person exchange if data between one supplying data (interviewee) and the one soliciting the data (interviewer) that is most appropriate for revealing data on complex, emotionally laden topics or sentiments underlying an expressed opinion. e.g. focused interview, clinical interview, nondirective interview 1.1.1 Facilitates the clarification of some questions and answers. 1.1.2 Allows the observation of the interviewee s reaction and facial reactions to some of the questions. 1.1.3 Interviewer may deliberately or unintentionally influence the interviewee s response. 1.2 Questionnaire elicits responses by way of a set of questionnaire that are usually mailed (snail mail or electronic mail) 1.2.1 Confidential data are usually collected by questionnaire
1.2.2 Respondent can accomplish the questionnaire at his most convenient time. 1.2.3 Covers wide geographical area. Types of Questions: 1. Fixed alternative questions limit response to a stead alternative. It is very easy to tabulate. e.g. Do you want to study abroad? O Yes O No 2. Openended questions permit free response by merely raising the issue without providing any instruction to the respondents reply. e.g. How do you describe your school? Characteristics of a Good Questionnaire: 1. Questions must be simple and clear in order to obtain accurate information. Good questions result in a greater degree of precision. Questions like, How much do you drink? . The question is not clear to respondents, it may have several meanings. 2. Questions must be objective. Questions like. Why do you like to study in UST? . This question must be phrased in such a way not to put the answer into the subject s response. 3. Questions must always state the precise units in order to facilitate the presentation of data. 4. Questions must as much as possible be fixed alternative 5. Questions must be organized in a logical manner. 6. Questions must include the essential information only.
2. Observation Method
Data pertaining to behaviors of an individual or a group of individuals during the occurrence of a particular event/situations are best obtained through observation. This method is limited to the time of occurrence of the event. Types of Observations 2.1 Participant observation observer joins the group as participating member actively or passively 2.2 Nonparticipating observation observe outside of the group whether his presence is known or unknown 3. Experimental Method A method designed for collecting data under controlled condition that usually establishes causal relationship. 4. Use of Records Method The data is obtained through registration such as birth, death, cars, as required by some laws, ordinances or policies.
ORGANIZATION OF DATA FREQUENCY DISTRIBUTION
Frequency distribution is the method of organizing and summarizing statistical data in tabular form. Classes are the categories for grouping data. Class frequency is the number of observation falling under a class. Class limits are the end of number classes. e.g. 110115 110 is the lower class limit (lcl) 115 is the upper class limit (ucl) Class boundaries are the true or real class limits. e.g. 110115 109.5115.5
Note: for discrete variable add and subtract 0.5 For continuous variable depends on the number of decimal places. Class size is called size of the class interval. It is obtained by getting the difference between the successive upper/lower class limits/boundaries. Class mark is the midpoint of the class interval. CM = (ucl + lcl) / 2 Steps in constructing a Frequency Distribution 1. Determine the range which is the difference between the highest and lowest value. 2. Determine the adequate number if class interval. a. The number of classes should not be smaller than 6 but not greater that 16 (6 < n < 16). Not too many so as to obtain many empty classes and not too few to avoid lumping observation and too much information. b. Observation should fall into one and only one class interval. Sturges approximation is only a guideline not an inflexible rule. K = no. of classes (approximate) K = 1 + 3.22 log n 3. Determine the size of the class interval (sci) sci = R/k round off sci to the nearest odd number depending on the number of Observation 4. Determine a number less than or equal to the lowest score divisible by the size of the class interval 5. List all class limits and class boundaries. 6. Tally the frequency for each class. 7. Get the sum of the frequency column and check against the total number of observation.
PRESENTATION OF DATA
Data must be presented in them most understandable form that shows significant characteristics. Three Ways of Data Presentation 1. Textual Form is summarizing the data in paragraph form. The simplest and the most appropriate approach when there are only few numbers to be presented. When a large quantitative data are included in the text or paragraph the presentation becomes almost incomprehensible 2. Tabular Form is arranging and presenting data in rows and columns so that the reader may easily compare and analyze. This method facilitates the comparison of various figures under the different categories. 3. Graphical Form is presenting values or relationships in pictorial form. Charts or graphs are extremely useful and effective in quickly presenting unlimited amount of information. It is more effective and attractive than any other method of data presentation. 3.1 Bar Graph. This consists of bars of heavy lines of equal width, either all vertical or all horizontal. The length of the bars represent the magnitude of the quantities being compared. 3.2 Line Graph. This graph shows the relationship between two or more sets of quantities and is usually used to highlight the effect of time in a given data. 3.3 Pie Chart. This is used to represent quantities that make a whole. The diagram is in circular shape cut into subdimension with each size of every section indicated on the proportion of each component.
Definition of Terms:
y y y y Experiment is the process by which observation or measurement is obtained. Event is the outcome of an experiment, It is a collection of one or more simple events. Sample space is the set of all possible outcomes of an experiment. Probability is the toll that allows the statistician to use sample information to make inferences about or describe the population form which the sample was drawn. The probability of an event is the numerical measure of the likelihood or degree of predictability that the event will occur.
The Empirical Probability of an Event A is defined as
P(A) = nA = number of times A occurred n number of trials run
AXIOMS OF CLASSICAL APPROACH TO PROBABILITY
1. For every event a, 0 < P(A) < 1, that is, the probability of any event is a real number 0 and 1 inclusive 2. P(S) = 1 and P(Ø) = 0 3. If A1, A2, A3... An are mutually exclusive events (mutually disjoint sets) Then:
n { i =1
P(A1 U A2 U... An ) =
[Ai = P(A1) + P(A2) + P (A3) +
P(An)]
This method is easy to employ when the sample space S that are equally likely or equiprobable.
Example: A researcher studied the relationship between the salary of a working a woman with school aged children and the number of children she had. 2 or fewer children .13 .20 .30 More than 2 children .02 .10 .25
High Salary Medium Salary Low Salary
Let A denote the event that a working woman has 2 or fewer children. Let B denote the event that a working woman has a lower salary. 1. What is P(A)? = ______________________________ 2. What is P(A U B)? = ____________________________________ 3. What is P(A B)? = _________________________________
4. Find P(B / A) = _________________________________ Definition: A permutation is an arrangement of objects in a definite order.
nPr
=
n!___ (n r)
A combination is a selection of objects without regard to order.
nCr
=
n!___ n!(n r)!
_____________________________________________________________________________________________________________________ Prepared By: Doxa Dave Rotap B.S. Microbiology 2013