Professional Documents
Culture Documents
1
Preface
• Statistics is the science of collecting, organizing and interpreting
numerical and nonnumerical facts, which we call data.
• The collection and study of data are important in the work of
many professions, so that training in the science of statistics is
valuable preparation for variety of careers. , for example
economists and financial advisors, businessmen, engineers,
farmers
• Knownedge of probability and statistical methods also are useful
for informatic specialists of various fields such as data mining,
knowledge discovery, neural network, fuzzy system and so on.
• Whatever else it may be, statistics is, firsrt and foremost, a
collection of tools used for converting raw data into information to
help decision makers in their works.
The science of data - statistics - is the subject of this course.
2
Audience and objective
• Audience
This tutorial as an introductory course to statistics is intended mainly for users
such as engineers, economists, managers,...which need to use statistical
methods in their work and for students. However, it will be in many aspects useful
for computer trainers.
• Objectives
• Understanding statistical reasoning
• Mastering basic statistical methods for analyzing data such as descriptive
and inferential methods
• Ability to use methods of statistics in practice with the help of computer
softwares in statistics
• Entry requirements
High school algebra course (+elements of calculus)
Skill of working with computer
3
•[Back]
Contents
Preface
Chapter 1 Introduction…………. Chapter 10 Categorical Data …....
Chapter 2 Data presentation…... Analysis and Analysis of variance
Chapter 3 Data characteristics... Chapter 11 Simple Linear ………
descriptive summary statistics
regression and correlation ……
Chapter 4 Probability: Basic…...
concepts ……………………. Chapter 12 Multiple regression …
Chapter 5 Basic Probability Chapter 13 Nonparametric statistics
distributions ………………... …………………………
Chapter 6 Sampling Distributions References
……………….
Appendix A
Chapter 7 Estimation………….
Chapter 8 General Concepts of
Appendix B
Hypothesis Testing ………….. Appendix C
Chapter 9 Applications of Hypothesis Appendix D
Testing …………..
Index
4
Chapter 1 Introduction •[Back]
5
•[Contents] •[Back]
Chapter 2 Data presentation
2.1 Introduction
• The objective of data description is to summarize the characteristics of a data set. Ultimately, we want
to make the data set more comprehensible and meaningful. In this chapter we will show how to
construct charts and graphs that convey the nature of a data set. The procedure that we will use to
accomplish this objective depends on the type of data.
2.2 Types of data
• Quantitative data are observations measured on a numerical scale.
Nonnumerical data that can only be classified into categories are said to be qualitative data..
2.3 Qualitative data presentation
• Category frequency = the number of observations that fall in that category.
• Relative frequency = the proportion of the total number of observations that fall in that category
• Percentage for a category = Relative frequency for the category x 100%
2.4 Graphical description of qualitative data
Bar graphs and pie charts
6
Chapter 2 (continued 1) •[Back]
3.1 Introduction
3.2 Types of numerical descriptive measures
3.3 Measures of central tendency
3.4 Measures of data variation
3.5 Measures of relative standing
3.6 Shape
3.7 Methods for detecting outlier
3.8 Calculating some statistics from grouped data
3.9 Computing descriptive summary statistics using computer
softwares
3.10 Exercises
8
Chapter 3 (continued 1) •[Back]
9
Chapter 3 (continued 2) •[Back]
10
Chapter 4. Probability: Basic concepts •[Contents] •[Back]
11
•[Back]
Chapter 4 (continued 1)
4.1 Experiment, Events and Probability of an Event
The process of making an observation or recording a measurement under
a given set of conditions is a trial or experiment.
Outcomes of an experiment are called events.
We denote events by capital letters A, B, C,…
The probability of an event A, denoted by P(A), in general, is the chance
A will happen.
4.2 Approaches to probability
.Definitions of probability as a quantitative measure of the “degree of
certainty” of the observer of experiment.
.Definitions that reduce the concept of probability to the more primitive
notion of “equal likelihood” (the so-called “classical definition “).
.Definitions that take as their point of departure the “relative frequency” of
occurrence of the event in a large number of trials (“statistical” definition).
12
•[Back]
Chapter 4 (continued 2)
4.3 The field of events
• Definitions and relations between the events: A implies B, A and B are
equivalent (A=B), product or intersection of the events A and B (AB),
sum or union of A and B (A+B), difference of A and (A-B or A\B), certain
(or sure) event, impossible event, complement of A, mutually exclusive
events, simple (or elementary), sample space.
• Ven diagrams
• Field of events
Chapter 5 (continued 1)
5.1 Random variables
• A random variable is a variable that assumes numerical values
associated with events of an experiment.
• Classification of random variables: A discrete random variable
and continuous random variable
5.2 The probability distribution for a discrete random
variable
• The probability distribution for a discrete random variable x is a
table, graph, or formula that gives the probability of observing
each value of x.
• Properties of the probability distribution
16
•[Back]
Chapter 5 (continued 2)
5.3 Numerical characteristics of a discrete random
variable
5.3.1 Mean or expected value: =E(X)= xp(x)
5.3.2 Variance and standard deviation 2=E[(X- )2]
5.4 The binomial probability distribution
• Model (or characteristics) of a binomial random variable
• The probability distribution
• mean and variance for a binomial random variable
5.5 The Poisson distribution
• Model (or characteristics) of a Poisson random variable
• The probability distribution
• mean and variance for a Poisson random variable
17
•[Back]
Chapter 5 (continued 3)
5.6 Continuous random variables: distribution function
and density function
• Cumulative distribution function F(x)=P(X<x)
• Density probability function f(x) = F’(x)
19
•[Back]
Chapter 6 (continued 1)
6.1 Why the method of sampling is important
• two samples from the same population can provide contradictory
information about the population
• Random sampling eliminates the possibility of bias in selecting a
sample and, in addition, provides a probabilistic basic for evaluating
the reliability of an inference
6.2 Obtaining a Random Sample
• A random sample of n experimental units is one selected in such a
way that every different sample of size n has an equal probability of
selection
• procedures for generating a random sample
20
Chapter 6 (continued 2) •[Back]
21
•[Contents] •[Back]
Chapter 7. Estimation
7.1 Introduction
7.2 Estimation of a population mean: Large-sample case
8.1 Introduction
The procedures to be discussed are useful in situations, where we are interested in
making a decision about a parameter value rather then obtaining an estimate of its
value
8.2 Formulation of Hypotheses
• A null hypothesis H0 is the hypothesis against which we hope to gather evidence. The
hypothesis for which we wish to gather supporting evidence is called the alternative
hypothesises Ha
• One-tailed (directional) test and two-tailed test
8.3 Conclusions and Consequences for a Hypothesis Test
• The goal of any hypothesis-testing is to make a decision based on sample information:
whether to reject H0 in favor of Ha we make one of two types of error.
• A Type I error occurs if we reject H0 when it is true. The probability of committing a
Type I error is denoted by (also called significance level)
• A Type II error occurs if we do not reject H0 when it is false. The probability of
committing a Type II error is denoted by .
•Contents
25
•[Back]
Chapter 8 (continued 1)
8.4 Test statistics and rejection regions
• The test statistic is a sample ststistic, upon which the decision
concerning the null and alternative hypotheses is based.
• The rejection region is the set of possible values of the test statistic for
which the null hypotheses will be rejected.
• Steps for testing hypothesis
• Critical value =boundary value of the rejection region
8.5 Summary
8.6 Exercises
26
•[Contents] •[Back]
Chapter 9. Applications of Hypothesis Testing
27
Chapter 9 (continued 1) •[Back]
29
•[Contents] •[Back]
Chapter 10. Categorical Data Analysis and Analysis of
Variance
10.1 Introduction
10.2 Tests of goodness of fit
10.3 The analysis of contingency tables
10.4 Contingency tables in statistical software packages
10.5 Introduction to analysis of variance
10.6 Design of experiments
10.7 Completely randomized designs
10.8 Randomized block designs
10.9 Multiple comparisons of means and confidence regions
10.10 Summary
10.11 Exercises
30
•[Back]
Chapter 10 (continued 1)
10.1 Introduction
10.2 Tests of goodness -of- fit
– Purpose: to test for a dependence on a qualitative variable that
allow for more than two categorires for a response.Namely, it test
there is a significant difference between observed frequency
distribution and a theoretical frequency distribution .
– Procedure for a Chi-square goodness -of- fit test
10.3 The analysis of contingency tables
– Purpose :to determine whether a dependence exists between to
qualitative variables
– Procedure for a Chi-square Test for independence of two
directions of Classification
10.4 Contingency tables in statistical software packages
31
•[Back]
Chapter 10 (continued 2)
10.5 Introduction to analysis of variance
Purpose: Comparison of more than two means
10.6 Design of experiments
• Concepts of experiment, design of the experiment, response variable, factor,
treatment
• Concepts of Between-sample variation, Within-sample variation
10.7 Completely randomized designs
• This design involves a comparison of the means of k treatments, based on
independent random samples of n1, n2,…, nk observations drawn from
populations.
• Assumptions: All k populations are normal, have equal variances
• F-test for comparing k population means
10.8 Randomized block designs
• Concept of randomized block design
• Tests to compare k Treatment and b Block Means
• 10.9 Multiple comparisons of means and confidence regions 32
•[Contents] •[Back]
Chapter 11. Simple Linear regression and correlation
11.9 Exercises
33
•[Back]
Chapter 11 (continued 1)
11.1 Introduction: Bivariate relationships
• Subject is to determine the relationship between two variables.
• Types of relationships: direct and inverse
• Scattergram
11.2 Simple Linear regression: Assumptions
• a simple linear regression model y = A + B x + e
• assumptions required for a linear regression model: E(e) = 0, e is
normal, 2 is equal a constant for all value of x.
11.3 Estimating A and B: the method of least squares
• the least squares estimators a and b , formula for a and b
11.4 Estimating 2
• Formula for s2, an estimator for 2
• interpretation of s, the estimated standard deviation of e
34
Chapter 11 (continued 2) •[Back]
36
•[Back]
Chapter 12 (continued 1)
12.1. Introduction: the general linear model
y = B0 + B1x1 + ... + Bkxk + e, where y - dependent., x1, x2, ..., xk -
independent variables, e - random error.
12.2 Model assumptions
– For any given set of values x1, x2, ..., xk , the random error e has a normal
probability distribution with the mean equal 0 and variance equal 2.
– The random errors are independent.
12.4 Estimating 2
12.5 Estimating and testing hypotheses about the B parameters
• Sampling distributions of b0, b1, ..., bk
• A (1-) 100% Confidence interval for Bi (i =0, 1,.., k)
• Test of an individual parameter coefficient Bi 37
Chapter 12 (continued 1) •[Back]
39
Chapter 13 (continued 1) •[Back] •[Contents]