INTRODUCTION TO
STATISTICS
CHAPTER 1
COURSE INSTRUCTOR: MUBASHER AKRAM
Father of Statistics:
Sir Ronald fisher was a British statistician.
Classical Definition:
Statistics is the process of identifying, organizing analyzing and
interpreting the results and present the data.
Basic Statistical Concept:
Data
Population
Sample
Parameter
Statistic
Data
collections of observations
(such as measurements,
genders, survey responses)
Statistics
It is the study of the
• collection,
• organization,
• analysis,
• interpretation and
• presentation of data
Population, Sample and Census
Population
The collection of all individuals or items
under consideration in a statistical study.
Sample
That part of the population from which information is
obtained.
Census
Collection of data from every member of a population.
Figure 1.1
Relationship between population and sample
Parameter
Statistic
Branches of Statistics,
Importance of Statistics,
Branches of Statistics
There are two types of applied statistics
Descriptive Statistics
Inferential Statistics
1. Descriptive statistics deals with the concepts and methods of
summarization and description of data.
Example: A cricket player wants to find his score average for the
last 20 games
2. Inferential Statistics is used to make inferences
(suggestions, conclusion) about some characteristics.
Example: A cricket player wants to estimate his chance of scoring
based on his current season average.
Uses of Statistical Information:
To inform the general public.
To explain things that have happened.
To justify a claim.
To provide general comparison.
To predict the decision of the future outcomes.
To estimate the unknown quantities.
To establish association/relationship between factors.
Importance of Statistics:
Large numbers are always estimated not counted
The motto of statistics is not mere collection of data but
comparison.
Research is impossible without statistical tools.
Without statistical balance, no once can dare to take risks
Applications of Statistics
• Statistics plays a vital role in every field of human activity. Statistics helps in
determining the existing position of per capita income, unemployment, population
growth rates, housing, schooling medical facilities, etc., in a country.
• Now statistics holds a central position in almost every field, including industry,
commerce, trade, physics, chemistry, economics, mathematics, biology, botany,
psychology, astronomy, etc., so the application of statistics is very wide. Now we
shall discuss some important fields in which statistics is commonly applied
(1) Business:
Statistics plays an important role in business. A successful
businessman must be very quick and accurate in decision making.
He knows what his customers want; he should therefore know what
to produce and sell and in what quantities.
Statistics helps businessmen to plan production according to the
taste of the customers, and the quality of the products can also be
checked more efficiently by using statistical methods. Thus, it can
be seen that all business activities are based on statistical
information. Businessmen can make correct decisions about the
location of business, marketing of the products, financial resources,
etc.
(2) Economics:
Economics largely depends upon statistics. National income
accounts are multipurpose indicators for economists and
administrators, and statistical methods are used to prepare these
accounts. In economics research, statistical methods are used to
collect and analyze the data and test hypotheses. The relationship
between supply and demand is studied by statistical methods;
imports and exports, inflation rates, and per capita income are
problems which require a good knowledge of statistics
(3) Mathematics:
Statistics plays a central role in almost all natural and social
sciences. The methods used in natural sciences are the most reliable
but conclusions drawn from them are only probable because they are
based on incomplete evidence.
Statistics helps in describing these measurements more precisely.
Statistics is a branch of applied mathematics. A large number of
statistical methods like probability averages, dispersions, estimation,
etc., is used in mathematics, and different techniques of pure
mathematics like integration, differentiation and algebra are used in
statistics.
(4) Banking:
Statistics plays an important role in banking. Banks make use of
statistics for a number of purposes. They work on the principle that
everyone who deposits their money with the banks does not
withdraw it at the same time. The bank earns profits out of these
deposits by lending it to others on interest. Bankers use statistical
approaches based on probability to estimate the number of deposits
and their claims for a certain day.
(5) State Management (Administration):
Statistics is essential to a country. Different governmental policies are
based on statistics. Statistical data are now widely used in making all
administrative decisions. Suppose if the government wants to revise the
pay scales of employees in view of an increase in the cost of living, and
statistical methods will be used to determine the rise in the cost of
living. The preparation of federal and provincial government budgets
mainly depends upon statistics because it helps in estimating the
expected expenditures and revenue from different sources. So statistics
are the eyes of the administration of the state.
(6) Accounting and Auditing:
Accounting is impossible without exactness. But for decision
making purposes, so much precision is not essential; the decision
may be made on the basis of approximation, know as statistics. The
correction of the values of current assets is made on the basis of the
purchasing power of money or its current value.
In auditing, sampling techniques are commonly used. An auditor
determines the sample size to be audited on the basis of error.
(7) Natural and Social Sciences:
Statistics plays a vital role in almost all the natural and social
sciences. Statistical methods are commonly used for analyzing
experiments results, and testing their significance in biology,
physics, chemistry, mathematics, meteorology, research, chambers
of commerce, sociology, business, public administration,
communications and information technology, etc.
(8) Astronomy:
Astronomy is one of the oldest branches of statistical study; it deals
with the measurement of distance, and sizes, masses and densities of
heavenly bodies by means of observations. During these
measurements errors are unavoidable, so the most probable
measurements are found by using statistical methods.
Applied Statistics
Methods of Data Collection
DATA
Data collection is a term used to describe a process of preparing and
collecting data
Systematic gathering of data for a particular purpose from various
sources, that has been systematically observed, recorded, organized.
Data are the basic inputs to any decision making process in business
PURPOSE OF DATA COLLECTION
The purpose of data collection is
to obtain information
to keep on record
to make decisions about important issues
to pass information on to others
Basic Data Types
Data sets can consist of two types of data:
qualitative data and quantitative data.
Quantitative ( or numerical or measurement ) data.
Data which can be measured or identified by a
numerical scale
Categorical (or qualitative or attribute) data .Data
which cannot be measured by a numerical scale
Quantitative Data
Categorical Data
Working with Quantitative Data
• Quantitative data can further be described by
distinguishing between discrete and continuous
types.
Discrete Data
Discrete data result when the number of possible
values is either a finite number or a ‘countable’
number (i.e. the number of possible values is
0, 1, 2, 3, . . .)
Example: The number of eggs that a hen lays, Test
score, shoe size, age, world ranking, number of
brothers etc.
The number of eggs that a hen lays is discrete
quantitative measure because it is numeric but can
only be a whole number
Continuous Data
Continuous (numerical) data
• result from infinitely many possible values that
correspond to some continuous scale that covers a
range of values without gaps, interruptions, or jumps
Example: Height, weight, length, amounts of milk from cows
• etc.
Height is continuous quantitative measure because it can take
any numerical value in a particular range.
The amount of milk that a cow produces; e.g. 2.343115 gallons
per day.
CLASSIFICATION OF DATA TYPES
There are many ways of classifying data.
A common classification is based upon who collected the data
PRIMARY DATA
SECONDARY DATA
Primary and Secondary Data
• The primary data are those which are collected afresh and for the first
time, and thus happen to be original in character
• The secondary data, on the other hand, are those which have already
been collected by someone else and which have already been passed
through the statistical process
PRIMARY DATA
The data which are collected from the field under the
control and supervision of an investigator
Primary data means original data that has been
collected specially for the purpose in mind
This type of data are generally afresh and collected
for the first time
It is useful for current studies as well as for future
studies
For example: your own questionnaire.
Primary Research Methods & Techniques
• Primary Research
Quantitative Data
Surveys
Qualitative Data Experiments
Personal interview (intercepts)
Mail
In-house, self-administered
Telephone, fax, e-mail, Web
Primary Research Methods & Techniques
• Primary Research
Qualitative Data
Focus groups
Individual depth interviews
Human observation
Case studies
Quantitative and Qualitative Information
Quantitative – based on numbers – 56% of 18 year olds drink alcohol
at least four times a week - doesn’t tell you why, when, how.
Qualitative – more detail – tells you why, when and how
METHODS
OBSERVATION METHOD
Through personal observation
PERSONAL INTERVIEW
Through Questionnaire
TELEPHONE INTERVIEW
Through Call outcomes, Call timings
MAIL SURVEY
Through Questionnaire
Through schedules
METHODS
other methods which include
warranty cards
distributor audits
pantry audits
consumer panels
using mechanical devices
through projective techniques
depth interviews
content analysis
Observation Method
Most commonly used method specially in studies relating to behavioral sciences
Information is sought by way of investigator’s own direct observation without
asking from the respondent
Subjective bias is eliminated, if observation is done accurately
Information obtained under this method relates to what is currently happening
Independent of respondents’ willingness to respond
Very limited information
Structured observation & Unstructured observation
Participant observation & non-participant observation
controlled & uncontrolled observation.
Personal interview
Structured interviews & unstructured interviews
Focused interview
Clinical interview
Non-directive interview
Merits & demerits of interview methods
Pre-requisites and basic tenets of interviewing
Telephone interviews
• This method of collecting information consists in contacting
respondents on telephone itself.
• It is not a very widely used method, but plays important part in
industrial surveys, particularly in developed regions.
THROUGH QUESTIONNAIRES
Questionnaire is sent (usually by post) to the persons concerned with
a request to answer the questions and return the questionnaire
The respondents have to answer the questions on their own
Pilot Survey
Main aspects of a questionnaire :general form, question sequence
and question formulation and wording
COLLECTION OF DATA THROUGH
SCHEDULES
schedules (proforma containing a set of questions) are being filled in
by the enumerators who are specially appointed for the purpose
Enumerators explain the aims and objects of the investigation and
also remove the difficulties which any respondent may feel in
understanding the implications of a particular question or the
definition or concept of difficult terms.
Enumerators should be trained
SECONDARY DATA
• Data gathered and recorded by someone else prior to and for a
purpose other than the current project
• Secondary data is data that has been collected for another purpose.
• It involves less cost, time and effort Secondary data is data that is
being reused. Usually in a different context.
• For example: data from a book.
SOURCES
INTERNAL SOURCES
Internal sources of secondary data are usually for marketing
application
various publications of the central, state are local governments
Sales Records
Marketing Activity
Cost Information
Distributor reports and feedback
Customer feedback
SOURCES
EXTERNAL SOURCES
External sources of secondary data are usually for Financial application
various publications of foreign governments or of international bodies and
their subsidiary organizations
Journals
Books
Magazines
Newspaper
Libraries
The Internet
Classification of Data using levels of measurement
1. Nominal level of measurement
2. Ordinal level of measurement
3. Interval level of measurement
4. Ratio level of measurement
Nominal Level
Nominal level of measurement is characterized by data
that consist of names, labels, or categories only, and
the data cannot be arranged in an ordering scheme
(such as low to high)
Examples:
Survey responses yes, no, undecided
Political Party: The political party affiliation of survey
respondents (Democrat, Republican, Independent,
other)
Ordinal Level
Ordinal level of measurement
involves data that can be arranged in some order, but
differences (obtained by subtraction) between data
values either cannot be determined or are meaningless
Example:
Course grades A, B, C, D, or F
Universities rank in USA (like 1st, 2nd, 3rd, 4th,…)
Interval Level
Interval level of measurement is like the ordinal level, with the
additional property that the difference between any two data values
is meaningful. However, data at this level do not have a natural zero
starting point (where none of the quantity is present).
Example:
Body temperatures of 96.2 F and 98.6 F (There is no natural
starting point. The value of 0 F might seem like a starting point, but
it is arbitrary and does not represent the total absence of heat.)
Years: 1000, 2000, 1776, and 1492. (Time did not begin in the year
0, so the year 0 is arbitrary instead of being a natural zero starting
point representing “no time.”)
Ratio Level
Ratio level of measurement Is the interval level with the additional
property that there is also a natural zero starting point (where zero
indicates that none of the quantity is present); for values at this level,
differences and ratios are meaningful.
Example:
Prices: Prices of college textbooks ($0 represents no cost, a $100 book
costs twice as much as a $50 book.)
Distances: Distances (in miles) travelled by cars (0 mile represents no
distance travelled, and 60 miles is twice as far as 30 miles)
Summary - Levels of Measurement
Nominal - categories only
Ordinal - categories with some order
Interval - differences but no natural
starting point
Ratio - differences and a natural starting
point
ks
a n
T h