Introduction to Statistics

What we need to learn?

Why is statistics important as a subject?

What are the application of statistics in Business?
What are the processes to collect the data? What are the basic concepts of data collection techniques?

What is statistics?

Data=Information??

Information=Data + Statistics

What does statistics give you?

Statistical techniques enables the decision maker to
Summarize and describe Data more precisely and derive the relevant Information out of it. Capture a Population’s characteristics by making inferences from a sample’s characteristics. Understand the nature of relationship between pair of variables in a process to improve its functioning. Make reliable forecasts of certain events of interest.

The word statistics refers to a collection of procedures and principles useful for gathering and analyzing numerical information, called statistical data or simply data, for drawing conclusions and making decisions Statistics is defined in Webster as

The classified facts representing the condition of the people in a state…. Especially those facts can be stated in numbers or in tables of numbers or in any tabular or classified arrangements

Statistical thinking Process
Specify the aim of the Study

Understand how the Process Works

Asses the current Performance

Identify strategies for Improvements Test the effectiveness of the Proposed Strategy Successful?

Implement the Strategy

Marketing

Market Research Consumer Behavior Quality Control Supply Chain Management Correlation Analysis of key ratios Analyze data on Assets and Liabilities Studies of wage rates, incentive plans etc Grievance addressal

Production
Finance
 

Personnel
 

Exercise…

Data and Data Sources

Sources of Data

The choice of a data collection method from particular source depends on
The research facility available, The extent of accuracy required in analyses, The time span of the study, The amount of money required.

Broadly Data is classified in two categories
Primary Data Secondary Data

Data Collection Methods
Methods

Primary Sources Focus Groups/Delphi Method

Secondary Sources

Observer

Interviews

Questionnaires

Documents

Participant

Structured

Mailed

Pre Existing

NonParticipant

Semistructured

Collected

Initiated, elicted

Unstructured

Electronically distributed

Primary Data Sources

Following are the methods to collect the Data through Primary Sources

Direct personal Observations

Diagnosis of a particular disease Telephonic interviews to collect the data Detailed questionnaire collecting details on behavior, demographics, Level of knowledge and Opinions Group/Panel Discussion Face to Face Meetings

Direct or indirect Oral Interviews

Focus Groups

Delphi Method

Secondary Data Sources

External Secondary Data Sources

Government publications Non Government Publications Various syndicate services International Organizations

Internal Secondary Data Sources
Data generated within an organization Example: Financial Data, Production, Quality Control and sales records

Data Analysis Process
Data Collection
• Data Preparation • Editing the Data • Coding the Data • Categorizing the Data • Creating the Data File • Selection of Software

Data Analysis
• Type of Study • Correlation • Regression • Group differences, Ranks etc • Stability of Data • Reliability • Validity • Hypothesis Testing

Results

Interpretation of Results

Conclusions

Exercise

Data Classification

Classification of Data

Arranging data is groups/classes on the basis of certain properties in referred as data classification. It helps in:
Considering the raw data in some compact and orderly from suitable for statistical analysis. Revealing the pattern and characteristics of variable in data. Comparison and drawing inferences from the data. Statistical analysis to reveal characteristics of elements in the data set.

Requisites of Ideal Classification

Classification should be unambiguous

Each element must belong to only one class

Classification should be stable

Data set into various classes must remain unchanged, so that the results can be compared

Basis of Classification

Geographical Classification

The classification of data is based upon geographical or location such as cities, villages etc Classification is based on time period, such classification is also called time series Simple Classification: Binary outcome such as Male-Female, Educated-Not Educated Manifold Classification: Diversified Classification such as Population is further divided into two sub population, MaleFemale Data is based on characteristics, such as Height, weight, income, expenditure etc

Chronological Classification

Qualitative Classification
Quantitative Classification

Frequency Distribution

Divide the observation in the data set into convenient ordered classes The number of observations in each class is referred to as Frequency

A tabular summary of data showing the number (frequency) of observations in each of several nonoverlapping class intervals.

Constructing a Frequency Distribution

Number of Class Intervals

Minimum 5 but less than 15 H= (Largest Numerical Value-Smallest Numerical Value )/No of class desired The Limits of each class should be clearly defined so that each observation of the data set belongs to one and only one class The value on each class that is halfway between the lower and upper class limits

Width of Class Intervals

Class Limits (Boundaries)

Mid point of Class Intervals

Methods of Data Classification

Exclusive Method

When the data are classified in such a way that the upper limit of a class interval is the lower limit of the succeeding class interval. Example: 0-10, 10-20, 20-30 etc

Inclusive Method

When the data are classified in such a way that the both lower and upper limits of a class interval are included in the interval itself. Example: 0-4, 5-8, 9-12 etc

Other types of Frequency Distribution

Bivariate Frequency Distribution

When the data involves two variable such as income and percentage expenditure on food items, supply and demand of a commodity etc, then frequency distribution is obtained as a result of cross classification is called Bivariate Frequency Distribution

Types of Frequency Distribution

Cumulative Frequency Distribution

The cumulative number of observations less than or equal to the upper class limit of each class The number of observations for each class interval is divided by the total number of observations The number of observations for each class interval is converted into a percentage frequency by dividing it by the total number of observations

Relative Frequency Distribution

Percentage Frequency Distribution

Exercise

Graphical Representation

Types of Diagrams > Histograms (Bar Diagrams)

Used for Grouped and Ungrouped Data

Types of Diagrams > Simple Bar Diagrams (Charts )

Displays the value of categorical variables

Types of Diagrams > Multiple Bar Diagrams

To show the Direct Comparison between two or more sets of Data

Types of Diagrams > Deviation Bar Diagrams

Net quantities in excess or decline

Types of Diagrams > Sub-divided Bar Diagrams

To express information in terms of ratios or percentages

Types of Diagrams > Percentage Bar Diagrams

To show the share of multiple variables wrt to the data set

Types of Diagrams > Frequency Polygons

Data points are plotted at the midpoints of the intervals and are connected with a straight line.

Types of Diagrams > Frequency Curve

To represent the continuous frequency for each time interval

Types of Diagrams > Cumulative Frequency Curve

To represent the cumulative frequency for each time interval

Types of Diagrams > Pie Diagram

To show the total number of observations of different types in the data set on a percentage basis in circles