Unit - 1 PDF

Applied Statistics
Course Code:- 1150MA201

Unit-1 – Role of Statistics in Engineering ,
Data Description and Representation
Contents Contents
 Engineering methods and  Collection of data
statistical Thinking
 Collecting Engineering Data  Classification and Tabulation of
data
• Basic Principles
 Stem and Leaf diagram
• Retrospective Study
 Frequency distribution and
• Observational Study Histogram
• Designed Experiments  Box Plots
• Observing Processes over  Time Sequence Plots
Time  Probability Plots
 Mechanical and Empirical Models
Course outcome
Identify the role that statistics can play in the

Engineering problems –solving process, discuss the
different methods the engineers use to collect data
and construct and interpret visual data displays
The Engineering Method and Statistical
Thinking
 An Engineer is someone who solves problems of interest

to society with the efficient application of scientific
principles by:
• Refining existing products
• Designing new products or processes
The Creative Process in Engineering Method
Statistics Supports The Creative Process
 The field of statistics deals with the collection,

presentation, analysis, and use of data to:
• Make decisions
• Solve problems
• Design products and processes
 It is the science of learning information from data.
Statistics Supports The Creative Process
– Cont.
• Statistical techniques are useful for describing and

understanding variability.
• By variability, we mean successive observations of a
system or phenomenon do not produce exactly the
same result.
• Statistics gives us a framework for describing this
variability and for learning about potential sources of
variability.
Collecting Engineering Data
Three basic methods for collecting data:

 A retrospective study using historical data
- Data collected in the past for other purposes.
 An observational study
- Data, presently collected, by a passive observer.
 A designed experiment
- Data collected in response to process input
changes.
Observing Processes over Time
 Data are collected over time.

 It is usually very helpful to plot the data versus time in a time series
plot.
 Phenomena that might affect the system or process often become
more visible in a time-oriented plot and the concept of stability can
be better judged.
Mechanical and Empirical Models
•A Mechanistic model is built from our underlying knowledge

of the basic physical mechanism that relates several
variables.
Example: Ohm’s Law
Current = voltage/resistance
I = E/R
I = E/R + 
• The form of the function is known.
An Empirical model is built from our engineering and

scientific knowledge of the phenomenon, but is not
directly developed from our theoretical or first-principles
understanding of the underlying mechanism.
The form of the function is not known a priori.
Example:-
• We are interested in the numeric average molecular weight (Mn) of a
polymer. Now we know that Mn is related to the viscosity of the material (V),
and it also depends on the amount of catalyst (C) and the temperature (T ) in
the polymerization reactor when the material is manufactured. The
relationship between Mn and these variables is
Mn = f(V,C,T)
where the form of the function f is unknown.
• We estimate the model from experimental data to be of the following form
where the b’s are unknown parameters.
Collection of Data
 Collection of data is the first important aspect of

statistical survey.
 Data – Information which can be expressed in numbers.
 Two sources of data – Primary & Secondary.
 Primary data – data collected by investigator himself.
 Secondary data – data collected by someone and
used by the investigator.
Difference between Primary and Secondary
Data
 Primary data is original data collected by the

investigator while secondary data is already existing
and not original.
 Primary data is always collected for a specific purpose
while secondary data has already been collected for
some other purpose.
 Primary is costlier or is more expensive whereas
secondary data is less expensive.
Methods / Sources of Collection of Primary
Data
 Direct Personal Interview

– Data is personally collected by the interviewer.
 Indirect Oral Investigation
– Data is collected from third parties who have
information about subject of enquiry.
 Information from correspondents
– Data is collected from agents appointed
in the area of investigation.
Methods / Sources of Collection of Primary
Data -Cont
 Mailed questionnaire
– Data is collected through questionnaire
[list of questions] mailed to the informant.
 Questionnaire filled by enumerators
– Data is collected by trained enumerators
who fill questionnaires.
 Telephonic interviews
– Data is collected through an interview over
the telephone with the interviewer.
Difference Between Census & Sampling
Method
 Census Method
 Every unit of population studied
 Reliable and accurate results
 Expensive method
 Suitable when population is of homogenous nature
 Sampling Method
 Few units of population are studied
 Less Reliable and accurate results
 Less expensive method
 Suitable when population is of heterogeneous nature
Advantages and Disadvantages -
Mailed Questionnaire Method:
Advantages Disadvantages
 Less expensive  Long response time

 Only method to reach  Cannot be used by illiterates.
remote areas  Doubts cannot be cleared
 Informants can be regarding questions
influenced
Personal Interview Method
 Highest response rate  Most expensive

 Allows all types of questions  Informants can be influenced
 Allows clearing doubts  Takes more time
regarding questions
Telephonic Interview Method:
 Relatively low cost  Limited use

 Relatively high response  Reactions cannot be
rate watched
 Less influence on informants  Respondents can be
influenced
Methods / Sources of Collection of Secondary
Data
Published Source
- Government publications, Semi-government
publications etc.
Unpublished Source
- Census of India [They are collected by the
organizations for their own record]
Classification of Data
 Classification is the process of arranging data into sequences

and groups according to their common characteristics or
separating them into different but related parts.
(or)
 The process of grouping large number of individual facts and
observations on the basis of similarity among the items is
called classification.
Objectives – Classification of Data
 It condenses the mass of data in an easily assimilable form.

 It eliminates unnecessary details.
 It facilitates comparison and highlights the significant aspect
of data.
 It enables one to get a mental picture of the information
and helps in drawing inferences.
 It helps in the statistical treatment of the information
collected.
Types of Classification
 Chronological classification:
- In chronological classification the collected data are
arranged according to the order of time expressed in
years, months, weeks, etc.,
Eg:- The estimates of birth rates in India during 970 – 76 are
Year 1970 1971 1972 1973 1974 1975 1976
Birth 36.7 35.9 45.8 32.6 45.6 34.8 36.7

Rate
Types of Classification –Cont.
 Geographical classification:
- The data are classified according to geographical
region or place.
Eg:- The production of paddy in different states in Iraq,
production of wheat in different countries etc.
Country America China Denmark France Iraq
Yield of 1924 893 225 439 862

Wheat (in
kg/acre)
 Qualitative classification:
- Data are classified on the basis of same attributes or
quality like sex, literacy, religion, employment etc.,
Such attributes cannot be measured along with a
scale.
Eg:- If the population to be classified in respect to one attribute,
say sex, then we can classify them into two namely that of
males and females. Similarly, they can also be classified into
‘married or ‘ single’ on the basis of another attribute ‘marital
status’.
 Quantitative classification:
- It refers to the classification of data according to
some characteristics that can be measured such as
height, weight, etc.,
Eg:-The group of a children may be classified according to weight
Weight (in kg) No of children

5 - 10 50
10 – 15 200
15-20 260
Tabulation of Data
 Tabulation is the process of summarizing classified or grouped

data in the form of a table so that it is easily understood and an
investigator is quickly able to locate the desired information.
 A table is a systematic arrangement of classified data in columns
and rows.
 A statistical table makes it possible for the investigator to present
a huge mass of data in a detailed and orderly form.
 It facilitates comparison and often reveals certain patterns in
data
Main Parts of a Table
• Title of the table – It is a brief explanation of contents of the table.

• Table Number – It is given to be used for the reference.
• Captions – A word or Phrase which explains the contents of column
of a table.
• Stubs – It explains the contents of rows of a table.
• Body of the table – Most important part of table as it contains data.
• Head Note – Head note is inserted to convey complete information of
title.
• Source Note - It refers to the source from which information has been taken.
• Foot Note – It is used for pointing exceptions to the data
Format of Table
Table Number__________
Title ___________
[Head Note]
Stub caption Total[Rows]
Sub head Sub head
Column Column Column Column
head head head head
Stub Body of the table

entries
Total
[Columns]
Source Note:
Foot Note:
Basic Principles of Tabulation
 Tables should be clear, concise & adequately titled.

 Every table should be distinctly numbered for easy reference.
 Column headings & row headings of the table should be clear &
brief.
 Units of measurement should be specified at appropriate places.
 Explanatory footnotes concerning the table should be placed at
appropriate places.
 Source of information of data should be clearly indicated.
Basic Principles of Tabulation – cont.
 The columns & rows should be clearly separated with dark lines.
 Demarcation should also be made between data of one class and
that of another.
 Comparable data should be put side by side.
 The figures in percentage should be approximated before
tabulation.
 The alignment of the figures, symbols etc. should be properly aligned
and adequately spaced to enhance the readability of the same.
 Abbreviations should be avoided.
Representation of Data
 Stem and leaf diagram

 Frequency Distribution
 Bar Diagram
 Histogram
 Box plot
 Time sequence plot
 Probability plot
Representation of Data
 Stem and leaf diagram:-

 A stem-and-leaf plot is a graphical summary used to describe a set of
observations (as symmetric, skewed, etc.).
 Each observation is displayed on the graph and should have at least
two digits.
 Split each observation (at the same point) into a stem (one or more of
the leading digit(s)) and a leaf (remaining digits).
 Select the split point so that there are 5–20 total stems. List the stems in a
column to the left, and write each leaf in the corresponding stem row.
Problem-1
1. Use the data in the table to make a stem-and-leaf plot.

Step 1: Group the data by tens digits.
Step 2: Order the data from least to greatest.
Step 3: List the tens digits of the data in order
from least to greatest. Write these in
the “stems” column.
61 64 67
Step 4: For each tens digit, record the ones digits
of each data value in order from least to 72 74 76 79
greatest. Write these in the “leaves” column 83 84 88

Problem-1 - cont.
Step 5: Title the graph and add a key.
Ans:-
Problem-2
 The following are the numbers of text messages sent last week by
the cellular phone users on one floor of a college dormitory.
Display the data in a stem-and-leaf plot. What can you conclude?
155 159 144 129 105 145 126 116 130 114 122 112 112 142 126 118 118
108 122 121 109 140 126 119 113 117 118 109 109 119 139 139 122 78
133 126 123 145 121 134 124 119 132 133 124 129 112 126 148 147
Problem-2 – Cont.
Ans:-
Interpretation :- From the display,you can conclude that more than 50% of the
cellular phone users sent between 110 and 130 text messages.
Frequency distribution
 A frequency distribution is a tabular method for summarizing continuous or

discrete numerical data or categorical data.
 Partition the measurement axis into 5–20 (usually equal) reasonable
subintervals called classes, or class intervals. Thus, each observation falls
into exactly one class.
 Record, or tally, the number of observations in each class, called the
frequency of each class.
 Compute the proportion of observations in each class, called the relative
frequency.
 Compute the proportion of observations in each class and all preceding
classes, called the cumulative relative frequency.
Problem-1
 Construct a frequency distribution for the following Ticket Data

Ticket data: Forty random speeding tickets were selected from
the court house records in Columbia County. The
speed indicated on each ticket is given in the table
below.
58 72 64 65 67 92 55 51 69 73 64 59 65 55 75 56
89 60 84 68 74 67 55 68 74 43 67 71 72 66 62 63
83 64 51 63 49 78 65 75
Problem-1-cont.
Histogram
 A histogram is a graphical representation of a frequency

distribution.
 A (relative) frequency histogram is a plot of (relative) frequency
versus class interval.
 Rectangles are constructed over each class with height
proportional (usually equal) to the class (relative) frequency.
 A frequency and relative frequency histogram have the same
shape, but different scales on the vertical axis.
Histogram-Model
Step 1: Choose an appropriate scale and

Number of Pages Read per
interval.
Student Last Weekend
Step 2: Draw a bar for the number of 4
students in each interval. The bars should
touch but not overlap.
Students
3
Step 3: Title the graph and label the axes. 2
0
1- 10 11- 20 21- 30 31- 40
Number of Pages
Problem-1
 Construct a frequency histogram for the following Ticket Data

below.
58 72 64 65 67 92 55 51 69 73 64 59 65 55 75 56
89 60 84 68 74 67 55 68 74 43 67 71 72 66 62 63
83 64 51 63 49 78 65 75
Problem-1- Cont.
Frequency polygons
 A frequency polygon is a line plot of points with x

coordinate being class midpoint and y coordinate
being class frequency.
 Often the graph extends to an additional empty class
on both ends.
 The relative frequency may be used in place of
frequency.
Problem-1
 Construct a frequency polygon for the following Ticket Data

below.
58 72 64 65 67 92 55 51 69 73 64 59 65 55 75 56
89 60 84 68 74 67 55 68 74 43 67 71 72 66 62 63
83 64 51 63 49 78 65 75
Problem-1- Cont.
Ans:-
Ogive (or) Cumulative frequency polygon
 An ogive, or cumulative frequency polygon, is a plot of

cumulative frequency versus the upper class limit
Box and Whisker Plots
 A box-and-whisker plot uses a number line to show the distribution

of a set of data.
 Box plots are useful for comparing two or more sets of data like
heights of boys and girls in a class.
 A boxplot is a graphical display of the five-number summary.
 To make a box-and-whisker plot, first divide the data into four
equal parts using quartiles. The median, or middle quartile,
divides the data into a lower half and an upper half. The median
of the lower half is the lower quartile, and the median of the
upper half is the upper quartile.
Quartiles
 Quartiles split the data into four parts. For ungrouped data, arrange
the observations in order from smallest to largest.
 The second quartile is the median: Q2 = x.
 If n is even: The first quartile, Q1, is the median of the smallest n/2
observations; and the third quartile, Q3, is the median of the largest
n/2 observations.
 If n is odd: The first quartile, Q1, is the median of the smallest (n +1)/2
observations; and the third quartile, Q3, is the median of the largest
(n+1)/2 observations.
Quartiles
 For grouped data:-

 L1 = the lower boundary of the class containing Q1.
 L3 = the lower boundary of the class containing Q3.
 f1 = the frequency of the class containing the first quartile.
 f3 = the frequency of the class containing the third quartile.
 CF1 = cumulative frequency for classes below the one containing Q1.
 CF3 = cumulative frequency for classes below the one containing Q3.
Problem-1
 Use the data to make a box-and-whisker plot.
73 67 75 81 67 75
85 69
Problem-1-Cont.
 Step 1: Order the data from least to greatest. Then find the least
and greatest values, the median, and the lower and upper
quartiles.
 Step 2: Draw a number line. Above the number line, plot points
for each value in Step 1.
 Step 3: Draw a box from the lower to the upper quartile. Inside
the box, draw a vertical line through the median. Then draw the
“whiskers” from the box to the least and greatest values.
Problem-1-Cont.
Problem-1 –Cont.
Problem-2:
Comparing Box-and-Whisker Plots
Problem-2 – Cont.
Ans:-
Time Sequence Plots
 A time series “Measures the same phenomenon at equal intervals of

time”
Components of time series
 Trend : underlying long-term movement

 Cycle : medium-term cyclical movements about the
trend
 Seasonal (S) : factors that occur one or more times per year.
Stable in size and direction from year to year.
 Irregular (I) : residual after other components have been
removed. Should exhibit no pattern.
 We combine the trend and the cycle to form trend cycle (C),
but refer to this as the “trend”.
Time Series – Trend ,Time Series –Seasonal &
Time Series – Irregular Graphs
Probability Plot
 It is a graphical technique for assessing whether or not a data

set follows a given distribution such as the normal or Weibull.
 The data are plotted against a distribution and if the data are
in such a way that the points should form approximately a
straight line.
 It is a graphical method for determining whether sample data
conform to a hypothesized distribution based on a subjective
visual examination of the data.

Unit - 1 PDF

Uploaded by

Copyright:

Available Formats

You might also like

Unit - 1 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit - 1 PDF

Uploaded by

Copyright:

Available Formats

Applied Statistics

Course Code:- 1150MA201

Identify the role that statistics can play in the

 An Engineer is someone who solves problems of interest

 The field of statistics deals with the collection,

• Statistical techniques are useful for describing and

Three basic methods for collecting data:

 Data are collected over time.

•A Mechanistic model is built from our underlying knowledge

An Empirical model is built from our engineering and

 Collection of data is the first important aspect of

 Primary data is original data collected by the

 Direct Personal Interview

 Less expensive  Long response time

 Highest response rate  Most expensive

 Relatively low cost  Limited use

 Classification is the process of arranging data into sequences

 It condenses the mass of data in an easily assimilable form.

Year 1970 1971 1972 1973 1974 1975 1976

Birth 36.7 35.9 45.8 32.6 45.6 34.8 36.7

Yield of 1924 893 225 439 862

Weight (in kg) No of children

 Tabulation is the process of summarizing classified or grouped

• Title of the table – It is a brief explanation of contents of the table.

Stub Body of the table

 Tables should be clear, concise & adequately titled.

 Stem and leaf diagram

 Stem and leaf diagram:-

1. Use the data in the table to make a stem-and-leaf plot.

greatest. Write these in the “leaves” column 83 84 88

Step 5: Title the graph and add a key.

 A frequency distribution is a tabular method for summarizing continuous or

 Construct a frequency distribution for the following Ticket Data

 A histogram is a graphical representation of a frequency

Step 1: Choose an appropriate scale and

Step 3: Title the graph and label the axes. 2

 Construct a frequency histogram for the following Ticket Data

 A frequency polygon is a line plot of points with x

 Construct a frequency polygon for the following Ticket Data

 An ogive, or cumulative frequency polygon, is a plot of

 A box-and-whisker plot uses a number line to show the distribution

 For grouped data:-

 Use the data to make a box-and-whisker plot.

 A time series “Measures the same phenomenon at equal intervals of

 Trend : underlying long-term movement

 It is a graphical technique for assessing whether or not a data

You might also like