You are on page 1of 43

ME-5101

Engineering Analysis &


Statistics
Lect. # 2
Introduction to Statistics &
Economics
Dr. Nazeer Ahmad Anjum
Mechanical Engineering Program
Engineering University Taxila
Data and Statistics 2
• Applications in Engineering Science,
Business and Economics
• Data
• Data Sources
• Descriptive Statistics
• Statistical Inference

2/27/2020
Applications in Business and 3
Economics
• Accounting
Public accounting firms use statistical sampling
procedures when conducting audits for their clients.
• Finance
Financial analysts use a variety of statistical information,
including price-earnings ratios and dividend yields, to
guide for investment & recommendations.
• Marketing
Electronic point-of-sale scanners at retail checkout
counters are being used to collect data for a variety of
marketing research applications.
2/27/2020
Applications in Engineering and 4
Economics
• Production
A variety of statistical quality control
charts are used to monitor the output of
a production process.
• Economics
Economists use statistical information
in making forecasts about the future of
the economy or some aspect of it.

2/27/2020
Applications in Engineering 5
• Engineering statistics: It combines
Engineering and Statistics using scientific
methods for analyzing data.
• Engineering statistics involves data
concerning manufacturing processes such
as: component dimensions, tolerances,
type of material, and fabrication process
control.
• There are many methods used in
engineering analysis and they are often
displayed as histograms to give a visual of
the data as opposed to being just numerical.
2/27/2020
Applications in Engineering 6
Methods:
Design of Experiments (DOE) is a
methodology for formulating scientific and
engineering problems using statistical models.
Quality Control and Process Control use
statistics as a tool to manage conformance to
specifications of manufacturing processes
and their products.
Time and Methods Engineering use statistics
to study repetitive operations in manufacturing
in order to set standards and find optimum
manufacturing procedures.
2/27/2020
Applications in Engineering 7
Reliability Engineering which measures the
ability of a system to perform for its intended
function (and time) and has tools for
improving performance.
Probabilistic Design involving the use of
probability in product and system design
System Identification uses statistical methods
to build mathematical models of dynamical
systems from measured data. System
identification also includes the optimal design
of experiments for efficiently generating
informative data for fitting such models
2/27/2020
Key concepts: 8

 Basic Characteristics of data


 Types of Data
 Scaling Measurement
 Elements, Variables, and Observations

2/27/2020
Data 9

• Data refers to numerical description of


quantitative aspects of things, Experiments,
Processes, Analysis etc.
• Data in the plural sense, indicates a set of
numerical figures usually obtained by
measurement or counting.

2/27/2020
Basic Characteristics of Data: 10
• Data is aggregate of facts:
Data are the facts which are collected in
aggregate. For example, single unconnected
figure/value which is having no relevance in the
Engineering or business purpose or any other
purpose can’t be called as data.
• Data is affected to a large extent by a
variety of factors:
For example: in Engineering & business
environment the observations recorded are
affected by a number of factors. These factors
are either controllable or uncontrollable.
2/27/2020
Basic Characteristics of Data: 11
• Data is collected in a systematic
manner for a predetermined objective:
Facts collected in a haphazard manner and
without a complete awareness of the objective
will be confusing and cannot be made the
basis of valid conclusions.
• For example: collected data on price serves on
purpose unless one knows whether he wants
to collect data on wholesale or retail prices
and what are the relevant commodities under
consideration.

2/27/2020
Basic Characteristics of Data: 12
• Data must be related to one another:
The data collected should be comparable, otherwise
these cannot be placed in relation to each other i.e.,
data on the yield of crop and fertilizers used are
related but the crop yield doesn’t have any relation
with the data on health of the people
• Data must be numerically expressed:
That is any facts to be collected, data must be
numerically or quantitatively expressed. Qualitative
characteristic such as beauty, intelligence, sex, etc.,
are called as attributes and must be scaled to
express in numeric terms.

2/27/2020
Data and Data Sets 13
• Data are the facts and figures that are collected,
summarized, analyzed, and interpreted.
• The data collected in a particular study are
referred to as the data set.

Data regarding apples produced in Pakistan

City Size Price Quality


Murree 10 130 Good
Quetta 12 110 Best
Swat 14 80 Fair
Abbottabad 13 90 Poor
2/27/2020
14
Elements, Variables, and Observations

• The elements are the entities on which data


are collected.

City Size Price Quality


Murree 10 130 Good
Quetta 12 110 Best
Swat 14 80 Fair
Abbottabad 13 90 Poor
Each city is an element, data set contains 4 elements

2/27/2020
15
Elements, Variables, and Observations

• A variable is a characteristic of interest for


the elements.

City Size Price Quality


Murree 10 130 Good
Quetta 12 110 Best
Swat 14 80 Fair
Abbottabad 13 90 Poor

Size, price and quality are the variables

2/27/2020
Data, Data Sets, Elements, Variables
16
and Observations
Variables
Stock Annual Earn/
Company Exchange Sales($M) Sh.($)
Dataram AMEX 73.10 0.86
Energy South OTC 74.00 1.67
Keystone NYSE 365.70 0.86
Land Care NYSE 111.40 0.33
Psychemedics AMEX 17.60 0.13

Elements Data Set Datum

2/27/2020
17
Elements, Variables, and Observations

• The set of measurements collected for a


particular element is called an observation.
• The total number of data values in a data
set is the number of elements multiplied by
the number of variables.
• Data Values = Element Nos. * Variable Nos.

2/27/2020
Scale Data 18
• All types of data where Measurement and
Counts are used
For example,
 weights of students in a class
 Salaries of employees in an organization
 Percentage attendance
 Age of Patients in a hospital
 Number of Top Class Students in each Semester
 Hardness of the materials.
 Toughness of the material.
 Yield strength of materials 2/27/2020
Scaling Measurement 19

Ratio
Nominal
Scale
Scale

Types of
Measurement
Scaling

Interval Ordinal
Scale Scale

2/27/2020
Scales of Measurement 20
Nominal
Ordinal
Interval
Ratio
City Size Price Quality
Murree 10 130 Good
Quetta 12 110 Best
Swat 14 80 Fair
Abbottabad 13 90 Poor
Ratio, provides Interval has Nominal with
comparison fixed units ranking

2/27/2020
Scales of Measurement 21

Nominal:
• A Nominal measurement scale is used for
variables in which each participant or
observation in the study must be placed into
one mutually exclusive and exhaustive
category.

2/27/2020
Nominal Data 22
• Nominal (Data Describes Categories)
Nominal scales just name differences and are used most often
for qualitative variables in which observations are classified
into discrete groups. The key attribute for a nominal scale is
that there is no inherent quantitative difference among the
categories. Sex, religion, and race are three classic
nominal scales used in the behavioral sciences.
– Example:
Students of a university are classified by the school in
which they are enrolled using a nonnumeric label such
as EE, ME, CE, in Engineering, & Business, Humanities,
Education, in social science and so on.
Alternatively, a numeric code could be used for the school
variable (e.g. 1 denotes Business, 2 denotes
Humanities, 3 denotes Education, and so on).
2/27/2020
Scales of Measurement 23
Ordinal:
• An ordinal scale is next important
measurement type in the list of measurement.
• The simplest ordinal scale is a ranking.
• An ordinal scale only interprets gross order
and not the relative positional distances.
• Measurements with ordinal scales are
ordered in the sense that higher numbers
represent higher values.

2/27/2020
Ordinal Data 24
• Ordinal (where preference is specified)
It is the order of the values that tells about the important and
significant, but the differences between each one is not
really known.
Take a look at the example below. In each case, we know
that a #4 is better than a #3 or #2, but we don’t know–and
cannot quantify–how much better it is.
For example, the difference between “OK” and “Unhappy”
the same as the difference between “Very Happy” and
“Happy?” We can’t say. Ordinal scales are typically
measures of non-numeric concepts like Satisfaction,
Happiness, Discomfort, etc.
Example: A product produced by any industrial sector can
be scaled as:
1. Very Unsatisfied 2. Somewhat Unsatisfied
3. Neutral 4. Somewhat Satisfied 5. Very satisfied 2/27/2020
Scales of Measurement 25
Interval Scale:
• The interval scale provides the researchers more
quantitative information.
• When a variable is measured on an interval scale,
the distance between numbers or units on the
scale is equal over all levels of the scale.
• For example, central tendency can be measured
by mode, median, or mean; standard deviation
can also be calculated.
• The greatest draw back of the interval scale is that,
there is no absolute zero point on it.

2/27/2020
Scales of Measurement 26
Ratio Scale:
• A ratio scale is the top level of measurement and is often
available in social and behavioural research studies.
• Ratio scales give us the ultimate order, interval values,
plus the ability to calculate ratios since a “true zero” can
be defined.
• Ratio scales provide a wealth of possibilities when it
comes to statistical analysis. These variables can be
meaningfully added, subtracted, multiplied, divided
(ratios).
• Central tendency can be measured by mode, median,
or mean; measures of dispersion, such as standard
deviation and coefficient of variation can also be
calculated from ratio scales.
2/27/2020
Factors of High Quality Data: 27

• Completeness.
• Consistency.
• Accuracy.
• Validity.
• Timeliness.

2/27/2020
DFM 28
 How many parts
 How many different parts
 How many mfg steps.
 How many assembly
processes

2/27/2020
29
Component Elimination Example: Rollbar Redesign
 8 Parts
 4 different parts
 multiple mfg. & assembly
processes necessary

 24 Parts
 8 different parts
 multiple mfg. & assembly
processes necessary
 4 Parts  2 Parts
 3 different parts  2 Mfg. processes
 3 mfg. & assembly  one assembly step
Dr. Nazeer A. Anjum MED, UET,
processes necessary
Taxila 2/27/2020
Factors of High Quality Data: 30
Data Completeness
• Data completeness refers to whether there are
any gaps in the data from what was expected
to be collected, and what was actually
collected.
• Example: An inspection is done on a vehicle
and the inspector accidentally does not
indicate the current hour meter reading on
the vehicle, which is a required field for that
inspection. This has rendered the inspection
incomplete and less valuable because
important information is left out.
2/27/2020
Factors of High Quality Data: 31
Data Consistency
• Data consistency refers to whether the types of data
align with the expected versions of the data that should
be coming in. This may seem similar to data
completeness, and while they can be similar, they are
also quite different.
• Example: Referring to the same example of vehicle
inspection, if the vehicle operator mis-spells the type of
vehicle, the data is not consistent with what is
expected to be written and can create problems further
down the road. For example, this limits searching
functionality once the data is entered into a computer,
because a search function that pulls up a vehicle’s
history will now exclude the latest inspection because
of the spelling mistake. 2/27/2020
Factors of High Quality Data: 32
Data Accuracy
• Data accuracy refers to whether the collected data
is correct, and accurately represents what it
should.
• Example: Sticking with the vehicle inspection theme,
if the operator performs the inspection and puts in
a value for every field, and spells the type of
vehicle correctly and records the correct units of
measurements, he or she has complete and
consistent data. However, on the same inspection,
if the operator records the mileage at 40,000 miles
instead of 60,000 miles, this is inaccurate data,
resulting in misinformation and related issues.
2/27/2020
Factors of High Quality Data: 33
Data Validity
• Data validity can be a bit trickier than the previous
examples, and fixing invalid data often means that
there is an issue with a process rather than a result.
Validity of data is determined by whether the data
measure that which it is intended to measure.
• Example: One challenge with paper forms is the
rigidity of them, because they are a hassle to change.
Often new information needs to be included in order
for the data to remain valid, in that it accurately
measures what it is supposed to. When new
information is needed but forms don’t get changed, the
data is no longer valid because it does not properly
measure what it is supposed to. 2/27/2020
Factors of High Quality Data: 34
Data Timeliness
Data timeliness refers to the expectation of when data
should be received in order for the information to be
used effectively. The expectation and reality often do
not align, leading to ineffective use of the data and a
lack of data-driven decisions.
Example: When data is collected on paper in the field,
there is a significant lag between when the data is
collected to when it is used to drive informed decision
making. If a vehicle is inspected and has been
determined to need maintenance or repairs, and this
information is recorded on paper, often it can be days
before the information is submitted to the right person,
inputted into a computer and ultimately received by the
workers in the shop. 2/27/2020
Factors of High Quality Data: 35

ASSIGNMENT
Write down the corrective action to solve
such problems on all these Five factors with
examples according to Engineering and
Statistics approach with the help of
examples in an industrial context.

2/27/2020
Data Types & Sources 36
• Secondary & Primary Data
Categorical (Nominal and Ordinal)
Quantitative (Interval and Ratio)
Cross sectional (At same Time)
Time Series (Collected over several time periods)
• Data Sources
 Existing Sources
 Statistical Studies (Experimental or
Observational)
 Variable of interest identified? Blood pressure
drugs

2/27/2020
Secondary Data 37
• The secondary data is also known as published
data.
• Data which are not originally collected but
rather obtained from published sources and
statistically processed are known as
secondary data.
• For example: Data published by State Bank of
Pakistan, Ministry of Economic affairs, Ministry
of Agriculture etc along with international
bodies such as World Bank, Asian
Development Bank, International Labour
Organization, UNICEF, etc.
2/27/2020
Secondary Data 38
Merits of secondary data:
• Collection charge is less costly as data is
already available.
• It is faster to collect and process as compared
to primary data.
• It provides valuable insists and background
familiarity with the subject matter.
• It provides a base on which further
information can be collected to update the
data and finally use the data for the purpose of
research.
2/27/2020
Secondary Data 39
Demerits of secondary data:
• Locating appropriate source and finally getting
access to the data could be time consuming.
• The data available might be too vast and a lot
of time may be spend going through it.
• It is originally collected for some purposes
which is specific and not known to the present
researcher.

2/27/2020
Secondary Data 40
Sources of secondary data:
Dictionaries
To identify terminology of an industry--used for online
searches
To identify bellwether (trend) events in an industry
To identify knowledgeable people to interview
To identify organizations of influence
Encyclopedias
To identify historical or background information
To find critical dates within an industry
To find events of significance to the industry company
Handbooks
To find facts relevant to topic
To identify influential individuals through source
citations
2/27/2020
Primary Data 41
• These data are collected first time as
original data. These data are called raw
data.
• For example, data obtained in a population
census by the Registrar General and
Census Commissioner, Ministry of Home
Affairs is primary data.
• Data collected through experimentation.

2/27/2020
Primary Data 42
Merits of Primary data:
Primary data is more accurate and gives detailed
information according to the requirement
The explanation of terms, definition, and concepts
are incorporated in primary data
Methods of collection, its limitations and other allied
aspects are highlighted
It is more reliable and less prone to errors
Demerits of Primary data:
It is expensive to collect primary data.
It is time consuming method of data collection
It requires experts/trained personnel to collect the
primary data. Otherwise it may lead to wrong
observations/unreliable data collection.
2/27/2020

You might also like