You are on page 1of 29

MATH OBTAINING

403 DATA
ENGINEER
ING
DATA
ANALAYSI
S
Introduction
Statistics Descriptive Statistics
deals with the procedures
defined as the science that deals
with the collection, organization, that organize, summarize
presentation, analysis, and and describe quantitative
interpretation of data in order be data. It seeks merely to
able to draw judgments or describe data.
conclusions that help in the
decision-making process

Descriptive Statistics Inferential Statistics

Inferential Statistics
deals with making a
judgment or a conclusion
about a population based
on the findings from a
sample that is taken from
the population
https://www.youtube.com/watch?v=0VDafmUys04
Intended Learning Outcomes
At the end of this module, it is expected that the students will be able to:

1. Demonstrate an understanding of the different methods of obtaining


data.
2. Explain the procedures in planning and conducting surveys and
experiments.
Statistical Terms
(Before proceeding to the discussion of the different methods of obtaining data, let us have first definition of some
statistical terms: )

Population or Universe
refers to the totality of objects, persons, places,
things used in a particular study. All members of a
particular group of objects (items) or people
(individual), etc. which are subjects or respondents
of a study.

Sample

is any subset of population or few members of a


population.
Data
are facts, figures and information collected on some characteristics of a population or
sample. These can be classified as qualitative or quantitative data.

Ungrouped (or raw) data Grouped Data


are data which are not are raw data organized
organized in any specific way. into groups or categories
They are simply the collection with corresponding
of data as they are gathered. frequencies. Organized
in this manner, the data
is referred to as
frequency distribution.
Parameter
is the descriptive measure of a characteristic of a
population/ whole.
Statistic
is a measure of a characteristic of sample

Constant
is a characteristic or property of a population or sample which is common to all
members of the group.

Variable
A variable is any characteristics, number, or quantity that
can be measured or counted. A variable may also be
called a data item. Age, sex, business income and
expenses, country of birth, capital expenditure, class
grades, eye colour and vehicle type are examples
of variables.
Methods on Obtaining Data

I. Methods of Data Collection

II. Planning and Conducting Surveys

III. Planning and Conducting Experiments


I. Methods of Data Collection

• Collection of the data is the first step in conducting statistical inquiry. It simply refers to the
data gathering, a systematic method of collecting and measuring data from different sources
of information in order to provide answers to relevant questions.
• This involves acquiring information published literature, surveys through questionnaires or
interviews, experimentations, documents and records, tests or examinations and other forms
of data gathering instruments.

Investigator Enumerator Respondent

person who conducts the the one who helps in collecting information is collected
inquiry information from them
DATA Tell whether if it is a Primary or
Secondary Data
Primary Secondary
Raw data
According to Wessel, “Data Secondary data, on the other biographies
collected in the process of hand, is collected by some other dictionary
organization for their own use diary
investigation are known as surveys
primary data.” These are but the investigator also gets it photographs
for his use. According to M.M.
collected for the
Blair, “Secondary data are those Tax records books
investigator’s use from the
already in existence for some
primary source.
other purpose than answering
dissertations Reports
the question in hand.” experiments letters
questionnaire interview
Internet articles
Political Journals
commentary
In Engineering, there are three basic methods of collecting data

retrospective study observational study Experiments designed


In engineering, there are problem
would use the population or the researchers only
areas with no scientific or
sample of the historical data observe the subjects and
engineering theory that are
which had been archived over do not interfere or try to
directly or completely applicable,
some period of time. influence the outcomes
so experimentation and
observation of the resulting data is
the only way to solve them.
An example of an experiment
is when scientists give rats a new
medicine and see how they react to
learn about the medicine. An
example of an experiment is when
you try a new coffee shop but you
aren't sure how the coffee will taste.
The result of experimentation.
II. Planning and Conducting Surveys
• Advantages of face-to-face interviews include fewer
Face to face misunderstood questions, fewer incomplete responses,
higher response rates, and greater control over the
environment in which the survey is administered; also,
Survey the researcher can collect additional information if any of
the respondents’ answers need clarifying
is a method of asking • The disadvantages of face-to-face interviews are that
respondents some well- they can be expensive and time-consuming and may
constructed questions. It is require a large staff of trained interviewers. In addition,
an efficient way of the response can be biased by the appearance or attitude
collecting information and of the interviewer.
easy to administer • Less expensive than interviews.
wherein a wide variety of Self- • It can be administered in large numbers and does not
information can be administer require many interviewers and there is less pressure on
collected. respondents.
• The respondents are more likely to stop participating
mid-way through the survey and respondents cannot ask
to clarify their answers. There are lower response rates
than in personal interviews
When designing a survey, the following steps are useful:
1. Determine the objectives of your survey:
What questions do you want to answer?

2. Identify the target population sample:


Whom will you interview? Who will be the respondents? What
sampling method will you use?

3. Choose an interviewing method:


face-to-face interview, phone interview, self-
administered paper survey/internet survey.

4. Decide what questions you will ask in what order, and how to phrase them.

5. Conduct the interview and collect the information.

6. Analyze the results by making graphs and drawing


conclusions.
In choosing the respondents, sampling techniques are necessary.

Sampling Sampling is the process of selecting units (e.g., people, organizations) from a
population of interest

Sample must be a representative of the target population. The target population is the entire
group a researcher is interested in; the group about which the researcher wishes to
draw conclusions.
Two ways of selecting a
sample.

Probability sampling Non-probability sampling

Probability sampling is defined as a sampling It is also called judgment or subjective sampling. This
technique in which the researcher chooses samples method is convenient and economical but the inferences
from a larger population using a method based on the made based on the findings are not so reliable
theory of probability. For a participant to be It is a sampling method in which not all members of the
considered as a probability sample, he/she must be population have an equal chance of participating in the
selected using a random selection. study, unlike probability sampling. Each member of the
The most critical requirement of probability sampling population has a known chance of being selected. Non-
is that everyone in your population has a known and probability sampling is most useful for exploratory studies
equal chance of getting selected. like a pilot survey (deploying a survey to a smaller sample
For example, if you have a population of 100 people, every compared to pre-determined sample size). Researchers use
person would have odds of 1 in 100 for getting selected.
Probability sampling gives you the best chance to create a
this method in studies where it is impossible to draw
sample that is truly representative of the population. random probability sampling due to time or cost
considerations.
Convenience Sampling
Non-probability sampling The researcher use a device in obtaining the
information from the respondents which
favors the researcher but can cause bias to the
respondents.
Convenience It means collecting a sample of whichever
Sampling participants are easiest to reach

Purposive
Sampling

Quota
Sampling.
Purposive Sampling
Non-probability sampling The selection of respondents is predetermined
according to the characteristic of interest made
by the researcher. Randomization is absent in
this type of sampling.
Convenience The participants are selected based on the
Sampling purpose of the sample, hence the name.
Participants are selected according to the
needs of the study (hence the alternate name,
deliberate sampling); applicants who do not
Purposive
meet the profile are rejected.
Sampling

Quota
Sampling.
Quota
Sampling.
Non-probability sampling
Proportional Non Proportional
20 employees In proportional quota Non-proportional quota
40% SM_8 sampling the major sampling is a bit less
Convenience 30% CS_6 restrictive. In this method, a
characteristics of the
Sampling 20% IT_4 minimum number of
10% Finance_2 population by sampling a
proportional amount of each sampled units in each
is represented. category is specified and
7000male 70% not concerned with having
Purposive • For example, imagine you want to create a numbers that match the
3000 fmale30% council of 20 employees that will meet and
Sampling recommend possible changes to the proportions in the
employee handbook. Let's say 40% of your
employees are in Sales and Marketing, population.
30% in Customer Service, 20% of your
employees are in IT, and 10% in Finance.
You will randomly select 8 people from
Sales and Marketing, 6 from Customer
Quota Service, 4 from IT, and 2 from Finance. As
Sampling. you can see, each number you pick is
proportionate to the overall percentage of
people in each category (e.g., 40% = 8
people).
Simple Random Sampling
Simple random sampling is the basic sampling Probability Sampling
technique where a group of subjects (a sample)
is selected for study from a larger group (a
population). Each individual is chosen entirely
by chance and each member of the population Simple Random
has an equal chance of being included in the Sampling
sample. Every possible sample of a given size
has the same chance of selection; i.e. each
member of the population is equally likely to
be chosen at any stage in the sampling process. Stratified
Sampling

Cluster Sampling.
Stratified Sampling
A stratified sample is obtained by taking samples Probability Sampling
from each stratum or sub-group of a population.
When a sample is to be taken from a population with
several strata, the proportion of each stratum in the
sample should be the same as in the population Simple Random
Sampling

For example, you have three sub-groups with a


population size of 150, 200, 250 subjects in each Stratified
subgroup respectively. Now, to make it
Sampling
proportionate, the researcher uses one specific
fraction or a percentage to be applied on its
subgroups of population. The sample for first
group would be 150*0.5= 75, 200*0.5=100 and Cluster Sampling.
250*0.5= 125. Here the constant factor is the
proportion ration for each population subset.
Cluster Sampling
Cluster sampling is a sampling technique where the entire population is Probability Sampling
divided into groups, or clusters, and a random sample of these clusters
are selected. All observations in the selected clusters are included in the
sample.
In cluster sampling, researchers divide a population into smaller groups
known as clusters. They then randomly select among these clusters to Simple Random
form a sample.
Cluster sampling is often used to study large populations, particularly
Sampling
those that are widely geographically dispersed. Researchers usually use
pre-existing units such as schools or cities as their clusters.

Stratified
Sampling

Cluster Sampling
III. Planning and Conducting Experiments
Experiment
is a series of tests conducted in a systematic manner
to increase the understanding of an existing process
or to explore a new product or process

Design of Experiments, or DOE


is a tool to develop an experimentation strategy that maximizes
learning using minimum resources. Design of Experiments is widely
and extensively used by engineers and scientists in improving
existing process through maximizing the yield and decreasing the
variability or in developing new products and processes. It is a
technique needed to identify the "vital few" factors in the most
efficient manner and then directs the process to its best setting to
meet the ever-increasing demand for improved quality and increased
productivity.
Methodology of DOE
ensures that all factors and their
interactions are systematically investigated
resulting to reliable and complete
information
Planning

Screening
Five stages of
Methodology Optimization
of DOE
robustness testing

Verification
1. Planning

It is important to carefully plan for the course of experimentation before embarking


upon the process of testing and data collection. At this stage, identification of the
objectives of conducting the experiment or investigation, assessment of time and
available resources to achieve the objectives. Individuals from different disciplines
related to the product or process should compose a team who will conduct the
investigation. Well planned experiments are easy to execute and analyze using the
available statistical software.
2. Screening

Screening experiments are used to identify the important factors that affect the process
under investigation out of the large pool of potential factors. Screening process eliminates
unimportant factors and attention is focused on the key factors. Screening experiments are
usually efficient designs which require few executions and focus on the vital factors and not
on interactions.
3. Optimization
After narrowing down the important factors affecting the process, then determine the best
setting of these factors to achieve the objectives of the investigation. The objectives may be
to either increase yield or decrease variability or to find settings that achieve both at the
same time depending on the product or process under investigation
It is an act, process, or methodology of making something (such as a design, system, or
decision) as fully perfect, functional, or effective as possible specifically It is the
mathematical procedures (such as finding the maximum of a function) involved in this.
4. Robustness Testing

A robust statistic is resistant to errors in the results.


Once the optimal settings of the factors have been determined, it is important to make the
product or process insensitive to variations resulting from changes in factors that affect the
process but are beyond the control of the analyst. Such factors are referred to as noise or
uncontrollable factors that are likely to be experienced in the application environment. It is
important to identify such sources of variation and take measures to ensure that the
product or process is made robust or insensitive to these factors.
5. Verification
A process in which different types of data are checked for accuracy and inconsistencies
after data migration is done.
This final stage involves validation of the optimum settings by conducting a few follow-up
experimental runs. This is to confirm that the process functions as expected and all
objectives are achieved.
REFERENCES:

• Montgomery, Douglas C.,et al., Applied Statistics and Probabiliy for Engineers, 7th ed., John Wiley & Sons
(Asia) Pte Ltd, 2018
• Panopio, Felix M. (2004). Statistics with Probability. Batangas City, Philippines: Feliber Publishing House
• Rawley, Eve. Planning and Conducting Surveys. https://www.ck12.org/statistics/planning-and-conducting-
surveys/lesson/Planning-and-Conducting-Surveys-ALG-I/ Date accessed: July 27, 2020
• Walpole, Ronald E., et al., Probability and Statistics for Engineers and Scientists, 9th ed., Pearson Education
Inc., 2016
• Introduction to Design of Experiments. https://www.weibull.com/hotwire/issue84/hottopics84.htm. Date
Accessed: April 15, 2020
• https://mathspace.co/learn/world-of-maths/language-and-use-of-statistics/planning-a-statistical-investigation-
i-investigation-18643/investigation-statistical-inquiry-916/
ACTIVIT
Y
As one of the students of EDA class, you are tasked to conduct a survey to show which
extracurricular activities the students from the College of Engineering, Architecture and Fine
Arts would like to engage in during the first semester. Follow the presented steps in conducting a
survey.(Steps are in slide #12)

You might also like