You are on page 1of 30

Fundamentals of Data

Analysis (FDA)
Introduction to the course,
Importance
and
Fundamental Aspects
Why the course is important to me?
• In today’s world, decisions taken based on the results of data analysis has
more value. I want to be one among those decision makers.
• I wish to understand the situations completely and also find the latent
information.
• I am into one of the teams that deals with projects on analytics for clients.
• I want to forecast the demand, price, sales etc.
• I want to evaluate the performance of my employees.
• I was given the responsibility of comparing two ERP software and give the
detailed report.
• I wish to know whether the training program is effective or not?
• Are the variables considered significantly correlated?
• The results of the market research have to be presented in a precise
manner.
• Are the statements made by the finance team on current budget true?
Session Learning Objectives (SLO)

• What is statistics as branch?


• What is statistics as measurement tools?
• What is statistics as numbers?
• What a sample and population are?
• Difference between a sample survey and the census survey.
• Descriptive statistics.
• Inferential statistics.
• Types of Study-Cross sectional and Time series.
• Types of data and variables-Qualitative and Quantitative.
• Measuring the variables-Scaling (Nominal, Ordinal, Interval, and Ratio).
Statistics
The term statistics can refer to numerical facts such as
averages, medians, percents, and index numbers that help
us understand a variety of business and economic
situations.
Statistics can also refer to the art and science of
collecting, analyzing, presenting, and interpreting data.
Types of Study
Cross-Sectional data: Data collected at a given point of time or during a time
interval.
• A survey conducted to measure the satisfaction level of the customers.
• Employee satisfaction in an organization.
• A study on comparing the performance of different ERP software.
• Find a brand ambassador to a product.
• Finding the right arrangement in a department store.
• Finding the right mix of marketing for an organization.
• Identify the more frequently visited tourist place and reasons.
• Closing stock price of 25 stocks collected on 09.07.2019.
Types of Study
Time series data: Data collected at different time points regarding a
characteristic under study. Data change as the time changes.
• Daily, monthly, yearly sales of a product.
• Gross revenue of an organization taken for the last 10 years.
• Budget allotment for a sector.
• Prices of a stock that change with change in time.
• Petrol prices of a nation that change with time.
• Production of units that change monthly.
Data and Data Sets

Data are the facts and figures collected, analyzed, and


summarized for presentation and interpretation.

All the data collected in a particular study are referred


to as the data set for the study.
Elements, Variables, and Observations

Elements are the entities on which data are collected.

A variable is a characteristic of interest for the elements.

The set of measurements obtained for a particular


element is called an observation.

A data set with n elements contains n observations.

The total number of data values in a complete data set is the


number of elements multiplied by the number of variables.
Data, Data Sets, Elements, Variables, and
Observations
Observation Variables
Element
Names Stock Annual Earn/
Company Exchange Sales($M) Share($)

Dataram NQ 73.10 0.86


EnergySouth N 74.00 1.67
Keystone N 365.70 0.86
LandCare NQ 111.40 0.33
Psychemedics N 17.60 0.13

Data Set
Types of Data: Categorical and Quantitative Data

Data can be further classified as being categorical


or quantitative.

The statistical analysis that is appropriate depends


on whether the data for the variable are categorical
or quantitative.

In general, there are more alternatives for statistical


analysis when the data are quantitative.
Categorical Data
Labels or names used to identify an attribute of
each element

Often referred to as qualitative data

Use either the nominal or ordinal scale of


measurement

Can be either numeric or nonnumeric

Appropriate statistical analyses are rather limited


Quantitative Data

Quantitative data indicate how many or how much:

discrete, if measuring how many

continuous, if measuring how much

Quantitative data are always numeric.

Ordinary arithmetic operations are meaningful for


quantitative data.
Scales of Measurement

• Nominal Scale - groups or classes


Gender, color, professional classification, etc.
• Ordinal Scale - order matters
Ranks (top ten videos, products, etc.)
• Interval Scale - difference or distance matters – has arbitrary zero value.

Temperatures ( F, 0
C), marks, time,
0

• Ratio Scale - Ratio matters – has a natural zero value.

Salaries, Sales, costs, market share, number of purchasers, distance


travelled, etc.
Ordinal Scale

In the ordinal scale of measurement, data elements may be ordered


according to their relative size or quality.

Four products ranked by a consumer may be ranked as 1, 2, 3, and 4,


where 4 is the best and 1 is the worst. In this scale of measurement we
do not know how much better one product is than others, only that it is
better.
Interval Scale
In the interval scale of measurement the value of zero is assigned arbitrarily and
therefore we cannot take ratios of two measurements. But we can take ratios of
intervals.

A good example is how we measure time of day, which is in an interval scale. We


cannot say 10:00 A.M. is twice as long as 5:00 A.M. But we can say that the
interval between 0:00 A.M. (midnight) and 10:00 A.M., which is a duration of 10
hours, is twice as long as the interval between 0:00 A.M. and 5:00 A.M., which is
a duration of 5 hours. This is because 0:00 A.M. does not mean absence of any
time. Another example is temperature. When we say 0°F, we do not mean zero
heat. A temperature of 100°F is not twice as hot as 50°F.
Ratio Scale

If two measurements are in ratio scale, then we can take ratios of those
measurements. The zero in this scale is an absolute zero. Money, for example, is
measured in a ratio scale. A sum of $100 is twice as large as $50. A sum of $0
means absence of any money and is thus an absolute zero. We have already seen
that measurement of duration (but not time of day) is in a ratio scale. In general, the
interval between two interval scale measurements will be in ratio scale. Other
examples of the ratio scale are measurements of weight, volume, area, or length.
Summary

Data

Categorical Quantitative

Numeric Non-numeric Numeric

Nominal Ordinal Nominal Ordinal Interval Ratio


Data Sources
• Statistical Studies - Experimental
In experimental studies the variable of interest is
first identified. Then one or more other variables
are identified and controlled so that data can be
obtained about how they influence the variable of
interest.

The largest experimental study ever conducted is


believed to be the 1954 Public Health Service
experiment for the Salk polio vaccine. Nearly two
million U.S. children (grades 1- 3) were selected.
Data Sources
 Statistical Studies - Observational

In observational (nonexperimental) studies no


attempt is made to control or influence the
variables of interest.
a survey is a good example

Studies of smokers and nonsmokers are


observational studies because researchers
do not determine or control
who will smoke and who will not smoke.
Data Acquisition Considerations

Time Requirement
• Searching for information can be time consuming.
• Information may no longer be useful by the time it is available.

Cost of Acquisition
• Organizations often charge for information even
when it is not their primary business activity.
Data Errors

• Using any data that happen to be available or were acquired


with little care can lead to misleading information.
Example 1

The HR manager of an organization wishes to identify employees’


satisfaction level with respect to the working conditions of the
organization. The following questionnaire was formulated to get the
details from the employees. The questionnaire was divided into two
sections. The first section includes demographic characteristics of the
respondents. The other section focus on their satisfaction levels with
respect to different parameters.
Questionnaire for the study on employee satisfaction:
Kindly fill the questionnaire by giving a tick on the appropriate box for the
questions.
1. Age 2. Experience 3. Gender 4. Marital status
< 20 • <1 yr   • male • married 
 20-30 • 1-2  • female • unmarried
• 2-4 
30-40
• 4-8  5. Department
40-50
• 8-15 
> 50 • >15
Working conditions and hygiene issues

1. Satisfaction with the surrounding environment and general layout of office

Extremely very moderately slightly not at all


satisfied satisfied satisfied satisfied satisfied
         

2. Satisfaction with the geographically situated workplace

Extremely very moderately slightly not at all


satisfied satisfied satisfied satisfied satisfied
         
Example-2
A survey by an electric company contains questions on the following:
1. Age of household head.
2. Gender of household head.
3. Number of people in household.
4. Use of electric heating (yes or no).
5. Number of large appliances used daily.
6. Average number of hours heating is on.
7. Average number of heating days.
8. Household income.
9. Average monthly electric bill.
10.Ranking of this electric company as compared with two previous
electricity suppliers.
Example-3

An individual federal tax return form asks, among other things, for the
following information: income (in dollars and cents), number of dependents,
whether filing singly or jointly with a spouse, whether or not deductions are
itemized, amount paid in local taxes. Describe the scale of measurement of
each variable, and state whether the variable is qualitative or quantitative.
Example-4

Describe each of the following variables as qualitative


(categorical) or quantitative.
Data Level Meaningful Operations

Nominal Classifying and Counting

Ordinal All of the above plus Ranking


Interval All of the above plus Addition, Subtraction,
Multiplication, and Division (including
means, standard deviations, etc.)
Ratio
All of the above
Unlearn Before you Learn

Thank you

You might also like