You are on page 1of 22

DATA ANALYSIS

Dr. Urooj A Siddiqui


DATA ANALYSIS
 Data – Raw Facts, especially numerical facts,
collected together for reference or
information.
 Data is collected on some particular
variable/s
 Data analysis is processing of data to derive
useful information
 Knowledge communicated concerning some
particular fact
 The created knowledge helps in APPLICATION /
DECISION MAKING
DATA & ITS MEASUREMENT
 Categorical: Qualitative
 Continuous: Quantitative

Data

Categorical Continuous

Nomin Ordina Interva


Ratio
al l l
VARIABLE
 Any phenomenon which takes at least two
different values/ observations
 Data: Set of values/ observations
collected on variable is called data
 Nominal
 Ordinal
 Interval
 Ratio
DATA ANALYSIS
1. Data Preparation / Initial 2. Summarizing Data / Analysis
Operations Operations

 Editing / Cleaning
 Table / Crosstab
 Coding
 Graph / Figure
 Statistical Methods
 Classification  Frequency, %age, Ratio,
 Tabulation  Mean, Median, Standard
 Graphical Deviation (Variance)
Representation  Advance Statistical
Methods / Analysis
 Comparison (t/z-test)
 Association (chi square)
 Correlation (r)
 Regression (y = ax+b)
DATA ANALYSIS
1. DATA PREPARATION / INITIAL OPERATIONS
 Editing / Data Cleaning
 examining the collected raw data to detect any
errors and omit/correct it if possible
 Coding
 assigningnumerals to answers so that responses can
be put into a limited number of categories
 Classification
 Grouping of data on some basis (large volume of raw
data is reduced into homogenous groups
I. Attribute - on the basis of demographic bases
eg. gender, rural/urban, day scholar/hosteller
II. Class Interval – on the basis on some numeric
range eg. 0-10, 10-20 etc.
DATA ANALYSIS
2. DATA SUMMARIZATION / DATA ANALYSIS OPERATIONS

I. Tabulation
 isthe process of displaying raw data in tabular
form and summarising it for further analysis
 orderly arranging data in columns and rows

Tabulation is essential because


 It conserves space and reduces statements
 It facilitates the process of summation of
items, comparison, detection of errors and
omissions
 Basis for various statistical computations
TABULATION OF DATA
temp of
Yrs in Pain
Name Gender Caste Age Mob. No. Edu IQ locality
school level
deg cel

Ram M Hindu 60 9450366367 NIL 0 16 Mild-0 -4

Musli
Akbar M 65 8004896712 HS 16 14 Mod-2 20
m

Sita F Hindu 305 9934876545 Int. 19 0 Mild-0 15

Shalini F Hindu 90 2542543598 HS 8 16 Mild-0 0

Mehnaj F Sikh 38 9458098734 UG 21 13 Severe -3 0

Ravi M Hindu 48 9412890112 PG 23 20 Mod-2 -1

Hari M Hindu 45 8796654398 Prim 12 10 Mod-2 30


EDITING & CODING OF DATA
temp of
Edu Yrs in Pain
Name Gender Caste Age Mob.No. IQ locality
level sch. level
deg cel

7 1 1 60 9450366367 -1 0 16 0 -4

2 1 2 65 8004896712 1 16 14 2 20

5 2 1 35 9934876545 2 19 0 0 15

4 2 1 90 2542543598 1 8 16 0 0

3 2 3 38 9458098734 3 21 13 3 0

6 1 1 48 9412890112 4 23 20 2 -1

1 1 1 45 8796654398 0 12 10 2 30

Nominal & Ordinal called qualitative . Interval and Ratio called quantitative
DATA ANALYSIS
2. DATA SUMMARIZATION / DATA ANALYSIS OPERATIONS
Roll. Age
 Single / Multi Variable Table - one or No (yr)
more variable (no interaction) 1 22
2 24
Single Variable Freq. Table
3 23
Age Group (years) Freq. 4 26
Below 20 2 5 19
20-22 28 6 22
22-24 16 . .
24-26 10 . .
Above 26 4 . .
60 . .
. .
**Multiple Variable Table – as presented in above slide
60 22
DATA ANALYSIS
2. DATA SUMMARIZATION / DATA ANALYSIS OPERATIONS
 Crosstabs – interaction of two or more
variables
Two Variable Interaction – Crosstab

Gender

Age Group Male Female Total


Below 20 1 1 2
20-22 18 10 28
22-24 9 7 16
24-26 7 3 10
Above 26 3 1 4
38 22 60
DATA ANALYSIS
2. DATA SUMMARIZATION / DATA ANALYSIS OPERATIONS

Graphical Representation of Data


 Pie Chart
 Bar Graph
 Line Graph
 Scatter Plot
DATA ANALYSIS
2. DATA SUMMARIZATION / DATA ANALYSIS OPERATIONS

Pie Charts
 Used to represent %ages, distribution of 1
variable at various levels

Sales (in mn)


1.2,
8%
1.4, 1st Qtr
10% 2nd Qtr
3rd Qtr
3.2, 8.2; 59% 4th Qtr
23%
DATA ANALYSIS
2. DATA SUMMARIZATION / DATA ANALYSIS OPERATIONS

Bar Chart
 To represent 1 variable at various levels
 Levels can be year/ groups etc.

Sales
4

2 4.3 4.5
3.5
2.5
1

0
2018 2019 2020 2021
DATA ANALYSIS
2. DATA SUMMARIZATION / DATA ANALYSIS OPERATIONS

Bar Chart

Clustered Bar
5

4
1st
3 2nd
3rd
2 4.3 4.4 4
3.5 4th
3 3
1 2.4 2 2.5 2.5
2 1.8

0
2018 2019 2020
DATA ANALYSIS
2. DATA SUMMARIZATION / DATA ANALYSIS OPERATIONS

Histogram
 To show the distribution of a quantitative
variable
12

10

8
Frequency

6
10
4 8
6
2 4
2
0 0
10 20 30 40 50

Class Interval/Variable Unit


DATA ANALYSIS
2. DATA SUMMARIZATION / DATA ANALYSIS OPERATIONS

Line Diagram
 To show change in variable from in a particular
time period / reference range or points

₹ 7.40
₹ 7.20
Stock Price

₹ 7.00
₹ 6.80
₹ 6.60
₹ 6.40
₹ 6.20
₹ 6.00
₹ 5.80
₹ 5.60
1 2 3 4 5 6 7 8 9 10
Last 10 Days
DATA ANALYSIS
2. DATA SUMMARIZATION / DATA ANALYSIS OPERATIONS

Line Diagram
 May also be used to compare 2 or more variables
along the range

14
12
10
8 Adani
6 Tata
Reliance
4
2
0
1 2 3 4 5 6 7 8
DATA ANALYSIS
2. DATA SUMMARIZATION / DATA ANALYSIS OPERATIONS

Scatter Plot
 to express relationships between two variables

6
5
4
Sales in 3
Crore
2 Y-Values
1
0
0.5 1 1.5 2 2.5 3 3.5 4
Adv Budget in 10’Lacs
DATA ANALYSIS
2. DATA SUMMARIZATION / DATA ANALYSIS OPERATIONS

Scatter Plot
 to express relationships between two variables
DATA ANALYSIS
2. DATA SUMMARIZATION / DATA ANALYSIS OPERATIONS

Scatter Plot
 Trend Lines - Correlation
FREQUENCY DISTRIBUTION

No. of
Income 80
families
70
0-500 20
60
500-1000 30 50

No.of families
1000-1500 50 40

1500-2000 70 30

2000-2500 30 20

2500-3000 10 10

0
. . 0 500 1000 1500 2000 2500 3000 3500 4000

Income
. .

You might also like