You are on page 1of 20

EMPIRICAL ANALYSIS FOR PUBLIC POLICY

Data Presentation and Correlation (mid-term exam)

Sadia Malik
NDU-GPP/M.P-19/F-460
MPhil 2nd Semester
INSTRUCTOR
Dr. Shehzad Hussain

Department of Government and Public Policy

National Defence University, Islamabad


What is best way to describe/present data in research? Also explain its

importance.

Data Presentation:

There are several types of professional economist who test and use theoretical models of various

aspects of economy. Economist working as a civil servant has to deal with the merits and demerits

of policies. Economist working in banks give advice on issues relating to monetary policy. On the

other hand, economist working in private companies deals with the sales and profits of the

company. So all of these economist deals with data and the ability to work with data is an important

skill. Moreover, they have to take decisions make predictions about policies and forecast what may

happen in the future. Thus all of these decisions based on data. However, economist use numerous

amount of facts and information in the form of “data” to analyze the policies to solve many

economics issues.

In simplest term, data are a set of facts that provide a partial picture of reality. There are many

types of data such as time series data (specific point in time e.g. GDP rate, interest rate), cross

sectional data (based on units e.g. people, companies and countries), panel data (both time series

and cross sectional data) etc.

“Data presentation may refers to arrangement and organization of data in different statistical

methods such as tables, graphs, or charts, so that logical and statistical conclusion can be derived

from the collected measurements”.


Data Presentation requires professional skill and understanding of data. It is necessary to process

the collected data which is considered to be raw data. By using various techniques economist can

present collected raw data through different data collection techniques. Presenting data may

include pictorial representations of the data by using graphs, charts, maps and other methods.

These methods provide the visual aspects of data which makes it easily understanding and

convenient. Thus when you collected the data, the next step is to summarize into an informative

way because it is difficult to understand the original raw data set.

GRAPHS OF TIME SERIES:

The data presentation in a graphical form make the data simple and easily understandable. Graphs

tell a story with visual presentation instead of words and numbers. Time series graphs shows one

and more variables from one period of time to other period of time. By using this graphs, we can

analyses the change in variables regarding period of time.

Example:

The following graph indicated the number of covid-19 cases in January to June in Pakistan. As we

can see the number of covid-19 cases increase over the period of time. I have taken this information

to describe how we can present data in a convenient form that might be easy to understand. This

time series graph shows a pattern of data that clearly indicates the increase in covid-19 over the

period of time.
Source: ZME Science

HISTOGRAMS:

Another convenient and reliable way to present data is histogram. Histogram provide a numerical

data through visual interpretation. Histogram divide data into partition and provide a frequency

distribution of each partitions that is called a “Bin”. If we don’t have frequency count and we use

proportions, the graph will equal to 1, and the histogram shows a normal curve to judge whether

our data seems to follow a normal distribution. It is necessary to clear that histogram is not a bar-

plot. A bar-plot is common plot which usually shows the mean scores of some data while a

histogram show a frequency or proportion count and intervals of the collected data. Histogram can

give information about the distribution which is systematically distributed.

Example:

As earlier mentioned that histogram is a convenient way to present data and provide information

about the phenomenon. To understand the term histogram and I have taken an example of current
phenomenon of covid-19 outbreak in the world. I take 35 countries data about covid-19 outbreak.

The data consist on number of covid-19 cases in one day (6 June) in these countries. When I

collected the data from internet it was random and unarranged and difficult to understand. The

number of cases in all these countries often similar to each other so there is a need of frequency

distribution. Intervals in histogram make it easier to understand. In this example 35 countries

present one day data about covid-19 cases that is presenting in table as well as histogram graph.

Table 1.1

No of Cases Frequency
(Class Intervals)
0-500 18

500-1000 7

1000-1500 1

1500-2000 2

2000-2500 2

2500-3000 1

3000-3500 2

3500-4000 2
Histogram: Graph

XY_ PLOTS:

XY-Plots is another best graphical technique to present data that shows the relationship between

two or more variable in research studies, for example the relationship but deforestation and

population, higher education and wages, pollution and population poverty and crime etc.

Economist are more interested in exploring the relationship between variable. XY-Plot provide a

great understanding of the data which may be difficult to understand raw data or a merely list of

numbers. XY-Plot plays an important role in data analysis and economist used it to describe the

relationship between variable. Graphical representation such as XY-Plot provide insight


information on the data. It also provide a visual presentation of pairs or variable not only one or

two.

Example:

Construct an x-y plot for the data of sale of ice-cream versus the noon temperature. This will

show how the temperature of the day effects the sales of ice-cream of a local shop. The data for

last 12 days is as follows;

Temperature OC Ice-cream sales in $


14.2 215
16.4 325
11.8 475
18.5 260
22.4 117
15.3 330
19.7 412
20.6 445
8.4 395
15.9 280
13.1 360
21.6 275

XY-Plot

Ice-cream sales vs. Temp.


500
450
400
350
300
250
200
150
100
50
0
0 5 10 15 20 25
Importance of Data Presentation:

• Presentation of data help the decision makers to understand the information in comfortable

way and make the policies.

• Helping decision makers understand how the business data is being interpreted to

determine business decisions.

• Presentation of data shows the relationship between variables therefore it is necessary to

present data in graphical form.

• Presentation of data provide the researcher and policy maker a focus point instead of

original raw data.

• By presenting data on graphs and charts one can easily compare the results over different

periods of time.

• Present data in a pictorial form or some types of graphs may provide a summary of unseen

realities and facts.

• Handling large amounts of data in continent way and using graphs and tables reveals the

insights of the data and tell the story behind it so that decision makers can establish its goal

according to the facts.

• Data presentation helps the researcher to use it secondary data for her/his further research.
Explain correlation. How do you interpret it? What the reasons are for arise of
correlation between/among variables?

CORRELATION:

Correlation analysis describe the strength and direction of the linear relationship between

two variables. This statistical technique that shows whether the relationship between variable is

strong or week. For example, wearing heavy coats in winter than summer shows that there is

relationship between coats and weather and describe a correlation between two variables.

The properties of correlation described us that a correlation should have two variables and these

variables do not have any level within them. In correlation, both variables are continuous. It is not

necessary to define the variables as dependent and independent in correlation.

Economist are interested to shows the nature of variables and the relationship between them such

as the relationship between poverty and education, wages and skills, interest rate and inflation etc.

So it is said that the change in one variable may cause the change in other variable then these two

variables are called correlated. However, the correlation is the convenient tool to quantifying the

relationship between two variables. Here, there are some definition that may help to further

understand the concept of correlations.

Craxton and Cowden:

“When the relationship is of a quantative nature, the approximate statistical tool for discovering

and measuring the relationship and expressing it in a brief formula is known as correlation”
A.M. Tuttle:

“Correlation is an analysis of the co-variation between two or more variables”

Simpson and Kofka:

“Correlation analysis deals with the association between two or more variables”

As we know that correlation exist between two variable so that value of one variable association

with the value of other variable. Correlation shows the pattern in data and shows a relationship

between these variables, though these pattern may be linear, exponential, logarithmic or periodic.

Through these pattern we can draw a scatter plot of the data.

Correlation Graphs:

Presenting data or information in pictorial form or by using graphs, charts, maps and other methods
provide a great understanding of the relationship between variables. Graphs provide a visual
aspects of the relationship which makes it easily understanding and convenient. Here there are
some graphs of correlation which shows different kind of relationship between variables.
As we can see that if the graph moves up it shows the correlation is “Positive” and if it moves

down the correlation is “Negative”. If we talk about the relationship between variables it may

refers as “Week”, “Moderate” and “Strong” relationship.

Types of Correlation:

Generally, there are three types of correlation that are mentioned below;

i. Positive Correlation:

Positive correlation refers to as the one increase in variable, increase the value to other variable

so there is a positive correlation. There will be a positive correlation between two variables, if both

variables are directly proportional to each other i.e. increase in one variable (independent variable)
causes an increase in the other variable (dependent variable) and decrease in one variable cause

decrease in other variable.

Examples:

➢ Increase in price of a good increase the supply of the commodity.

➢ Higher education or skill may cause the higher wages in a company.

➢ The density of the population in city may increase the number of cases of covi-19 in a city.

Positive Correlation

ii. Negative Correlation:

Negative correlation shows a relationship in which one variable increase and the other variable

decrease and vice versa. Further it can be defined as a relationship between two variables can be

termed as a negative correlation if both Variables are inversely proportional to one another i.e.
decrease in one variable causes increase in the other variables. A negative correlation is

represented by -1.

Examples:

➢ The relationship between education and crime shows a negative correlation as education

may cause the decrease in crime in a country.

➢ Increase in price of a good may cause the decrease in demand of a good shows a negative

relationship.

➢ Increase in altitude decreases the oxygen level.

Negative Correlation

iii. Zero Correlation (No Correlation):

If one value increases but the other value does not take its effect and no change occurred it may

refer to zero correlation. Zero correlation is termed as when two variables are not associated or
related to one another. In other words, we can say that when there is no cause and effect

relationship between two variables, then there will be a zero correlation.

Example:

➢ There is no relationship between prices of a commodity and school fees

Zero Correlation

The Interpretation of Correlation: Causality

As above mentioned that correlation indicated the relationship between two variables. This

relationship can be studied as in the term of causality of influence. Indeed, in some cases causality

and correlation are closely related. Correlation do not imply causation because two variable can

be related without causing the other variable. In simple words, the relationship between two

variable does not means that one variable cause another. A correlation can be interpreted in
different ways. If we take an example of teacher quality and teacher preparation. In this example

we can interpret the relationship between these two variables in 3 different ways. Firstly the

relationship these two variable may be interrupted as greater preparation leads to grater teaching

quality. It means that those teachers who prepare can have better quality to teach and instruct.

Secondly, the interpretation about high quality of teaching spend more time on preparation. Thirdly

the preparation indicates that any third variable may influence on both variables.

Correlation Coefficient

Correlation Coefficient expresses the strength of the correlation or expresses the strength of

relationship between two variables. This coefficient is represented by symbol r. It can be expressed

in numbers or scales ranging from -1 to +1.

• Stronger Correlation

There is no hard and fast rule which determines that which one correlation is strong but generally

the correlation which is considered to be strong have its correlation coefficient values as much as

close to -1 and +1.

• Weaker Correlation

If the correlation coefficient value is close to zero, then it will be considered as a weaker

relationship between these variables. As, a rule of thumb, the following guidelines on strength of

relationship are often useful.


Correlation Formulas:

➢ Pearson Correlation Coefficient

The most common formula is the Pearson correlation coefficient used for linear

dependency. The value of the coefficient lies between -1 to +1 and the formula is;

𝑛(Σxy) − (Σx)(Σy)
𝑟=
√[𝑛Σ𝑥 2 − (Σx)2 ][𝑛Σ𝑦 2 − (Σy)2 ]

Where,

n = Quantity of Information

Σx = Total of the first variable value

Σy = Total of the second variable value

Σx2 = Sum of the squares of the first variable value

Σy2 = Sum of the squares of the second variable value

Σxy = Sum of the product of first and second value

➢ Linear Correlation Coefficient Formula

The formula for Linear Correlation Coefficient Formula is;

∑𝑛 𝑛 𝑛
𝑖=1 𝑥𝑖𝑦𝑖− ∑𝑖=1 𝑥𝑖 ∑𝑖=1 𝑦𝑖
rxy =
2 2
√𝑛 ∑𝑛 2 𝑛
𝑖=1 𝑥𝑖 −(∑𝑖=1 𝑥𝑖 )
√𝑛 ∑𝑛 2 𝑛
𝑖=1 𝑦𝑖 −(∑𝑖=1 𝑦𝑖 )

➢ Sample Correlation Coefficient Formula

The formula for Sample Correlation Coefficient Formula is;


rxy = Sxy / SxSy

Where;

Sx and Sy are the sample standard deviations and Sxy is the sample covariance.

➢ Population Correlation Coefficient Formula

The formula for Population Correlation Coefficient Formula is;

rxy = σxy / σxσy

where; σx and σy are the population standard deviations and σxy is the population covariance.

Understanding Correlation through Scatter-Plot:

A correlation can be present in a graphical form make the relationship simple and easily

understandable. Graphs tell a story with visual presentation instead of words and numbers. A

correlation can be expressed visually. This is done by drawing a scatter gram (also known as a

scatterplot, scatter graph, scatter chart, or scatter diagram). This methods provide the visual aspects

of the relationship of variables which makes it easily understanding and convenient. Thus when

you construct a relationship of two variable, the next step is to present it into an informative way

because it is difficult to understand. The scatter-plot use to explore the relationship between two

variables. The value if one variable appears on the horizontal axis and the value of another variable

appear on the vertical axis.


Example:

The relationship between students’ hardworking and GPA

In this example, we analyze the relationship between students’ hardworking and their GPA. In this

table there is a list of students and their GPA has been mentioned with respect to students

hardworking. GPAs range is range 0 to 4 and the student’s hard work scores in this example range

from 0 to 100. If we simply see the table we can see that when student’s hardworking score increase

the GPA of that student also increase.

Students Hard Work level (in scores 0-100) GPA of Students

Sadia 50 2.0

Sobia 48 2.0

Arshad 100 3.8

Sofia 12 1.5

Marry 34 1.9

Shekel 30 2.0

Ahmad 78 3.5

Junaid 87 3.6

Salma 84 3.1

Ali 75 3.0
Hard Working Level vs GDP of Student
120

100

80

60

40

20

0
0 0.5 1 1.5 2 2.5 3 3.5 4

The image present the visual aspect of the relationship between students’ hand working and GPA

of the students. Each dot on the scatterplot represents one individual from the data set. The location

of each point on the graph depends on both the GPA and the level of hand Working. Individuals

with higher level of hard working are located further to the right and individuals with higher GPA

scores are located higher up on the graph. The purpose of a scatterplot is to provide a general

illustration of the relationship between the two variables. In this example, in general, as level of

hardworking increases individual’s GDP score also increase and provide a positive relationship

both of these variables.


Reasons for Arise of Correlation between Variable:

➢ Correlation is useful for those variables that are difficult to study and it allows the

researcher to investigate the some unethical or impractical variables to test experimentally.

For example, it would be unethical to conduct an experiment on whether smoking causes

lung cancer.

➢ Correlation also use to define the relationship between variables that can be present in

graphical form.

➢ The use of correlation help the researcher in hypothesis building and testing the hypothesis.

➢ The reason for arise of correlation can be that there is often a pattern in data so the

correlation helps to understand and show that pattern between the variable.

➢ The other reason may be that correlation between data indicates that when one variable

change it may change the other variable so to understand this change in variable and the

strength of this relationship correlation in a convenient tool for researcher.

You might also like