You are on page 1of 7

ASSIGNMENT No.

1
Subject: Educational Statistics(8614)
(Units 1–4)
Subject
Name : Asia Noor Roll # BY627591 B.Ed 1.5 years Spring 2020

Q.1: Describe level of measurement. Give five example of each level and explain the role of level of
measurement in decision making.

ANS: Level of measurement


The level of measurement refers to the relationship among the values that are assigned to the attributes for a
variable. What does that mean? Begin with the idea of the variable, in this example “party affiliation.”
hat variable has a number of attributes. Let’s assume that in this particular election context the only relevant
attributes are “republican”, “democrat”, and “independent”. For purposes of analyzing the results of this
variable, we arbitrarily assign the values 1, 2 and 3 to the three attributes. The level of
measurement describes the relationship among these three values. In this case, we simply are using the
numbers as shorter placeholders for the lengthier text terms. We don’t assume that higher values mean
“more” of something and lower numbers signify “less”. We don’t assume the the value of 2 means that
democrats are twice something that republicans are. We don’t assume that republicans are in first place or
have the highest priority just because they have the value of 1. In this case, we only use the values as a
shorter name for the attribute. Here, we would describe the level of measurement as “nominal”.

There are typically four levels of


measurement that are defined:

• Nominal
• Ordinal
• Interval
• Ratio

In Nominal measurement the numerical values just “name” the attribute uniquely. No ordering of the cases is
implied. For example, jersey numbers in basketball are measures at the nominal level. A player with
number 30 is not more of anything than a player with number 15, and is certainly not twice whatever
number 15 is.

In Ordinal measurement the attributes can be rank-ordered. Here, distances between attributes do not have
any meaning. For example, on a survey you might code Educational Attainment as 0=less than high school;
1=some high school.; 2=high school degree; 3=some college; 4=college degree; 5=post college. In this
measure, higher numbers mean more education. But is distance from 0 to 1 same as 3 to 4? Of course not.
The interval between values is not interpretable in an ordinal measure.

n Interval measurement the distance between attributes does have meaning. For example, when we measure
temperature (in Fahrenheit), the distance from 30-40 is same as distance from 70-80. The interval between
values is interpretable. Because of this, it makes sense to compute an average of an interval variable, where
it doesn’t make sense to do so for ordinal scales. But note that in interval measurement ratios don’t make any
sense - 80 degrees is not twice as hot as 40 degrees (although the attribute value is twice as large).

Finally, in Ratio measurement there is always an absolute zero that is meaningful. This means that you can
construct a meaningful fraction (or ratio) with a ratio variable. Weight is a ratio variable. In applied social
research most “count” variables are ratio, for example, the number of clients in past six months. Why?
Because you can have zero clients and because it is meaningful to say that “…we had twice as many clients
in the past six months as we did in the previous six months.”

It’s important to recognize that there is a hierarchy implied in the level of measurement idea. At lower lev els of
measurement, assumptions tend to be less restrictive and data analyses tend to be less sensitive. At each
level up the hierarchy, the current level includes all of the qualities of the one below it and adds something
new. In general, it is desirable to have a higher level of measurement (e.g., interval or ratio) rather than a
lower one (nominal or ordinal).

Q.2:Differentiate between Primary and Secondary Data. Give meaningful examples with explanation.

ANS: Primary Data: Primary data is the kind of data that is collected directly from the data source without going
through any existing sources. It is mostly collected specially for a research project and may be shared publicly to
be used for other research

Primary data is often reliable, authentic, and objective in as much as it was collected with the purpose of
addressing a particular research problem. It is noteworthy that primary data is not commonly collected because of
the high cost of implementation.

A common example of primary data is the data collected by organizations during market research, product
research, and competitive analysis. This data is collected directly from its original source which in most cases are
the existing and potential customers.

Most of the people who collect primary data are government authorized agencies, investigators, research-based
private institutions, etc.
Examples of Primary Data

• Market Research

This is an important aspect of business strategy that involves the process of gathering information about the target
market and customers. The data gathered during market research is primary as it is tailored specifically to meet
the business needs.An organization doing market research about a new product (say phone) they are about to
release will need to collect data like purchasing power, feature preferences, daily phone usage, etc. from the target
market. The data from past surveys are not used because the product differs.

• Student Thesis

When conducting academic research or a thesis experiment, students collect data from the primary source. The
kind of data collected during this process may vary according to the kind of research being performed—lab
experiments, statistical data gathering, etc.
For example, a student carrying out a research project with the aim of finding out the effect of daily intake of fruit
juice on an individual's weight will need to take a sample population of 2 or more people, feed them with fruit juice
daily and record the changes in their weight. The data gathered throughout this process is primary.

Secondary Data: Secondary data is the data that has been collected in the past by someone else but made
available for others to use. They are usually once primary data but become secondary when used by a third party.
Secondary data are usually easily accessible to researchers and individuals because they are mostly shared
publicly. This, however, means that the data are usually general and not secondary data analysis is the process of
analyzing data collected from another researcher who primarily collected this data for another purpose.
Researchers leverage secondary data to save time and resources that would have been spent on primary data
collection.

Secondary data analysis process can be carried out quantitatively or qualitatively depending on the kind of data
the researcher is dealing with. The quantitative method of secondary data analysis is used on numerical data and
is analyzed mathematically, while the qualitative method uses words to provide in-depth information about data.
Examples of Secondary Data

• Books

Books are one of the most traditional ways of collecting data. Today, there are books available for all topics you
can think of. When carrying out research, all you have to do is look for a book on the topic being researched on,
then select from the available repository of books in that area. Books, when carefully chosen are an authentic
source of authentic data and can be useful in preparing a literature review.

• Journal

Journals are gradually becoming more important than books these days when data collection is concerned. This is
because journals are updated regularly with new publications on a periodic basis, therefore giving to date
information.

Also, journals are usually more specific when it comes to research. For example, we can have a journal on,
"Secondary data collection for quantitative data" while a book will simply be titled, "Secondary data collection".

• Newspapers

In most cases, the information passed through a newspaper is usually very reliable. Hence, making it one of the
most authentic sources of collecting secondary data.

The kind of data commonly shared in newspapers is usually more political, economic, and educational than
scientific. Therefore, newspapers may not be the best source for scientific data collection.

Q.3: Explain advantages and disadvantages of bar charts and scatter plot.
ANS: A bar chart is a pictorial rendition of statistical data in which the independent variable can attain only certain
discrete values. The dependent variable may be discrete or continuous. The most common form of bar chart is the
vertical bar graph, also called a column chart.
In a vertical bar chart, values of the independent variable are plotted along a horizontal axis from left to right.
Function values are shown as shaded or colored vertical bars of equal thickness extending upward from the
horizontal axis to various heights.
In a horizontal bar chart, the independent variable is plotted along a vertical axis from the bottom up. Values of the
function are shown as shaded or colored horizontal bars of equal thickness extending toward the right, with their
left ends vertically aligned.

Advantages of Bar Chart:


• Show each data category in a frequency distribution
• Display relative numbers/proportions of multiple categories
• Summarize a large amount of data in a visual, easily interpretable form
• Make trends easier to highlight than tables do
• Estimates can be made quickly and accurately
• Permit visual guidance on accuracy and reasonableness of calculations
• Accessible to a wide audience
• Bar charts are made of simple rectangular bars, making it easier to draw them.
• The scales and figures are easy to read.
• It shows all the categories present in a distribution.
• Bar charts summarize large complex data into an easy visual format for understanding.
• Changes or differences in groups are easy to point out than in tables.
• Estimations and Calculations can be easily made using this chart.
• Bar charts are most effective with discrete data, such as rainfall over a month.

Disadvantages of Bar Chart:


• often require additional explanation
• fail to expose key assumptions, causes, impacts and patterns
• can be easily manipulated to give false impressions
• It is hard to identify the difference between two bars with similar looking heights.
• Sometimes bar charts require additional information for readers to fully understand it.
• Only when bar charts show frequency distribution, each data category can be observed properly.
• Bar charts are very common and have lost impact on the readers.
• Bar charts often fail to mark key assumptions, patterns, and causes.
• Log scales are not allowed in bar charts, as they must start at zero.
• The tall bars may overshadow minute details.

Q.4: Explain normal distribution. How does normality of data effect the analysis of data?

ANS: Normal Distribution:


Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about
the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. In graph
form, normal distribution will appear as a bell curve.

Understanding Normal Distribution


The normal distribution is the most common type of distribution assumed in technical stock market analysis and in
other types of statistical analyses. The standard normal distribution has two parameters: the mean and
the standard deviation. For a normal distribution, 68% of the observations are within +/- one standard deviation of
the mean, 95% are within +/- two standard deviations, and 99.7% are within +- three standard deviations.

The normal distribution model is motivated by the Central Limit Theorem. This theory states that averages
calculated from independent, identically distributed random variables have approximately normal distributions,
regardless of the type of distribution from which the variables are sampled (provided it has finite variance). Normal
distribution is sometimes confused with symmetrical distribution. Symmetrical distribution is one where a dividing
line produces two mirror images, but the actual data could be two humps or a series of hills in addition to the bell
curve that indicates a normal distribution.

Normality of data effect the analysis of data:

The normal distribution is the most important probability distribution in statistics because it fits many natural
phenomena. For example, heights, blood pressure, measurement error, and IQ scores follow the normal
distribution. It is also known as the Gaussian distribution and the bell curve.

The normal distribution is a probability function that describes how the values of a variable are distributed. It is a
symmetric distribution where most of the observations cluster around the central peak and the probabilities for
values further away from the mean taper off equally in both directions. Extreme values in both tails of the
distribution are similarly unlikely.

Example of Normally Distributed Data: Heights

Height data are normally distributed. The distribution in this example fits real data that I collected from 14-year-old
girls during a study.As you can see, the distribution of heights follows the typical pattern for all normal distributions.
Most girls are close to the average (1.512 meters). Small differences between an individual’s height and the mean
occur more frequently than substantial deviations from the mean. The standard deviation is 0.0741m, which
indicates the typical distance that individual girls tend to fall from mean height.

The distribution is symmetric. The number of girls shorter than average equals the number of girls taller than
average. In both tails of the distribution, extremely short girls occur as infrequently as extremely tall girls.

In addition to all of the above, there are several other reasons why the normal distribution is crucial in statistics.

o Some statistical hypothesis tests assume that the data follow a normal distribution. However, as I explain in
my post about parametric and nonparametric tests, there’s more to it than only whether the data are
normally distributed.
o Linear and nonlinear regression both assume that the residuals follow a normal distribution. Learn more in
my post about assessing residual plots.
o The central limit theorem states that as the sample size increases, the sampling distribution of the mean
follows a normal distribution even when the underlying distribution of the original variable is non-normal.
Q.5: How is mean different from median? Explain the role of level of measurement in measure of central
tendency.

ANS: Mean (or average) and median are statistical terms that have a somewhat similar role in terms of
understanding the central tendency of a set of statistical scores. While an average has traditionally been a
popular measure of a mid-point in a sample, it has the disadvantage of being affected by any single value being
too high or too low compared to the rest of the sample. This is why a median is sometimes taken as a better
measure of a mid point.

Mean
The mean is the arithmetic average of a set of numbers, or distribution. It is the most commonly used measure of
central tendency of a set of numbers.

Applicability: The mean is used for normal distributions.

Relevance to the data set: The mean is not a robust tool since it is largely influenced by outliers.

How to calculate: A mean is computed by adding up all the values and dividing that score by the number of
values.

Median
The median is described as the numeric value separating the higher half of a sample, a population, or a probability
distribution, from the lower half.

Applicability: The median is generally used for skewed distributions.

Relevance to the data set: The median is better suited for skewed distributions to derive at central tendency
since it is much more robust and sensible.

How to calculate:
The Median is the number found at the exact middle of the set of values. A median can be computed by listing all
numbers in ascending order and then locating the number in the centre of that distribution.

In mathematics and statistics, the mean or the arithmetic mean of a list of numbers is the sum of the entire list
divided by the number of items in the list. When looking at symmetric distributions, the mean is probably the best
measure to arrive at central tendency. In probability theory and statistics, a median is that number separating the
higher half of a sample, a population, or a probability distribution, from the lower half.

The Role Of Level Of Measurement In Measure Of Central Tendency:


Measures of central tendency help you find the middle, or the average, of a data set. The 3 most common
measures of central tendency are the mode, median, and mean.

• Mode: the most frequent value.


• Median: the middle number in an ordered data set.
• Mean: the sum of all values divided by the total number of values.
In addition to central tendency, the variability and distribution of your data set is important to understand when
performing descriptive statistics.
easures of Central Tendency provide a summary measure that attempts to describe a whole set of data with a
single value that represents the middle or centre of its distribution. There are three main measures of central
tendency: the mean, the median and the mode.
When data is normally distributed, the mean, median and mode should be identical, and are all effective in
showing the most typical value of a data set.

It's important to look the dispersion of a data set when interpreting the measures of central tendency.

Mean

The mean of a data set is also known as the average value. It is calculated by dividing the sum of all values in a
data set by the number of values.

So in a data set of 1, 2, 3, 4, 5, we would calculate the mean by adding the values (1+2+3+4+5) and dividing by
the total number of values (5). Our mean then is 15/5, which equals 3.

Disadvantages to the mean as a measure of central tendency are that it is highly susceptible to outliers
(observations which are markedly distant from the bulk of observations in a data set), and that it is not appropriate
to use when the data is skewed, rather than being of a normal distribution.

Median

The median of a data set is the value that is at the middle of a data set arranged from smallest to largest.

In the data set 1, 2, 3, 4, 5, the median is 3.

In a data set with an even number of observations, the median is calculated by dividing the sum of the two middle
values by two. So in: 1, 2, 3, 4, 5, 6, the median is (3+4)/2, which equals 3.5.

The median is appropriate to use with ordinal variables, and with interval variables with a skewed distribution.

Mode

The mode is the most common observation of a data set, or the value in the data set that occurs most frequently.

The mode has several disadvantages. It is possible for two modes to appear in the one data set (e.g. in: 1, 2, 2, 3,
4, 5, 5, both 2 and 5 are the modes).

You might also like