You are on page 1of 18

ALLAMA IQBAL OPEN UNIVERSITY, ISLAMABAD

Educational Statistics (8614)


Munir Ahmed 0333-4801885
Registration No. 14BND00107 B. Ed 1.5 Year

Assignment No. 1 Units 1-4

Q.NO 1 Scientific method is a systematic way to identify and solve problems. Discuss.
Answer Scientific method:

The scientific method is the process of objectively establishing facts through testing and experimentation. The basic
process involves making an observation, forming a hypothesis, making a prediction, conducting an experiment and
finally analyzing the results. The principals of the scientific method can be applied in many areas, including scientific
research, business and technology.

Steps of the scientific method

The scientific method uses a series of steps to establish facts or create knowledge. The overall process is well
established, but the specifics of each step may change depending on what is being examined and who is performing it.
The scientific method can only answer questions that can be proven or disproven through testing.

Make an observation or ask a question.


The first step is to observe something that you would like to learn about or ask a question that you would like answered.
These can be specific or general. Some examples would be "I observe that our total available network bandwidth drops
at noon every weekday" or "How can we increase our website registration numbers?" Taking the time to establish a
well-defined question will help you in later steps.

Gather background information. This involves doing research into what is already known about the topic. This can
also involve finding if anyone has already asked the same question.

Create a hypothesis. A hypothesis is an explanation for the observation or question. If proven later, it can become a
fact. Some examples would be "Our employees watching online videos during lunch is using our internet bandwidth" or
"Our website visitors don't see our registration form."

Create a prediction and perform a test.


Create a testable prediction based on the hypothesis. The test should establish a noticeable change that can be measured
or observed using empirical analysis. It is also important to control for other variables during the test. Some examples
would be "If we block video-sharing sites, our available bandwidth will not go down significantly during lunch" or "If
we make our registration box bigger, a greater percentage of visitors will register for our website than before the
change."

Analyze the results and draw a conclusion. Use the metrics established before the test see if the results match the
prediction. For example, "After blocking video-sharing sites, our bandwidth utilization only went down by 10% from
before; this is not enough of a change to be the primary cause of the network congestion" or "After increasing the size of
the registration box, the percent of sign-ups went from 2% of total page views to 5%, showing that making the box
larger results in more registrations."

Share the conclusion or decide what question to ask next: Document the results of your experiment. By sharing the
results with others, you also increase the total body of knowledge available. Your experiment may have also led to other
questions, or if your hypothesis is disproven you may need to create a new one and test that. For example, "Because user
activity is not the cause of excessive bandwidth use, we now suspect that an automated process is running at noon every
day."

Using the scientific method in technology and computers

The scientific method is incredibly valuable in technology and related fields. It is obviously used in research and
development, but it is also useful in day-to-day operations. Because almost everything can be quantified, testing
hypotheses can be easy.

Most modern computer systems are complicated and difficult to troubleshoot. Using the scientific method of hypothesis
and testing can greatly simplify the process of tracking down errors and it can help find areas of improvement. It can
also help when you evaluate new technologies before implementation.

Using the scientific method in business

Many business processes benefit when using the scientific method. Shifting business landscapes and complex business
relationships can make behaviors hard to predict or act counter to previous history. Instead of using gut feelings or
previous experience, a scientific approach can help businesses grow. Big data initiative can make business information
more available and easier to test with.

The scientific method can be applied in many areas. Customer satisfaction and retention numbers can be analyzed and
tested upon. Profitability and finance numbers can be analyzed to form new conclusions. Making predictions on
changing business practices and checking the results will help to identify and measure success or failure of the
initiatives.

Common pitfalls in using the scientific method


The scientific method is a powerful tool. Like any tool, though, if it is misused it can cause more damage than good.

The scientific method can only be used for testable phenomenon. This is known as falsifiability. While much in nature
can be tested and measured, some areas of human experience are beyond objective observation.

Both proving and disproving the hypothesis are equally valid outcomes of testing. It is possible to ignore the outcome or
inject bias to skew the results of a test in a way that will fit the hypothesis. Data in opposition to the hypothesis should
not be discounted.

It is important to control for other variables and influences during testing to not skew the results. While difficult, not
accounting for these could produce invalid data. For example, testing bandwidth during a holiday or measuring
registrations during a sale event may introduce other factors that influence the outcome.

Another common pitfall is mixing correlation with causation. While two data points may seem to be connected, it is not
necessarily true that once is directly influenced by the other. For example, an ice cream stand in town sees drops in
business on the hottest days. While the data may look like the hotter the weather, the less people want ice cream, the
reality is that more people are going to the beach on those days and less are in town.

History of the scientific method:

The discovery of the scientific method is not credited to any single person, but there are a few notable figures who
contributed to its development.

The Greek philosopher Aristotle is considered to be one of the earliest proponents of logic and cycles of observation and
deduction in recorded history. Ibn al-Haytham, a mathematician, established stringent testing methodologies in pursuit
of facts and truth, and he recorded his findings.

During the Renaissance, many thinkers and scientists continued developing rational methods of establishing facts. Sir
Francis Bacon emphasized the importance of inductive reasoning. Sir Isaac Newton relied on both inductive
and deductive reasoning to explain the results of his experiments, and Galileo Galilei emphasized the idea that results
should be repeatable.

Other well-known contributors to the scientific method include Karl Popper, who introduced the concept of falsifiability,
and Charles Darwin, who is known for using multiple communication channels to share his conclusions.
Q.NO 2 Discuss importance and scope of Statistics with reference to a teacher and researcher.
Answer Definition of statistic education.
Statistics is a science which deals with the method of collecting, classifying, identifying and interpreting numerical date
which throw some light on any sphere of enquiry and investigation.
the term statictics has been defined widely some important definitions has been provided here.

Statistics is now used in many fields, because our knowledge and decisions with regard to everything depend on the facts
and data available for psychologists and educationists statistics is used in test construction, experiments and research.
statistics has now become fundamental in education and psychology.
The word statistics generally means accumulated numerical statements, as also the theory of statistics. the science of
statistics is now quite and has been defined in a number of ways. Bowley defined statistics as ''Numerical statements of
facts in any department of enquiry placed in relation to each other. '' Webster defined it as 'classified facts respecting the
condition of the people in a state.
statistics are aggregates of fact numerical expressed, that they are responsible accurate, they are collected in a systematic
manner and are influenced by a number of facts.

Nature of educational statistics


The foundation of statistical methods is provided by mathematics. the mathematical theory of statistics has in
fact achieved recognition as an area of specialization the general field of higher mathematics. no longer is it possible to
qualify as statistical expert and be relative ignorant mathematically. it means possible, never the less to acquire some very
useful information regarding the application and interpretation of certain important statistical techniques without studying
their mathematical bases now statistics is the independent field of study.
Statistics is now used in many fields, because our knowledge and decisions with regard to everything depend on
the facts and data available for psychologists and educationists statistics is used in test construction, experiments and
research. statistics has now become fundamental in education and psychology.
Educational statistics deals with the data associate with education.
The studies of statistical treatment always involve empirical or observed evidences or data but not all studies
involving empirical data are statistical. factual information about an individual is not statistical information. a statistical
problem always relates to a sample or group of individuals rather than to a single individual. the group or sample is the
frame of reference in statistical analysis and interpretation. the numerical data and statistics are interpreted with reference
to a group.

Statistics Example
An example of statistical analysis is when we have to determine the number of people in a town who watch TV
out of the total population in the town. The small group of people is called the sample here, which is taken from the
population.

Types of Statistics
The two main branches of statistics are:

 Descriptive Statistics
 Inferential Statistics
Descriptive Statistics – Through graphs or tables, or numerical calculations, descriptive statistics uses the data to
provide descriptions of the population.

Inferential Statistics – Based on the data sample taken from the population, inferential statistics makes the predictions
and inferences.

Characteristics of Statistics
The important characteristics of Statistics are as follows:

 Statistics are numerically expressed.


 It has an aggregate of facts
 Data are collected in systematic order
 It should be comparable to each other
 Data are collected for a planned purpose
Importance of Statistics
The important functions of statistics are:
 Statistics helps in gathering information about the appropriate quantitative data
 It depicts the complex data in graphical form, tabular form and in diagrammatic representation to understand it
easily
 It provides the exact description and a better understanding
 It helps in designing the effective and proper planning of the statistical inquiry in any field
 It gives valid inferences with the reliability measures about the population parameters from the sample data
 It helps to understand the variability pattern through the quantitative observations

Statistics: Basic concepts, definitions and history, Scope in Education


The term statistics is ultimately derived from the New Latin statisticum collegium ("council of state") and
the Italian word statista ("statesman" or "politician"). The German Statistik, first introduced
by Gottfried Achenwall (1749), originally designated the analysis of data about the state, signifying the "science of
state" (then called political arithmetic in English). It acquired the meaning of the collection and classification of data
generally in the early 19th century. It was introduced into English in 1791 by Sir John Sinclair when he published the
first of 21 volumes titled Statistical Account of Scotland.

Statistics

1.Data are everywhere


2.Statistical techniques are used to make many decisions that affect our lives
3.No matter what your career, you will make professional decisions that involve data. An understanding of statistical
methods will help you make these decisions effectively
Applications of statistical concepts
—Finance – correlation and regression, index numbers, time series analysis
—Marketing – hypothesis testing, chi-square tests, nonparametric statistics
—Personel – hypothesis testing, chi-square tests, nonparametric tests
—Operating management – hypothesis testing, estimation, analysis of variance, time series analysis.

Statistics
—The science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective
decisions
—Statistical analysis – used to manipulate, summarize, and investigate data, so that useful decision-making information
results.
—The study of principles and methods used in collecting, presenting, analyzing and interpreting numerical data that is
called Statistics.

Functions of Statistics:
1.To Present Facts in Definite Form:
2.Precision to the Facts:
3.Comparisons:
4.Formulation and Testing of Hypothesis:
5.Forecasting:
6.Policy Making:
7.It enlarges Knowledge:
8.To Measure Uncertainty:

Scope of Statistics in Education

1. It helps the teacher to provide the most exact type of description:


2. It makes the teacher definite and exact in procedures and thinking
3. It enables the teacher to summarize the results in a meaningful and convenient form
4. It enables the teacher to draw general conclusions
5. It helps the teacher to predict the future performance of the pupils
6. Statistics enables the teacher to analyse some of the causal factors underlying complex and otherwise be-wildering
events
Types of statistics
—Descriptive statistics – Methods of organizing, summarizing, and presenting data in an informative way
—Inferential statistics – The methods used to determine something about a population on the basis of a sample
◦Population –The entire set of individuals or objects of interest or the measurements obtained from all individuals or
objects of interest

Need, Importance and Uses of Statistics:


1. Group Comparison:
The achievements of a class are not uniform in every subject. It is found that one class is progressing faster is one
subject, while another is progressing is a different one. Even the various sections of a particular class do not progress
uniformly.

2. Individual Comparison:
Statistics helps in the individual comparison of students differing in respect of their ages, abilities and intelligence
levels. It is statistics which tells us why thus students who are similar in every other respect yet do not show similar
achievement is one particular subject.

3. Educational and Vocational Guidance:


Every individual student differs from others in his intellectual ability, interests, attitude and mental abilities students are
given educational and vocational guidance so that they make the best use of these abilities and the process of guidance is
based upon statistics only.

4. Educational Experiments and Research:


With a change in place, line and circumstances, the aims, curricula and methods of education keep on changing. The
work of research and experimentation cannot become reliable and valid without the use of statistics.

5. Essential for Professional Efficiency:


The teacher’s responsibility does not end when he teaches a particular subject in the classroom. His responsibility
includes teaching the students, obtaining the desired level of knowledge for himself and assessing the achievement of
modification in behaviour also.

6. Basis of Scientific Approach to Problems:


Statistics forms the basis of scientific approach to problems of Educational Psychology.

Meaning of Graphical Representation of Data:


A graphic representation is the geometrical image of a set of data. It is a mathematical picture. It enables us to think
about a statistical problem in visual terms. A picture is said to be more effective than words for describing a particular
thing or phenomenon.

Consequently the graphic representation of data proves quite an effective and an economic device for the presentation,
understanding and inter predation of the collected statistical data. The statistical data can be represented by diagram,
charts etc., so that the significance attached to these data may immediately be grasped, of course, the diagrams should be
neatly and accurately drawn.

Advantages of Graphical Representation of Data:


1. The data can be presented in a more attractive and an appealing form.
2. It provides a more lasting effect on the brain. It is possible to have an immediate and a meaning group of large
amounts of data through such presentation.
3. Comparative analysis and interpretation may be effectively and easily made.
4. Various valuable statistics like median, mode, quartiles may be easily computed. Through such representation, we
also get an indication of correlation between two variables.
5. Such representation may help in the proper estimation evaluation and interpretation of the characteristics of items and
individuals.
6. The real value of graphical representation use in us economy and effectiveness. It carries a lot of communication
power.
7. Graphical representation helps in for-casting, as it indicates the trend of the data in the past.

Modes of Graphical Representation of Data:


We know that the data in the form of raw scores is known as ungrouped data and when it is organised into a frequency
distribution, then it is referred to as grouped data. Separate methods are used to represent, these two types of data-
ungrouped and grouped. Let us discuss them under separate heads.

Q.NO 3 Elaborate probability sampling techniques.


Answer When you conduct research about a group of people, it’s rarely possible to collect data from every person in that group.
Instead, you select a sample. The sample is the group of individuals who will actually participate in the research.
To draw valid conclusions from your results, you have to carefully decide how you will select a sample that is
representative of the group as a whole. This is called a sampling method. There are two primary types of sampling
methods that you can use in your research:

 Probability sampling involves random selection, allowing you to make strong statistical inferences about the
whole group.
 Non-probability sampling involves non-random selection based on convenience or other criteria, allowing
you to easily collect data.

You should clearly explain how you selected your sample in the methodology section of your paper or thesis, as well as
how you approached minimizing research bias in your work.

Population vs. sample


First, you need to understand the difference between a population and a sample, and identify the target population of
your research.

 The population is the entire group that you want to draw conclusions about.
 The sample is the specific group of individuals that you will collect data from.

The population can be defined in terms of geographical location, age, income, or many other characteristics.
It can be very broad or quite narrow: maybe you want to make inferences about the whole adult population of your
country; maybe your research focuses on customers of a certain company, patients with a specific health condition, or
students in a single school.
It is important to carefully define your target population according to the purpose and practicalities of your project.
If the population is very large, demographically mixed, and geographically dispersed, it might be difficult to gain access
to a representative sample. A lack of a representative sample affects the validity of your results, and can lead to
several research biases, particularly sampling bias.

Sampling frame
The sampling frame is the actual list of individuals that the sample will be drawn from. Ideally, it should include the
entire target population (and nobody who is not part of that population).
Example: Sampling frameYou are doing research on working conditions at a social media marketing company.
Your population is all 1000 employees of the company. Your sampling frame is the company’s HR database,
which lists the names and contact details of every employee.

Sample size

The number of individuals you should include in your sample depends on various factors, including the size
and variability of the population and your research design. There are different sample size calculators and
formulas depending on what you want to achieve with Probability sampling methods
Probability sampling means that every member of the population has a chance of being selected. It is mainly used
in quantitative research. If you want to produce results that are representative of the whole population, probability
sampling techniques are the most valid choice.
There are four main types of probability sample.
statistical analysis.
1. Simple random sampling
In a simple random sample, every member of the population has an equal chance of being selected. Your sampling
frame should include the whole population.
To conduct this type of sampling, you can use tools like random number generators or other techniques that are based
entirely on chance.
Example: Simple random samplingYou want to select a simple random sample of 1000 employees of a social media
marketing company. You assign a number to every employee in the company database from 1 to 1000, and use a
random number generator to select 100 numbers.

2. Systematic sampling
Systematic sampling is similar to simple random sampling, but it is usually slightly easier to conduct. Every member of
the population is listed with a number, but instead of randomly generating numbers, individuals are chosen at regular
intervals.
If you use this technique, it is important to make sure that there is no hidden pattern in the list that might skew the
sample. For example, if the HR database groups employees by team, and team members are listed in order of seniority,
there is a risk that your interval might skip over people in junior roles, resulting in a sample that is skewed towards
senior employees.

3. Stratified sampling
Stratified sampling involves dividing the population into subpopulations that may differ in important ways. It allows you
draw more precise conclusions by ensuring that every subgroup is properly represented in the sample.
To use this sampling method, you divide the population into subgroups (called strata) based on the relevant
characteristic (e.g., gender identity, age range, income bracket, job role).
Based on the overall proportions of the population, you calculate how many people should be sampled from each
subgroup. Then you use random or systematic sampling to select a sample from each subgroup.

4. Cluster sampling
Cluster sampling also involves dividing the population into subgroups, but each subgroup should have similar
characteristics to the whole sample. Instead of sampling individuals from each subgroup, you randomly select entire
subgroups.
If it is practically possible, you might include every individual from each sampled cluster. If the clusters themselves are
large, you can also sample individuals from within each cluster using one of the techniques above. This is
called multistage sampling.
This method is good for dealing with large and dispersed populations, but there is more risk of error in the sample, as
there could be substantial differences between clusters. It’s difficult to guarantee that the sampled clusters are really
representative of the whole population.

Non-probability sampling methods


In a non-probability sample, individuals are selected based on non-random criteria, and not every individual has a
chance of being included.
This type of sample is easier and cheaper to access, but it has a higher risk of sampling bias. That means the inferences
you can make about the population are weaker than with probability samples, and your conclusions may be more
limited. If you use a non-probability sample, you should still aim to make it as representative of the population as
possible.
Non-probability sampling techniques are often used in exploratory and qualitative research. In these types of research,
the aim is not to test a hypothesis about a broad population, but to develop an initial understanding of a small or under-
researched population.

1. Convenience sampling
A convenience sample simply includes the individuals who happen to be most accessible to the researcher.
This is an easy and inexpensive way to gather initial data, but there is no way to tell if the sample is representative of the
population, so it can’t produce generalizable results. Convenience samples are at risk for both sampling
bias and selection bias.

2. Voluntary response sampling


Similar to a convenience sample, a voluntary response sample is mainly based on ease of access. Instead of the
researcher choosing participants and directly contacting them, people volunteer themselves (e.g. by responding to a
public online survey).
Voluntary response samples are always at least somewhat biased, as some people will inherently be more likely to
volunteer than others, leading to self-selection bias.

3. Purposive sampling
This type of sampling, also known as judgement sampling, involves the researcher using their expertise to select a
sample that is most useful to the purposes of the research.
It is often used in qualitative research, where the researcher wants to gain detailed knowledge about a specific
phenomenon rather than make statistical inferences, or where the population is very small and specific. An effective
purposive sample must have clear criteria and rationale for inclusion. Always make sure to describe your inclusion and
exclusion criteria and beware of observer bias affecting your arguments.

4. Snowball sampling
If the population is hard to access, snowball sampling can be used to recruit participants via other participants. The
number of people you have access to “snowballs” as you get in contact with more people. The downside here is also
representativeness, as you have no way of knowing how representative your sample is due to the reliance on participants
recruiting others. This can lead to sampling bias.

5. Quota sampling
Quota sampling relies on the non-random selection of a predetermined number or proportion of units. This is called a
quota.
You first divide the population into mutually exclusive subgroups (called strata) and then recruit sample units until you
reach your quota. These units share specific characteristics, determined by you prior to forming your strata. The aim of
quota sampling is to control what or who makes up your sample.

Probability sampling

Probability sampling is a sampling technique that involves randomly selecting a small group of people (a sample) from a
larger population, and then predicting the likelihood that all their responses put together will match those of the overall
population.

There are two important requirements when it comes to probability sampling:

1. Everyone in your population must have an equal, non-zero chance of being selected. (In other words, everyone
has an equal chance of receiving a survey.)
2. You must know, specifically, what that chance of being selected is for each person. (For example, you might
determine that in a population of 100 people, each person’s odds of receiving a survey is 1 in 100. Being able to
represent each person’s chance of selection as a probability is at the core of probability sampling.)

Following these two rules will help you choose appropriately (i.e. randomly) from your sampling frame, which is the list
of everyone in your entire population who can be sampled. Random selection is key—probability sampling is all about
making sure everyone has an equal probability of being included. From picking names out of a hat or pulling the short
straw, to more complex random selection processes, this ensures that the sample you end up creating is representative of
the population as a whole.

With the right sample, you can achieve results that are just as valuable as those you might get from a far bigger survey
effort. From there, you can draw valid conclusions based on the sample’s wants, needs, or opinions and take action that
makes sense for the entire population.
Q.NO 4 Explain ‘scatter plot’ and its use in interpreting data.
Answer
Scatter plot:

A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two different numeric variables.
The position of each dot on the horizontal and vertical axis indicates values for an individual data point. Scatter plots are
used to observe relationships between variables.

Scatter plots are the graphs that present the relationship between two variables in a data-set. It represents
data points on a two-dimensional plane or on a Cartesian system. The independent variable or attribute is plotted
on the X-axis, while the dependent variable is plotted on the Y-axis. These plots are often called scatter
graphs or scatter diagrams.

Scatter plot Graph


A scatter plot is also called a scatter chart, scattergram, or scatter plot, XY graph. The scatter diagram graphs
numerical data pairs, with one variable on each axis, show their relationship. Now the question comes for
everyone: when to use a scatter plot?

Scatter plots are used in either of the following situations.

 When we have paired numerical data


 When there are multiple values of the dependent variable for a unique value of an independent variable
 In determining the relationship between variables in some scenarios, such as identifying potential root causes of
problems, checking whether two products that appear to be related both occur with the exact cause and so on.

Scatter Plot Uses and Examples


Scatter plots instantly report a large volume of data. It is beneficial in the following situations –

 For a large set of data points given


 Each set comprises a pair of values
 The given data is in numeric form

The line drawn in a scatter plot, which is near to almost all the points in the plot is known as “ line of best fit” or
“trend line“. See the graph below for an example.

Scatter plot Correlation


We know that the correlation is a statistical measure of the relationship between the two variables’ relative movements.
If the variables are correlated, the points will fall along a line or curve. The better the correlation, the closer the points
will touch the line. This cause examination tool is considered as one of the seven essential quality tools.

Types of correlation
The scatter plot explains the correlation between two attributes or variables. It represents how closely the two variables
are connected. There can be three such situations to see the relation between the two variables –

1. Positive Correlation
2. Negative Correlation
3. No Correlation

Positive Correlation
When the points in the graph are rising, moving from left to right, then the scatter plot shows a positive correlation. It
means the values of one variable are increasing with respect to another. Now positive correlation can further be
classified into three categories:

 Perfect Positive – Which represents a perfectly straight line


 High Positive – All points are nearby
 Low Positive – When all the points are scattered

Negative Correlation
When the points in the scatter graph fall while moving left to right, then it is called a negative correlation. It means the
values of one variable are decreasing with respect to another. These are also of three types:

 Perfect Negative – Which form almost a straight line


 High Negative – When points are near to one another
 Low Negative – When points are in scattered form

No Correlation
When the points are scattered all over the graph and it is difficult to conclude whether the values are increasing or
decreasing, then there is no correlation between the variables.

Scatter plot Example


Let us understand how to construct a scatter plot with the help of the below example.

Question:

Draw a scatter plot for the given data that shows the number of games played and scores obtained in each instance.

Solution:

X-axis or horizontal axis: Number of games

Y-axis or vertical axis: Scores


Now, the scatter graph will be:

Note: We can also combine scatter plots in multiple plots per sheet to read and understand the higher-level formation in
data sets containing multivariable, notably more than two variables.

Scatter plot Matrix


For data variables such as x1, x2, x3, and xn, the scatter plot matrix presents all the pairwise scatter plots of the variables
on a single illustration with various scatterplots in a matrix format. For the n number of variables, the scatterplot matrix
will contain n rows and n columns. A plot of variables xi vs xj will be located at the ith row and jth column intersection.
We can say that each row and column is one dimension, whereas each cell plots a scatter plot of two dimensions.

When you should use a scatter plot


Scatter plots’ primary uses are to observe and show relationships between two numeric variables. The dots in a
scatter plot not only report the values of individual data points, but also patterns when the data are taken as a whole.

Identification of correlational relationships are common with scatter plots. In these cases, we want to know, if we were
given a particular horizontal value, what a good prediction would be for the vertical value. You will often see the
variable on the horizontal axis denoted an independent variable, and the variable on the vertical axis the dependent
variable. Relationships between variables can be described in many ways: positive or negative, strong or weak, linear or
nonlinear.

A scatter plot can also be useful for identifying other patterns in data. We can divide data points into groups based on
how closely sets of points cluster together. Scatter plots can also show if there are any unexpected gaps in the data and if
there are any outlier points. This can be useful if we want to segment the data into different parts, like in the
development of user personas.

Common issues when using scatter plots

When we have lots of data points to plot, this can run into the issue of overplotting. Overplotting is the case
where data points overlap to a degree where we have difficulty seeing relationships between points and variables. It can
be difficult to tell how densely-packed data points are when many of them are in a small area.

There are a few common ways to alleviate this issue. One alternative is to sample only a subset of data points: a random
selection of points should still give the general idea of the patterns in the full data. We can also change the form of the
dots, adding transparency to allow for overlaps to be visible, or reducing point size so that fewer overlaps occur. As a
third option, we might even choose a different chart type like the heatmap, where color indicates the number of points in
each bin. Heatmaps in this use case are also known as 2-d histograms.
Q.NO 5 Discuss ‘normal curve’ with special emphasis on its application in educational.
Answer Normal Distribution:
Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric
about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean.

In graphical form, the normal distribution appears as a "bell curve".

 The normal distribution is the proper term for a probability bell curve.
 In a normal distribution the mean is zero and the standard deviation is 1. It has zero skew and a kurtosis of 3.
 Normal distributions are symmetrical, but not all symmetrical distributions are normal.
 Many naturally-occurring phenomena tend to approximate the normal distribution.
 In finance, most pricing distributions are not, however, perfectly normal.

Understanding Normal Distribution


The normal distribution is the most common type of distribution assumed in technical stock market analysis and in
other types of statistical analyses. The standard normal distribution has two parameters : the mean and the standard
deviation.

The normal distribution model is important in statistics and is key to the Central Limit Theorem (CLT). This theory
states that averages calculated from independent, identically distributed random variables have approximately normal
distributions, regardless of the type of distribution from which the variables are sampled (provided it has finite
variance).

The normal distribution is one type of symmetrical distribution. Symmetrical distributions occur when where a
dividing line produces two mirror images. Not all symmetrical distributions are normal since some data could appear
as two humps or a series of hills in addition to the bell curve that indicates a normal distribution.

Properties of the Normal Distribution


The normal distribution has several key features and properties that define it.

First, its mean (average), median (midpoint), and mode (most frequent observation) are all equal to one another.
Moreover, these values all represent the peak, or highest point, of the distribution. The distribution then falls
symmetrically around the mean, the width of which is defined by the standard deviation.

Significance of Normal Curve:


Normal Curve has great significance in mental measurement and educational evaluation. It gives important information
about the trait being measured.

If the frequency polygon of observations or measurements of a certain trait is a normal curve, it indicates that:

1. The measured trait is normally distributed in the Universe.

2. Most of the cases are average in the measured trait and their percentage in the total population is about 68.26%

3. Approximately 15.87% of (50-34.13%) cases are high in the trait measured.

4. Similarly 15.87% cases approximately are low in the trait measured.

5. The test which is used to measure the trait is good.

6. The test has good discrimination power as it differentiates between poor, average and high ability group individuals,
and

7. The items of the test used are fairly distributed in terms of difficulty level.

Applications/Uses of Normal Curve/Normal Distribution:


There are a number of applications of normal curve in the field of measurement and evaluation in psychology and
education.

These are:
(i) To determine the percentage of cases (in a normal distribution) within given limits or scores.

(ii) To determine the percentage of cases that are above or below a given score or reference point.

(iii) To determine the limits of scores which include a given percentage of cases.

(iv) To determine the percentile rank of a student in his group.

(v) To find out the percentile value of a student’s percentile rank.

(vi) To compare the two distributions in terms of overlapping.

(vii) To determine the relative difficulty of test items, and

(viii) Dividing a group into sub-groups according to certain ability and assigning the grades.

Table of Areas under the Normal Curve:


How do we use all the above applications of normal curve in psychological and educational measurement and
evaluation. It is essential first to know about the Table of areas under the normal curve. Table A gives the fractional
parts of the total area under the normal curve found between the mean and ordinates erected at various a (sigma)
distances from the mean.

The normal probability curve table is generally limited to the area under unit normal curve with N = 1, σ = 1. In case
when the values of N and σ are different from these, the measurements or scores should be converted into sigma scores
(also referred to as standard scores or Z scores).

The process is as follows:


Z = X-M/σ or Z = x/σ

In which Z = Standard Score

X = Raw Score

M = Mean of X Scores

σ = Standard Deviation of X Scores.

The table of areas of normal probability curve is then referred to find out the proportion of area between the mean and
the Z value. Though the total area under N P C. is 1, but for convenience, the total area under the curve is taken to be
10,000 because of greater ease with which fractional parts of the total area, may be then calculated.

The first column of the table, x/σ gives distance in tenths of a measured off on the base line for the normal curve from
the mean as origin. In the row, the x/σ distance are given to the second place of the decimal.

To find the number of cases in the normal distribution between the mean, and the ordinate erected at a distance of la unit
from the mean, we go down the x/σ column until 1.0 is reached and in the next column under .00 we take the entry
opposite 1.0, namely 3413.

This figure means that 3413 cases in 10,000; or 34.13 percent of the entire area of the curve lies between the mean and
la. Similarly, if we have to find the percentage of the distribution between the mean and 1.56 σ, say, we go down the
x/σ column to 1.5, then across horizontally to the column headed by .06, and note the entry 44.06. This is the percentage
of the total area that lies between the mean and 1.56σ.
We have so far considered only a distances measured in the positive direction from the mean. For this we have taken
into account only the right half of the normal curve. Since the curve is symmetrical about the mean, the entries in Table-
A apply to distances measured in the negative direction (to the left) as well as to those measured in the positive
direction.

If we have to find the percentage of the distribution between mean and —1.28 σ, for instance, we take entry 3997 in the
column .08, opposite 1.2 in the x/σ column. This entry means that 39.97 of the cases in the normal distribution fall
between the mean and -1.28σ.

For practical purposes we take the curve to end at points -3σ and +3σ distant from the mean as the normal curve does
not actually meet the base line. Table of area under normal probability curve shows that 4986.5 cases lie between mean
and ordinate at +3σ.

Thus, 99 .73 percent of the entire distribution, would lie within the limits -3σ and +3σ. The rest 0.27 percent of the
distribution beyond ±3σ is considered too small or negligible except where N is very large.

Points to be kept in mind while consulting Table of Area under Normal Probability Curve:
The following points are to be kept in mind to avoid errors, while consulting the N.P.C. Table:
1. Every given score or observation must be converted into standard measure i.e. Z score, by using the following
formula:
Z = X-M/σ
2. The mean of the curve is always the reference point, and all the values of areas are given in terms of distances from
mean which is zero.
3. The area in terms of proportion can be converted into percentage and,
4. While consulting the table, absolute values of Z should be taken. However, a negative value of Z shows the scores and
the area lie below the mean and this fact should be kept in mind while doing further calculation on the area. A positive
value of Z shows that the score lies above the mean i.e. right side.
Practical Problems Related to Application of the Normal Probability Curve:
(a) To determine the percentage of cases in a Normal Distribution within given limits or scores.

Example 1:
Given a normal distribution of 500 scores with M = 40 and σ= 8, what percentage of cases lie between 36 and 48.

The Normal Distribution Curve and Its Applications

The normal distribution, or bell curve, is most familiar and useful toteachers in describing the frequency of standardized
test scores, how manystudents earned particular scores. This is not just any distribution, but atheoretical one with several
unique characteristics:

 It is always symmetrical, with equal areas on both sides of the curve.


 The highest point on the curve corresponds to the mean score, which equalsthe median and the mode in this
distribution.
 The area between given standard deviation units (represented byperpindicular lines in the diagram below)
includes a determined percent area. Because of the curve's symmetry, the percent area is the same as the
percentfrequency of test scores.

The mean (the perpindicular line down the center of the curve) of the normaldistribution divides the curve in half, so that
50% of the area under the curveis to the right of the mean and 50% is to the left. Therefore, 50% of testscores are greater
than the mean, and 50% of test scores are less than the mean. The figure above shows that 34.13% of the area is between
the mean and +1 or -1SD units, called a z score. Therefore atotal of 68.26% (34.13% x 2) of the test scores fall between
+1 and -1 SD.(Try working out other percentages of area under the curve between two standarddeviation lines or the
total percentage to left or right of a standard deviationline.)

Example application: All the second-graders in a school took an IQ testwith a mean of 100 and a SD of 15. An
administrator wants to determine whatpercent of the examinees should score between 1 SD above (100 + 15 = 115
IQ) and1 SD below (100 - 15 = 85 IQ) the mean. Since the percent area under thecurve equals the percent frequency of
scores, 68.26% (34.13% x 2) of thestudents should score between 85 and 115 on the IQ test. In addition, 15.87%(50% -
34.13% = 15.87%) will score above a score 115 and below 85.

On the same IQ test, one second-grader received a score of 145. Theteacher knew this was an exceptional score
but wanted to compare his score tothose of other students. The score of 145 is +3 SD units above the mean(100 + 15
+ 15 + 15 = 145). The area under the normal distribution curve to theleft of this score is 99.87% (50% + 34.13% +
13.59% + 2.15% = 99.87%). Therefore, this student scored better than 99.87% of the other test-takers. This statistic is
also referred to as a percentile.

Of course not all test score distributions are normally distributed. Theycan be skewed, i.e. have a disproportionate
number of people who dovery well or very poorly. This would be the case if a test was too easy or toohard for the testing
population. However, standardized tests are designed sothat the outcome follows a normal distribution curve.

You might also like