You are on page 1of 26

1. Explain Briefly The Stages In Data Processing?

Introduction
Data processing is simply the conversion of raw data to meaningful
information through a process. Data is manipulated to produce results
that lead to a resolution of a problem or improvement of an existing
situation. Similar to a production process, it follows a cycle where
inputs (raw data) are fed to a process (computer systems, software,
etc.) to produce output (information and insights).
Generally, organizations employ computer systems to carry out a
series of operations on the data in order to present, interpret, or obtain
information. The process includes activities like data entry, summary,
calculation, storage, etc. Useful and informative output is presented in
various appropriate forms such as diagrams, reports, graphics, etc
Stages of Data processing
There are five main stages in data processing, they are followed
below:

Collection of data
Preparation of data
Input of data
Processing of data
Output of data

Collection of data
1|Page

Here data are obtain or gather from various sources available. The two
main sources are followed:
Primary Data
Primary data are the data collected for first time. It is a firsthand copy.
Secondary Data
Secondary data are the data extracted from primary data, it is second
hand data.

Preparation of Data
In this stage data are made available for future use by performing
various processes that is:
Classifying the data,
Rearranging the data,
Editing the raw data, etc
Here data are made filter for further use. Preparation of data is the
stage where researchers make sure that they have sufficient raw
material of data from which he can satisfied his research subject. The
researcher made raw data into usable output so that he can use it in
future interrogations.
Input of Data

2|Page

Input data is the stage where the prepared data is put in the data
processing system to obtain information. Here the data are passed to
the person or department responsible for processing data. For
instance, if computer is used then the data are made recorded into the
computer.
Processing of Data
In this stage the data are manipulated by sorting, studying, analysing,
calculating, updating, etc the obtained contain by the research to get
answer of their questions. It is usually a set of working procedures or
instructions are followed.
Output of Information
This is the final stage where the information is made available for the
future use. The end result is produce in better format.
2. Explain In Brief The Measures Of Central Tendency?
Introduction
A measure of central tendency is a single value that describes the way
in which a group of data cluster around a central value. To put in other
words, it is a way to describe the centre of a data set. There are three
measures of central tendency: the mean, the median, and the mode.
Important of Central Tendency
3|Page

Central tendency is very useful in psychology. It lets us know what


is normal or 'average' for a set of data. It also condenses the data set
down to one representative value, which is useful when you are
working with large amounts of data. Could you imagine how difficult
it would be to describe the central location of a 1,000 item data set if
you had to consider every number individually?
Central tendency also allows you to compare one data set to another.
For example, let's say you have a sample of girls and a sample of
boys, and you are interested in comparing their heights. By
calculating the average height for each sample, you could easily draw
comparisons between the girls and boys.
Central tendency is also useful when you want to compare one piece
of data to the entire data set. Let's say you received a 60% on your last
psychology quiz, which is usually in the D range. You go around and
talk to your classmates and find out that the average score on the quiz
was 43%. In this instance, your score was significantly higher than
those of your classmates. Since your teacher grades on a curve, your
60% becomes an A. Had you not known about the measures of central
tendency, you probably would have been really upset by your grade
and assume that you bombed the test.

Three Measures of Central Tendency

4|Page

There are three types of measures of central tendency. Each of these


measures describes a different indication of the typical or central
value in the distribution, they are followed:
Mean
Median
Mode
Mean
The mean or average. The mean is calculated in two steps:
1. Add the data together to find the sum
2. Take the sum of the data and divide it by the total number of
data
Now let's see how this is done using the height example from earlier.
Let's say we have a sample of 10 girls and 9 boys.
The girls' heights in inches are 60, 72, 61, 66, 63, 66, 59, 64, 71, 68.
Here are the steps to calculate the mean height for the girls:
First, you add the data together: 60 + 72 + 61 + 66 + 63 + 66 + 59 +
64 + 71 + 68 = 650. Then, we take the sum of the data (650) and
divide it by the total number of data (10 girls): 650 / 10 = 65. The
average height for the girls in the sample is 65 inches. If you look at
the data, you can see that 65 is a good representation of the data set
because 65 lands right around the middle of the data set.
5|Page

The mean is the preferred measure of central tendency because it


considers all of the values in the data set. However, the mean is not
without limitations. In order to calculate the mean, data must be
numerical. You cannot use the mean when you are working with
nominal data, which is data on characteristics like gender, appearance,
and race. For example, there is no way that you can calculate the
mean of the girls' eye colours. The mean is also very sensitive to
outliers, which are numbers that are much higher or much lower than
the rest of the data set and thus, it should not be used when outliers
are present.
To illustrate this point, let's look at what happens to the mean when
we change 68 to 680. Again, we add the data together: 60 + 72 + 61 +
66 + 63 + 66 + 59 + 64 + 71 + 680 = 1262. Then we take the sum of
the data (1262) and divide it by the total number of data (10 girls):
1262 / 10 = 126.2. The mean height (in inches) for the sample of girls
is now 126.2. This number is not a good estimate of the central height
for the girls. This number is almost twice as high as the height of most
of the girls.
Median
The median is determined by sorting the data set from lowest to
highest values and taking the data point in the middle of the sequence.
There are an equal number of points above and below the median. For
example, in the data set {1, 2, 3, 4, 5} the median is 3; there are two
6|Page

data points greater than this value and two data points less than this
value. In this case, the median is equal to the mean. But consider the
data set {1, 2, 3, 4, 10}. In this dataset, the median still is three, but
the mean is equal to 4. If there is an even number of data points in the
set, then there is no single point at the middle and the median is
calculated by taking the mean of the two middle points.
The median can be determined for ordinal data as well as interval and
ratio data. Unlike the mean, the median is not influenced by outliers at
the extremes of the data set. For this reason, the median often is used
when there are a few extreme values that could greatly influence the
mean and distort what might be considered typical. This often is the
case with home prices and with income data for a group of people,
which often is much skewed. For such data, the median often is
reported instead of the mean. For example, in a group of people, if the
salary of one person is 10 times the mean, the mean salary of the
group will be higher because of the unusually large salary. In this
case, the median may better represent the typical salary level of the
group.
Mode
The mode is the most frequently occurring value in the data set. For
example, in the data set {1, 2, 3, 4, 4}, the mode is equal to 4. A data
set can have more than a single mode, in which case it is multimodal.
In the data set {1, 1, 2, 3, 3} there are two modes: 1 and 3.
7|Page

The mode can be very useful for dealing with categorical data. For
example, if a sandwich shop sells 10 different types of sandwiches,
the mode would represent the most popular sandwich. The mode also
can be used with ordinal, interval, and ratio data. However, in interval
and ratio scales, the data may be spread thinly with no data points
having the same value. In such cases, the mode may not exist or may
not be very meaningful.
3. What Is Hypothesis? Explain Steps In Hypothesis Testing?
Introduction
A hypothesis is a specific, testable prediction. It describes in concrete
terms what you expect will happen in a certain circumstance.
Hypothesis is not a written conclusion. It is a preservation that is
taken before any research is done.
Definition
William Goode and Paul Hatt define hypothesis as a proposition,
which can be put to a test to determine its validity.
G.A. Lundberg defines hypothesis as a tentative generalization, the
validity of which remains to be tested.
Hypothesis can also be define as an unproved theory, proposition,
supposition, etc., tentatively accepted to explain certain facts or to
provide a basis for further investigation, argument, etc
8|Page

PURPOSE OF HYPOTHESIS
A hypothesis is used in an experiment to define the relationship
between two variables. The purpose of a hypothesis is to find the
answer to a question. A formalized hypothesis will force us to think
about what results we should look for in an experiment. The first
variable is called the independent variable. This is the part of the
experiment that can be changed and tested. The independent variable
happens first and can be considered the cause of any changes in the
outcome. The outcome is called the dependent variable. The
independent variable in our previous example is not studying for a
test. The dependent variable that you are using to measure outcome is
your test score.
Let's use the previous example again to illustrate these ideas. The
hypothesis is testable because you will receive a score on your test
performance. It is measurable because you can compare test scores
received from when you did study and test scores received from when
you did not study.
A hypothesis should always:
Explain what you expect to happen
Be clear and understandable
Be testable
Be measurable
9|Page

And contain an independent and dependent variable


HYPOTHESIS TESTING
A hypothesis test is a statistical test that is used to determine whether
there is enough evidence in a sample of data to infer that a certain
condition is true for the entire population. A hypothesis test examines
two opposing hypotheses about a population: the null hypothesis and
the alternative hypothesis. The null hypothesis is the statement being
tested. Usually the null hypothesis is a statement of "no effect" or "no
difference". The alternative hypothesis is the statement you want to be
able to conclude is true.
Based on the sample data, the test determines whether to reject the
null hypothesis. You use a p-value, to make the determination. If the
p-value is less than or equal to the level of significance, which is a
cut-off point that you define, and then you can reject the null
hypothesis.
A common misconception is that statistical hypothesis tests are
designed to select the more likely of two hypotheses. Instead, a test
will remain with the null hypothesis until there is enough evidence
(data) to support the alternative hypothesis.
Example of performing a basic hypothesis test
We can follow six basic steps to correctly set up and perform a
hypothesis test. For example, the manager of a pipe manufacturing
10 | P a g e

facility must ensure that the diameters of its pipes equal 5cm. The
manager follows the basic steps for doing a hypothesis test.
NOTE
We should determine the criteria for the test and the required sample
size before we collect the data.
1. Specify the hypotheses.
First, the manager formulates the hypotheses. The null hypothesis is:
The population mean of all the pipes is equal to 5 cm. formally, this is
written as: H0: = 5
Then, the manager chooses from the following alternative hypotheses:
Condition to test
The population mean is less than the target.
The population mean is greater than the target.
The population mean differs from the target.
Because they need to ensure that the pipes are not larger or smaller
than 5 cm, the manager chooses the two-sided alternative hypothesis,
which states that the population mean of all the pipes is not equal to 5
cm. Formally, this is written as H1: 5
2. Determine the power and sample size for the test.
11 | P a g e

The manager uses a power and sample size calculation to determine


how many pipes they need to measure to have a good chance of
detecting a difference of 0.1 cm or more from the target diameter.
3. Choose a significance level (also called alpha or ).
The manager selects a significance level 0.05, which is the most
commonly, used significance level.
4. Collect the data.
They collect a sample of pipes and measure their diameters.
5. Compare the p-value from the test to the significance level.
After they perform the hypothesis test, the manager obtains a p-value
of 0.004. The p-value is less than the significance level of 0.05.
6. Decide whether to reject or fail to reject the null hypothesis.
The manager rejects the null hypothesis and concludes that the mean
pipe diameter of all pipes is not equal to 5cm.
Data than can be analyze with a hypothesis test
Hypothesis tests can be used to evaluate many different parameters of
a population. Each test is designed to evaluate a parameter associated
with a certain type of data. Knowing the difference between the types
of data, and which parameters are associated with each data type, can
help you choose the most appropriate test.
Continuous Data
12 | P a g e

You will have continuous data when you evaluate the mean, median,
standard deviation, or variance.
When you measure a characteristic of a part or process, such as
length, weight, or temperature, you usually obtain continuous data.
Continuous data often includes fractional (or decimal) values.
For example, a quality engineer wants to determine whether the mean
weight differs from the value stated on the package label (500 g). The
engineer samples cereal boxes and records their weights.
Binomial Data
You will have binomial data when you evaluate a proportion or a
percentage.
When you classify an item, event, or person into one of two
categories you obtain binomial data. The two categories should be
mutually exclusive, such as yes/no, pass/fail, or defective/no
defective.
For example, engineers examine a sample of bolts for severe cracks
that make the bolts unusable. They record the number of bolts that are
inspected and the number of bolts that are rejected. The engineers
want to determine whether the percentage of defective bolts is less
than 0.2%.
Poisson Data
13 | P a g e

You will have Poisson data when you evaluate a rate of occurrence.
When you count the presence of a characteristic, result, or activity
over a certain amount of time, area, or other length of observation,
you obtain Poisson data. Poisson data are evaluated in counts per unit,
with the units the same size.
For example, inspectors at a bus company count the number of bus
breakdowns each day for 30 days. The company wants to determine
the daily rate of bus breakdowns.
About the Null and Alternative hypotheses
A hypothesis test examines two opposing hypotheses about a
population: the null hypothesis and the alternative hypothesis. How
you set up these hypotheses depends on what you are trying to show.
Null hypothesis (H0)
The null hypothesis states that a population parameter is equal to a
value. The null hypothesis is often an initial claim that researchers
specify using previous research or knowledge.
Alternative Hypothesis (H1)
The alternative hypothesis states that the population parameter is
different than the value of the population parameter in the null
hypothesis. The alternative hypothesis is what you might believe to be
true or hope to prove true.
14 | P a g e

When you do a hypothesis test, two types of errors are possible, they
are as follow: Type I
Type II.
The risks of these two errors are inversely related and determined by
the level of significance and the power for the test. Therefore, you
should determine which error has more severe consequences for your
situation before you define their risks.
No hypothesis test is 100% certain. Because the test is based on
probabilities, there is always a chance of drawing an incorrect
conclusion.

Type I error
When the null hypothesis is true and you reject it, you make a type I
error. The probability of making a type I error is , which is the level
of significance you set for your hypothesis test. An of 0.05 indicates
that you are willing to accept a 5% chance that you are wrong when
you reject the null hypothesis. To lower this risk, you must use a
lower value for . Type II error
When the null hypothesis is false and you fail to reject it, you make a
type II error. The probability of making a type II error is , which
depends on the power of the test. You can decrease your risk of
15 | P a g e

committing a type II error by ensuring your test has enough power.


You can do this by ensuring your sample size is large enough to detect
a practical difference when one truly exists.
The probability of rejecting the null hypothesis when it is false is
equal to 1. This value is the power of the test.
Null Hypothesis
Decision True
Fail
reject

to Correct

False
Decision Type II Error - fail to reject

(probability = 1 - )

the null when it is false


(probability = )

Reject

Type I Error - rejecting Correct Decision (probability


the null when it is true = 1 - )
(probability = )

Example of type I and type II error


To understand the interrelationship between type I and type II error,
and to determine which error has more severe consequences for your
situation, consider the following example.
A medical researcher wants to compare the effectiveness of two
medications. The null and alternative hypotheses are:
Null hypothesis (H0): 1= 2
The two medications are equally effective.
16 | P a g e

Alternative hypothesis (H1): 1 2


The two medications are not equally effective.
A type I error occurs if the researcher rejects the null hypothesis and
concludes that the two medications are different when, in fact, they
are not. If the medications have the same effectiveness, the researcher
may not consider this error too severe because the patients still benefit
from the same level of effectiveness regardless of which medicine
they take. However, if a type II error occurs, the researcher fails to
reject the null hypothesis when it should be rejected. That is, the
researcher concludes that the medications are the same when, in fact,
they are different. This error is potentially life-threatening if the lesseffective medication is sold to the public instead of the more effective
one.
As you conduct your hypothesis tests, consider the risks of making
type I and type II errors. If the consequences of making one type of
error are more severe or costly than making the other type of error,
then choose a level of significance and a power for the test that will
reflect the relative severity of those consequences.

17 | P a g e

4. Explain the meaning and significance of interpretation of


data?
Data interpretation refers to the process of critiquing and determining
the significance of important information, such as survey results,
experimental findings, observations or narrative reports. Interpreting
data is an important critical thinking skill that helps you comprehend
text books, graphs and tables. Researchers use a similar but more
meticulous process to gather, analyze and interpret data. Experimental
scientists base their interpretations largely on objective data and
statistical calculations. Social scientists interpret the results of written
reports that are rich in descriptive detail but may be devoid of
mathematical calculations.
Data

interpretation is

people. Interpretation is

part
the

of

daily

life

process

of

making

for
sense

most
of

numerical data that has been collected, analyzed, and presented. There
are two types of data interpretation, they are
Quantitative data interpretation
Qualitative data interpretation
Quantitative Interpretation
18 | P a g e

Scientists interpret the results of rigorous experiments that are


performed under specific conditions. Quantifiable data are entered
into spreadsheets and statistical software programs, and then
interpreted by researchers seeking to determine if the results they
achieved are statistically significant or more likely due to chance or
error. The results help prove or disprove hypotheses generated from
an existing theory. By using scientific methods, researchers can
generalize about how their results might apply to a larger population.
For example, if data show that a small group of cancer patients in a
voluntary drug study went into remission after taking a new drug,
other cancer patients might also benefit from it.

Qualitative Interpretation
Certain academic disciplines, such as sociology, anthropology and
womens studies, rely heavily on the collection and interpretation of
qualitative data. Researchers seek new knowledge and insight into
phenomena such as the stages of grief following a loss, for example.
Instead of controlled experiments, data is collected through
techniques such as field observations or personal interviews of
research subjects that are recorded and transcribed. Social scientists
study field notes or look for themes in transcriptions to make
meaning, out of the data.

19 | P a g e

The interpretation of data is based on the workings of the human


mind. Since the human mind is not 100 percent objective, the
interpretation of data may not be 100 percent accurate. There is
various way of interpretation of data, they are followed below: Correct Interpretation
In order to understand misinterpretation, the correct way to interpret
data must be understood. Data interpretation must be approached
without personal bias or preconceived opinions. A researcher forms an
initial opinion, called the hypothesis. He runs an experiment based on
the hypothesis. The data collected prove or disprove his original
hypothesis. For example, a researcher states that the sky is blue
because of nitrogen. He runs an experiment, and the data collected
reveal a high concentration of ozone. In his conclusion, he states the
original hypothesis was wrong, and the facts collected indicate ozone
is the colorant gas. By interpreting data objectively, the correct
conclusion is reached. Unfortunately, having a 100 percent bias-free
and objective frame of mind is difficult.
Subjectivity
Suppose you are writing a technical manual and in a step you state:
"Move the part up a little bit, and sideways a little bit." The words "a
little bit" are extremely subjective. To one person, this may mean 1
inch. To another, this may mean 1 foot. Furthermore, "sideways" does
not specify to the left or right. Two different people will interpret the
20 | P a g e

data you presented completely differently. Stating "move part number


30 to the left one inch" eliminates the error in interpreting the data.
For data to be effectively interpreted, it has to be objective and
accurate.
Background and Experience
According to Drs. Anne E. Egger and Anthony Carpi at Vision
Learning, people base the interpretation of data upon their
background and prior experience. Since backgrounds vary widely, the
interpretation varies widely as well. Drs. Egger and Carpi stated that
even scientists (who are supposed to be objective) can interpret the
same set of data and reach differing opinions depending on their
backgrounds.
Abnormal Mental States
People with an abnormal mental state will interpret data in abnormal
ways. Researchers M.R. Broom et al, writing for the British Journal of
Psychiatry reported their findings in 2007. The findings were that
people with delusional attributes jumped to conclusions quickly after
interpreting only a little bit of data. Furthermore, they did not tolerate
ambiguity. For example, a person with severe paranoia may read that
the law enforcement does wiretaps. He may stop reading there, never
reading that this is only done by a search warrant and court approval.
He jumps to the conclusion, based on incomplete data, that he is being
wiretapped.
21 | P a g e

Cultural Background
In 1968, researchers Marshall Segall et al presented a series of optical
illusions to people of different cultural groups. The conclusion
reached was that different groups perceived the illusions in various
ways. This experiment illustrated that a person's cultural background
influences how data is interpreted.
Significance of interpretation of data
Interpretation is essential for the simple reason that the usefulness and
utility of research findings lie in proper interpretation. It is being
considered a basic component of research process because of the
following reasons:
1. It is through interpretation that the researcher can well
understand the abstract principle that works beneath his
findings. Through this he can link up his findings with those of
other studies, having the same abstract principle, and thereby
can predict about the concrete world of events. Fresh inquiries
can test these predictions later on. This way the continuity in
research can be maintained.
2. Interpretation leads to the establishment of explanatory concepts
that can serve as a guide for future research studies; it opens
new avenues of intellectual adventure and stimulates the quest
for more knowledge.
22 | P a g e

3. Researcher can better appreciate only through interpretation why


his findings are what they are and can make others to understand
the real significance of his research findings.
4. The interpretation of the findings of exploratory research study
often results into hypotheses for experimental research and as
such interpretation is involved in the transition from exploratory
to experimental research. Since an exploratory study does not
have a hypothesis to start with, the findings of such a study have
to be interpreted on a post-factum basis in which case the
interpretation is technically described as post factum
interpretation.
5. What are the precautions essential for interpretation of
data?
One should always remember that even if the data are properly
collected and analysed, wrong interpretation would lead to inaccurate
conclusions. It is, therefore, absolutely essential that the task of
interpretation be accomplished with patience in an impartial manner
and also in correct perspective. Researcher must pay attention to the
following points for correct interpretation:
1. At the outset, researcher must invariably satisfy himself that the
data are appropriate, trustworthy and adequate for drawing
inferences, the data reflect good homogeneity; and that proper
analysis has been done through statistical methods.
23 | P a g e

2. The researcher must remain cautious about the errors that can
possibly arise in the process of interpreting results. Errors can
arise due to false generalization and/or due to wrong
interpretation of statistical measures, such as the application of
findings beyond the range of observations, identification of
correlation with causation and the like. Another major pitfall is
the tendency to affirm that definite relationships exist on the
basis of confirmation of particular hypotheses. In fact, the
positive test results accepting the hypothesis must be interpreted
as being in accord with the hypothesis, rather than as
confirming the validity of the hypothesis. The researcher must
remain vigilant about all such things so that false generalization
may not take place. He should be well equipped with and must
know the correct use of statistical measures for drawing
inferences concerning his study.
3. He must always keep in view that the task of interpretation is
very much intertwined with analysis and cannot be distinctly
separated. As such he must take the task of interpretation as a
special aspect of analysis and accordingly must take all those
precautions that one usually observes while going through the
process of analysis viz., precautions concerning the reliability of
data, computational checks, validation and comparison of
results.
24 | P a g e

4. He must never lose sight of the fact that his task is not only to
make sensitive observations of relevant occurrences, but also to
identify and disengage the factors that are initially hidden to the
eye. This will enable him to do his job of interpretation on
proper lines. Broad generalisation should be avoided as most
research is not amenable to it because the coverage may be
restricted to a particular time, a particular area and particular
conditions. Such restrictions, if any, must invariably be specified
and the results must be framed within their limits.
5. The researcher must remember that ideally in the course of a
research study, there should be constant interaction between
initial hypothesis,

empirical

observation

and theoretical

conceptions. It is exactly in this area of interaction between


theoretical

orientation

and

empirical

observation

that

opportunities for originality and creativity lie." He must pay


special attention to this aspect while engaged in the task of
interpretation.
6. Discuss the essentials of the good research report?

25 | P a g e

26 | P a g e