Professional Documents
Culture Documents
Semester – III
MIHMCT
UNIT 1
TERMS –RESEARCH
Search back
Search for knowledge
Careful / diligent search, studious enquiry
Critical exhaustive investigation / experimentation
Aimed at discovery / interpretation of facts
Revision of accepted theories/laws in light of new facts
Or practical application of such new theories or laws
Research
―Research is a systematic inquiry to describe, explain, predict, and control the observed
phenomenon. Research involves inductive and deductive methods.‖
Inductive research methods are used to analyze an observed event. Deductive methods are
used to verify the observed event. Inductive approaches are associated with qualitative
research and deductive methods are more commonly associated with quantitative research.
1. A systematic approach must be followed for accurate data. Rules and procedures are
an integral part of the process that set the objective. Researchers need to practice
ethics and a code of conduct while making observations or drawing conclusions.
2. Research is based on logical reasoning and involves both inductive and deductive
methods.
3. The data or knowledge that is derived is in real time from actual observations in
natural settings.
4. There is an in-depth analysis of all data collected so that there are no anomalies
associated with it.
5. Research creates a path for generating new questions. Existing data helps create more
opportunities for research.
6. Research is analytical in nature. It makes use of all the available data so that there is
no ambiguity in inference.
7. Accuracy is one of the most important aspects of research. The information that is
obtained should be accurate and true to its nature. For example, laboratories provide a
controlled environment to collect data. Accuracy is measured in the instruments used,
the calibrations of instruments or tools, and the final result of the experiment.
Purpose of Research
Types of Research
Descriptive Research
Analytical Research
Applied Research
Fundamental Research
Qualitative Research
Quantitative Research
Conceptual Research
Empirical Research
Qualitative Methods
Qualitative research is a method that collects data using conversational methods. Participants
are asked open-ended questions. The responses collected are essentially non- numerical. This
method not only helps a researcher understand what participants think but also why they
think in a particular way.
Focus Groups: Focus groups are small groups comprising of around 6-10
participants who are usually experts in the subject matter. A moderator is assigned to
a focus group who facilitates the discussion amongst the group members. A
moderator‘s experience in conducting the focus group plays an important role. An
experienced moderator can probe the participants by asking the correct questions that
will help them collect a sizable amount of information related to the research.
Case Study: Case study research is used to study an organization or an entity. This
method is one of the most valuable options for modern This type of research is used in
fields like the education sector, philosophical studies, and psychological studies. This
method involves a deep dive into ongoing research and collecting data.
Quantitative methods deal with numbers and measurable forms. It uses a systematic way of
investigating events or data. It is used to answer questions in terms of justifying relationships
with measurable variables to explain, predict, or control a phenomenon.
There are three methods that are often used by researchers:
Survey Research — The ultimate goal of survey research is to learn about a large
population by deploying a survey. Today, online surveys are popular as they are
convenient and can be sent in an email or made available on the internet. In this
method, a researcher designs a survey with the most relevant survey questions and
distributes the survey. Once the researcher receives responses, they summarize them
to tabulate meaningful findings and data.
Descriptive Research — Descriptive research is a method which identifies the
characteristics of an observed phenomenon and collects more information. This
method is designed to depict the participants in a very systematic and accurate
manner. In simple words, descriptive research is all about describing the phenomenon,
observing it, and drawing conclusions from it.
Correlational Research— Correlational research examines the relationship between
two or more variables. Consider a researcher is studying a correlation between cancer
and married Married women have a negative correlation with cancer. In this example,
there are two variables: cancer and married women. When we say negative
correlation, it means women who are married are less likely to develop cancer.
However, it doesn‘t mean that marriage directly avoids cancer.
Generally, an investigator has a hypothesis. The hypothesis may be that the sample mean
is less than the population mean, or the mean of one group is greater than the other group,
or the means of more than two groups are not the same. These hypotheses may shown
symbolically as follows:
𝐻 ∶ 𝑋̅ < 𝜇
𝐻 ∶ 𝜇1 < 𝜇2
𝐻 ∶ 𝜇1 ≠ 𝜇2 ≠ 𝜇3 ≠ 𝜇3
Whatever be the hypothesis an investigator puts forward, its statistical significance is
obtained by subjecting the null form of the hypothesis to an appropriate test of
significance. The null form of the hypothesis (𝐻0), which means there is no significance
difference between the means of two or more groups. The null hypotheses of the above
hypotheses as follows:
𝐻0 ∶ 𝑋̅ = 𝜇
𝐻0 ∶ 𝜇1 = 𝜇2
𝐻0 ∶ 𝜇1 = 𝜇2 = 𝜇3 = 𝜇3
Verbally, the 𝐻0 states that there is no significant difference between sample mean and
population mean, or between means of two population mean, or between means of more
than two populations.
Any admissible hypothesis that differs from a null hypothesis is called an alternative
hypothesis and is denoted by 𝐻1.
Data Collection:
The quantitative data collection methods rely on random sampling and structured data
collection instruments, which fit diverse experiences, into predetermined response categories.
They produce results that are easy to summarize, compare and generalize.
INTERVIEWS
Sampling methods
Non probability
Probability sampling
sampling
Random sampling
Under this method, every unit of the population at any stage has equal chance (or)
each unit is drawn with known probability. It helps to estimate the variable of the sample
results. It is not possible in non probability sampling which is used advantageously when
there is no frame or when the respondance is expected to be non-co-operative.
Under probability sampling there are two procedures
1. Sampling with replacement(SWR)
2. Sampling without replacement(SWOR)
When the successive draws are made with placing back the units selected in the preceding
draws, it is known as sampling with replacement. When such replacement is not made it is
known as sampling without replacement.
When the population is finite sampling with replacement is adopted otherwise SWOR
is adopted.
Mainly there are many kinds of random sampling. Some of them are.
1. Simple Random Sampling
2. Systematic Random Sampling
3. Stratified Random Sampling
4. Cluster Sampling
Lab Experiment
A laboratory experiment is an experiment conducted under highly controlled conditions (not
necessarily a laboratory), where accurate measurements are possible. The researcher decides
where the experiment will take place, at what time, with which participants, in what
circumstances and using a standardized procedure. Participants are randomly allocated to
each independent variable group.
Field Experiments
Field experiments are done in the everyday (i.e. real life) environment of the participants. The
experimenter still manipulates the independent variable, but in a real- life setting (so cannot
really control extraneous variables).
Strength:
Behaviour in a field experiment is more likely to reflect real life because of its natural
setting, i.e. higher ecological validity than a lab experiment. There is less likelihood
of demand characteristics affecting the results, as participants may not know they are
being studied. This occurs when the study is covert.
Limitation:
There is less control over extraneous variables that might bias the results. This makes
it difficult for another researcher to replicate the study in exactly the same way.
Natural Experiments
Natural experiments are conducted in the everyday (i.e. real life) environment of the
participants, but here the experimenter has no control over the independent variable as it
occurs naturally in real life.
Strength:
Behaviour in a natural experiment is more likely to reflect real life because of its
natural setting, i.e. very high ecological validity. There is less likelihood of demand
characteristics affecting the results, as participants may not know they are being
studied.
Limitation:
They may be more expensive and time consuming than lab experiments. There is no
control over extraneous variables that might bias the results. This makes it difficult for
another researcher to replicate the study in exactly the same way.
OBSERVATION METHODS
The observation method is described as a method to observe and describe the behavio ur of a
subject. As the name suggests, it is a way of collecting relevant information and data by
observing. It is also referred to as a participatory study because the researcher has to establish
a link with the respondent and for this has to immerse him in the same setting as theirs. Only
then can he use the observation method to record and take notes.
Participant observation
Participant observation was first introduced by Prof. Edward Winder Man. It means the
activities of a group in which an observer himself participates and note the situation. He
willingly mixes with the group and performs his activities as an observer not merely a
participator who criticizes the situation. In other words he takes place and share the activities
with his group. For example when we study the rural and urban conditions of Asian people,
we have to go there and watched what is going on. The best p hilosophy of participant
observation is that we watch the phenomena not to ask. The actual behavior of the group can
be observed only by participant observation not by any other method.
Merits
1. The observer is personally involved in group activities and shares their feelings and
prejudices.
2. He participates himself and gets insight into the behaviour of the group.
3. It motivates and stimulates mutual relationship b/w the observer and observe.
4. He can get more information‘s with accuracy and precision.
5. The information‘s are recorded in front of the group people.
Demerits
1. The observer may develop emotional attachment to his group which will lose the
objectivity of the study.
Non-Participant Observation
The non-participant observation has a lack of participation of the observer in his group
activities. He either watches the phenomena from a distance or participates in the group but
never in its activities. He only sit in the group but do not interest in the process.
The difference between participant & non-participant observation is that, in the former the
observer himself take part in a group and become the member of that group also participate in
their activities with full fledge while the latter refers to the less or no participation of the
observer in his group, their membership and activities. He watch from a distance but do not
have active eye sight that what is going on in the field of research.
Merits
1. Although observer himself never attach to the group but the objectivity maintained.
2. Less emotional involvement of the observer leads to accuracy and greater objectivity.
3. having secondary relationship with his group, so the information‘s are collected
entirely.
4. Through non-participant observation the research remains very smooth.
Demerits
1. Do not have full knowledge about the group activities.
2. Cannot understand the whole phenomena.
3. Cannot get real and deep insight into the phenomena.
Controlled Observation
Here observer and observe or subject both are controlled. For systematic data collection
control is imposed on both for accuracy and precision. When observation is pre-planned and
definite, then it is termed as controlled observation. In control observation, mechanical
devices are used for precision and standardized. So, control increase accuracy, reduce bias,
and ensure reliability and standardization. Some of the devices are as under.
1. Observational plan.
2. Observational schedule.
3. Mechanical appliances like, camera, maps, films, video, tape recorder etc.
4. Team of observers.
5. Socio Matric Scale.
Scaling Techniques
Scaling technique is a method of placing respondents in continuation of gradual change in the
pre-assigned values, symbols or numbers based on the features of a particular object as per
the defined rules. All the scaling techniques are based on four pillars, i.e., order, description,
distance and origin.
The marketing research is highly dependable upon the scaling techniques, without which no
market analysis can be performed.
The major four scales used in statistics for market research consist of the following:
1. Dichotomous: A nominal scale that has only two labels is called ‗dichotomous‘; for
example, Yes/No.
3. Nominal without Order: Such nominal scale which has no sequence, is called
‗nominal without order‘; for example, Black, White.
Ordinal Scale
The ordinal scale functions on the concept of the relative position of the objects or labels
based on the individual‘s choice or preference.
For example, At Amazon.in, every product has a customer review section where the buyers
rate the listed product according to their buying experience, product features, quality, usage,
etc.
5 Star – Excellent
4 Star – Good
3 Star – Average
2 Star – Poor
1 Star – Worst
Interval Scale
An interval scale is also called a cardinal scale which is the numerical labelling with the same
difference among the consecutive measurement units. With the help of this scaling technique,
researchers can obtain a better comparison between the objects.
In the scale mentioned above, every unit has the same difference, i.e., 1, whether it is between
2 and 3 or between 4 and 5.
Ratio Scale
One of the most superior measurement technique is the ratio scale. Similar to an interval
scale, a ratio scale is an abstract number system. It allows measurement at proper intervals,
order, categorization and distance, with an added property of originating from a fixed zero
point. Here, the comparison can be made in terms of the acquired ratio.
For example, A health product manufacturing company surveyed to identify the level of
obesity in a particular locality. It released the following survey questionnaire:
Select a category to which your weight belongs to:
40-59 Kilograms
60-79 Kilograms
80-99 Kilograms
100-119 Kilograms
Scaling of objects can be used for a comparative study between more than one objects
(products, services, brands, events, etc.). Or can be individually carried out to understand the
consumer‘s behaviour and response towards a particular object.
Following are the two categories under which other scaling techniques are placed based on
their comparability:
For comparing two or more variables, a comparative scale is used by the respondents.
Following are the different types of comparative scaling techniques:
Paired Comparison
A paired comparison symbolizes two variables from which the respondent needs to sele ct
one. This technique is mainly used at the time of product testing, to facilitate the consumers
with a comparative analysis of the two major products in the market.
To compare more than two objects say comparing P, Q and R, one can first compare P with Q
and then the superior one (i.e., one with a higher percentage) with R.
For example, A market survey was conducted to find out consumer‘s preference for the
network service provider brands, A and B. The outcome of the survey was as follows:
Brand ‗A‘ = 57%
Brand ‗B‘ = 43%
Thus, it is visible that the consumers prefer brand ‗A‘, over brand ‗B‘.
In rank order scaling the respondent needs to rank or arrange the given objects according to
his or her preference.
For example, A soap manufacturing company conducted a rank order scaling to find out the
orderly preference of the consumers. It asked the respondents to rank the following brands in
the sequence of their choice
The above scaling shows that soap ‗Y‘ is the most preferred brand, followed by soap ‗X‘,
then soap ‗Z‘ and the least preferred one is the soap ‗V‘.
Constant Sum
It is a scaling technique where a continual sum of units like dollars, points, chits, chips, etc. is
given to the features, attributes and importance of a particular product or service by the
respondents.
For example, The respondents belonging to 3 different segments were asked to allocate 50
points to the following attributes of a cosmetic product ‗P‘:
Segment 1 considers product ‗P‘ due to its competitive price as a major factor.
But segment 2 and segment 3, prefers the product because it is skin- friendly.
Q-Sort Scaling
Q-sort scaling is a technique used for sorting the most appropriate objects out of a large
number of given variables. It emphasizes on the ranking of the given objects in a descending
order to form similar piles based on specific attributes.
It is suitable in the case where the number of objects is not less than 60 and more than 140,
the most appropriate of all ranging between 60 to 90.
For example, The marketing manager of a garment manufacturing company sorts the most
efficient marketing executives based on their past performance, sales revenue generation,
dedication and growth.
The Q-sort scaling was performed on 60 executives, and the marketing head creates three
piles based on their efficiency as follows:
Non-Comparative Scales
It is a graphical rating scale where the respondents are free to place the object at a position of
their choice. It is done by selecting and marking a point along the vertica l or horizontal line
which ranges between two extreme criteria.
The above diagram shows a non-comparative analysis of one particular product, i.e. comfy
bedding. Thus, making it very clear that the customers are quite satisfied with the product and
its features.
Itemized scale is another essential technique under the non-comparative scales. It emphasizes
on choosing a particular category among the various given categories by the respondents.
Each class is briefly defined by the researchers to facilitate such selection.
The three most commonly used itemized rating scales are as follows:
Likert Scale : In the Likert scale, the researcher provides some statements and ask the
respondents to mark their level of agreement or disagreement over these statements by
selecting any one of the options from the five given alternatives.
For example, A shoes manufacturing company adopted the Likert scale technique for
its new sports shoe range named Z sports shoes. The purpose is to know the
agreement or disagreement of the respondents.
For this, the researcher asked the respondents to circle a number representing the most
suitable answer according to them, in the following representation:
1 – Strongly Disagree
2 – Disagree
From the above diagram, we can analyze that the customer finds the product of superior
quality; however, the brand needs to focus more on the styling of its watches.
Stapel Scale : A Stapel scale is that itemized rating scale which measures the response,
perception or attitude of the respondents for a particular object through a unipolar rating. The
range of a Stapel scale is between -5 to +5 eliminating 0, thus confining to 10 units.
For example, A tours and travel company asked the respondent to rank their holiday package
in terms of value for money and user- friendly interface as follows:
Introduction
Statistics has originated as a science of statehood and found applications slowly and steadily
in Agriculture, Economics, Commerce, Biology, Medicine, Industry, planning, education and
so on.
Sir Ronald A. Fisher, (1890 – 1962), who is called “the Father of Statistics”, drew many
solid conclusions from statistical data.
STATISTICS – Definition
Statistics is the science which deals with the
1. Collection of data,
2. Organization of data (or) Classification of data,
3. Presentation of data,
4. Analysis of data,
5. Interpretation of data, which are known as the statistical methods.
Limitations of Statistics
1. Statistics does not deal with individual items.
2. Statistics deals with quantitative data only.
3. Statistical laws are true only on averages.
4. Statistical results are only approximately correct.
5. Statistics is liable to be misused.
Functions of Statistics
1. Simplifies complexity
2. Helps to compare.
3. Formulates and tests hypothesis.
4. Studies relationships.
5. Helps the government.
6. Helps in forecasting.
7. Formulation of suitable policies.
PROBABILITY:
The concept of probability is difficult to define in precise terms. In ordinary
language, the word probable means likely (or) chance. Generally the word, probability, is
used to denote the happening of a certain event, and the likelihood of the occurrence of
that event, based on past experiences. By looking at the clear sky, one will say that there
will not be any rain today. On the other hand, by looking at the cloudy sky or overcast sky,
one will say that there will be rain today. In the earlier sentence, we aim that there will not be
rain and in the latter we expect rain. On the other hand a mathematician says that the
probability of rain is ‗0‘ in the first case and that the probability of rain is ‗1‘ in the second
case. In between 0 and 1, there are fractions denoting the chance of the event occurring.
Exhaustive Events:
The total number of possible outcomes in any trial is known as exhaustive events (or)
exhaustive cases.
Example:
1. In tossing of a coin there are two exhaustive cases, namely head and tail.
2. In throwing of a die, there are six exhaustive cases, since anyone of the 6 faces
1, 2, 3, 4, 5, 6 may come uppermost.
Favourable Events:
The number of cases favourable to an event in a trial is the number of outcomes which
entail the happening of the event.
Example:
1. In throwing of two dice, the number of cases favourable to getting the sum 5 is
(1, 4), (2, 3), (3,2), (4, 1) = 4.
2. In drawing a card from a pack of cards the number of cases favourable to drawing
of an ace is 4, for drawing a spade is 13 and for drawing a red card is 26.
Mutually Exclusive Events:
Events are said to be mutually exclusive (or) incompatible if the happening of any one
of the events excludes (or) precludes the happening of all the others i.e.) if no two or more of
the events can happen simultaneously in the same trial. (i.e.) The joint occurrence is not
possible.
Example:
1. In tossing a coin the events head and tail are mutually exclusive.
2. In throwing a die, all the 6 faces numbered 1 to 6 are mutually exclusive since if
any one of these faces comes, the possibility of others, in the same trial, is ruled
out.
Equally Likely Events:
Outcomes of a trial are said to be equally likely if taking in to consideration all the
relevant evidences, there is no reason to expect one in preference to the others. (i.e.) Two or
Note:
1. If m = 0 P(A) = 0, then ‗A‘ is called an impossible event. (i.e.) also by P() = 0.
2. If m = n P(A) = 1, then ‗A‘ is called assure (or) certain event.
3. The probability is a non-negative real number and cannot exceed unity (i.e.) lies
between 0 to 1.
4. The probability of non-happening of the event ‗A‘ (i.e.) P( A ). It is denoted by ‗q‘.
(or) P(A) + P( A ) = 1.
3. If A and B are mutually exclusive (or) disjoint events then the probability of
occurrence of either A (or) B denoted by P(AUB) shall be given by
P(AB) = P(A) + P(B)
P(E1E2….En) = P(E1) + P(E2) + …… + P(En)
If E1, E2, …., En are mutually exclusive events.
Conditional Probability:
Two events A and B are said to be dependent, when B can occur only when A is
known to have occurred (or vice versa). The probability attached to such an event is called
the conditional probability and is denoted by P(A/B) (read it as: A given B) or, in other
words, probability of A given that B has occurred.
= n(A) n(B)
n n
P(AB) = P(A) + P(B)
Note:
(i) In the case of 3 events, (not mutually exclusive events)
P(A or B or C) = P(ABC) = P(A + B + C)
= P(A) + P(B) + P(C) – P(AB) – P(BC) – P(AC) + P(ABC)
(ii) In the case of 3 events, (mutually exclusive events)
P(A or B or C) = P(ABC) = P(A + B + C) = P(A) + P(B) + P(C)
Proof:
Let n is the total number of events
n(A) is the number of events in A
n(A B)
P(AB) =
n
= n(A B) n(A)
n n(A)
n(A) n(A B)
=
n n(A)
P(AB) = P(A) . P(B/A) (I)
n(A B)
P(AB) =
n
= n(A B) n(B)
n n(B)
n(B) n(A B)
=
n n(B)
P(AB) = P(B) . P(A/B) (II)
Note:
(i) In the case of 3 events, (dependent)
P(ABC) = P(A) . P(B/A) . P(C/AB)
A random variable is a variable that assumes numerical values associated with events
of an experiment.
Example
2. Observe 100 babies to be born in a clinic. The number of boys, born is a random
variable.
3. Select one student from an university and measure the height and record the height as
‗x‘. The ‗x‘ is random variable. The value may be from 100 cm to 250 cm
A discrete random variable is one that can assume only a countable number of values.
A continuous random variable can assume any value in one or more intervals .
In the example above 1st and 2nd example are discrete random variables. The height of
the students is a continuous random variable.
Probability Distribution
Suppose ‗x‘ be a random variable taking countable infinite number of values
x1+x2+… with each possible outcomes ‗xi‘. We associate a number pi= P(X=xi)=p(xi), called
the probability of xi. the set [xi, P(xi)] is called the probability distribution(pd) of the random
variable ‗x‘.
The probability distribution can be classified in to two categories
1. Discrete Probability Distribution (or) Probability Mass Function (or) (pmf)
2. Continuous Probability Distribution (or) Probability Density Function (or) (pdf)
Theoretical distributions are
1. Binomial distribution
Discrete distribution
2. Poisson distribution
Bernoulli distribution
A random variable x takes two values 0 and 1, with probabilities q and p ie., p(x=1) =
p and p(x=0)=q, q-1-p is called a Bernoulli variate and is said to be Bernoulli distribution
where p and q are probability of success and failure. It was given by swiss mathematician
James Bernoulli(1654-1705)
Example
Binomial distribution
The probability of x success and consequently n-x failures in n independent trails. But
x successes in n trails can occur in ncx ways. Probability for each of these ways is pxqn-x.
P(sss…ff…fsf…f)=p(s)p(s)….p(f)p(f)….
=p,p…q,q…
=(p,p…p)(q,q…q)
Poisson distribution
e x
P(x) = , x=0,1,2… called as probability mass function of Poisson
x!
distribution, where λ is the average number of occurrences per unit of time ------------ λ = np
Normal distribution
Continuous Probability distribution is normal distribution. It is also known as error law or
Normal law or Laplacian law or Gaussian distribution. Many of the sampling distribution like
student-t, f distribution and χ2 distribution.
A continuous random variable x is said to be a normal distribution with parameters µ
and σ2, if the density function is given by the probability law
ESTIMATION
The process of generalizing from the sample to the population is known as statistical
inference.
In statistical inference we have
1. Estimation of population parameters and
2. Testing of hypothesis.
The process of obtaining an estimate of the unknown value of a parameter by a statistic is
termed as estimation. There are two types of estimation.
1. Point Estimation.
2. Interval Estimation.
θ is unknown. The value of the statistic ˆ is computed from the random sample taken from
the population.
The statistic ˆ for estimating a parameter θ is called estimation of θ.
Example
1. The sample mean x is an estimator of population mean µ.
2. The sample SD s is an estimator of the population SD σ
Interval Estimation
Interval estimation involves the determination of an interval within which the
population value must like with a specified degree of confidence. 100(1-α)% confidence
sample of size n. If 𝜒2 < 𝜒21− 𝛼 and 𝜒2 > 𝜒𝛼2 , i.e., when the computed value of 𝜒2 lies in the
2 2
rejection region, we reject the null hypothesis, otherwise we fail to reject the null hypothesis.
„ t „ test:
Let 𝑋̅1 , 𝑋̅2 ,… , 𝑋̅𝑛 be a random sample of size ‗n‘ drawn from a normal population with
mean 𝜇 and variance 𝜎 2 then, student ‗ t ‗ statistics is given by
𝑋̅ − 𝜇
𝑡=
𝑆/√𝑛
This follows a student ‗t‘ distribution with (n-1) d.f. where
∑ 𝑋̅ 1
𝑋̅ = 𝑎𝑛𝑑 𝑆2 = ∑(𝑋̅𝑖 − 𝑋̅) 2 (𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒)
𝑛 𝑛−1
t – Distribution is used to test.
1. Test for single mean for single sample case.
2. Test for equality of two means for double sample case (independent samples and
dependent samples (paired – t- test)
3. Test for significance of observed correlation coefficient.
4. Test for significance of observed partial and multiple correlation coefficient.
5. Test for significance of observed regression coefficient.
Properties of t-distribution
(1) The t-distribution ranges from −∞ to ∞ just as does a normal distribution.
(2) The t-distribution like the standard normal distribution is bell shaped and symmetrical
around zero
(3) The shapes of the distribution changes as the number of degrees of freedom changes.
Therefore, for different degrees of freedom, the t distribution has a family of t-
distributions. Hence the degrees of freedom is a parameter of the t-distribution.
(4) The variance of the t-distribution is always greater than one and is defined only when
𝜈 ≥ 3 and is given by var(t)= 𝜈/𝜈 − 2.
(5) The t-distribution is more of platykurtic(less peaked at the centre and higher in tails)
than the normal distribution.
The t-distribution has a greater dispersion than the standard normal distribution. As n gets
larger the t-distribution approaches the normal distribution. When n is as large as 30, the
differences very small and the t-distribution approaches normal distribution in shape.
(4) The distribution of √2𝜒2has mean equal to √2𝜈 − 1 and the standard deviation 1
The sum of independent 𝜒2variates is also a 𝜒2variate. Therefore if 𝜒12 is a 𝜒2 variate with
𝜈1 d.f. and 𝜒22 is another 𝜒2variate with 𝜈2 d.f. independent of 𝜒12, then their sum 𝜒12 + 𝜒22 is
also a 𝜒2variate with 𝜈1 + 𝜈2 d.f. this property is known as the additive property of 𝜒2.
Conditions for the application of 𝝌𝟐Test
The following five basic conditions must be met in order for chi square analysis to be applied
(i) The experimental data (sample observation) must be independent each other.
(ii) The sample data must be drawn at random from target population
(iii) The data should be expressed in original units for convenience of comparison, and
not in percentage or ratio form
(iv) The sample should contain at least 50 observations
There should not be less than five observations in any cell (each data entry known as a cell).
For less than 5 observations the value of 𝜒2 shall be overestimated and result in too many
rejections of null hypothesis.
Application of 𝝌𝟐Test
𝜒2distribution has a large number of applications in statistics, some of which are enumerated
below:
1. To test the hypothetical value of the population variance is 𝜎 2 = 𝜎𝑜2
2. To test the goodness of fit
3. To test the independence of attributes
To test the homogeneity of independent estimates of the population variance.
Remarks
1. If the differences between O and E are greater, the chi-square will be greater, and vice
versa. If there is no difference between O and E, the 𝜒2will be zero and vice versa.
2. Only frequencies can be used
3. Percentages, proportions, etc., cannot be used
4. All observations must be independent and mutually exclusive
5. Number of observation must be large
A minimum expected frequency of 5 is necessary in each O-E combination
Correlation:
Correlation is the study of relationship between two or more variables. Whenever we co nduct
any experiment we gather information on more related variables. When there are two related
variables their joint distribution is known as bivariate normal distribution and if there are
more than two variables their joint distribution is known as multivariate normal
distribution.
In case of bivariate (or) multivariate normal distribution, we are interested in discovering and
measuring the magnitude and direction of relationship between two (or) more variables, for
this we use the tool known as correlation.
Types of Correlation:
There are four important ways of classifying correlation, viz,
When the variables move in the same direction, these variables are said to be positively
correlated (or) direct correlation and if they move in the opposite direction they are said to
be negatively correlated (or) indirect (or) inverse correlation.
If the amount of change in one variable tends to bear constant ratio to the amount of change
in the other variable, then the correlation is said to be linear correlation.
Example: If rainfall is doubled, the production of rice would not necessarily be doubled.
When both the variables are not normal, the linear correlation coefficient procedure is not
applicable and we have to use rank correlation. The two methods of computing rank
correlation are one proposed by Spearman and another by Kendall. Spearman‟s rank
correlation procedure starts within ranking of the measurements of the values of X and Y
separately.
Scatter Diagram:
To investigate whether there is any relation between the variables X and Y we use scatter
diagram. Let (x1, y1), (x2, y2),….,(xn, yn) be ‗n‘ pairs of observations. If the variables X
and Y are plotted along the X axis and Y axis respectively in the X-Y plane of a graph sheet
the resultant diagram of dots is known as scatter diagram. From the scatter diagram we can
say whether there is any correlation between x and y and whether it is positive (or) negative
(or) the correlation is linear (or) curvilinear.
1. The variables under study are continuous random variables and they are normally
distributed.
The index of the degree of the relationship between two continuous variables is
known as correlation coefficient. The correlation coefficient is symbolized as ‗r‘ in case of
sample and as ‗ρ‘ in case of population. The correlation coefficient ‗r‘ is known as Karl
Pearson‟s correlation coefficient. It is often referred to as product moment correlation.
Properties:
The correlation coefficient values ranges between –1 to +1 (i.e.) −1 ≤ r ≤+1.
The correlation coefficient is not affected by change of origin (or) scale (or) both.
If r > 0, it denotes positive correlation and r < 0, it denotes negative correlation
between the two variables X and Y. If r = 0, then the two variables are not linearly
correlated. If X and Y are independent, then r = 0. (i.e.) no correlation. If r = +1, the
correlation is perfect positive correlation and if r = –1 then the correlation is perfect
negative correlation.
The correlation coefficient between X and Y is same as Y and X. (i.e.) rxy = ryx = r
(say as symmetric).
Multiple Correlation:
The term multiple correlation and partial correlation refers to the theory of correlation
involving more than two variables. If it is used to find the degree of relationship among three
or more variables. Let ‗x1‘ be the dependent variable and x2, x3 be independent variables.
Regression Analysis:
In the correlation section we saw how measures the relation between two related
variables. Correlation analysis serves as a technique to estimate the degree of association
between the two random variables. But in many situations one may be interested in predicting
the value of a variable or expected value of the variable when the value of other variable is
known. In such cases we use the principle of regression.
In simple regression analysis only two variables will be considered, where one may
represent cause and the other may represent effect. The variable representing cause is known
as independent variable and is denoted by ‗X‘. The variable ‗X‘ is also known as predictor
variable (or) regressor. The variable representing effect is known as dependent variable
and is denoted by ‗Y‘. ‗Y‘ is also known as predicted variable or response
Example:
If we know the past history of age and height of plants and we may be interested in
predicting height of plant of a given age. In this example, height is the dependent variable
and age is the independent variable.
Uses of Regression:
The regression analysis is useful in predicting the value of one variable from the given
value of another variable. Such predictions are useful when it is very difficult or expensive to
measure the dependent variable, Y. The other use of the regression analysis is to find out the
causal relationship between variables. Suppose we manipulate the variable X and obtain a
significant regression of variables Y on the variable X. Thus we can say that there is a causal
Variance is the square of the standard deviation. For us humans, standard deviations are
easier to understand than variances because they‘re in the same units as the data rather than
squared units. However, many analyses actually use variances in the calculations.
F-statistics are based on the ratio of mean squares. The term ―mean squares‖ may sound
confusing but it is simply an estimate of population variance that accounts for the degrees of
freedom (DF) used to calculate that estimate.
A more important use of the F-distribution is in analyzing variance to see if three or more
samples come from populations with equal means. This is an important statistical test, not so
much because it is frequently used, but because it is a bridge between univariate statistics and
multivariate statistics and because the strategy it uses is one that is used in many multivariate
tests and procedures.
This is also the beginning of multivariate statistics. Notice that in the one-way ANOVA, each
observation is for two variables: the x variable and the group of which the observation is a
part. In later chapters, observations will have two, three, or more variables.
The F-test for equality of variances is sometimes used before using the t-test for equality of
means because the t-test, at least in the form presented in this text, requires that the samples
come from populations with equal variances.
MANOVA is used under the same circumstances as ANOVA but when there are multiple
dependent variables as well as independent variables within the model which the researcher
wishes to test. MANOVA is also considered a valid alternative to the repeated measures
ANOVA when sphericity is violated.
There needs to be more participants than dependent variables. If there were only one
participant in any one of the combination of conditions, it would be impossible to determine
the amount of variance within that combination (since only one data point would be
available). Furthermore, the statistical power of any test is limited by a small sample size. (A
greater amount of variance will be attributed to error in smaller sample sizes, reducing the
Cross Tabulation
Contigency Table : When individual in a same problem have two characters and a frequency
distribution is to be made by classifying them on the basis of both the characters so as to
show the relation between the characters, the resulted term is ca lled contingency table, such
as height and weight of plants.
Cross tabulation is usually performed on categorical data — data that can be divided into
mutually exclusive groups.
An example of categorical data is the region of sales for a product. Typically, regio n can be
divided into categories such as geographic area (North, South, Northeast, West, etc) or state
(Andhra Pradesh, Rajasthan, Bihar, etc). The important thing to remember about categorical
data is that a categorical data point cannot belong to more than one category.
Cross tabulations are used to examine relationships within data that may not be readily
apparent. Cross tabulation is especially useful for studying market research or survey
responses. Cross tabulation of categorical data can be done with through tools such as SPSS,
SAS, and Microsoft Excel.
Discriminant analysis
Discriminant analysis is a technique that is used by the researcher to analyze the research data
when the criterion or the dependent variable is categorical and the predictor or the
independent variable is interval in nature. The term categorical variable means that the
dependent variable is divided into a number of categories. For example, three brands of
computers, Computer A, Computer B and Computer C can be the categorical dependent
variable. The objective of discriminant analysis is to develop discriminant functions that are
nothing but the linear combination of independent variables that will discriminate between
the categories of the dependent variable in a perfect manner. It enables the researcher to
examine whether significant differences exist among the groups, in terms of the predictor
variables. It also evaluates the accuracy of the classification. Discriminant analysis is
described by the number of categories that is possessed by the dependent variable.
Discriminant Analysis can be understood as a statistical method that analyses if the
classification of data is adequate with respect to the research data. It is implemented by
researchers for analyzing the data at the time when-
Cluster analysis
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects
in the same group (called a cluster) are more similar (in some sense) to each other than to
those in other groups (clusters). Cluster analysis, in statistics, set of tools and algorithms that
is used to classify different objects into groups in such a way that the similarity between two
objects is maximal if they belong to the same group and minimal otherwise.
Factor analysis
Factor analysis is a technique that is used to reduce a large number of variables into fewer
numbers of factors. This technique extracts maximum common variance from all variables
and puts them into a common score. As an index of all variables, we can use this score for
further analysis. Factor analysis is part of general linear model (GLM) and this method also
assumes several assumptions: there is linear relationship, there is no multicollinearity, it
includes relevant variables into analysis, and there is true correlation between variables and
factors. Several methods are available, but principal component analysis is used most
commonly.
Exploratory factor analysis: Assumes that any indicator or variable may be associated with
any factor. This is the most common factor analysis used by researchers and it is not based
on any prior theory.
Confirmatory factor analysis (CFA): Used to determine the factor and factor loading of
measured variables, and to confirm what is expected on the basic or pre-established theory.
CFA assumes that each factor is associated with a specified subset of measured variables. It
commonly uses two approaches:
Co-joint analysis
Conjoint analysis is a popular method of product and pricing research that uncovers
consumers' preferences and uses that information to help select product features, assess
sensitivity to price, forecast market shares, and predict adoption of new products or services.
Conjoint analysis is frequently used across different industries for all types of products, such
as consumer goods, electrical goods, life insurance plans, retirement housing, luxury goods,
and air travel. It is applicable in various instances that centre around discovering what type of
product consumers are likely to buy and what consumers value the most (and least) about a
product. As such, it is commonplace in marketing, advertising, and product management.
Businesses of all sizes can benefit from conjoint analysis, including even local grocery stores
and restaurants — and its scope is not just limited to profit motives, for example, charities
can use conjoint analysis‘ techniques to find out donor preferences.
Conjoint analysis works by breaking a product or service down into its components (referred
to as attributes and levels) and then testing different combinations of these components
to identify consumer preferences. For example, consider a conjoint study on smartphones.
The smartphone is sorted into four attributes which are further broken down into different
variations to create levels:
Conjoint analysis can take various forms. Some of the most common include:
Choice-Based Conjoint (CBC) Analysis: This is one of the most common forms of
conjoint analysis and is used to identify how a respondent values combinations of
features.
Full-Profile Conjoint Analysis : This form of analysis presents the respondent with a
series of full product descriptions and asks them to select the one they‘d be most
inclined to buy.
MaxDiff Conjoint Analysis: This form of analysis presents multiple options to the
respondent, which they‘re asked to organize on a scale of ―best‖ to ―worst‖ (or ―mo st
likely to buy‖ to ―least likely to buy‖).