You are on page 1of 115

KENYATTA UNIVERSITY

SCHOOL OF AGRICULTURE AND ENTERPRISE


DEVELOPMENT

DEPARTMENT OF AGRICULTURAL ECONOMICS,

STATISTICS FOR AGRICULTURE

WRITTEN BY:

VETTED BY:

Licensed under a Creative Commons Attribution-Non Commercial-Share Alike 4.0


International License

1
INTRODUCTION

Welcome to this module. Statistics is not a new term and I believe in your day to day life you
interact in one way or another with it because it is almost applicable in all human endeavors.
Statistics is a familiar and accepted part of modern world that is concern with obtaining an
insight into the real word by means of the analysis of numerical relationships.

Statistics for Agriculture is a first semester course compulsory for all students in the school of
agriculture. It entails analysis of numerical relationships where we will emphasize on the
meaning of statistics, collections of quantitative information, method of handling such data
and drawing inferences on the basis of observation. We will also discuss Measures of central
tendency and dispersion. Probabilities: permutations, combinations, frequency of distribution,
independent events and conditional probability distributions. Random variables; Samples and
sampling distribution. Sampling theory estimation, population means, variance; Confidence
intervals for samples; Analysis of variance and covariance; experimental designs; correlation
and linear regressions; forecasting and presenting of qualitative and quantitative data.

WHAT YOU WILL LEARN IN THIS COURSE

This is an interactive instructional module that uses both action and collaborative learning styles
that provide you with diverse online learning experiences and effective learning processes. You
have the course units divided into lessons. The course guide will tell you briefly what the course
is all about. It is a general overview of the course materials you will be using and how to use
those materials. It also helps you to allocate the appropriate time to each unit so that you can
successfully complete the course within the stipulated time limit. This module will enable you
know more on data collection, management and analyses, the knowledge will be helpful during
your project in data collection and analysis. It is indeed very fascinating field of agriculture

2
STATISTICS FOR AGRICULTURE FLOW CHART
LESSONS TOPIC

LESSON 1: Definition and concepts of statistics

LESSON 2: Measures of central tendency

LESSON 3: Measures of dispersion

LESSON 4: Random variable and probability

LESSON 5: Statistical inferences and hypothesis testing

LESSON 6: Sample and sampling distribution

LESSON 7: Introduction to experimental design

LESSON 8: Analysis of variance and covariance

LESSON 9: Correlation and linear regression

LESSON 10: Data presentation

OVERVIEW OF THE COURSE

LESSON 1: Definition and concepts of statistics

3
The aim of the present lesson is to enable the students to understand the meaning, definition,
nature, importance and limitations of statistics

LESSON 2: Measures of central tendency


The measures of central tendency covered in this chapter are: the Mean, Mode and Median.
The mean is equal to the sum of all the values in the data set divided by the number of values
in the data set while the mode is the most frequent score in a data set. The median is the
middle score for a set of data that has been arranged in order of magnitude.

LESSON 3: Measures of dispersion


In most cases, the measures of central tendency alone may not adequately describe a data set
and therefore it is useful to also compute the measures of dispersion which include the
standard deviation, variance, range and interquartile range. The variance, as the name
suggests, is a measure of how different the elements in a given population are. The standard
deviation is the square root of variance and is a measure of how precise the mean of a
population or sample is. The range is the difference between the largest and the smallest
observation in the data. Interquartile range is defined as the difference between the 25th and
75th percentile (also called the first and third quartile).

LESSON 4: Random variable and probability


In this lecture we will cover the different probability distributions for both discrete and
continuous variables. The skills acquired here will be useful to you when analyzing data,
especially when you want to test hypotheses about any population or sample statistic. The
different distributions can help you when testing hypotheses. The t distribution can be used
with any statistic having a bell-shaped distribution (i.e., approximately normal). On the other
hand, the Chi-square distribution is a probability distribution of the sum of squares of several
normally distributed variables. It tends to be used to (1) test hypotheses about categorical
data, and (2) test the fit of models to the observed data.

4
LESSON 5: Statistical inferences and hypothesis testing

This topic will address statistical inferences, an important concept for inferential
statistics. The concept of inferential statistic will also be introduced, and is
concerned with making predictions on inferences about a population from
observations and analyses of a sample. Inferential statistics is concerned with
making predictions or inferences about a population from observations and
analyses of a sample. When making such an inference, it is important to set and test
hypotheses.

LESSON 6: Sample and sampling distribution

This topic will address sampling distributions, an important concept for inferential statistics.
The concept of inferential statistic was also introduced, and is concerned with making
predictions on inferences about a population from observations and analyses of a sample.

LESSON 7: Introduction to experimental design

In this lecture you will learn how to design experiments. The techniques will help you to
successfully apply the concept of experimental design in diverse fields of agriculture. In the
lecture, you have learned that when designing and implementing experiments, it is important
to keep these principles in mind: Randomization, Replication, Blocking, Orthogonality and
Factorial experimentation

LESSON 8: Analysis of variance and covariance

Analysis of Variance (ANOVA) is a statistical method used to test differences between two
or more means. ANCOVA evaluates whether population means of a dependent variable (DV)
are equal across levels of a categorical independent variable (IV), while statistically
controlling for the effects of other continuous variables that are not of primary interest,

5
known as covariates (CV). ANOVA and ANCOVA are important tools that will help you to
check whether two or more groups differ in terms of a particular variable of interest.

LESSON 9: Correlation and linear regression

This lesson introduces you to correlation and simple linear regression. This can be used to
examine the presence of a linear relationship between two variables while providing certain
assumptions about the data. The interpretation of the results need to be done with care,
particularly when looking for a causal relationship or when using the regression equation for
prediction.

LESSON 10: Data presentation

In this lesson we will learn how to present data results. The techniques will help you to
successfully present data in diverse fields of agriculture. In , you have learned that when
designing and implementing experiments, it is important to keep these principles in mind:
Randomization, Replication, Blocking, Orthogonality and Factorial experimentation.

MODULE LEARNING OUTCOMES

By the end of this module, you will be able to:

1 Calculate measures of central tendency and dispersion (e.g., mean and standard
deviation) and use these measures to understand important features of a dataset.

2 Computing interpreting probabilities and find probabilities for both discrete and
continuous random variables.

6
3 Apply the concept of a sampling distribution and calculate the mean and standard
deviation of the sampling distribution of the mean.

4 Hypotheses testing about means and proportions for different populations.

5 Constructing confidence intervals for means and proportions for population

6 Estimate a linear relationship between two variables and use it to predict the trend.

7 Quantitative Measurement of the relation between two variables and test a


hypothesis about the relation.

COURSE DESCRIPTION
As a student, or in your professional career, you will encounter situations that will
demand that you conduct some research, which is concerned with answering an
interesting question. The process begins with an observation that you want to
understand, and this observation could be anecdotal or could be based on some data.
From your initial observation you generate explanations, or theories, of those
observations, from which you can make predictions (hypotheses). Here’s where the
data comes into the process because to test your predictions you need data. First you
collect some relevant data (and to do that you need to identify aspects that can be
measured-variables) and then you analyze those data. The analysis of the data may
support your theory or give you cause to modify the theory. As such, the processes of
data collection and analysis and generating theories are intrinsically linked: theories
lead to data collection/analysis and data collection/analysis informs theories.

7
COURSE REQUIREMENTS

This is a blended learning course that will utilize the flex model. This means that learning
materials and instructions will be given online and the lessons will be self-guided with the
lecturer being available briefly for face to face sessions and support and also on-site (online)
most of the time. Your lecturer will be meeting you face to face to introduce a lesson and put
it into perspective and you will actively participate in your search for knowledge by
undertaking several online activities. This means that some of the 39 instructional hours of
the course will be delivered face to face while other lessons will be taught online through
various learner and lecturer activities. It is important for you to note that one instructional
hour is equivalent to two online hours. Three instructional hours will be needed per week.
Out of these, one will be used for face to face contact with your lecturer (also referred as e-
moderator in the online activities) while the other two instructional hours (translating to four
online hours) will be used for online activities otherwise referred to as e-tivities in the
lessons. This will add up to the 5 hours requirement per lesson earlier mentioned. There are
27 online activities each taking at least two hours and totaling to 54 online hours. You are
advised to follow the topic flow-chart given so that you cover at least a lesson every week.

You will be required to participate and interact online with your peers and the e-
moderator who in this case is your lecturer. Guidelines for the online activities (which we
shall keep referring to as e-tivities) will be provided whenever there is an e-tivity. Please
note that since the online e-tivities are part of the learning process, they may be graded at
the discretion of your e-moderator. Such grading will however be communicated in the e-
tivity guidelines and feedback given as soon as possible after the e-tivity. The e-tivities
will include but will not be limited to online assessment quizzes, assignments and
discussions. There are also assessment questions that you can attempt at the end of every
lesson to test your understanding of the lesson. The answers to all the assessment

8
questions are at the end of the module after lesson 10. All the resource that have been
used in this module in form of books are available under the resources section after the
answers to the questions.

ASSESSMENT

It is important to note that the module has embedded certain learner formative assessment

feedback tools that will enable you gauge your own learning progress. The tools include
online collaborative discussions forums that focus on team learning and personal mastery and
will therefore provide you with peer feedback, lecturer assessment and self- reflection. You
will also be required to do one major assignment/project that is meant to assess the
application of the skills and knowledge gained during the course. The project score in
combination with scores for e-tivities (where graded) will account for 30% of your final
examination score with the remaining 70% coming from a face to face sit-in final written
examination that will be guided by your university examination policy and procedures. The
final and mid-term examination will be closed book, no notes allowed, while assignments
will be open book. The unit lecturer will grade the final exam and determine your grade at the
end of the course. Collaboration is allowed in completing the assignments, and you are
encouraged to learn from each other. Late assignments will not be accepted, unless extreme
circumstances can be demonstrated. The grading of the marks will follow University criteria
listed below;
i. 70-100= A
ii. 60-69= B
iii. 50-59= C

9
iv. 40-49= D
v. <40=F

Wish you the very best of experiences in this course.

10
TABLE OF CONTENTS

OVERVIEW OF THE COURSE 3


MODULE LEARNING OUTCOMES 6
COURSE DESCRIPTION 7
COURSE REQUIREMENTS 7
ASSESSMENT 9
TABLE OF CONTENTS 10
LESSON 1 14
DEFINITIONS AND CONCEPTS IN STATISTICS 14
1.1 Introduction 14
1.2 Lesson Learning Outcomes 14
1.2.1 Definition of Population and Sample as a basic concept of statistics 14
1.3 Assessment Questions 20
1.4 References 21
LESSON 2: MEASURES OF CENTRAL TENDENCY 22
2.1 Introduction 22
2.2 Lesson Learning Outcomes 22
2.2.1 THE MEAN 22
2.2.3 THE MEDIAN 25
2.3 Assessment Questions 27
2.4 References 30
LESSON 3 31
MEASURES OF DISPERSION 31
3.1 Introduction 31
3.2 Lesson learning outcomes 31
3.3 Assessment Questions 36
3.4 References 37
LESSON 4: RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 38
1.1 Introduction 38

11
1.2 Learning Outcomes 38
4.2.3 Probability Distribution of a Continous Random Variable 42
4.2.4 Normal Probability Distribution 44
4.3 Assessment Questions 46
4.4 References 47
LESSON 5: STATISTICAL INFERENCE AND HYPOTHESIS TESTING 49
5.1 Introduction 49
5.2 Learning Outcomes 49
5.2.1 Hypothesis testing 50
5.2.2 One-tail verses two-tail test 52
5.2.3 Chi-square testing 54
5.2.4 Goodness-of-fit 55
5.3 Assessment Questions 57
5.4 References 59
LESSON 6: SAMPLES AND SAMPLING DISTRIBUTION 61
6.1 Introduction 61
6.2 Learning outcomes 61
6.2.1 Inferential statistics 62
6.2.2 Sampling distribution 63
6.2.3 Construct confidence interval 65
6.2.3 E-tivity Constructing confidence intervals 66
6.3 Assessment Questions 67
6.4 References 69
LESSON 7: EXPERIMENTAL DESIGN 70
7.1 Introduction 70
7.2 Learning Outcomes 70

12
7.2.1 Principles of experimental design 70
7.2.2 Implementing experimental design 72
7.3 Assessment Questions 74
7.4 References 74
LESSON 8: ANALYSIS OF VARIANCE AND COVARIANCE 76
8.1 Introduction 76
8.2 Lesson Learning Outcomes 76
8.2.1 Analysis of Variance (ANOVA) 76
8.2.2 Analysis of Variance ANOVA 78
8.2.3 Analysis of Covariance ANCOVA 79
8.3 Assessment Questions 81
8.4 References 82
LESSON 9: CORRELATION AND REGRESSION ANALYSIS 84
9.1 Introduction 84
9.2 Learning Outcomes 84
9.2.1 Correlation 85
9.2.2 Regression 87
9.3 Assessment Questions 89
9.4 REFERENCES 91
LESSON 10: DATA PRESENTATION 92
10.1. Introduction 92
10.2. Learning Outcomes 92
10.2.1. Presenting Data in Tables 92
10.2.2. Presenting data in charts and graphs 94
10.3. Assessment Questions 96
10.4. References 98

13
14
LESSON 1

DEFINITIONS AND CONCEPTS IN STATISTICS

1.1 Introduction

In this first lesson, we lay the foundation for the entire course by defining the concept

statistics and other terms used in statistics. Throughout our teaching experiences, we have

found that an understanding of the basic principles behind the subject and their applications

increases the students’ motivation for the subject. Many students view statistics as not being

different from mathematics. In this unit you will be introduced to the basic concept of

statistics. Two basic and commonly referred concepts of statistics are population and sample.

You will appreciate the difference between population and sample. You will also learn the

terminologies associated with population and those associated with sample.

1.2 Lesson Learning Outcomes


By the end of this lesson, you will be able to:
1.2.1 Define Population and Sample as a basic concept of statistics
1.2.2 Define a sample and sampling.
1.2.3 Discuss discrete and continuous variables
1.2.4 Discuss the different sampling methods and understand the purpose and importance of
sampling and the advantage made possible by sampling

1.2.1 Definition of Population and Sample as a basic concept of statistics


1.2.1.1 Population

15
Definition: Population is the collection of all individuals or items under consideration in a

statistical study (Weiss, 1999). The Population is the whole set of values or individuals you

are interested in. The population may also be defined as the set of entities under study. An

example is the weight of cattle in a Githunguri Farm. The cattle population include all bulls

and cows currently alive, those that had lived and now dead and the ones that will live in the

future. You will not be able to measure the weights of the entire cattle population because

many cattle are yet unborn while many are already dead and unreachable. Even when it is

possible to reach all of them, it is often too costly in terms of money and time involved. In the

example you are interested in the population of cattle and your parameter of interest is body

weight.

1.2.1.2 Sample

Sample is the part of the population from which information is collected (Weiss, 1999)
Since you cannot reach all the members of the population to take measurement, you will take

a subset of population. This subset is called sample. You will then use this subset to draw

inferences about the population under study, given some conditions. You will therefore take

a subset of cattle population which is called sample, measure their weights and calculate the

average or mean. The means that you calculated from sample is called a statistic. It is this

statistic that you will use to draw an inference about the parameter of the population of

interest. Because of the uncertainty and inaccuracy involved in drawing conclusions about

the population based upon sample, you can only draw an inference about the population.

16
You should take note that you will always have few numbers in your sample than the

population. So you are bound to lose some information on the population.

E-tivity 1.2.1 - Concept of Statistics

Numbering, pacing and 1.2.1


sequencing
Title Definition of Population and Sample as a basic concept
of statistics
Purpose The purpose of this e-tivity is to enable you to
understand the difference between population and
sample.
Brief summary of overall Read document in these links
task https://laulima.hawaii.edu/access/content/user/
hallston/341website/1pops_samples.pdf

http://www.math.niu.edu/~richard/Math101/sp07/
stats3_ho.pdf

http://www.ddegjust.ac.in/studymaterial/mcom/mc-
106.pdf

17
Spark

Individual task (a) Using bullet points, outline the difference between
sample and population
(b) discuss the different types of data
Interaction begins

a) Post your answers


b) Provide positive and constructive feedback on the team
learners ‘views and ideas. Do this on the discussion
forum 1.2.1

E-moderator interventions 1 Ensure that learners are focused on the contents and
context of discussion.
2 Stimulate further learning and generation of new
ideas.
3 Provide feedback on the learning progress.
4 Round-up the e-tivity
Schedule and time This task should take two hours
Next Define Sample and sampling

18
1.2.2 Sample and sampling.
Sampling: is a technique of selecting individual members or a subset from a population to
make statistical inferences from them and eventually used to estimate characteristics of the
total population. You must ensure that your samples are randomly selected to avoid bias in
the use of the statistic to estimate the parameter.
E-tivity -1.2.2 Sample and sampling.

Numbering, pacing and 1.2.2


sequencing
Title Sample and sampling.
Purpose The purpose of this e-tivity is to enable you to understand
what sampling is and how its done
Brief summary of overall Watch this video
task https://www.youtube.com/watch?
v=1owSExITFdM&pbjreload=101

Spark

Individual task a) Define sample, sampling

b) Distinguish between probabilistic and non-

19
probabilistic sampling

c) Discuss the different sampling methods

d) Save this work in your portfolio.

Interaction begins

1. Post your work


2. Provide positive and constructive feedback on the
team learners ‘views and ideas. Do this on the
discussion forum 1.2.2

E-moderator interventions 1. Ensure that learners are focused on the contents


and context of discussion.
2. Stimulate further learning and generation of new
ideas.
3. Provide feedback on the learning progress.
4. Round-up the e-tivity
Schedule and time This task should take two hours
Next Types of variables

1.2.3 Types of variables


A variable is any characteristic or trait that varies or changes when moving from an
individual to individual or object to object in a collection. If you conduct an study or an
experiment, several variable are involved. You may be interested in the sex of the animal, the
parity of the animal, number of piglets in a litter of a sow or the diameter of a sugarcane stem
etc. Meaning they could be counts or measurements

20
E-tivity 1.2.3 _ Types of variables
Numbering, pacing 1.2.3
and sequencing
Title Types of Variables
Purpose The purpose of this e-tivity is to enable you to distinguish
between the different types of variables
Brief summary of Read this doc
overall task https://laulima.hawaii.edu/access/content/user/hallston/
341website/2typesofvariables.pdf

Spark

Individual task Distinguish the different types of variables and post in


discussion forum 1.2.3
Interaction begins

a) Post your analysis here in discussion forum 1.2.3


b) Provide positive and constructive feedback on the team
learners ‘views and ideas. Do this on the discussion forum
1.2.3

E-moderator 1. Ensure that learners are focused on the contents and context
interventions of discussion.
2. Stimulate further learning and generation of new ideas.

21
3. Provide feedback on the learning progress.
4. Round-up the e-tivity
Schedule and time This task should take two hours

Next Measures of central tendency

1.3 Assessment Questions

1. Define population in your own words

2. Define sample also in your own word.

3. What is the relationship between population and sample

4. Define variable. Can you differentiate between discrete and continuous variable

5. Classify the following variable

a) Weight of broiler chicken at 8 weeks

b) Sex of day hold pullet chicks

c) Pollness in cattle (whether a cow or bull has or does not have horn)

d) Number of egg produced by chickens

e) Body length of goat

f) Number of parity of a cow or goat or sheep

g) Litter size

22
h) Litter weight

i) Staff strength on the poultry farm

j) Milk yield

1.4 References
Agarwal, B. L. (2009). Basic Mathematics Fifth Edition. Delhi: New Age International (P)
Limited Publishers.
Gupta, S., & Kapoor, V. K. (1980). Fundamentals of Mathematics Statistics 7th Edition.
Delhi: Sultan Chand & Sons.
Beierlein, J., Schneeberger, K., & Osbum, D. (2008). Principles of Agribusiness
Management.Third Edition. Waveland Press Inc.
Sunderson, T., & Scolve, S. (1978). An Introduction to the Statistics Ananlysis of data.
Boston: Houghton Mifflin.

23
LESSON 2: MEASURES OF CENTRAL TENDENCY
2.1 Introduction
In this lecture you will learn the measures of central tendency and dispersion which are very
important in statistics. A measure of central tendency can be defined is a single value that
describe the central position in a given set of data. As such, measures of central tendency are
sometimes called measures of central location. They are also classed as summary statistics.
The mean (often called the average) is most likely the measure of central tendency that is
commonly used, but there are others, such as the median and the mode. The mean, median
and mode are the common valid measures of central tendency.
However, the measures of central tendency are not adequate enough to describe data. For
example, two sets of data can have the same mean but does not necessarily mean they are the
same. Thus knowing the extent of variability is important when to describe data. Measure of
dispersion includes the range, interquartile range, and standard deviation.

2.2 Lesson Learning Outcomes


By the end of this lesson, you will be able to:

2.2.1 Compute the Mean


2.2.2 Compute Mode
2.2.3 Compute Median

24
2.2.1 THE MEAN
The mean (or average) is the most popular and well known measure of central tendency. It
can be used with both discrete and continuous data, although its use is most often with
continuous data. The mean is equal to the sum of all the values in the data set divided by the
number of values in the data set. So, if we have n values in a data set and they have values x 1,
x2, ..., xn, the sample mean, usually denoted by (pronounced x bar), is

The above formula refers to the sample mean. So, why have we called it a sample mean? This
is because, in statistics, samples and populations have very different meanings and these
differences are very important, even if, in the case of the mean, they are calculated in the
same way. To acknowledge that we are calculating the population mean and not the sample
mean, we use the Greek lower case letter "mu", denoted as µ:

E-tivity 2.2.1 The mean

Numbering, pacing 2.2.1


and sequencing
Title The mean

25
Purpose To enable you understand and be able to compute the mean

Brief summary of a) Read the linked


overall task http://www.cimt.org.uk/cmmss/S1/Text.pdf
watch this video
https://www.youtube.com/watch?v=zjHfAhcU6kE&pbjreload=101

Spark

Individual task

a) Following the example in the video link answer the questions at the
end of the topic
b) Your answers in this section should be posted to the discussion forum
2.2.1

Interaction begins a) Post two most important internal and two external factors of
motivation.
b) Provide positive descriptive comments on your team learners’
answers with a view of enhancing further thinking. Do this on the
discussion forum 2.2.1
E-moderator a) Ensure that learners are focused on the contents and context of

26
interventions discussion.
b) Stimulate further learning and generation of new ideas.
c) Provide feedback on the learning progress.
Schedule and time This activity should take two hours
Next mode

2.2.2 THE MODE

The mode is the most recurring score in our data set. It represents the highest bar in a bar
chart or histogram. You can, therefore, sometimes consider the mode as being the most
prevalent option.

Figure 1: Description of the Mode

27
The mode has two major weaknesses:
1. This measure is not appropriate to use for continuous data
2. The mode does not provide us with a very good measure of central tendency
when the most common mark is far away from the rest of the data in the data
set.

E-tivities 2.2.2 Mode


Numbering, 2.2.2
pacing and
sequencing
Title Compare and contrast the advantages and disadvantages of self-
employment
Purpose To enable you understand and be able to compute the meode

Brief summary of Read the document in the link below


overall task http://www.cimt.org.uk/cmmss/S1/Text.pdf
watch this video
https://www.youtube.com/watch?
v=zjHfAhcU6kE&pbjreload=101
https://www.youtube.com/watch?v=V7o2C61IQjQ

After reading
 distinguish between popularion and sample mean

28
Spark

Individual task

 After reading attempt the exercise found at the end of the topic
 save the answers on your portfolio.

Interaction a) Think of two comparative points of wage and self- employment.


begins b) Read your colleagues’ comments and provide positive
descriptive comments on their answers with a view of enhancing
further thinking. Do this on the discussion forum 2.2.2
E-moderator a) Ensure that learners are focused on the contents and
interventions context of discussion.
b) Stimulate further learning and generation of new ideas.
c) Provide feedback on the learning progress.
Schedule and This activity should take two hours
time
Next Median

2.2.3 THE MEDIAN

29
The median is the middle score for a set of data that has been arranged in order of magnitude.

The median is that value of the variable which divides the group into two equal parts, one

part comprising of all values greater, and the other, all values less than median.

The median is less affected by outliers and skewed data. In order to calculate the median,
suppose we have the data below:
65 55 89 56 35 14 56 55 87 45 92

We first need to rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89 92

Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the middle
mark because there are 5 scores before it and 5 scores after it. This works fine when you have
an odd number of scores, but what happens when you have an even number of scores? What
if you had only 10 scores? Well, you simply have to take the middle two scores and average
the result. So, if we look at the example below:
65 55 89 56 35 14 56 55 87 45

We again rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89 92

Only now we have to take the 5th and 6th score in our data set and average them to get a
median of 55.5.

30
E-tivity 2.2.3 Median

Numbering, pacing 2.2.3


and sequencing
Title Median
Purpose To be able to calculate median in a given data

Brief summary of Read the document in the linked


overall task http://www.cimt.org.uk/cmmss/S1/Text.pdf.
Watch this video
https://www.youtube.com/watch?v=zjHfAhcU6kE&pbjreload=101
https://www.youtube.com/watch?v=V7o2C61IQjQ

Follow the worked examples and the attempt the exercises there after

31
Spark

Individual task

a) Attempt the exercises given at the end of the-topic

b) Your answers in this section should be posted to the discussion forum


2.2.1

Interaction begins c) Post your work on the discussion


d) . Do this on the discussion forum 2.2.1
E-moderator d) Ensure that learners are focused on the contents and context of
interventions discussion.
e) Stimulate further learning and generation of new ideas.
f) Provide feedback on the learning progress.
Schedule and time This activity should take two hours
Next Measures of dispersion

32
2.3 Assessment Questions
1. suppose you have the data below calculate the median,:

65 55 89 56 35 14 56 55 87 45 92

2. Differentiate between the following terminologies.

i. Probabilistic sampling and non-probabilistic sampling.

ii. Cluster sampling and Stratified sampling

iii. Census and sampling

iv. Descriptive statistics and inferential statistics

3. While conducting a research the following information was collected and used for
preliminary statistics-

Age group Frequency class

0-30 16
30-60 43
60-90 56
90-120 32
120-150 19

Use the information to calculate

i. Mean

ii. Median

33
iii. Mode

4. The stem diameter was measured for each of 10 randomly selected maize plant, the
following measurements (mm) were recorded: 45.9, 52.4, 65.0, 65.3, 69.2, 57.8, 72.5,
69.9, 64.7, 72.6. Calculate

i. Median

ii. Semi Interquartile range

iii. Variance

iv. Standard deviation

For each of these questions, choose the option (A, B, C or D) that is TRUE.

1. The range of a sample gives an indication of the

(A) Way in which the values cluster about a particular point

(B) Number of observations bearing the same value

(C) Maximum variation in the sample

(D) Degree to which the mean value differs from its expected value.

2. The observation which occurs most frequently in a sample is the

(A) Median

(B) Mean deviation

(C) Standard deviation


(D) Mode

3. What is the median of the sample 5, 5, 11, 9, 8, 5, 8 ?

(A) 5

(B) 6

(C) 8

34
(D) 9

Items 4 - 5 refer to the information below.

The following scores were obtained by eleven footballers in a goal-shoot competition:

5 3 6 8 7 8 3 11 6 3 2

4. The modal score was

(A) 3

(B) 6

(C) 8

(D) 11

5. The median score was

(A) 3

(B) 6

(C) 8

(D) 11

6. The mean of ten numbers is 58. If one of the numbers is 40,

what is the mean of the other nine?

(A) 18

(B) 60
(C) 162

(D) 540

7. The mean of 11 numbers is 7. One of the numbers, 13, is deleted.

What is the mean of the remaining 10 numbers?

(A) 7.7

(B) 6.4

35
(C) 6.0

(D) 5.8

2.4 References
Devore, J. L., & Peck, R. (1986). Statistics: The exploration and analysis of data. St. Paul:
West Pub. Co.
Donnelly R. A. (2004). The complete idiot’s guide to statistics (Vol. The complete idiot's
guide).Indianapolis, IN: Alpha.

Gomez, K. A., & Gomez, A. A. (1984). Statistical procedures for agricultural research (2nd
ed.). New York: Wiley.

Kaps, M., Lamberson, W. R., & Lamberson, W. (2004). Biostatistics for animal science.
Wallingford: CABI Publishing.

LESSON 3
MEASURES OF DISPERSION

3.1 Introduction
Welcome to lesson three. In this lesson, we introduce you to the measures of dispersion.
The Measures of central tendency enables us to have a bird’s eye view of the entire data. This
is called averages of the first order and it serve to identify the center of the distribution
though they do not tell how the items are spread out on either side of the central value. The
measure of the scattering of items in a distribution about the average is called dispersion.
Dispersion measures the extent to which the items vary from some central value. It may be
noted that the measures of dispersion or variation measure only the degree but not the
direction of the variation. The measures of dispersion are also called averages of the second

36
order because they are based on the deviations of the different values from the mean or other
measures of central tendency which are called averages of the first order.
There are three main measures of dispersion:
 The range
 The semi-interquartile range (SIR)
 Variance / standard deviation
.
3.2 Lesson learning outcomes
By the end of the lesson, you will be able to;
3.2.1 Compute the measures of range
3.2.2 Compute the measures of semi-interquartile range
3.2.3 Compute the measures of variance and standard deviation.

E-tivity 3.2.1 Range


Numbering, 3.2.1
pacing and
sequencing
Title Measures Range

Purpose The purpose of this e-tivity is to enable you understand and


compute range in a given set of data
Brief summary of Watch this video
overall task https://www.youtube.com/watch?v=xHYi14VktKs&frags=wn
read the following document

37
https://byjus.com/maths/dispersion/

Spark

Individual task

a) Write short notes on range stating its meaning, how it is calculated


and it uses

Interaction begins

a) Post your work


b) Read posts from other students and provide two comments on their
thoughts and ideas.
c) Post your response on the discussion forum 3.2.1

E-moderator a) Ensure that learners are focused on the contents and context of
interventions discussion.
b) Stimulate further learning and generation of new ideas.

38
c) Provide feedback on the learning progress.
d) Close the discussions
Schedule and time This activity should take two hours
Next Semi- interquartile range

3.2.2 Semi-interquartile range (SIR)


It is the second measure of dispersion which is improved version over the range. It is based
on the quartiles so while calculating this one require upper quartile (Q3) and lower quartile
(Q1) and then is divided by 2.
Hence it is half of the deference between two quartiles it is also a semi inter quartile range.
The formula of Quartile Deviation is
(Q D) = Q3 - Q1
2
E-tivity 3.2.2 Semi interquartile range
Numbering, pacing 3.2.2
and sequencing
Title Semi-interquartile range (SIR)

Purpose The purpose of this e-tivity is to enable you appreciate and


compute semi-interquartile range (SIR)
Brief summary of Read the following article
overall task https://byjus.com/maths/dispersion/
Watch this video
https://www.youtube.com/watch?v=xHYi14VktKs&frags=wn

39
Spark

Individual task

Describe what quartiles are


describe how SIR is calculated

Interaction begins

a) Read posts from other students and provide two comments on


their thoughts and ideas.
b) Post you response on the discussion forum 3.2.2

E-moderator a) Ensure that learners are focused on the contents and context of
interventions discussion.

40
b) Stimulate further learning and generation of new ideas.
c) Provide feedback on the learning progress.
d) Close the discussions
Schedule and time This task should take two hours
Next Variance and standard deviation

3.2.3 Variance and standard deviation

The variance, is a measure of how different the elements in a given population are. Variance
is used to indicate how spread out these elements are from the central point of the population.
Two kinds of variance exist: population variance and sample variance. Population variance is
2
the variance of the entire population and is denoted by σ . The standard deviation on the
other hand is the square root of variance. Standard deviation is a measure of how precise the
mean of a population or sample is. It is used to indicate trends in the elements in a given data
set with respect to the mean, i.e. the spread of these elements from the mean. Standard
deviation (SD) is the most commonly used measure of dispersion. It is a measure of spread of
data about the mean. SD is the square root of sum of squared deviation from the mean
divided by the number of observations.

E-tivities 3.2.3 Variance and standard deviation

Numbering, 3.2.3
pacing and
sequencing
Title Variance and standard deviation

41
Purpose The purpose of this e-tivity is to enable you calculate the variance and
the standard deviation and understand how it is used in statistics
Brief summary of Read this
overall task https://byjus.com/maths/dispersion/
watch the video linked
https://www.youtube.com/watch?v=lp2nTFdYGec&pbjreload=101
https://www.youtube.com/watch?v=Ks_rGi7_-yc

Spark

Individual task

a) Write short note on variance and standard deviation explaining


their meaning, how they are calculated and their uses in statistics
b) Post your work on the discussion forum

42
Interaction begins

a) Read posts from other students and provide two comments on their
thoughts and ideas.
b) Post your response on the discussion forum 3.2.3
c) Refine your answer based on any new insight acquired from your
colleagues’ posts and save it on your portfolio.

E-moderator (a) Ensure that learners are focused on the contents and context of
interventions discussion.
(b) Stimulate further learning and generation of new ideas.
(c) Provide feedback on the learning progress.
(d) Close the discussions
Schedule and time This task should take two hours
Next Lesson 4: Random variables and probability disrtibition

3.3 Assessment Questions


1. The scatter in a series of values about the average is called:
(a) Central tendency (b) Dispersion (c) Skewness (d) Symmetry
2. The measurements of spread or scatter of the individual values around the central point is
called:
(a) Measures of dispersion (b) Measures of central tendency (c) Measures of skewness
(d)Measures of kurtosis
3. The measures used to calculate the variation present among the observations in the unit of
the variable is called:
(a) Relative measures of dispersion (b)Coefficient of skewness (c) Absolute measures of
dispersion (d) Coefficient of variation

43
4. The measures used to calculate the variation present among the observations relative to
their average is called:
(a) Coefficient of kurtosis (b) Absolute measures of dispersion (c) Quartile deviation (d)
Relative measures of dispersion
5. The degree to which numerical data tend to spread about an average value called: (a)
Constant (b) Flatness (c) Variation (d) Skewness
6. The measures of dispersion can never be: (a) Positive (b) Zero (c) Negative(d) Equal to 2
7. If all the scores on examination cluster around the mean, the dispersion is said to be:
(a) Small (b) Large (c) Normal (d) Symmetrical

3.4 References

Devore, J. L., & Peck, R. (1986). Statistics: The exploration and analysis of data. St. Paul:
West Pub. Co.

Donnelly R. A. (2004). The complete idiot’s guide to statistics (Vol. The complete idiot's
guide).Indianapolis, IN: Alpha.

Gomez, K. A., & Gomez, A. A. (1984). Statistical procedures for agricultural research (2nd
ed.).New York: Wiley.

Kaps, M., Lamberson, W. R., & Lamberson, W. (2004). Biostatistics for animal
science. Wallingford: CABI Publishing.

Jaisingh, L. R. (2006). Statistics for the utterly confused (2nd ed.). New York: McGraw-
Hill. McDonald, J.H. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House
Publishing, Baltimore, Maryland (http://www.biostathandbook.com/index.html )

Weiss N.A. 1999. Elementary Statistics, fourth edition. Reading, Massachusetts: Addison-
Wesley Publishing Company

44
LESSON 4: RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
1.1 Introduction
As explained in the previous section, research involves collection of data. This data is

normally collected by looking at variables of interest. A random variable, usually written X,

is a variable whose possible values are numerical outcomes of a random phenomenon. There

are two types of random variables, discrete and continuous. All random variables (discrete

and continuous) have a cumulative distribution function. It is a function giving the

probability that the random variable X is less than or equal to x, for every value x. For a

discrete random variable, the cumulative distribution function is found by summing up the

probabilities. A probability distribution is a table or an equation that links each possible

value that a random variable can assume with its probability of occurrence. This lecture is

concerned with elaborating on these concepts.

In life, there is no certainty. Every event that come our away is always associated with some

level of uncertainty. When you plant some seeds of corns, there is some probability that this

corn will germinate. The probability of the corn germinating may be improved if certain

condition is fulfilled. For example, if the soil is well water, the probability of germination is

improved. As you go through this unit, you will understand the concept of probability

distribution as it is related to agricultural experiments and research.

1.2 Learning Outcomes


By the end of the lesson, you will be able to;

45
1.2.1 Distinguish between discrete and continuous random variables
1.2.2 Probability distribution for discrete variables
1.2.3 Construction frequency distribution for a continuous variable
1.2.4 Identify the skew of a distribution

1.1.1 Discrete / continuous random variable

A discrete random variable is one which may take on only a countable number of distinct
values such as 0, 1, 2, 3, 4, ... Discrete random variables are usually (but not necessarily)
counts. If a random variable can take only a finite number of distinct values, then it must be
discrete. While a continuous random variable is a random variable which can take values
measured on a continuous scale e.g. weights, strengths, times or lengths. A continuous
random variable is one which takes an infinite number of possible values. Continuous random
variables are usually measurements.

E-tivity 4.2.1 Discrete/ continuous random variable

Numbering, pacing and sequencing 4.2.1

Title Discrete/ continuous random variable


Purpose The purpose of this e-tivity is to enable you to

46
distinguish between discrete and continuous variables
Brief summary of overall task Read slides in this link
https://ocw.mit.edu/courses/sloan-school-of-
management/15-063-communicating-with-data-
summer-2003/lecture-notes/lecture6.pdf
Read this article
https://www.uplifteducation.org/cms/lib/
TX01001293/Centricity/Domain/273/RANDOM%20-
%20discrete%20and%20continuous%20-
%20VARIABLE.pdf

-
and Watch the
https://www.youtube.com/watch?v=PlUsFNLRUOc
https://www.youtube.com/watch?v=gPAxuMKZ-w8

After reading and watching respond to the following


questions

a. Distinguish between discrete and continuous


variables
b. Differentiate between probability density
function and cumulative distribution function

47
Spark

Individual task 1. Click on the link provided


2. Watch the video and slides
3. Read the articles
4. Give your answers in discussion forum 4.2.1
Interaction begins 1. Post your answer in the discussion forum 4.2.1
2. Read posted work from 2 colleagues
3. Post at least one comment on their work and
give constructive criticism
E-moderator interventions 1. Summarize the threads
2. Give feedback
3. Provide teaching points
4. Close the E-tivity
Schedule and time This task will take two hours
Next Probability distributions for Discrete Random
variables

48
1.1.2 Probability distributions for Discrete Random variables

The probability distribution of a discrete random variable y is the table, graph or formula
that assign the probability P(y) for each possible value of the variable y. A random
variable is an outcome that takes on a numerical value as a result of an experiment. The
value is not known with certainty before the experiment. But you know the sample space
of the experiment. You can denote the value of the random variable as x. For example in
an experiment where a single dice is rolled, the P(x=1) = 1/6, P(x=2)=1/6, P(x=3) = 1/6,
P(x=4)=1/6, P(x=5) = 1/6 and P(x=6)=1/6. The sum of all the probability is 1..

E-tivity 4.2.2. Probability distributions for Discrete Random variables

Numbering, pacing and 4.2.2


sequencing
Title

Probability distributions for Discrete Random variables

Purpose The purpose of this e-tivity is to enable the you understand


probability distribution of random discrete variables
Brief summary of overall Listen to this video and
task https://www.youtube.com/watch?v=UnzbuqgU2LE

After listening and reading answer the question:-


1. Define random variable
2. Define discrete random variable
3. Construct a probability distribution table

49
Spark

Individual task

1. Click on the link provided


2. Watch the video
3. Give your answers in discussion forum 4.2.2
4. Post the scanned work in the discussion forum 4.2.2

Interaction begins 1. Post your answer in the discussion forum 4.2.2


2. Read posted work from 2 colleagues
3. Post at least one comment on their work and give
constructive criticism
E-moderator interventions

1. Summarize the threads


2. Give feedback
3. Provide teaching points
4. Close the E-tivity

Schedule and time This task will take 2 hours


Next Probability Distribution of a Continous Random Variable

1.2.1 Probability Distribution of a Continous Random Variable


The probability distribution of a continuous random variable is represented by an
equation, called the probability density function (PDF). In probability theory, a PDF or

50
density of a continuous random variable is a function that describes the relative likelihood
for this random variable to take on a given value. The probability of the random variable
falling within a particular range of values is given by the integral of this variable’s density
over that range—that is, it is given by the area under the density function but above the
horizontal axis and between the lowest and greatest values of the range. The integral of a
PDF over the entire space is equal to one.
The distribution of a continuous random variable can be characterized through its
probability density function (PDF). The probability that a continuous random variable
takes a value in a given interval is equal to the integral of its probability density function
over that interval, which in turn is equal to the area of the region in the xy-plane bounded
by the x-axis, the pdf and the vertical lines corresponding to the boundaries of the
interval.
E-tivity 4.2.3 Probability Distribution of a Continuous Random Variable

Numbering, pacing and 4.2.3


sequencing
Title Probability Distribution of a Continuous Random
Variable
Purpose The purpose of this e-tivity is to enable the you explain
why we use probability density for continuous random
variables.
Brief summary of overall task Read these notes on the links below
https://www.pnw.edu/wp-content/uploads/2020/03/lecture
notes5-10.pdf
https://www.colorado.edu/amath/sites/default/files/
attached-files/ch4.pdf

Watch this video link

51
https://www.youtube.com/watch?v=9KVR1hJ8SxI

After reading respond to the following questions


1. Define continuous random variable
2. Define probability density function and cumulative
distribution function
3. write characteristics of a probability density
function.

Spark

Individual task  Read the notes and watch the video provided
 Answer the questions above

Interaction begins

 Post your answer in the discussion forum 4.3.3


 Read posted work from 2 colleagues
 Post at least one comment on their work and give
constructive criticism

52
E-moderator interventions

 Summarize the threads


 Give feedback
 Provide teaching points
 Close the E-tivity

Schedule and time This task will take two hours


Next Normal Probability Distribution

1.2.2 Normal Probability Distribution


The normal distribution is a branch of probability distribution. The proportion of the area
that falls under the curve between two points on a probability distribution plot indicates
the probability that a value will fall within that interval.
The Normal Probability Distribution commonly used in the field of statistics. A curve is
usually drawn whenever measurements of things like people's height, weight, etc and the
graph of the results is often a normal curve.

E-tivity 4.2.3 Normal Probability Distribution


Numbering, pacing and 4.2.3
sequencing
Title Normal distribution
Purpose The purpose of this e-tivity is to enable understand
better on the concept of normal distribution and how
its important in statistics
Brief summary of overall task Read this on this link
https://www.intmath.com/counting-probability/14-

53
normal-probability-distribution.php

Listen this video


https://www.youtube.com/watch?v=gI5y3RZe9fk

After reading respond to the following questions


1. Explain what is normal probability distribution
2. Draw and describe the properties of a normal
distribution curve
3.
Spark

Individual task  Read the notes provided


 Scan your work and share with two classmates
 Refine your answers using any new knowledge
acquired from discussing with your colleagues and
post on your portfolio.
Interaction begins  Post your answer in the discussion forum 4.3.3
 Read posted work from 2 colleagues
 Post at least one comment on their work and give
constructive criticism
E-moderator interventions  Summarize the threads
 Give feedback
 Provide teaching points

54
 Close the E-tivity
Schedule and time This task will take two hours
Next Lesson 5: Statistical inference and Hypothesis

4.3 Assessment Questions

1 Which of the following statements can best describes the relationship between a
parameter and a statistic?

a) A parameter has a sampling distribution with the statistic as its mean.

b) A parameter has a sampling distribution that can be used to determine what values
the statistic is likely to have in repeated samples.

c) A parameter is used to estimate a statistic.

d) A statistic is used to estimate a parameter.

2 What is the area under a conditional Cumulative density function?

a) 0

b) Infinity

c) 1

d) Changes

3 A table with all possible value of a random variable and its corresponding
probabilities is called ___________

a) Probability Mass Function

b) Probability Density Function

c) Cumulative distribution function

d) Probability Distribution

4 In a wild life conservation survey on birds, it was showed that one out of ten
quails was trapped, using mist net, in a given season. If 20 birds are selected

55
at random, find the probability that 6 of the birds were trapped in the previous
season.

5 A farmer has capacity for keeping just for goats.

i. A farmer’s goat is expecting a set of twin. What is the sample space for this
farmer’s expectation (hint: any combinations of the two sexes are possible)

ii. What is the probability the goat giving birth to at least 1 male?

iii. What is the probability of giving birth to 2 females.

iv. Construct a probability distribution for your answers.

6 What is Normal Distribution?


i. What is the relationship between normal distribution, standard deviation
and means?

4.4 References
Devore, J. L., & Peck, R. (1986). Statistics: The exploration and analysis of data. St. Paul:
West Pub. Co.

Donnelly R. A. (2004). The complete idiot’s guide to statistics (Vol. The complete idiot's
guide).Indianapolis, IN: Alpha.

Gomez, K. A., & Gomez, A. A. (1984). Statistical procedures for agricultural research (2nd
ed.).New York: Wiley.

Kaps, M., Lamberson, W. R., & Lamberson, W. (2004). Biostatistics for animal
science. Wallingford: CABI Publishing.

Jaisingh, L. R. (2006). Statistics for the utterly confused (2nd ed.). New York: McGraw-
Hill. McDonald, J.H. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House
Publishing, Baltimore, Maryland (http://www.biostathandbook.com/index.html )

Weiss N.A. 1999. Elementary Statistics, fourth edition. Reading, Massachusetts: Addison-
Wesley Publishing Company

56
LESSON 5: STATISTICAL INFERENCE AND HYPOTHESIS TESTING
1.1 Introduction
The main goal of statistical inference and hypothesis testing is to enable a researcher make a
statement about something that is not observed within a certain level of uncertainty.
Inferences are normally based on sample to give information that will be generalized about a
population i.e. the objective is to understand the population based on the sample. Population
is a collection of objects that we want to study/test. For example, if you are studying quality
of products from a production line for a given day, then the whole production for that day is
the population. In the real world, it may be hard to test every product – hence we draw a
sample from the population and infer the results based on the sample for the total population.

The statistical model in this case offers an abstract representation of the population and how
the elements of the population relate to each other. Parameters are numbers that represent
features or associations of the population and are usually estimated from the data. A
parameter represents a summary description of a fixed characteristic or measure of the target
population. It denotes the true value that would be obtained as if we had carried out a census
(instead of a sample). Parameters include Mean (μ), Variance (σ²), Standard Deviation (σ),
Proportion (π). These values are individually called a statistic. A Sampling Distribution is a
probability distribution of a statistic obtained through a large number of samples drawn from
the population. In sampling, the confidence interval provides a more continuous measure of
un-certainty. The confidence interval proposes a range of plausible values for an unknown
parameter (for example, the mean). In other words, the confidence interval represents a range
of values we are fairly sure our true value lies in. For example, for a given sample group, the
mean weight is 146 cms and if the confidence interval is 95%, then it means, 95% of similar
experiments will include the true mean, but 5% will not contain the sample.
1.2 Learning Outcomes

57
By the end of the lesson, you will be able to;

1.2.1 Define hypothesis testing


1.2.2 Distinguish one-tail verses two-tail test
1.2.3 Chi –square testing
1.2.4 Goodness-of-fit

5.2.1 Hypothesis testing


A statistical hypothesis is an assumption about a population parameter which may or may not
be true. Hypothesis testing refers to the formal procedures used by statisticians to accept or
reject statistical hypotheses. The best way to determine whether a statistical hypothesis is true
would be to examine the entire population. Since that is often impractical, researchers
typically examine a random sample from the population. If sample data are not consistent
with the statistical hypothesis, the hypothesis is rejected. When analyzing the data and fitting
models into your data it is important to test different hypotheses. Scientific statements can be
split into testable hypotheses. The hypothesis or prediction that comes from your theory is
usually saying that an effect will be present. This hypothesis is called the alternative
hypothesis and is denoted by H1. (It is sometimes also called the experimental hypothesis but
because this term relates to a specific type of methodology it’s probably best to use
‘alternative hypotheses.) This is the hypothesis that sample observations are influenced by
some non-random cause. There is another type of hypothesis, though, and this is called the
null hypothesis and is denoted by H0. This hypothesis is the opposite of the alternative
hypothesis and so would usually state that an effect is absent. The null hypothesis is usually
the hypothesis that sample observations result purely from chance.

58
E-tivity 5.2.1 Concept of design thinking

Numbering, pacing 5.2.1


and sequencing
Title Hypothesis testing
Purpose The purpose of this e-tivity is to enable the you understand and apply the
concept of hypothesis testing
Brief summary of Read this notes from the link provided
overall task http://spots.gru.edu/nsmith12/openstats/chapter9_stats.pdf
http://pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6_-
_Statistical_Inference/6.1_-_One_Sample.pdf
After, Listen to the following video
https://www.youtube.com/watch?v=zJ8e_wAWUzE
to further understand the concept of hypothesis testing:
After reading respond to the following questions
1. Define hypothesis testing
2. Distinguish between the two types of hypotheses
3. Explain what are Z-test and T-test
4. Explain how and when to use Z-test and T-test
5. Explain how do make conclusion when using any test?

59
Spark

Individual task  Read the document provided and listen to the video clip and use it to
fill the answers on section e. above
 Post the discussion on discussion board 5.2.1

Interaction begins  Post your answer in the discussion forum 5.2.1


 Read posted work from 2 colleagues

E-moderator  Summarize the threads and review the scanned forms


interventions  Give feedback
 Provide teaching points
 Close the E-tivity
Schedule and time This task will take two hours
Next One-tail versus two-tail test

5.2.2 One-tail verses two-tail test


Its prudent to understand the meaning of a two-tailed test. When using a significance level of
0.05, a two-tailed test will have half of the alpha used for testing the statistical significance in
one direction and the other half of the alpha in the other direction. This means that .025 is in
each tail of the distribution of the test statistic. When using a two-tailed test, regardless of the
direction of the relationship you hypothesize, you are testing for the possibility of the

60
relationship in both directions. For example, we may wish to compare the mean of a sample
to a given value x using a t-test. Our null hypothesis is that the mean is equal to x. A two-
tailed test

E-tivity 5.2.2 One-tail verses two-tailed test

Numbering, pacing and sequencing 5.2.2

Title One-tail verses two-tailed test


Purpose The purpose of this e-tivity is to enable you to
distinguish between one-tail and two-tailed test
Brief summary of overall task Read these notes
https://www.nipissingu.ca/sites/default/files/One-tailed-
Test-or-Two-tailed-Test.pdf
and after Watch the video in this link
https://www.youtube.com/watch?v=XHPIEp-3yC0
After watching respond to the following questions

a. Discuss the differences between one-tail test


and two-tail test
b. How is the tests carried out?

61
Spark .

Individual task 1. Read the notes provided and listen to the video
clip
.
Interaction begins

1. Post your answer in the discussion forum 5.2.2


2. Read posted work from 2 colleagues
3. Post at least one comment on their work and
give constructive criticism .
4. Save your answers on your portfolio

E-moderator interventions

1. Summarize the threads


2. Give feedback
3. Provide teaching points
4. Close the E-tivity

Schedule and time This task will take a minimum of 3 hours


Next Chi-square testing

62
5.2.3 Chi-square testing
The chi-square test is used to determine whether there is a significant there exist any
difference between the expected frequencies and the observed frequencies in one or more
categories. Such question can be answered by the test:-does the number of individuals or
objects that fall in each category differ significantly from the number you would expect? Ehat
is the source of the differences observed? Is the difference between the expected and
observed due to sampling error?
Chi-Square enables you to estimate whether a relationship exists, but how do you know how
strongly the variables are related? Chi square tests will allow you to perform hypothesis
testing on nominal and ordinal data.

E-tivity 5.2.3 Chi-square testing

Numbering, pacing 5.2.3


and sequencing
Title Chi-square testing
Purpose The purpose of this e-tivity is to enable the you understand and apply the
concept of Chi-square testing
Brief summary of Read this notes from the link provided
overall task https://www.westga.edu/academics/research/vrc/assets/docs/ChiSquareTes
t_LectureNotes.pdf
After, Listen to the following video
https://www.youtube.com/watch?v=QyRJ0720u98
to further understand the concept of Chi-Square testing:
After reading respond to the following questions
1. Define chi-square testing
2. Construct the chi –square table

63
Spark

Individual task  Read the document provided and listen to the video clip and use it to
fill the answers on section e. above
 Post the discussion on discussion board 5.2.1

Interaction begins  Post your answer in the discussion forum 5.2.1


 Read posted work from 2 colleagues

E-moderator  Summarize the threads and review the scanned forms


interventions  Give feedback
 Provide teaching points
 Close the E-tivity
Schedule and time This task will take two hours
Next Goodness of fit

5.2.4 Goodness-of-fit
Goodness of Fit test compares the Observed Frequencies from the data with the Expected
Frequencies predicted by null hypothesis.

E-tivity 5.2.4 Goodness-of-fit

64
Numbering, pacing 5.2.4
and sequencing

Title Goodness-of-fit

Purpose The purpose of this e-tivity is to enable the you understand and apply the
concept of goodness of fit

Brief summary of Read this notes from the link provided


overall task
https://www.studocu.com/en-gb/document/newcastle-university/quantitati
ve-methods-for-business-management/lecture-notes/mas1403-2016-2017-
lecture-notes-chapter-4-goodness-of-fit-tests/1242556/view

After, Listen to the following video

https://www.youtube.com/watch?v=kUqLtRVtTs4

to further understand the concept of goodness of fit:

After reading respond to the following questions

Define goodness of fit

65
Spark

Individual task  Read the document provided and listen to the video clip and use it
to fill the answers on section e. above

 Post the discussion on discussion board 5.2.4

Interaction begins  Post your answer in the discussion forum 5.2.4

 Read posted work from 2 colleagues

E-moderator  Summarize the threads and review the scanned forms


interventions
 Give feedback

 Provide teaching points

 Close the E-tivity

Schedule and time This task will take two hours

Next Sample and sapling distribution

66
1.3 Assessment Questions

A statement made about a population for testing purpose is called?


a) Statistic
b) Hypothesis
c) Level of Significance
d) Test-Statistic

2. If the assumed hypothesis is tested for rejection considering it to be true is called?


a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis

3. A statement whose validity is tested on the basis of a sample is called?


a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis
4. A hypothesis which defines the population distribution is called?
a) Null Hypothesis
b) Statistical Hypothesis
c) Simple Hypothesis
d) Composite Hypothesis

67
5. If the null hypothesis is false then which of the following is accepted?
a) Null Hypothesis
b) Positive Hypothesis
c) Negative Hypothesis
d) Alternative Hypothesis.

6. The rejection probability of Null Hypothesis when it is true is called as?


a) Level of Confidence
b) Level of Significance
c) Level of Margin
d) Level of Rejection
7. Consider a hypothesis where H0 where ϕ0 = 23 against H1 where ϕ1 < 23. The test is?
a) Right tailed
b) Left tailed
c) Center tailed
d) Cross tailed

13. Type 1 error occurs when?


a) We reject H0 if it is True
b) We reject H0 if it is False
c) We accept H0 if it is True
d) We accept H0 if it is False

14. The probability of Type 1 error is referred as?

68
a) 1-α
b) β
c) α
d) 1-β
15. If a hypothesis is rejected at the 5% level of significance, it
a. will always be rejected at the 1% level
b. will always be accepted at the 1% level
c. will never be tested at the 1% level
d. may be rejected or not rejected at the 1% level

16. Formulate hypothesis statement for the following claim: “The average Maasai cow
produces 6ltrs of milk daily.” A sample of 40 Maasai cows produced an average 8 kg of milk
per day. Assume the population standard deviation is 2.5 ltrs. Using α = 0.05, test your
hypothesis. What is your conclusion?

1.4 References
1. Devore, J. L., & Peck, R. (1986). Statistics: The exploration and analysis of data. St.
Paul: West Pub. Co.
2. Donnelly R. A. (2004). The complete idiot’s guide to statistics (Vol. The complete
idiot's guide). Indianapolis, IN: Alpha.
3. Gomez, K. A., & Gomez, A. A. (1984). Statistical procedures for agricultural research
(2nd ed.). New York: Wiley.
4. Kaps, M., Lamberson, W. R., & Lamberson, W. (2004). Biostatistics for animal
science.Wallingford: CABI Publishing.

69
5. Jaisingh, L. R. (2006). Statistics for the utterly confused (2nd ed.). New York: McGraw-
Hill. McDonald, J.H. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House
Publishing, Baltimore, Maryland (http://www.biostathandbook.com/index.html )
6. Weiss N.A. 1999. Elementary Statistics, fourth edition. Reading, Massachusetts:
Addison-Wesley Publishing Company

70
LESSON 6: SAMPLES AND SAMPLING DISTRIBUTION
1.1 Introduction
Sampling distributions are important for inferential statistics. Ideally the process of research
starts with specifying a population and the sampling distribution of the mean and the range
determined. One usually starts from collection of sample data and from these data estimate
parameters of the sampling distribution. The knowledge of the sampling distribution is very
valuable especially in knowing the degree to which means from different samples differ from
each other and from the population mean. This would give a sense of how close a particular
sample mean is likely to be to the population mean. This information is established from a
sampling distribution. The standard deviation of the sampling distribution of the mean is the
most common measure of how much sample means differ from each other. The standard
deviation is called the standard error of the mean, this implies that if all the sample means
were very close to the population mean, then the standard error of the mean would be small.
On the contrary, if the sample means varied considerably, then the standard error of the mean
would be large.

For example, assume in statistic a sample mean calculated were 125 and the estimated
standard error of the mean were 5. If the data is normally distribution, then the sample mean
would be within 10 units of the population mean since most of a normal distribution is within
two standard deviations of the mean. One needs to keep in mind that all statistics have
sampling distributions and not just the mean.

1.2 Learning outcomes


By the end of the lesson, you will be able to;

71
1.2.1 Describe the concepts of inferential statistics
1.2.2 Discuss the concept of sampling distribution
1.2.3 Construct a confidence interval

6.2.1 Inferential statistics


Inferential statistics is concerned with making predictions or inferring about a population
from observations and analyses of a sample. This means that we can take the results of an
analysis using a sample and then generalize it to the whole population that the sample was
drawn from. This can only be done if the sample is a good representative of the group to
which it is being generalized. The issue of generalization can be addressed by performing
significance tests. Using tests such as Chi-square or T-test, it can reveal the probability that
the results of our analysis on the sample are representative of the population that the sample
represents. In other words, these tests of significance tell us the probability that the results of
the analysis could have occurred by chance when there is no relationship at all between the
variables we studied in the population we studied. After collecting data, inferential statistics,
is used to reach conclusions that extend beyond the immediate data alone.
E-tivity 6.2.1 Inferential statistics

Numbering, pacing and 6.2.1


sequencing
Title Inferential statistics
Purpose The purpose of this e-tivity is to enable you to understand the concept
inferential statistics
Brief summary of Listen to the following video
overall task https://www.youtube.com/watch?v=EPWH91UcpZw

72
Read the following book from the link provided
https://www.acsu.buffalo.edu/~deannaal/Statistics_Textbook.pdf

a) Define what you understand by inferential statistics


b) Define the standard error of the mean

c) Describe the role of sampling distribution in inferential


statistics
Spark

Individual task a) Click on the link provided


b) Download the book
c) Listen to the video
d) Read the book sections
e) Give your answers in discussion forum 6.2.1
Interaction begins 1. Post your answer in the discussion forum 6.2.1
2. Read posted work from 2 colleagues
3. Post at least one comment on their work and give constructive
criticism
E-moderator 1. Summarize the threads
interventions 2. Give feedback
3. Provide teaching points
4. Close the E-tivity
Schedule and time This task will take two hours
Next Sampling distribution

73
6.2.2 Sampling distribution
The sampling distribution is a branch of a sample statistic. Like the population distribution it
is a model of a distribution of scores, except that the scores are not raw scores, but statistics.
It represents a thought experiment of what would happen if a person repeatedly took samples
of size N from the population distribution and computed a particular statistic each time. The
resulting distribution of statistics is called the sampling distribution of that statistic. For
example, suppose that a sample of size sixteen (N=16) is taken from some population. The
mean of the sixteen numbers is computed. Next a new sample of sixteen is taken, and the
mean is again computed. If this process were repeated an infinite number of times, the
distribution of the now infinite number of sample means would be called the sampling
distribution of the mean. Every statistic has a sampling distribution. For example, suppose
that instead of the mean, medians were computed for each sample. The infinite number of
medians would be called the sampling distribution of the median.

E-tivity 6.2.2 Sampling distribution


Numbering, pacing and sequencing 6.2.2

Title Sampling distribution


Purpose The purpose of this e-tivity is to enable you to
understand sampling distribution
Brief summary of overall task Read the following powerpoint
http://www.personal.kent.edu/~mshanker/personal/
Classes/f06/ch06_F06.pdf

Listen to the following videos


https://www.youtube.com/watch?v=EOlNb1XXC_M
https://www.youtube.com/watch?v=IiV6blF1crE

74
a. Define what you understand by a sampling
distribution
b. What do you understand by central limit theorem
c. What is a sample mean

Spark

Individual task

1. Click on the link provided


2. Listen to the video and read the powerpoint slide
3. Provide a summary of the two links
4. Give your answers and summarized notes in
discussion forum 6.2.2

Interaction begins

a. Post your answer in the discussion forum 6.2.2


b. Read posted work from 2 colleagues

75
c. Post at least one comment on their work and give
constructive criticism

E-moderator interventions

a. Summarize the threads


b. Give feedback
c. Provide teaching points
d. Close the E-tivity

Schedule and time This task will take two hours


Next Construct confidence interval

6.2.3 Construct confidence interval


A confidence interval (CI) is a type of interval estimate of a population parameter which
measures the reliability of an estimate. It is usually an observed interval calculated from the
observations of different samples that frequently includes the parameter of interest if the
experiment is repeated. How frequently the observed interval contains the parameter is
determined by the confidence level or confidence coefficient. Confidence level is commonly
used term when confidence intervals are constructed across many separate data analyses of
repeated but possibly different experiments. The proportion of such intervals that contain the
true value of the parameter will match the confidence level; this is guaranteed by the
reasoning underlying the construction of confidence intervals. Whereas two-sided confidence
limits form a confidence interval, their one-sided counterparts are referred to as lower or
upper limits of confidence interval.

76
6.2.1 E-tivity Constructing confidence intervals
Numbering, pacing and 6.2.3
sequencing
Title Constructing confidence intervals
Purpose The purpose of this e-tivity is to enable you
construct confidence intervals
Brief summary of overall task Listen to the following video
https://www.youtube.com/watch?v=DT-fPG0Hff8
https://www.youtube.com/watch?v=UetYS3PaHIo
Listen to the following video
https://www.youtube.com/watch?v=MUD390jtgQs
after answer the following questions
a. Why and when do you use student t distribution
of Z- test?
b. Using the same example construct 99%
confidence interval for the population mean

Spark

Individual task 1. Click on the link provided


2. Listen to the video
3. Do the work on your exercise book

77
4. Scan the work and share in the discussion forum
6.2.3
Interaction begins 1. Post your work in the discussion forum 6.2.3
2. Review your colleagues work and provide
feedback
3. Post at least one comment on their work and give
constructive criticism
E-moderator interventions a. Summarize the threads and review the sketches
b. Give feedback
c. Provide teaching points
d. Close the E-tivity
Schedule and time This task will take a minimum of 2 hours
Next Lesson 7: experimental design

1.3 Assessment Questions


1. The level of significance of a statistical test indicates

a. How significant the difference between means is

b. The chance we are wrong in rejecting the null hypothesis

c. The chance we are right in accepting the null hypothesis

d. Whether to accept or reject the null hypothesis

78
2. Which of the following statements regarding a researcher’s use of inferential statistics is
true?

a. Descriptive statistics from a sample are used to estimate the characteristics of


the population.

b. It is best to measure every member of a population if possible.

c. A random sample provides a perfect estimate of the population values.

d. We usually need to take several samples to obtain a good estimate of the


population values.

3. If you drew all possible samples from some population, calculated the mean for each of
the samples, and constructed a line graph (showing the shape of the distribution) based on
all of those means, what would you have?

a. A population distribution

b. A sample distribution

79
c. A sampling distribution

d. A parameter distribution

4. What does it mean when you calculate a 95% confidence interval?

a. The process you used will capture the true parameter 95% of the time in the
long run

b. You can be “95% confident” that your interval will include the population
parameter

c. You can be “5% confident” that your interval will not include the population
parameter

d. All of the above statements are true

1.4 References
1. Devore, J. L., & Peck, R. (1986). Statistics: The exploration and analysis of data. St.
Paul: West Pub. Co.

80
2. Donnelly R. A. (2004). The complete idiot’s guide to statistics (Vol. The complete
idiot's guide). Indianapolis, IN: Alpha.
3. Gomez, K. A., & Gomez, A. A. (1984). Statistical procedures for agricultural research
(2nd ed.). New York: Wiley.
4. Kaps, M., Lamberson, W. R., & Lamberson, W. (2004). Biostatistics for animal
science.Wallingford: CABI Publishing.
5. Jaisingh, L. R. (2006). Statistics for the utterly confused (2nd ed.). New York: McGraw-
Hill. McDonald, J.H. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House
Publishing, Baltimore, Maryland (http://www.biostathandbook.com/index.html )
6. Weiss N.A. 1999. Elementary Statistics, fourth edition. Reading, Massachusetts:
Addison-Wesley Publishing Company

81
LESSON 7: EXPERIMENTAL DESIGN
7.1 Introduction
In this lecture you will learn some basics on experimental design and analysis of
experimental data. The experimental design, also called design of experiments, is actually a
structured and organized way of conducting and analyzing controlled tests so as to evaluate
the factors that are affecting a response variable. The design of experiments specifies the
particular combinations settings of factors at which the individual runs in the experiment are
to be conducted. Data obtained from observational studies or other data not collected in
accordance with a design of experiments approach can only establish correlation, not
causality. There are also problems with the traditional experimental method of changing one
factor at a time.

7.2 Learning Outcomes


By the end of this lesson, you will be able to;
7.2.1 Describe basic principles of experimental design
7.2.2 Implementing experimental design

7.2.1 Principles of experimental design


The fundamental principles in design of experiments are solutions to the problems in
experimentation posed by the two types of nuisance factors and serve to improve the
efficiency of experiments. Those fundamental principles are:

 Randomization
 Replication
 Local control
 Blocking
 Factorial experiments

82
E-tivity 7.2.1 Principles of experimental design

Numbering, pacing 7.2.1


and sequencing
Purpose The purpose of this e-tivity is to enable you understand the principles
of experimental design.
Brief summary of Read this linked article
overall task https://lar.msstate.edu/pdf/Basics%20of%20Experimental
%20Design.pdf
https://www.biostat.wisc.edu/~kbroman/teaching/misc/
humanesci_bw.pdf
https://cemast.illinoisstate.edu/downloads/hsrs/types_of_research.pdf

watch this video


https://www.youtube.com/watch?v=k3lUo0XYG3E

a) Why do you need to do experiments?


b) What is experimental design?
c) Why do we design experiments?
d) In not more than 150 words each, briefly explain each of the
principles experimental design

83
Spark

Individual task

a) In a paragraph of not more than 150 words each describe the


basic principle of experimental design

Interaction begins a) Post your work to the discussion forum 7.2.1


b) Provide positive and constructive feedback on the class members
‘views and ideas. Do this on the discussion forum 7.2.1
E-moderator Your role as a moderator is to ensure:
interventions a) Ensure that learners are focused on the contents and context
of discussion.
b) Stimulate further learning and generation of new ideas
c) Provide feedback on the progress the learners are making.
d) Round up the e-tivity
Schedule and time This activity should take two hours
Next Implementation of experimental design

7.2.2 Implementing experimental design

Unlike quasi-experimental design, experimental design requires active intervention in the


operation of the programme, which poses high demands in terms of project management to

84
ensure that the experimental design is not violated in the field. It is these issues that are the
focus of the topic.

E-tivity 7.2.2 Implementing experimental design

Numbering, pacing 7.2.2


and sequencing
Purpose The purpose of this e-tivity is to enable you implement your
experiment
Brief summary of
overall task
Read this article
https://www.aes.asn.au/images/stories/files/conferences/1999/De
%20Boer%20Marc%20LS.pdf
Watch this video in the link below
https://www.youtube.com/watch?v=8jZfgeV2rQA
after answer the following questions

a. Stepwise describe the process of carrying out experiment

85
Spark

Individual task

By use As an agricultural researcher, you intent to assess the effect of


applying a new insecticide on the population of Aphids. Using the
skills learned in this topic, provide a step by step procedure on how
you would set up an experiment to study the phenomenon of interest.

Interaction begins Post your response here in forum 7.2.2


Provide positive and constructive feedback on the class members
‘views and ideas. Do this on the discussion forum 7.2.2
E-moderator a) Ensure that learners are focused on the contents and context of
interventions discussion.
b) Stimulate further learning and generation of new ideas
c) Provide feedback on the progress the learners are making.
d) Round up the e-tivity
Schedule and time This activity should take two hours
Next Lesson 8: ANOVA and ANCOVA

86
7.3 Assessment Questions
1. Which of the following would improve the reliability of an experiment?

a. Increase the sample size

b. Replicate the experiment

c. Use controlled variables

d. All of the above

2. You are interested in the effect of increased carbon dioxide versus normal air on the
growth of corn plants as well as the effect of green light versus full sunlight on the growth
of corn plants. Your plan is to set up your experiment inside a greenhouse where you can
control the environment. Which of the following is an aspect of the experiment that
should be considered and controlled?

a. An increase in carbon dioxide should not result in a substantial decrease of other


necessary gases.

b. All seedlings come from one uniform strain.

c. The intensity or brightness of the green light equals the intensity of the full sunlight.

d. All temperatures and available water remain the same for all plants.

e. All of the choices are important consideration

3. What is the purpose of experimental control?

7.4 References

87
Devore, J. L., & Peck, R. (1986). Statistics: The exploration and analysis of data. St. Paul:
West Pub. Co.
Donnelly R. A. (2004). The complete idiot’s guide to statistics (Vol. The complete idiot's
guide).Indianapolis, IN: Alpha.

Gomez, K. A., & Gomez, A. A. (1984). Statistical procedures for agricultural research (2nd
ed.). New York: Wiley.

Kaps, M., Lamberson, W. R., & Lamberson, W. (2004). Biostatistics for animal science.
Wallingford: CABI Publishing.

Jaisingh, L. R. (2006). Statistics for the utterly confused (2nd ed.). New York: McGraw-Hill.
McDonald, J.H. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House
Publishing, Baltimore, Maryland (http://www.biostathandbook.com/index.html )

Weiss N.A. 1999. Elementary Statistics, fourth edition. Reading, Massachusetts: Addison-
Wesley Publishing Company

88
LESSON 8: ANALYSIS OF VARIANCE AND COVARIANCE

8.1 Introduction
Analysis of Variance (ANOVA) is a statistical method that is used to test differences
between two or more means. Literally this could be called "Analysis of Means" rather
than "Analysis of Variance." But the name is correct because inferences about means
are made by considering variance.

ANOVA is used to test general differences among means. ANOVA can be


extended to includeone or more continuous variables that predict the
outcome (or dependent variable).
Continuous variables such as these, that are not part of the main experimental
manipulation but have an influence on the dependent variable, are known as
covariates and they can be included in an ANOVA analysis
8.2 Lesson Learning Outcomes
By the end of the lesson, you will be able to;
8.2.1 Explain the theory of ANOVA
8.2.2. Conduct analysis of covariance and interpret the results

8.2.1 Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) is a statistical method used to test differences between


two or more means. When applying one way analysis of variance there are three key
assumptions that should be satisfied.

89
i. The observations are obtained independently and randomly from the populations defined
by the factor levels.
ii. The population at each factor level is (approximately) normally distributed.
iii. These normal populations have a common variance, σ2

E-tivity 8.2.1 Analysis of Variance

Numbering, pacing 8.2.1 Analysis of ANOVA


and sequencing
Purpose The purpose of this e-tivity is to enable you understand Analysis of
Variance ANOVA
Brief summary of
overall task Read this linked article
http://oak.ucc.nau.edu/rh232/courses/EPS525/Handouts/Understanding
%20the%20One-way%20ANOVA.pdf
After reading answer the following questions
1 Explain the reason for not using t test
2 Explain the variables used in ANOVA
3 Describe how to formulate hypothesis in one way ANOVA
4 Discuss the assumption of one way ANOVA

Spark

90
Individual task Post your responses in forum 8.2.1

Interaction begins a) Post your ANOVA.


b) Do this on the discussion forum 8.2.1
E-moderator a) Ensure that learners are focused on the contents and context of
interventions discussion.
b) Stimulate further learning and generation of new ideas
c) Provide feedback on the progress the learners are making.
d) Round up the e-tivity
Schedule and time This task should take two hours
Next

The conducting ANOVA and interpreting results

8.2.2 Analysis of Variance ANOVA


Analysis of Variance (ANOVA) is a statistical method used to test differences between two
or more means. It may seem odd that the technique is called "Analysis of Variance" rather
than "Analysis of Means." As you will see, the name is appropriate because inferences about
means are made by analyzing variance. ANOVA is used to test general rather than specific
differences among means.

91
E-tivities 8.2.2 Analysis of Variance ANOVA

Numbering, pacing and 8.2.2


sequencing
Purpose The purpose of this e-tivity is to enable you be able to carry out
ANOVA
Brief summary of Read this document
overall task https://sites.calvin.edu/scofield/courses/m143/materials/handouts/
anova1And2.pdf
Watch this video
https://www.youtube.com/watch?v=q48uKU_KWas
https://www.youtube.com/watch?v=nmHFFFpOVZs
Spark

Individual task a. Using the data from the documents and video try conducting
ANOVA in Excel on your own
b. Interpret the results
Interaction begins a) Post your responses in forum 8.2.2

E-moderator a) Ensure that learners are focused on the contents and context of

92
interventions discussion.
b) Stimulate further learning and generation of new ideas
c) Provide feedback on the progress the learners are making.
d) Round up the e-tivity
Schedule and time This task should take two hours
Next ANCOVA

8.2.3 Analysis of Covariance ANCOVA


Sometime a variable that has not interested a researcher may seem to have effect on the
outcome of an experiment. For example a researcher is interested in the effect of feeding
three different types of forage to cattle. The response or dependent variable is the weight
gain. The treatment variable is type of forage. The researcher is not interested in the initial
body weight of the individual animal used in the experiment. But it is however noticed that
the initial body weight is has effect on the weight gain. You then use the analysis of
covariance ANCOVA to check if the regression of weight gain is the same for each of the
type of forage. Ancova will tell you whether the regression lines are different from each other
in either slope or intercept.
E-tivities 8.2.3 ANCOVA

Numbering, pacing and 8.2.3


sequencing
Purpose The purpose of this e-tivity is to enable you be able to understand
ANCOVA
Brief summary of Read this book
overall task https://online.stat.psu.edu/stat502/book/export/html/813
after watch this slide player
https://slideplayer.com/slide/9827815/

93
Then answer the following question
a) What is ANCOVA
b) Highlight the uses of ANCOVA
c) Describe the steps of ANCOVA
Spark

Individual task c. Using the data from the video try conducting ANOVA in Excel
on your own
d. Interpret the results
Interaction begins Post your responses in forum 8.2.3
E-moderator a) Ensure that learners are focused on the contents and context of
interventions discussion.
b) Stimulate further learning and generation of new ideas
c) Provide feedback on the progress the learners are making.
d) Round up the e-tivity
Schedule and time This task should take two hours

8.3 Assessment Questions

94
1. The ANOVA procedure is a statistical approach for determining whether or not
a) The means of two samples are equal
b) The means of two or more samples are equal
c) The means of more than two samples are equal
d) The means of two or more populations are equal
2. The null hypothesis for an ANOVA states that __________.
a) There are no differences between any of the population means
b) At least one of the population means is different from the others
c) All of the population means are different from each other
d) None of the other 3 choices is correct.
3. In an ANOVA, which of the following is most likely to produce a large value for the F-
ratio?
a) Large mean differences and small sample variances
b) Large mean differences and large sample variances
c) Small mean differences and small sample variances
d) Small mean differences and large sample variances
4. An analysis of variance is used to evaluate the mean differences for a research study
comparing four treatments with a separate sample of n = 5 in each treatment. If the data
produce an F-ratio of F = 3.15, then which of the following is the correct statistical
decision?
a) Reject the null hypothesis with = .05 but not with = .01.
b) Reject the null hypothesis with either = .05 or = .01.
c) Fail to reject the null hypothesis with either = .05 or = .01.
d) There is not enough information to make a statistical decision.
5. An analysis of variances produces df between treatments = 2 and df within treatments = 24. For this
analysis, what is df total?
a) 26
b) 27

95
c) 28
d) Cannot be determined without additional information
6. An undergraduate student in the school of agriculture and enterprise development studied
the effect of fertilizer rate in plant height. Originally 35 experimental units were selected
for their uniformity and assigned randomly to five fertilizer rates (0, 10, 20, 30, 40 kg/ha);
seven units per fertilizer rate. A problem in the field resulted in the loss of seven
measurements. The following are the results for the remaining cases.

Fertilizer rate
0 10 20 30 40
24 31 30 26 30
18 27 28 21 32
25 29 27 23 29
23 25 25 25 25
22 30 20 31
26 24 29
20

i.) Calculate ANOVA for the data


ii.) Complete the table for the ANOVA

8.4 References

Agarwal, B. L. (2009). Basic Mathematics Fifth Edition. Delhi: New Age International (P)
Limited Publishers.
Gupta, S., & Kapoor, V. K. (1980). Fundamentals of Mathematics Statistics 7th Edition.
Delhi: Sultan Chand & Sons.

96
Beierlein, J., Schneeberger, K., & Osbum, D. (2008). Principles of Agribusiness
Management.Third Edition. Waveland Press Inc.
Sunderson, T., & Scolve, S. (1978). An Introduction to the Statistics Ananlysis of data.
Boston: Houghton Mifflin.

97
LESSON 9: CORRELATION AND REGRESSION ANALYSIS

1.5 Introduction
Welcome to the ninth lesson. In reality so many questions could be running your mind as an

agriculture expert that need to be answered. Some of them could be: - Is the amount of milk

produce by a cow related to the weight of her calf at weaning? Is the level of feeding of

broiler chicken related to the weight of the broiler chicken at 8 weeks when it should be

slaughter for marketing? At the end of this unit, you will be able to quantify your answer to

questions of this type based on the data you might have gathered.

Correlation and regression are other areas of inferential statistics which involve determining

whether a relationship between two or more numerical or quantitative variables exists. This is

when two characteristics are studied simultaneously on each member of a population in order

to examine whether they are related. For instance, a researcher may be interested in finding

out the relationship between weight and age of broiler chickens or Lactation length of cows

and weaning weight of the calves

Therefore, correlation and regression analyses are used to measure association between

two variables of a bivariate data.

1.6 Learning Outcomes


By the end of the lesson, you will be able to;
9.2.1 Explain how you can express the relationship between variables statistically
by looking at two measures: covariance and correlation coefficient.

98
9.2.2 Explain how you can express the relationship between variables statistically
by looking at two measures: covariance and correlation coefficient.

1.6.1 Correlation
You often wonder what is the relationship between the height of an egg and its weight. Now
go to take 10 eggs. Measure the height of each egg and its weight. Does there appear to be
connection between the height and weight of the eggs?
Correlation is a statistical measure that indicates the extent to which two or more variables
drawn from the same population fluctuate together. Correlation coefficient calculated fron a
sample data measures the strength and direction of a linear relationship between two
variables A positive correlation indicates the extent to which those variables increase or
decrease; a negative correlation indicates the extent to which one variable increases as the
other decreases. The symbol of correlation coefficient calculated from a sample data is r
while the symbol for population correlation coefficient is ρ (rho). The relationship between
two variables is not perfect.
This lesson looks first at how we can express the relationships between variables statistically
by looking at two measures: covariance and the correlation coefficient.
E-tivity 9.2.1 Main legal forms of business
Numbering, pacing 9.2.1
and sequencing
Title Correlation
Purpose The purpose of this e-tivity is to enable you to appreciate the various legal
forms of business
Brief summary of Read and watch the materials and video from this link
overall task https://www.simplypsychology.org/correlation.html

99
http://educ.jmu.edu/~drakepp/FIN360/readings/Regression_notes.pdf

After respond to the following questions

a. Define correlation
b. Give four uses of correlation
c. Differentiate between correlation and causation
d. Discuss the strengths and weaknesses of correlation

Spark

Individual task 1. Click on the link provided


2. Watch the video and read the materials
3. Give your answers in discussion forum 9.2.1
Interaction begins 1 Post your answer in the discussion forum 9.2.1
2 Read posted work from 2 colleagues
3 Post at least one comment on their work and give constructive
criticism
E-moderator 1. Summarize the threads
interventions 2. Give feedback
3. Provide teaching points

100
4. Close the E-tivity
Schedule and time This task will take two hours
Next Regression

1.6.2 Regression
In the previous section we looked at correlation i.e how to measure relationships between
two variables. Though correlation are very useful but we can not predict one variable from
another. Regression analysis can then be used to predict how much one variable can
influence the other variable. A simple example might be to try to predict levels of maize
output from the amount of fertilizer applied. You’d expect this to be a positive relationship
(the higher the amount of fertilizer, the higher the output). We could then extend this basic
relationship to answer a question such as ‘if you applied 25Kg/ha, how much maize output
would the farmer harvest? The essence of regression is therefore to fit a model to our data
and use it to predict values of the dependent variable (DV) from one or more independent
variables (IVs). Regression is a way of predicting an outcome variable from one variable
(Predictor variable) hence called simple regression or from several variables (predictor
variables) hence called multiple regressions. This tool allows us to go a step beyond the data
that we collected.

E-tivity 9.2.2 Regression

Numbering, pacing 9.2.2


and sequencing
Title Regression
Purpose The purpose of this e-tivity is to enable you understand and explain
complex measures of relationships as a precursor to conducting multiple
regression

101
Brief summary of Read the following documents on this link -
overall task https://corporatefinanceinstitute.com/resources/knowledge/finance/
regression-analysis/
http://pba.ucdavis.edu/files/45007.pdf
After reading, watch the following video
https://www.youtube.com/watch?v=TU2t1HDwVuA
and answer the following questions;
Question 1.

a. Define regression
b. Distinguish the different types of regression.
c. In one paragraph of not more than 150 words, explain the
disadvantages of incorporation.

Question 2.
Go through this link https://www.youtube.com/watch?v=owI7zxCqNY0
and answer the following questions.

1. In a brief paragraph, explain objectives of regression


2. Explain how to interpret the regression results

102
Spark

Individual task

1. Click on the link provided


2. Watch the videos and read the materials
3. Give your answers in discussion forum 9.2.2
4. scanned assignment and post in the discussion forum 9.2.2.

Interaction begins 1. Post your answer in the discussion forum 9.2.2


2. Read posted work from 2 colleagues
3. Post at least one comment on their work and give constructive
criticism
E-moderator
interventions
1. Summarize the threads
2. Give feedback
3. Provide teaching points
4. Close the E-tivity

Schedule and time This task will take a minimum of two hours
Next Lesson 10: Data presentation

103
1.7 Assessment Questions
In the following multiple-choice questions, select the best answer.

MULTIPLE CHOICE QUESTIONS

In the following multiple-choice questions, select the best answer.

1 The correlation coefficient is used to determine:


a. A specific value of the x-variable given a specific value of the y-variable
b. A specific value of the y-variable given a specific value of the x-variable
c. The strength of the relationship between the x and y variables
d. None of these

2 If there is a very strong correlation between two variables then the correlation
coefficient must be
a. much larger than 0, regardless of whether the correlation is negative or
positive
b. any value larger than 1
c. much smaller than 0, if the correlation is negative
d. None of these alternatives is correct.

3 In regression, the equation that describes how the response variable


(y) is related to the explanatory variable (x) is:
a. the regression model
b. the correlation model
c. used to compute the correlation coefficient
d. None of these alternatives is correct.

4 The relationship between number of beers consumed (x) and blood alcohol
content (y) was studied in 16 male college students by using least squares
regression. The following regression equation was obtained from this study:
= -0.0127 + 0.0180x
The above equation implies that:
a. each beer consumed increases blood alcohol by an average of amount of 1.8%
b. on average it takes 1.8 beers to increase blood alcohol content by 1%
c. each beer consumed increases blood alcohol by 1.27%

104
d. each beer consumed increases blood alcohol by exactly 0.018

5 SSE can never be


a. smaller than SST
b. larger than SST
c. equal to zero
d. equal to 1

6 Regression modeling is a statistical framework for developing a mathematical


equation that describes how
a. one response and one or more explanatory variables are related
b. one explanatory and one or more response variables are related
c. several explanatory and several response variables response are related
d. All of these are correct.

7 In regression analysis, the variable that is being predicted is the


a. intervening variable
b. independent variable
c. response, or dependent, variable
d. is usually x

8 Regression analysis was applied to return rates of sparrowhawk colonies.


Regression analysis was used to study the relationship between return rate (x: %
of birds that return to the colony in a given year) and immigration rate (y: % of
new adults that join the colony per year). The following regression equation was
obtained.
= 31.9 – 0.34x
Based on the above estimated regression equation, if the return rate were to
decrease by 10% the rate of immigration to the colony would:
a. increase by 34%
b. decrease by 3.4%
c. decrease by 0.34%
d. increase by 3.4%

9 In least squares regression, which of the following is not a required assumption


about the error term ε?

105
a. The expected value of the error term is one.
b. The variance of the error term is the same for all values of x.
c. The values of the error term are independent.
d. The error term is normally distributed.

10 Larger values of r2 (R2) imply that the observations are more closely grouped
about the
a. least squares line
b. average value of the independent variables
c. origin
d. average value of the dependent variable

1.8 REFERENCES
Devore, J. L., & Peck, R. (1986). Statistics: The exploration and analysis of data. St. Paul:
West Pub. Co.
Donnelly R. A. (2004). The complete idiot’s guide to statistics (Vol. The complete idiot's
guide).
Indianapolis, IN: Alpha.
Gomez, K. A., & Gomez, A. A. (1984). Statistical procedures for agricultural research (2nd
ed.).
New York: Wiley.
Kaps, M., Lamberson, W. R., & Lamberson, W. (2004). Biostatistics for animal
science.
Wallingford: CABI Publishing.
Jaisingh, L. R. (2006). Statistics for the utterly confused (2nd ed.). New York: McGraw-
Hill. McDonald, J.H. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House
Publishing, Baltimore, Maryland (http://www.biostathandbook.com/index.html )
Weiss N.A. 1999. Elementary Statistics, fourth edition. Reading, Massachusetts: Addison-
Wesley Publishing Company
http://www1.appstate.edu/~mcraelt/simpreg1.pdf

106
LESSON 10: DATA PRESENTATION
1.1. Introduction
We are in our final lesson of statistics for agriculture. In this lesson, we will learn how to
present your data. Once you have conducted your research, it is paramount that you present
your data in a way that you communicate effectively to your audience. According to Tufte
(2001) a good method of presenting data should:
i. Show the data clearly.
ii. Induce the reader to think about the data being presented (rather than some other
aspect of the method of presentation such as the colour of a graph.
iii. Avoid distorting the data.
iv. Present many numbers with minimum ink.
v. Make large data sets (assuming you have one) coherent.
vi. Encourage the reader to compare different pieces of data.
vii. Reveal data.
Data can be presented as text, in tables, or pictorially as graphs and charts..

1.2. Learning Outcomes


By the end of the lesson, you will be able to;
10.2.1 Explain the presentation of data in tables

10.2.2 Explain the presentation of data in charts and graphs

1.2.1. Presenting Data in Tables


When presenting data using Tables then they should be self-explanatory, should have title
clearly indicating what the table shows, the columns and rows should be clear. Tables

107
should include only essential data and should try to use relatively few significant digits.
Too many decimal points should be avoided because they make data less clear. One
should however consider the orientation of the table so as to make the table visible
enough.

E-tivity 10.2.1 – Presenting data in tables


Numbering, pacing 10.2.1
and sequencing
Title Presenting data in tables
Purpose The purpose of this e-tivity is to enable you to understand how and when
to use tables in data presentation
Brief summary of Review the following video
overall task https://www.youtube.com/watch?v=Xr0BgvtXWwA
Read the notes on
https://byjus.com/commerce/tabular-presentation-of-data/

i. Describe the main parts of a table.


ii. In a short paragraph, explain the objective of tabulation
iii. Using 5 bullet points, give the five limitation of tabulation

Spark

Individual task 1. Click on the links provided


2. Download the video and the notes
3. Listen to the video

108
4. Review the material
5. Provide summaries from the video clip
6. Give your answers and summarized notes in discussion forum
10.2.1
Interaction begins 1. Post your answer in the discussion forum 10.2.1
2. Read posted work from 2 colleagues
3. Post at least one comment on their work and give constructive
criticism
E-moderator 1. Summarize the threads
interventions 2. Give feedback
3. Provide teaching points
4. Close the E-tivity
Schedule and time This task will take two hours
Next Presenting data in charts and graphs

1.2.2. Presenting data in charts and graphs


Data can be presented using graphs too. There are different types of graphs that can be used
depending on the type of data being presented. Bar charts give a clear display of simple
results. They are used when the horizontal axis is composed of categories (e.g. male / female;
those attending study support and those that don’t; ethnic groups, individual pupils etc.).
Impact studies very often compare categories, which is why bar charts are most often seen. A
stacked bar chart can be used if when emphasizing on the totality while still showing the
separate sources. If the bars are not separated by spaces, the chart is referred to as a
histogram, rather than a bar chart. Line graphs are appropriate when emphasizing on trends,
the horizontal axis is continuous rather than categories. In impact studies, they could be used
to show progress over. Pie charts are on the other hand visual tool that show proportions for
instance percentages of different gender responses when conducting a survey.

109
E-tivity 10.2.3 presenting data in charts and graphs

Numbering, pacing 10.2.3


and sequencing
Title Presenting data in charts and graphs
Purpose The purpose of this e-tivity is to enable you to understand how and when
to use charts and graphs in data presentation
Brief summary of Review the following videos
overall task https://youtu.be/HvgwXn7EEz4
https://youtu.be/aUk4npRmjL8
to understand when and how to use charts and graphs
read the following material
https://byjus.com/maths/graphical-representation/
a) Identify the different types of graphical representation
b) Outline the general rules of graphical representation
Spark

Individual task

1. Click on the links provided


2. Download the videos and notes
3. Review them
4. Outline the general rules of graphical representation
5. Give your answers and summarized notes in discussion forum

110
10.2.2

Interaction begins

1. Post your answer in the discussion forum 10.2.2


2. Read posted work from 2 colleagues
3. Post at least one comment on their work and give constructive
criticism

E-moderator
interventions
1. Summarize the threads
2. Give feedback
3. Provide teaching points
4. Close the E-tivity

Schedule and time This task will take two hours

1.3. Assessment Questions


1. ___________ explains the specific feature of the table which is not self-explanatory

a. Footnote

b. Source note

111
c. Body of table

d. Caption

2. At the top of each column in a table a column designation is provided to explain figures
of the column which is known as ___________.

a. Stub

b. Caption

c. Head note

d. Title

3. ___________ part of table gives information about unit used in table to represent data.

a. Stub

b. Caption

c. Head note

d. Title

4. From the Data provided in Appendix 1, compute the mean of all the variables
disaggregated by the gender of the household. Present your results using the most
appropriate method among those provided in the lesson

APPENDIX 1: DATA FOR EXERCISES


Land allocated Asset
Household to crop value Fertilizer Value of crop
FarmID size (AEQs) Gender production (ha) (Kshs.) cost production (Ksh)
1 1.78 Male 0.56 63860 3780 35738
2 4.16 Female 0.66 344550 15198 691503
3 3.88 Male 1.07 524980 12800 126132
4 5.3 Male 1.07 263800 11980 147020

112
5 4 Male 0.77 259500 4610 69500
6 2 Female 1.43 171000 0 17948
7 5.16 Female 0.97 67800 890 30055
8 2.89 Male 0.61 55200 0 67779
9 6.33 Male 0.97 747400 300 19704
10 2.78 Female 0.9 1459300 11035 215265
11 3.89 Male 1.28 475950 21950 172060
12 5.02 Male 0.92 277800 13200 112016
13 4 Female 2.65 737500 34900 54560
14 2.78 Male 0.82 111600 6250 81138
15 4.78 Male 1.33 1298400 18540 328156
16 2.89 Male 0.92 893450 10760 69000
17 2.6 Female 0.87 354320 4940 111682
18 1 Female 0.61 137915 150 45085
19 4.65 Male 0.36 107650 975 23702
20 1.78 Female 0.46 293550 450 12052
21 4.06 Male 1.33 218500 18175 244373
22 7.23 Female 1.33 2410000 15900 145600
23 4.34 Female 1.48 1630900 12175 348680
24 6.3 Male 1.25 893400 19775 238435
25 7.1 Male 1.43 170100 32800 143603
26 4 Female 1.12 264300 7000 30000
27 4.96 Female 0.31 135530 1050 19592
28 5.8 Female 1.7 950200 4703 84365
29 3.77 Female 0.51 732900 6250 106735
30 7.18 Female 1.17 200800 6450 111518

1.4. References
1. Anderson, D.R, Sweeny D.J., Williams T.A.1999. Statistics for Business and
Economics. West Publishing, Saint Paul.

2. Field, Andy,. 2013. Discovering statistics using SPSS. London: SAGE.

113
3. Maxwell S.E and Delaney, H.D.1990.Designing Experiments and Analyzing
Data. Belmont, CA: Wadsworth.

4. Nassiuma, D.K,. 2000. Survey Sampling: Theory and Methods.


University of Nairobi

Press. Nairobi

5. Rawlings, J.O, Pantula, S.G, and Dickey, D.A,.1998. Applied Regression


Analysis: A Research Tool, Second Edition. New York: Springer-Verlag Inc.

6. Rees. D.G,. 2001. Essential Statistics, 4th Edition, Chapman and Hall/CRC

7. Telford, K.J,.2007.A Brief Introduction to Design of Experiments. Johns


Hopkins APL Technical Digest, 27: 3

8. http://stattrek.com/probability-distributions/t-distribution.aspx

9. http://www.canterbury.ac.uk/education/quality-in-study-support/docs/
5%20-%20Statistics%20and%20presentation.pdf

10. http://www.stat.yale.edu/Courses/1997-98/101/confint.htm

Production X Number of days in the fx


in litres month (Frequency)
21 - 25 23 -20.5 420.25 1 1 23
26 - 30 28 -15.5 240.25 2 3 56
31 - 35 33 -10.5 110.25 3 6 99
36 - 40 38 -5.5 30.25 4 10 152
41 - 45 43 -0.5 0.25 6 16 258
46 - 50 48 4.5 20.25 7 23 336

114
51 - 55 53 9.5 90.25 5 28 265
56 - 60 58 14.5 210.25 2 30 116
1,122 Ef = 30 1,305
Required; (i) Compute the mean milk production in the month (7 Marks)
43.5
(ii) Compute the median milk production in the month (5 Marks)

(iii) What is the mode milk production (3 Marks)

(iv) Variance in milk production (6 Marks)

37.4

(v) Standard deviation in milk production. (6 Marks)

6.1

(vi) Range in milk production for the month (3 Marks).

115

You might also like