Professional Documents
Culture Documents
WRITTEN BY:
VETTED BY:
1
INTRODUCTION
Welcome to this module. Statistics is not a new term and I believe in your day to day life you
interact in one way or another with it because it is almost applicable in all human endeavors.
Statistics is a familiar and accepted part of modern world that is concern with obtaining an
insight into the real word by means of the analysis of numerical relationships.
Statistics for Agriculture is a first semester course compulsory for all students in the school of
agriculture. It entails analysis of numerical relationships where we will emphasize on the
meaning of statistics, collections of quantitative information, method of handling such data
and drawing inferences on the basis of observation. We will also discuss Measures of central
tendency and dispersion. Probabilities: permutations, combinations, frequency of distribution,
independent events and conditional probability distributions. Random variables; Samples and
sampling distribution. Sampling theory estimation, population means, variance; Confidence
intervals for samples; Analysis of variance and covariance; experimental designs; correlation
and linear regressions; forecasting and presenting of qualitative and quantitative data.
This is an interactive instructional module that uses both action and collaborative learning styles
that provide you with diverse online learning experiences and effective learning processes. You
have the course units divided into lessons. The course guide will tell you briefly what the course
is all about. It is a general overview of the course materials you will be using and how to use
those materials. It also helps you to allocate the appropriate time to each unit so that you can
successfully complete the course within the stipulated time limit. This module will enable you
know more on data collection, management and analyses, the knowledge will be helpful during
your project in data collection and analysis. It is indeed very fascinating field of agriculture
2
STATISTICS FOR AGRICULTURE FLOW CHART
LESSONS TOPIC
3
The aim of the present lesson is to enable the students to understand the meaning, definition,
nature, importance and limitations of statistics
4
LESSON 5: Statistical inferences and hypothesis testing
This topic will address statistical inferences, an important concept for inferential
statistics. The concept of inferential statistic will also be introduced, and is
concerned with making predictions on inferences about a population from
observations and analyses of a sample. Inferential statistics is concerned with
making predictions or inferences about a population from observations and
analyses of a sample. When making such an inference, it is important to set and test
hypotheses.
This topic will address sampling distributions, an important concept for inferential statistics.
The concept of inferential statistic was also introduced, and is concerned with making
predictions on inferences about a population from observations and analyses of a sample.
In this lecture you will learn how to design experiments. The techniques will help you to
successfully apply the concept of experimental design in diverse fields of agriculture. In the
lecture, you have learned that when designing and implementing experiments, it is important
to keep these principles in mind: Randomization, Replication, Blocking, Orthogonality and
Factorial experimentation
Analysis of Variance (ANOVA) is a statistical method used to test differences between two
or more means. ANCOVA evaluates whether population means of a dependent variable (DV)
are equal across levels of a categorical independent variable (IV), while statistically
controlling for the effects of other continuous variables that are not of primary interest,
5
known as covariates (CV). ANOVA and ANCOVA are important tools that will help you to
check whether two or more groups differ in terms of a particular variable of interest.
This lesson introduces you to correlation and simple linear regression. This can be used to
examine the presence of a linear relationship between two variables while providing certain
assumptions about the data. The interpretation of the results need to be done with care,
particularly when looking for a causal relationship or when using the regression equation for
prediction.
In this lesson we will learn how to present data results. The techniques will help you to
successfully present data in diverse fields of agriculture. In , you have learned that when
designing and implementing experiments, it is important to keep these principles in mind:
Randomization, Replication, Blocking, Orthogonality and Factorial experimentation.
1 Calculate measures of central tendency and dispersion (e.g., mean and standard
deviation) and use these measures to understand important features of a dataset.
2 Computing interpreting probabilities and find probabilities for both discrete and
continuous random variables.
6
3 Apply the concept of a sampling distribution and calculate the mean and standard
deviation of the sampling distribution of the mean.
6 Estimate a linear relationship between two variables and use it to predict the trend.
COURSE DESCRIPTION
As a student, or in your professional career, you will encounter situations that will
demand that you conduct some research, which is concerned with answering an
interesting question. The process begins with an observation that you want to
understand, and this observation could be anecdotal or could be based on some data.
From your initial observation you generate explanations, or theories, of those
observations, from which you can make predictions (hypotheses). Here’s where the
data comes into the process because to test your predictions you need data. First you
collect some relevant data (and to do that you need to identify aspects that can be
measured-variables) and then you analyze those data. The analysis of the data may
support your theory or give you cause to modify the theory. As such, the processes of
data collection and analysis and generating theories are intrinsically linked: theories
lead to data collection/analysis and data collection/analysis informs theories.
7
COURSE REQUIREMENTS
This is a blended learning course that will utilize the flex model. This means that learning
materials and instructions will be given online and the lessons will be self-guided with the
lecturer being available briefly for face to face sessions and support and also on-site (online)
most of the time. Your lecturer will be meeting you face to face to introduce a lesson and put
it into perspective and you will actively participate in your search for knowledge by
undertaking several online activities. This means that some of the 39 instructional hours of
the course will be delivered face to face while other lessons will be taught online through
various learner and lecturer activities. It is important for you to note that one instructional
hour is equivalent to two online hours. Three instructional hours will be needed per week.
Out of these, one will be used for face to face contact with your lecturer (also referred as e-
moderator in the online activities) while the other two instructional hours (translating to four
online hours) will be used for online activities otherwise referred to as e-tivities in the
lessons. This will add up to the 5 hours requirement per lesson earlier mentioned. There are
27 online activities each taking at least two hours and totaling to 54 online hours. You are
advised to follow the topic flow-chart given so that you cover at least a lesson every week.
You will be required to participate and interact online with your peers and the e-
moderator who in this case is your lecturer. Guidelines for the online activities (which we
shall keep referring to as e-tivities) will be provided whenever there is an e-tivity. Please
note that since the online e-tivities are part of the learning process, they may be graded at
the discretion of your e-moderator. Such grading will however be communicated in the e-
tivity guidelines and feedback given as soon as possible after the e-tivity. The e-tivities
will include but will not be limited to online assessment quizzes, assignments and
discussions. There are also assessment questions that you can attempt at the end of every
lesson to test your understanding of the lesson. The answers to all the assessment
8
questions are at the end of the module after lesson 10. All the resource that have been
used in this module in form of books are available under the resources section after the
answers to the questions.
ASSESSMENT
It is important to note that the module has embedded certain learner formative assessment
feedback tools that will enable you gauge your own learning progress. The tools include
online collaborative discussions forums that focus on team learning and personal mastery and
will therefore provide you with peer feedback, lecturer assessment and self- reflection. You
will also be required to do one major assignment/project that is meant to assess the
application of the skills and knowledge gained during the course. The project score in
combination with scores for e-tivities (where graded) will account for 30% of your final
examination score with the remaining 70% coming from a face to face sit-in final written
examination that will be guided by your university examination policy and procedures. The
final and mid-term examination will be closed book, no notes allowed, while assignments
will be open book. The unit lecturer will grade the final exam and determine your grade at the
end of the course. Collaboration is allowed in completing the assignments, and you are
encouraged to learn from each other. Late assignments will not be accepted, unless extreme
circumstances can be demonstrated. The grading of the marks will follow University criteria
listed below;
i. 70-100= A
ii. 60-69= B
iii. 50-59= C
9
iv. 40-49= D
v. <40=F
10
TABLE OF CONTENTS
11
1.2 Learning Outcomes 38
4.2.3 Probability Distribution of a Continous Random Variable 42
4.2.4 Normal Probability Distribution 44
4.3 Assessment Questions 46
4.4 References 47
LESSON 5: STATISTICAL INFERENCE AND HYPOTHESIS TESTING 49
5.1 Introduction 49
5.2 Learning Outcomes 49
5.2.1 Hypothesis testing 50
5.2.2 One-tail verses two-tail test 52
5.2.3 Chi-square testing 54
5.2.4 Goodness-of-fit 55
5.3 Assessment Questions 57
5.4 References 59
LESSON 6: SAMPLES AND SAMPLING DISTRIBUTION 61
6.1 Introduction 61
6.2 Learning outcomes 61
6.2.1 Inferential statistics 62
6.2.2 Sampling distribution 63
6.2.3 Construct confidence interval 65
6.2.3 E-tivity Constructing confidence intervals 66
6.3 Assessment Questions 67
6.4 References 69
LESSON 7: EXPERIMENTAL DESIGN 70
7.1 Introduction 70
7.2 Learning Outcomes 70
12
7.2.1 Principles of experimental design 70
7.2.2 Implementing experimental design 72
7.3 Assessment Questions 74
7.4 References 74
LESSON 8: ANALYSIS OF VARIANCE AND COVARIANCE 76
8.1 Introduction 76
8.2 Lesson Learning Outcomes 76
8.2.1 Analysis of Variance (ANOVA) 76
8.2.2 Analysis of Variance ANOVA 78
8.2.3 Analysis of Covariance ANCOVA 79
8.3 Assessment Questions 81
8.4 References 82
LESSON 9: CORRELATION AND REGRESSION ANALYSIS 84
9.1 Introduction 84
9.2 Learning Outcomes 84
9.2.1 Correlation 85
9.2.2 Regression 87
9.3 Assessment Questions 89
9.4 REFERENCES 91
LESSON 10: DATA PRESENTATION 92
10.1. Introduction 92
10.2. Learning Outcomes 92
10.2.1. Presenting Data in Tables 92
10.2.2. Presenting data in charts and graphs 94
10.3. Assessment Questions 96
10.4. References 98
13
14
LESSON 1
1.1 Introduction
In this first lesson, we lay the foundation for the entire course by defining the concept
statistics and other terms used in statistics. Throughout our teaching experiences, we have
found that an understanding of the basic principles behind the subject and their applications
increases the students’ motivation for the subject. Many students view statistics as not being
different from mathematics. In this unit you will be introduced to the basic concept of
statistics. Two basic and commonly referred concepts of statistics are population and sample.
You will appreciate the difference between population and sample. You will also learn the
15
Definition: Population is the collection of all individuals or items under consideration in a
statistical study (Weiss, 1999). The Population is the whole set of values or individuals you
are interested in. The population may also be defined as the set of entities under study. An
example is the weight of cattle in a Githunguri Farm. The cattle population include all bulls
and cows currently alive, those that had lived and now dead and the ones that will live in the
future. You will not be able to measure the weights of the entire cattle population because
many cattle are yet unborn while many are already dead and unreachable. Even when it is
possible to reach all of them, it is often too costly in terms of money and time involved. In the
example you are interested in the population of cattle and your parameter of interest is body
weight.
1.2.1.2 Sample
Sample is the part of the population from which information is collected (Weiss, 1999)
Since you cannot reach all the members of the population to take measurement, you will take
a subset of population. This subset is called sample. You will then use this subset to draw
inferences about the population under study, given some conditions. You will therefore take
a subset of cattle population which is called sample, measure their weights and calculate the
average or mean. The means that you calculated from sample is called a statistic. It is this
statistic that you will use to draw an inference about the parameter of the population of
interest. Because of the uncertainty and inaccuracy involved in drawing conclusions about
the population based upon sample, you can only draw an inference about the population.
16
You should take note that you will always have few numbers in your sample than the
http://www.math.niu.edu/~richard/Math101/sp07/
stats3_ho.pdf
http://www.ddegjust.ac.in/studymaterial/mcom/mc-
106.pdf
17
Spark
Individual task (a) Using bullet points, outline the difference between
sample and population
(b) discuss the different types of data
Interaction begins
E-moderator interventions 1 Ensure that learners are focused on the contents and
context of discussion.
2 Stimulate further learning and generation of new
ideas.
3 Provide feedback on the learning progress.
4 Round-up the e-tivity
Schedule and time This task should take two hours
Next Define Sample and sampling
18
1.2.2 Sample and sampling.
Sampling: is a technique of selecting individual members or a subset from a population to
make statistical inferences from them and eventually used to estimate characteristics of the
total population. You must ensure that your samples are randomly selected to avoid bias in
the use of the statistic to estimate the parameter.
E-tivity -1.2.2 Sample and sampling.
Spark
19
probabilistic sampling
Interaction begins
20
E-tivity 1.2.3 _ Types of variables
Numbering, pacing 1.2.3
and sequencing
Title Types of Variables
Purpose The purpose of this e-tivity is to enable you to distinguish
between the different types of variables
Brief summary of Read this doc
overall task https://laulima.hawaii.edu/access/content/user/hallston/
341website/2typesofvariables.pdf
Spark
E-moderator 1. Ensure that learners are focused on the contents and context
interventions of discussion.
2. Stimulate further learning and generation of new ideas.
21
3. Provide feedback on the learning progress.
4. Round-up the e-tivity
Schedule and time This task should take two hours
4. Define variable. Can you differentiate between discrete and continuous variable
c) Pollness in cattle (whether a cow or bull has or does not have horn)
g) Litter size
22
h) Litter weight
j) Milk yield
1.4 References
Agarwal, B. L. (2009). Basic Mathematics Fifth Edition. Delhi: New Age International (P)
Limited Publishers.
Gupta, S., & Kapoor, V. K. (1980). Fundamentals of Mathematics Statistics 7th Edition.
Delhi: Sultan Chand & Sons.
Beierlein, J., Schneeberger, K., & Osbum, D. (2008). Principles of Agribusiness
Management.Third Edition. Waveland Press Inc.
Sunderson, T., & Scolve, S. (1978). An Introduction to the Statistics Ananlysis of data.
Boston: Houghton Mifflin.
23
LESSON 2: MEASURES OF CENTRAL TENDENCY
2.1 Introduction
In this lecture you will learn the measures of central tendency and dispersion which are very
important in statistics. A measure of central tendency can be defined is a single value that
describe the central position in a given set of data. As such, measures of central tendency are
sometimes called measures of central location. They are also classed as summary statistics.
The mean (often called the average) is most likely the measure of central tendency that is
commonly used, but there are others, such as the median and the mode. The mean, median
and mode are the common valid measures of central tendency.
However, the measures of central tendency are not adequate enough to describe data. For
example, two sets of data can have the same mean but does not necessarily mean they are the
same. Thus knowing the extent of variability is important when to describe data. Measure of
dispersion includes the range, interquartile range, and standard deviation.
24
2.2.1 THE MEAN
The mean (or average) is the most popular and well known measure of central tendency. It
can be used with both discrete and continuous data, although its use is most often with
continuous data. The mean is equal to the sum of all the values in the data set divided by the
number of values in the data set. So, if we have n values in a data set and they have values x 1,
x2, ..., xn, the sample mean, usually denoted by (pronounced x bar), is
The above formula refers to the sample mean. So, why have we called it a sample mean? This
is because, in statistics, samples and populations have very different meanings and these
differences are very important, even if, in the case of the mean, they are calculated in the
same way. To acknowledge that we are calculating the population mean and not the sample
mean, we use the Greek lower case letter "mu", denoted as µ:
25
Purpose To enable you understand and be able to compute the mean
Spark
Individual task
a) Following the example in the video link answer the questions at the
end of the topic
b) Your answers in this section should be posted to the discussion forum
2.2.1
Interaction begins a) Post two most important internal and two external factors of
motivation.
b) Provide positive descriptive comments on your team learners’
answers with a view of enhancing further thinking. Do this on the
discussion forum 2.2.1
E-moderator a) Ensure that learners are focused on the contents and context of
26
interventions discussion.
b) Stimulate further learning and generation of new ideas.
c) Provide feedback on the learning progress.
Schedule and time This activity should take two hours
Next mode
The mode is the most recurring score in our data set. It represents the highest bar in a bar
chart or histogram. You can, therefore, sometimes consider the mode as being the most
prevalent option.
27
The mode has two major weaknesses:
1. This measure is not appropriate to use for continuous data
2. The mode does not provide us with a very good measure of central tendency
when the most common mark is far away from the rest of the data in the data
set.
After reading
distinguish between popularion and sample mean
28
Spark
Individual task
After reading attempt the exercise found at the end of the topic
save the answers on your portfolio.
29
The median is the middle score for a set of data that has been arranged in order of magnitude.
The median is that value of the variable which divides the group into two equal parts, one
part comprising of all values greater, and the other, all values less than median.
The median is less affected by outliers and skewed data. In order to calculate the median,
suppose we have the data below:
65 55 89 56 35 14 56 55 87 45 92
We first need to rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the middle
mark because there are 5 scores before it and 5 scores after it. This works fine when you have
an odd number of scores, but what happens when you have an even number of scores? What
if you had only 10 scores? Well, you simply have to take the middle two scores and average
the result. So, if we look at the example below:
65 55 89 56 35 14 56 55 87 45
14 35 45 55 55 56 56 65 87 89 92
Only now we have to take the 5th and 6th score in our data set and average them to get a
median of 55.5.
30
E-tivity 2.2.3 Median
Follow the worked examples and the attempt the exercises there after
31
Spark
Individual task
32
2.3 Assessment Questions
1. suppose you have the data below calculate the median,:
65 55 89 56 35 14 56 55 87 45 92
3. While conducting a research the following information was collected and used for
preliminary statistics-
0-30 16
30-60 43
60-90 56
90-120 32
120-150 19
i. Mean
ii. Median
33
iii. Mode
4. The stem diameter was measured for each of 10 randomly selected maize plant, the
following measurements (mm) were recorded: 45.9, 52.4, 65.0, 65.3, 69.2, 57.8, 72.5,
69.9, 64.7, 72.6. Calculate
i. Median
iii. Variance
For each of these questions, choose the option (A, B, C or D) that is TRUE.
(D) Degree to which the mean value differs from its expected value.
(A) Median
(A) 5
(B) 6
(C) 8
34
(D) 9
5 3 6 8 7 8 3 11 6 3 2
(A) 3
(B) 6
(C) 8
(D) 11
(A) 3
(B) 6
(C) 8
(D) 11
(A) 18
(B) 60
(C) 162
(D) 540
(A) 7.7
(B) 6.4
35
(C) 6.0
(D) 5.8
2.4 References
Devore, J. L., & Peck, R. (1986). Statistics: The exploration and analysis of data. St. Paul:
West Pub. Co.
Donnelly R. A. (2004). The complete idiot’s guide to statistics (Vol. The complete idiot's
guide).Indianapolis, IN: Alpha.
Gomez, K. A., & Gomez, A. A. (1984). Statistical procedures for agricultural research (2nd
ed.). New York: Wiley.
Kaps, M., Lamberson, W. R., & Lamberson, W. (2004). Biostatistics for animal science.
Wallingford: CABI Publishing.
LESSON 3
MEASURES OF DISPERSION
3.1 Introduction
Welcome to lesson three. In this lesson, we introduce you to the measures of dispersion.
The Measures of central tendency enables us to have a bird’s eye view of the entire data. This
is called averages of the first order and it serve to identify the center of the distribution
though they do not tell how the items are spread out on either side of the central value. The
measure of the scattering of items in a distribution about the average is called dispersion.
Dispersion measures the extent to which the items vary from some central value. It may be
noted that the measures of dispersion or variation measure only the degree but not the
direction of the variation. The measures of dispersion are also called averages of the second
36
order because they are based on the deviations of the different values from the mean or other
measures of central tendency which are called averages of the first order.
There are three main measures of dispersion:
The range
The semi-interquartile range (SIR)
Variance / standard deviation
.
3.2 Lesson learning outcomes
By the end of the lesson, you will be able to;
3.2.1 Compute the measures of range
3.2.2 Compute the measures of semi-interquartile range
3.2.3 Compute the measures of variance and standard deviation.
37
https://byjus.com/maths/dispersion/
Spark
Individual task
Interaction begins
E-moderator a) Ensure that learners are focused on the contents and context of
interventions discussion.
b) Stimulate further learning and generation of new ideas.
38
c) Provide feedback on the learning progress.
d) Close the discussions
Schedule and time This activity should take two hours
Next Semi- interquartile range
39
Spark
Individual task
Interaction begins
E-moderator a) Ensure that learners are focused on the contents and context of
interventions discussion.
40
b) Stimulate further learning and generation of new ideas.
c) Provide feedback on the learning progress.
d) Close the discussions
Schedule and time This task should take two hours
Next Variance and standard deviation
The variance, is a measure of how different the elements in a given population are. Variance
is used to indicate how spread out these elements are from the central point of the population.
Two kinds of variance exist: population variance and sample variance. Population variance is
2
the variance of the entire population and is denoted by σ . The standard deviation on the
other hand is the square root of variance. Standard deviation is a measure of how precise the
mean of a population or sample is. It is used to indicate trends in the elements in a given data
set with respect to the mean, i.e. the spread of these elements from the mean. Standard
deviation (SD) is the most commonly used measure of dispersion. It is a measure of spread of
data about the mean. SD is the square root of sum of squared deviation from the mean
divided by the number of observations.
Numbering, 3.2.3
pacing and
sequencing
Title Variance and standard deviation
41
Purpose The purpose of this e-tivity is to enable you calculate the variance and
the standard deviation and understand how it is used in statistics
Brief summary of Read this
overall task https://byjus.com/maths/dispersion/
watch the video linked
https://www.youtube.com/watch?v=lp2nTFdYGec&pbjreload=101
https://www.youtube.com/watch?v=Ks_rGi7_-yc
Spark
Individual task
42
Interaction begins
a) Read posts from other students and provide two comments on their
thoughts and ideas.
b) Post your response on the discussion forum 3.2.3
c) Refine your answer based on any new insight acquired from your
colleagues’ posts and save it on your portfolio.
E-moderator (a) Ensure that learners are focused on the contents and context of
interventions discussion.
(b) Stimulate further learning and generation of new ideas.
(c) Provide feedback on the learning progress.
(d) Close the discussions
Schedule and time This task should take two hours
Next Lesson 4: Random variables and probability disrtibition
43
4. The measures used to calculate the variation present among the observations relative to
their average is called:
(a) Coefficient of kurtosis (b) Absolute measures of dispersion (c) Quartile deviation (d)
Relative measures of dispersion
5. The degree to which numerical data tend to spread about an average value called: (a)
Constant (b) Flatness (c) Variation (d) Skewness
6. The measures of dispersion can never be: (a) Positive (b) Zero (c) Negative(d) Equal to 2
7. If all the scores on examination cluster around the mean, the dispersion is said to be:
(a) Small (b) Large (c) Normal (d) Symmetrical
3.4 References
Devore, J. L., & Peck, R. (1986). Statistics: The exploration and analysis of data. St. Paul:
West Pub. Co.
Donnelly R. A. (2004). The complete idiot’s guide to statistics (Vol. The complete idiot's
guide).Indianapolis, IN: Alpha.
Gomez, K. A., & Gomez, A. A. (1984). Statistical procedures for agricultural research (2nd
ed.).New York: Wiley.
Kaps, M., Lamberson, W. R., & Lamberson, W. (2004). Biostatistics for animal
science. Wallingford: CABI Publishing.
Jaisingh, L. R. (2006). Statistics for the utterly confused (2nd ed.). New York: McGraw-
Hill. McDonald, J.H. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House
Publishing, Baltimore, Maryland (http://www.biostathandbook.com/index.html )
Weiss N.A. 1999. Elementary Statistics, fourth edition. Reading, Massachusetts: Addison-
Wesley Publishing Company
44
LESSON 4: RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
1.1 Introduction
As explained in the previous section, research involves collection of data. This data is
is a variable whose possible values are numerical outcomes of a random phenomenon. There
are two types of random variables, discrete and continuous. All random variables (discrete
probability that the random variable X is less than or equal to x, for every value x. For a
discrete random variable, the cumulative distribution function is found by summing up the
value that a random variable can assume with its probability of occurrence. This lecture is
In life, there is no certainty. Every event that come our away is always associated with some
level of uncertainty. When you plant some seeds of corns, there is some probability that this
corn will germinate. The probability of the corn germinating may be improved if certain
condition is fulfilled. For example, if the soil is well water, the probability of germination is
improved. As you go through this unit, you will understand the concept of probability
45
1.2.1 Distinguish between discrete and continuous random variables
1.2.2 Probability distribution for discrete variables
1.2.3 Construction frequency distribution for a continuous variable
1.2.4 Identify the skew of a distribution
A discrete random variable is one which may take on only a countable number of distinct
values such as 0, 1, 2, 3, 4, ... Discrete random variables are usually (but not necessarily)
counts. If a random variable can take only a finite number of distinct values, then it must be
discrete. While a continuous random variable is a random variable which can take values
measured on a continuous scale e.g. weights, strengths, times or lengths. A continuous
random variable is one which takes an infinite number of possible values. Continuous random
variables are usually measurements.
46
distinguish between discrete and continuous variables
Brief summary of overall task Read slides in this link
https://ocw.mit.edu/courses/sloan-school-of-
management/15-063-communicating-with-data-
summer-2003/lecture-notes/lecture6.pdf
Read this article
https://www.uplifteducation.org/cms/lib/
TX01001293/Centricity/Domain/273/RANDOM%20-
%20discrete%20and%20continuous%20-
%20VARIABLE.pdf
-
and Watch the
https://www.youtube.com/watch?v=PlUsFNLRUOc
https://www.youtube.com/watch?v=gPAxuMKZ-w8
47
Spark
48
1.1.2 Probability distributions for Discrete Random variables
The probability distribution of a discrete random variable y is the table, graph or formula
that assign the probability P(y) for each possible value of the variable y. A random
variable is an outcome that takes on a numerical value as a result of an experiment. The
value is not known with certainty before the experiment. But you know the sample space
of the experiment. You can denote the value of the random variable as x. For example in
an experiment where a single dice is rolled, the P(x=1) = 1/6, P(x=2)=1/6, P(x=3) = 1/6,
P(x=4)=1/6, P(x=5) = 1/6 and P(x=6)=1/6. The sum of all the probability is 1..
49
Spark
Individual task
50
density of a continuous random variable is a function that describes the relative likelihood
for this random variable to take on a given value. The probability of the random variable
falling within a particular range of values is given by the integral of this variable’s density
over that range—that is, it is given by the area under the density function but above the
horizontal axis and between the lowest and greatest values of the range. The integral of a
PDF over the entire space is equal to one.
The distribution of a continuous random variable can be characterized through its
probability density function (PDF). The probability that a continuous random variable
takes a value in a given interval is equal to the integral of its probability density function
over that interval, which in turn is equal to the area of the region in the xy-plane bounded
by the x-axis, the pdf and the vertical lines corresponding to the boundaries of the
interval.
E-tivity 4.2.3 Probability Distribution of a Continuous Random Variable
51
https://www.youtube.com/watch?v=9KVR1hJ8SxI
Spark
Individual task Read the notes and watch the video provided
Answer the questions above
Interaction begins
52
E-moderator interventions
53
normal-probability-distribution.php
54
Close the E-tivity
Schedule and time This task will take two hours
Next Lesson 5: Statistical inference and Hypothesis
1 Which of the following statements can best describes the relationship between a
parameter and a statistic?
b) A parameter has a sampling distribution that can be used to determine what values
the statistic is likely to have in repeated samples.
a) 0
b) Infinity
c) 1
d) Changes
3 A table with all possible value of a random variable and its corresponding
probabilities is called ___________
d) Probability Distribution
4 In a wild life conservation survey on birds, it was showed that one out of ten
quails was trapped, using mist net, in a given season. If 20 birds are selected
55
at random, find the probability that 6 of the birds were trapped in the previous
season.
i. A farmer’s goat is expecting a set of twin. What is the sample space for this
farmer’s expectation (hint: any combinations of the two sexes are possible)
ii. What is the probability the goat giving birth to at least 1 male?
4.4 References
Devore, J. L., & Peck, R. (1986). Statistics: The exploration and analysis of data. St. Paul:
West Pub. Co.
Donnelly R. A. (2004). The complete idiot’s guide to statistics (Vol. The complete idiot's
guide).Indianapolis, IN: Alpha.
Gomez, K. A., & Gomez, A. A. (1984). Statistical procedures for agricultural research (2nd
ed.).New York: Wiley.
Kaps, M., Lamberson, W. R., & Lamberson, W. (2004). Biostatistics for animal
science. Wallingford: CABI Publishing.
Jaisingh, L. R. (2006). Statistics for the utterly confused (2nd ed.). New York: McGraw-
Hill. McDonald, J.H. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House
Publishing, Baltimore, Maryland (http://www.biostathandbook.com/index.html )
Weiss N.A. 1999. Elementary Statistics, fourth edition. Reading, Massachusetts: Addison-
Wesley Publishing Company
56
LESSON 5: STATISTICAL INFERENCE AND HYPOTHESIS TESTING
1.1 Introduction
The main goal of statistical inference and hypothesis testing is to enable a researcher make a
statement about something that is not observed within a certain level of uncertainty.
Inferences are normally based on sample to give information that will be generalized about a
population i.e. the objective is to understand the population based on the sample. Population
is a collection of objects that we want to study/test. For example, if you are studying quality
of products from a production line for a given day, then the whole production for that day is
the population. In the real world, it may be hard to test every product – hence we draw a
sample from the population and infer the results based on the sample for the total population.
The statistical model in this case offers an abstract representation of the population and how
the elements of the population relate to each other. Parameters are numbers that represent
features or associations of the population and are usually estimated from the data. A
parameter represents a summary description of a fixed characteristic or measure of the target
population. It denotes the true value that would be obtained as if we had carried out a census
(instead of a sample). Parameters include Mean (μ), Variance (σ²), Standard Deviation (σ),
Proportion (π). These values are individually called a statistic. A Sampling Distribution is a
probability distribution of a statistic obtained through a large number of samples drawn from
the population. In sampling, the confidence interval provides a more continuous measure of
un-certainty. The confidence interval proposes a range of plausible values for an unknown
parameter (for example, the mean). In other words, the confidence interval represents a range
of values we are fairly sure our true value lies in. For example, for a given sample group, the
mean weight is 146 cms and if the confidence interval is 95%, then it means, 95% of similar
experiments will include the true mean, but 5% will not contain the sample.
1.2 Learning Outcomes
57
By the end of the lesson, you will be able to;
58
E-tivity 5.2.1 Concept of design thinking
59
Spark
Individual task Read the document provided and listen to the video clip and use it to
fill the answers on section e. above
Post the discussion on discussion board 5.2.1
60
relationship in both directions. For example, we may wish to compare the mean of a sample
to a given value x using a t-test. Our null hypothesis is that the mean is equal to x. A two-
tailed test
61
Spark .
Individual task 1. Read the notes provided and listen to the video
clip
.
Interaction begins
E-moderator interventions
62
5.2.3 Chi-square testing
The chi-square test is used to determine whether there is a significant there exist any
difference between the expected frequencies and the observed frequencies in one or more
categories. Such question can be answered by the test:-does the number of individuals or
objects that fall in each category differ significantly from the number you would expect? Ehat
is the source of the differences observed? Is the difference between the expected and
observed due to sampling error?
Chi-Square enables you to estimate whether a relationship exists, but how do you know how
strongly the variables are related? Chi square tests will allow you to perform hypothesis
testing on nominal and ordinal data.
63
Spark
Individual task Read the document provided and listen to the video clip and use it to
fill the answers on section e. above
Post the discussion on discussion board 5.2.1
5.2.4 Goodness-of-fit
Goodness of Fit test compares the Observed Frequencies from the data with the Expected
Frequencies predicted by null hypothesis.
64
Numbering, pacing 5.2.4
and sequencing
Title Goodness-of-fit
Purpose The purpose of this e-tivity is to enable the you understand and apply the
concept of goodness of fit
https://www.youtube.com/watch?v=kUqLtRVtTs4
65
Spark
Individual task Read the document provided and listen to the video clip and use it
to fill the answers on section e. above
66
1.3 Assessment Questions
67
5. If the null hypothesis is false then which of the following is accepted?
a) Null Hypothesis
b) Positive Hypothesis
c) Negative Hypothesis
d) Alternative Hypothesis.
68
a) 1-α
b) β
c) α
d) 1-β
15. If a hypothesis is rejected at the 5% level of significance, it
a. will always be rejected at the 1% level
b. will always be accepted at the 1% level
c. will never be tested at the 1% level
d. may be rejected or not rejected at the 1% level
16. Formulate hypothesis statement for the following claim: “The average Maasai cow
produces 6ltrs of milk daily.” A sample of 40 Maasai cows produced an average 8 kg of milk
per day. Assume the population standard deviation is 2.5 ltrs. Using α = 0.05, test your
hypothesis. What is your conclusion?
1.4 References
1. Devore, J. L., & Peck, R. (1986). Statistics: The exploration and analysis of data. St.
Paul: West Pub. Co.
2. Donnelly R. A. (2004). The complete idiot’s guide to statistics (Vol. The complete
idiot's guide). Indianapolis, IN: Alpha.
3. Gomez, K. A., & Gomez, A. A. (1984). Statistical procedures for agricultural research
(2nd ed.). New York: Wiley.
4. Kaps, M., Lamberson, W. R., & Lamberson, W. (2004). Biostatistics for animal
science.Wallingford: CABI Publishing.
69
5. Jaisingh, L. R. (2006). Statistics for the utterly confused (2nd ed.). New York: McGraw-
Hill. McDonald, J.H. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House
Publishing, Baltimore, Maryland (http://www.biostathandbook.com/index.html )
6. Weiss N.A. 1999. Elementary Statistics, fourth edition. Reading, Massachusetts:
Addison-Wesley Publishing Company
70
LESSON 6: SAMPLES AND SAMPLING DISTRIBUTION
1.1 Introduction
Sampling distributions are important for inferential statistics. Ideally the process of research
starts with specifying a population and the sampling distribution of the mean and the range
determined. One usually starts from collection of sample data and from these data estimate
parameters of the sampling distribution. The knowledge of the sampling distribution is very
valuable especially in knowing the degree to which means from different samples differ from
each other and from the population mean. This would give a sense of how close a particular
sample mean is likely to be to the population mean. This information is established from a
sampling distribution. The standard deviation of the sampling distribution of the mean is the
most common measure of how much sample means differ from each other. The standard
deviation is called the standard error of the mean, this implies that if all the sample means
were very close to the population mean, then the standard error of the mean would be small.
On the contrary, if the sample means varied considerably, then the standard error of the mean
would be large.
For example, assume in statistic a sample mean calculated were 125 and the estimated
standard error of the mean were 5. If the data is normally distribution, then the sample mean
would be within 10 units of the population mean since most of a normal distribution is within
two standard deviations of the mean. One needs to keep in mind that all statistics have
sampling distributions and not just the mean.
71
1.2.1 Describe the concepts of inferential statistics
1.2.2 Discuss the concept of sampling distribution
1.2.3 Construct a confidence interval
72
Read the following book from the link provided
https://www.acsu.buffalo.edu/~deannaal/Statistics_Textbook.pdf
73
6.2.2 Sampling distribution
The sampling distribution is a branch of a sample statistic. Like the population distribution it
is a model of a distribution of scores, except that the scores are not raw scores, but statistics.
It represents a thought experiment of what would happen if a person repeatedly took samples
of size N from the population distribution and computed a particular statistic each time. The
resulting distribution of statistics is called the sampling distribution of that statistic. For
example, suppose that a sample of size sixteen (N=16) is taken from some population. The
mean of the sixteen numbers is computed. Next a new sample of sixteen is taken, and the
mean is again computed. If this process were repeated an infinite number of times, the
distribution of the now infinite number of sample means would be called the sampling
distribution of the mean. Every statistic has a sampling distribution. For example, suppose
that instead of the mean, medians were computed for each sample. The infinite number of
medians would be called the sampling distribution of the median.
74
a. Define what you understand by a sampling
distribution
b. What do you understand by central limit theorem
c. What is a sample mean
Spark
Individual task
Interaction begins
75
c. Post at least one comment on their work and give
constructive criticism
E-moderator interventions
76
6.2.1 E-tivity Constructing confidence intervals
Numbering, pacing and 6.2.3
sequencing
Title Constructing confidence intervals
Purpose The purpose of this e-tivity is to enable you
construct confidence intervals
Brief summary of overall task Listen to the following video
https://www.youtube.com/watch?v=DT-fPG0Hff8
https://www.youtube.com/watch?v=UetYS3PaHIo
Listen to the following video
https://www.youtube.com/watch?v=MUD390jtgQs
after answer the following questions
a. Why and when do you use student t distribution
of Z- test?
b. Using the same example construct 99%
confidence interval for the population mean
Spark
77
4. Scan the work and share in the discussion forum
6.2.3
Interaction begins 1. Post your work in the discussion forum 6.2.3
2. Review your colleagues work and provide
feedback
3. Post at least one comment on their work and give
constructive criticism
E-moderator interventions a. Summarize the threads and review the sketches
b. Give feedback
c. Provide teaching points
d. Close the E-tivity
Schedule and time This task will take a minimum of 2 hours
Next Lesson 7: experimental design
78
2. Which of the following statements regarding a researcher’s use of inferential statistics is
true?
3. If you drew all possible samples from some population, calculated the mean for each of
the samples, and constructed a line graph (showing the shape of the distribution) based on
all of those means, what would you have?
a. A population distribution
b. A sample distribution
79
c. A sampling distribution
d. A parameter distribution
a. The process you used will capture the true parameter 95% of the time in the
long run
b. You can be “95% confident” that your interval will include the population
parameter
c. You can be “5% confident” that your interval will not include the population
parameter
1.4 References
1. Devore, J. L., & Peck, R. (1986). Statistics: The exploration and analysis of data. St.
Paul: West Pub. Co.
80
2. Donnelly R. A. (2004). The complete idiot’s guide to statistics (Vol. The complete
idiot's guide). Indianapolis, IN: Alpha.
3. Gomez, K. A., & Gomez, A. A. (1984). Statistical procedures for agricultural research
(2nd ed.). New York: Wiley.
4. Kaps, M., Lamberson, W. R., & Lamberson, W. (2004). Biostatistics for animal
science.Wallingford: CABI Publishing.
5. Jaisingh, L. R. (2006). Statistics for the utterly confused (2nd ed.). New York: McGraw-
Hill. McDonald, J.H. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House
Publishing, Baltimore, Maryland (http://www.biostathandbook.com/index.html )
6. Weiss N.A. 1999. Elementary Statistics, fourth edition. Reading, Massachusetts:
Addison-Wesley Publishing Company
81
LESSON 7: EXPERIMENTAL DESIGN
7.1 Introduction
In this lecture you will learn some basics on experimental design and analysis of
experimental data. The experimental design, also called design of experiments, is actually a
structured and organized way of conducting and analyzing controlled tests so as to evaluate
the factors that are affecting a response variable. The design of experiments specifies the
particular combinations settings of factors at which the individual runs in the experiment are
to be conducted. Data obtained from observational studies or other data not collected in
accordance with a design of experiments approach can only establish correlation, not
causality. There are also problems with the traditional experimental method of changing one
factor at a time.
Randomization
Replication
Local control
Blocking
Factorial experiments
82
E-tivity 7.2.1 Principles of experimental design
83
Spark
Individual task
84
ensure that the experimental design is not violated in the field. It is these issues that are the
focus of the topic.
85
Spark
Individual task
86
7.3 Assessment Questions
1. Which of the following would improve the reliability of an experiment?
2. You are interested in the effect of increased carbon dioxide versus normal air on the
growth of corn plants as well as the effect of green light versus full sunlight on the growth
of corn plants. Your plan is to set up your experiment inside a greenhouse where you can
control the environment. Which of the following is an aspect of the experiment that
should be considered and controlled?
c. The intensity or brightness of the green light equals the intensity of the full sunlight.
d. All temperatures and available water remain the same for all plants.
7.4 References
87
Devore, J. L., & Peck, R. (1986). Statistics: The exploration and analysis of data. St. Paul:
West Pub. Co.
Donnelly R. A. (2004). The complete idiot’s guide to statistics (Vol. The complete idiot's
guide).Indianapolis, IN: Alpha.
Gomez, K. A., & Gomez, A. A. (1984). Statistical procedures for agricultural research (2nd
ed.). New York: Wiley.
Kaps, M., Lamberson, W. R., & Lamberson, W. (2004). Biostatistics for animal science.
Wallingford: CABI Publishing.
Jaisingh, L. R. (2006). Statistics for the utterly confused (2nd ed.). New York: McGraw-Hill.
McDonald, J.H. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House
Publishing, Baltimore, Maryland (http://www.biostathandbook.com/index.html )
Weiss N.A. 1999. Elementary Statistics, fourth edition. Reading, Massachusetts: Addison-
Wesley Publishing Company
88
LESSON 8: ANALYSIS OF VARIANCE AND COVARIANCE
8.1 Introduction
Analysis of Variance (ANOVA) is a statistical method that is used to test differences
between two or more means. Literally this could be called "Analysis of Means" rather
than "Analysis of Variance." But the name is correct because inferences about means
are made by considering variance.
89
i. The observations are obtained independently and randomly from the populations defined
by the factor levels.
ii. The population at each factor level is (approximately) normally distributed.
iii. These normal populations have a common variance, σ2
Spark
90
Individual task Post your responses in forum 8.2.1
91
E-tivities 8.2.2 Analysis of Variance ANOVA
Individual task a. Using the data from the documents and video try conducting
ANOVA in Excel on your own
b. Interpret the results
Interaction begins a) Post your responses in forum 8.2.2
E-moderator a) Ensure that learners are focused on the contents and context of
92
interventions discussion.
b) Stimulate further learning and generation of new ideas
c) Provide feedback on the progress the learners are making.
d) Round up the e-tivity
Schedule and time This task should take two hours
Next ANCOVA
93
Then answer the following question
a) What is ANCOVA
b) Highlight the uses of ANCOVA
c) Describe the steps of ANCOVA
Spark
Individual task c. Using the data from the video try conducting ANOVA in Excel
on your own
d. Interpret the results
Interaction begins Post your responses in forum 8.2.3
E-moderator a) Ensure that learners are focused on the contents and context of
interventions discussion.
b) Stimulate further learning and generation of new ideas
c) Provide feedback on the progress the learners are making.
d) Round up the e-tivity
Schedule and time This task should take two hours
94
1. The ANOVA procedure is a statistical approach for determining whether or not
a) The means of two samples are equal
b) The means of two or more samples are equal
c) The means of more than two samples are equal
d) The means of two or more populations are equal
2. The null hypothesis for an ANOVA states that __________.
a) There are no differences between any of the population means
b) At least one of the population means is different from the others
c) All of the population means are different from each other
d) None of the other 3 choices is correct.
3. In an ANOVA, which of the following is most likely to produce a large value for the F-
ratio?
a) Large mean differences and small sample variances
b) Large mean differences and large sample variances
c) Small mean differences and small sample variances
d) Small mean differences and large sample variances
4. An analysis of variance is used to evaluate the mean differences for a research study
comparing four treatments with a separate sample of n = 5 in each treatment. If the data
produce an F-ratio of F = 3.15, then which of the following is the correct statistical
decision?
a) Reject the null hypothesis with = .05 but not with = .01.
b) Reject the null hypothesis with either = .05 or = .01.
c) Fail to reject the null hypothesis with either = .05 or = .01.
d) There is not enough information to make a statistical decision.
5. An analysis of variances produces df between treatments = 2 and df within treatments = 24. For this
analysis, what is df total?
a) 26
b) 27
95
c) 28
d) Cannot be determined without additional information
6. An undergraduate student in the school of agriculture and enterprise development studied
the effect of fertilizer rate in plant height. Originally 35 experimental units were selected
for their uniformity and assigned randomly to five fertilizer rates (0, 10, 20, 30, 40 kg/ha);
seven units per fertilizer rate. A problem in the field resulted in the loss of seven
measurements. The following are the results for the remaining cases.
Fertilizer rate
0 10 20 30 40
24 31 30 26 30
18 27 28 21 32
25 29 27 23 29
23 25 25 25 25
22 30 20 31
26 24 29
20
8.4 References
Agarwal, B. L. (2009). Basic Mathematics Fifth Edition. Delhi: New Age International (P)
Limited Publishers.
Gupta, S., & Kapoor, V. K. (1980). Fundamentals of Mathematics Statistics 7th Edition.
Delhi: Sultan Chand & Sons.
96
Beierlein, J., Schneeberger, K., & Osbum, D. (2008). Principles of Agribusiness
Management.Third Edition. Waveland Press Inc.
Sunderson, T., & Scolve, S. (1978). An Introduction to the Statistics Ananlysis of data.
Boston: Houghton Mifflin.
97
LESSON 9: CORRELATION AND REGRESSION ANALYSIS
1.5 Introduction
Welcome to the ninth lesson. In reality so many questions could be running your mind as an
agriculture expert that need to be answered. Some of them could be: - Is the amount of milk
produce by a cow related to the weight of her calf at weaning? Is the level of feeding of
broiler chicken related to the weight of the broiler chicken at 8 weeks when it should be
slaughter for marketing? At the end of this unit, you will be able to quantify your answer to
questions of this type based on the data you might have gathered.
Correlation and regression are other areas of inferential statistics which involve determining
whether a relationship between two or more numerical or quantitative variables exists. This is
when two characteristics are studied simultaneously on each member of a population in order
to examine whether they are related. For instance, a researcher may be interested in finding
out the relationship between weight and age of broiler chickens or Lactation length of cows
Therefore, correlation and regression analyses are used to measure association between
98
9.2.2 Explain how you can express the relationship between variables statistically
by looking at two measures: covariance and correlation coefficient.
1.6.1 Correlation
You often wonder what is the relationship between the height of an egg and its weight. Now
go to take 10 eggs. Measure the height of each egg and its weight. Does there appear to be
connection between the height and weight of the eggs?
Correlation is a statistical measure that indicates the extent to which two or more variables
drawn from the same population fluctuate together. Correlation coefficient calculated fron a
sample data measures the strength and direction of a linear relationship between two
variables A positive correlation indicates the extent to which those variables increase or
decrease; a negative correlation indicates the extent to which one variable increases as the
other decreases. The symbol of correlation coefficient calculated from a sample data is r
while the symbol for population correlation coefficient is ρ (rho). The relationship between
two variables is not perfect.
This lesson looks first at how we can express the relationships between variables statistically
by looking at two measures: covariance and the correlation coefficient.
E-tivity 9.2.1 Main legal forms of business
Numbering, pacing 9.2.1
and sequencing
Title Correlation
Purpose The purpose of this e-tivity is to enable you to appreciate the various legal
forms of business
Brief summary of Read and watch the materials and video from this link
overall task https://www.simplypsychology.org/correlation.html
99
http://educ.jmu.edu/~drakepp/FIN360/readings/Regression_notes.pdf
a. Define correlation
b. Give four uses of correlation
c. Differentiate between correlation and causation
d. Discuss the strengths and weaknesses of correlation
Spark
100
4. Close the E-tivity
Schedule and time This task will take two hours
Next Regression
1.6.2 Regression
In the previous section we looked at correlation i.e how to measure relationships between
two variables. Though correlation are very useful but we can not predict one variable from
another. Regression analysis can then be used to predict how much one variable can
influence the other variable. A simple example might be to try to predict levels of maize
output from the amount of fertilizer applied. You’d expect this to be a positive relationship
(the higher the amount of fertilizer, the higher the output). We could then extend this basic
relationship to answer a question such as ‘if you applied 25Kg/ha, how much maize output
would the farmer harvest? The essence of regression is therefore to fit a model to our data
and use it to predict values of the dependent variable (DV) from one or more independent
variables (IVs). Regression is a way of predicting an outcome variable from one variable
(Predictor variable) hence called simple regression or from several variables (predictor
variables) hence called multiple regressions. This tool allows us to go a step beyond the data
that we collected.
101
Brief summary of Read the following documents on this link -
overall task https://corporatefinanceinstitute.com/resources/knowledge/finance/
regression-analysis/
http://pba.ucdavis.edu/files/45007.pdf
After reading, watch the following video
https://www.youtube.com/watch?v=TU2t1HDwVuA
and answer the following questions;
Question 1.
a. Define regression
b. Distinguish the different types of regression.
c. In one paragraph of not more than 150 words, explain the
disadvantages of incorporation.
Question 2.
Go through this link https://www.youtube.com/watch?v=owI7zxCqNY0
and answer the following questions.
102
Spark
Individual task
Schedule and time This task will take a minimum of two hours
Next Lesson 10: Data presentation
103
1.7 Assessment Questions
In the following multiple-choice questions, select the best answer.
2 If there is a very strong correlation between two variables then the correlation
coefficient must be
a. much larger than 0, regardless of whether the correlation is negative or
positive
b. any value larger than 1
c. much smaller than 0, if the correlation is negative
d. None of these alternatives is correct.
4 The relationship between number of beers consumed (x) and blood alcohol
content (y) was studied in 16 male college students by using least squares
regression. The following regression equation was obtained from this study:
= -0.0127 + 0.0180x
The above equation implies that:
a. each beer consumed increases blood alcohol by an average of amount of 1.8%
b. on average it takes 1.8 beers to increase blood alcohol content by 1%
c. each beer consumed increases blood alcohol by 1.27%
104
d. each beer consumed increases blood alcohol by exactly 0.018
105
a. The expected value of the error term is one.
b. The variance of the error term is the same for all values of x.
c. The values of the error term are independent.
d. The error term is normally distributed.
10 Larger values of r2 (R2) imply that the observations are more closely grouped
about the
a. least squares line
b. average value of the independent variables
c. origin
d. average value of the dependent variable
1.8 REFERENCES
Devore, J. L., & Peck, R. (1986). Statistics: The exploration and analysis of data. St. Paul:
West Pub. Co.
Donnelly R. A. (2004). The complete idiot’s guide to statistics (Vol. The complete idiot's
guide).
Indianapolis, IN: Alpha.
Gomez, K. A., & Gomez, A. A. (1984). Statistical procedures for agricultural research (2nd
ed.).
New York: Wiley.
Kaps, M., Lamberson, W. R., & Lamberson, W. (2004). Biostatistics for animal
science.
Wallingford: CABI Publishing.
Jaisingh, L. R. (2006). Statistics for the utterly confused (2nd ed.). New York: McGraw-
Hill. McDonald, J.H. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House
Publishing, Baltimore, Maryland (http://www.biostathandbook.com/index.html )
Weiss N.A. 1999. Elementary Statistics, fourth edition. Reading, Massachusetts: Addison-
Wesley Publishing Company
http://www1.appstate.edu/~mcraelt/simpreg1.pdf
106
LESSON 10: DATA PRESENTATION
1.1. Introduction
We are in our final lesson of statistics for agriculture. In this lesson, we will learn how to
present your data. Once you have conducted your research, it is paramount that you present
your data in a way that you communicate effectively to your audience. According to Tufte
(2001) a good method of presenting data should:
i. Show the data clearly.
ii. Induce the reader to think about the data being presented (rather than some other
aspect of the method of presentation such as the colour of a graph.
iii. Avoid distorting the data.
iv. Present many numbers with minimum ink.
v. Make large data sets (assuming you have one) coherent.
vi. Encourage the reader to compare different pieces of data.
vii. Reveal data.
Data can be presented as text, in tables, or pictorially as graphs and charts..
107
should include only essential data and should try to use relatively few significant digits.
Too many decimal points should be avoided because they make data less clear. One
should however consider the orientation of the table so as to make the table visible
enough.
Spark
108
4. Review the material
5. Provide summaries from the video clip
6. Give your answers and summarized notes in discussion forum
10.2.1
Interaction begins 1. Post your answer in the discussion forum 10.2.1
2. Read posted work from 2 colleagues
3. Post at least one comment on their work and give constructive
criticism
E-moderator 1. Summarize the threads
interventions 2. Give feedback
3. Provide teaching points
4. Close the E-tivity
Schedule and time This task will take two hours
Next Presenting data in charts and graphs
109
E-tivity 10.2.3 presenting data in charts and graphs
Individual task
110
10.2.2
Interaction begins
E-moderator
interventions
1. Summarize the threads
2. Give feedback
3. Provide teaching points
4. Close the E-tivity
a. Footnote
b. Source note
111
c. Body of table
d. Caption
2. At the top of each column in a table a column designation is provided to explain figures
of the column which is known as ___________.
a. Stub
b. Caption
c. Head note
d. Title
3. ___________ part of table gives information about unit used in table to represent data.
a. Stub
b. Caption
c. Head note
d. Title
4. From the Data provided in Appendix 1, compute the mean of all the variables
disaggregated by the gender of the household. Present your results using the most
appropriate method among those provided in the lesson
112
5 4 Male 0.77 259500 4610 69500
6 2 Female 1.43 171000 0 17948
7 5.16 Female 0.97 67800 890 30055
8 2.89 Male 0.61 55200 0 67779
9 6.33 Male 0.97 747400 300 19704
10 2.78 Female 0.9 1459300 11035 215265
11 3.89 Male 1.28 475950 21950 172060
12 5.02 Male 0.92 277800 13200 112016
13 4 Female 2.65 737500 34900 54560
14 2.78 Male 0.82 111600 6250 81138
15 4.78 Male 1.33 1298400 18540 328156
16 2.89 Male 0.92 893450 10760 69000
17 2.6 Female 0.87 354320 4940 111682
18 1 Female 0.61 137915 150 45085
19 4.65 Male 0.36 107650 975 23702
20 1.78 Female 0.46 293550 450 12052
21 4.06 Male 1.33 218500 18175 244373
22 7.23 Female 1.33 2410000 15900 145600
23 4.34 Female 1.48 1630900 12175 348680
24 6.3 Male 1.25 893400 19775 238435
25 7.1 Male 1.43 170100 32800 143603
26 4 Female 1.12 264300 7000 30000
27 4.96 Female 0.31 135530 1050 19592
28 5.8 Female 1.7 950200 4703 84365
29 3.77 Female 0.51 732900 6250 106735
30 7.18 Female 1.17 200800 6450 111518
1.4. References
1. Anderson, D.R, Sweeny D.J., Williams T.A.1999. Statistics for Business and
Economics. West Publishing, Saint Paul.
113
3. Maxwell S.E and Delaney, H.D.1990.Designing Experiments and Analyzing
Data. Belmont, CA: Wadsworth.
Press. Nairobi
6. Rees. D.G,. 2001. Essential Statistics, 4th Edition, Chapman and Hall/CRC
8. http://stattrek.com/probability-distributions/t-distribution.aspx
9. http://www.canterbury.ac.uk/education/quality-in-study-support/docs/
5%20-%20Statistics%20and%20presentation.pdf
10. http://www.stat.yale.edu/Courses/1997-98/101/confint.htm
114
51 - 55 53 9.5 90.25 5 28 265
56 - 60 58 14.5 210.25 2 30 116
1,122 Ef = 30 1,305
Required; (i) Compute the mean milk production in the month (7 Marks)
43.5
(ii) Compute the median milk production in the month (5 Marks)
37.4
6.1
115