You are on page 1of 45

Sudershan Mishra, ID-51063

Applications of statistical tools for data


analysis and interpretation in crops
Introduction
• Statistical methods involved in carrying out a study
include planning, designing, collecting data, analyzing,
drawing meaningful interpretation and reporting of the
research findings

• The statistical analysis gives meaning to the meaningless


numbers, thereby breathing life into a lifeless data. The
results and inferences are precise only if proper
statistical tests are used.

• Very handy softwares are available to perform those


tests nowadays
Winters et al., 2010 2
Variable
• Variable is a
characteristic that
varies from one
individual member
of population to
another

Kaur, 2013
3
Variables ………….. Contd.
• Categorical or nominal variables are unordered. The data are merely
classified into categories and cannot be arranged in any particular order. If
only two categories exist (as in gender male and female), it is called as a
dichotomous (or binary) data

• Ordinal variables have a clear ordering between the variables. However,


the ordered data may not have equal intervals

• Interval variables are similar to an ordinal variable, except that the intervals
between the values of the interval variable are equally spaced

• Ratio scales are similar to interval scales, in that equal differences between
scale values have equal quantitative meaning. However, ratio scales also
have a true zero point, which gives them an additional property.
Kaur, 2013 4
Statistics

Descriptive

• About relationship between variables in sample or polupation

Inferential

• About estimation of population parameter from sample

Winters et al., 2010 5


Components of Descriptive Statistics

• Measures of central tendency


• Measures of dispersion
• Skewness
• Kurtosis
• Gaussian distribution
• Skewed distribution

Winters et al., 2010 6


Fig1- Normal/Gaussian distribution curve

Winters et al., 2010 7


Fig 2 – Curves showing negatively and positively skewed distribution

Winters et al., 2010 8


Inferential Statistics
• Data from sample analyzed to get a sample parameter

• Purpose is to test the validity of calculated sample parameter


against population parameter

• In inferential statistics, the term ‘null hypothesis’ (H0


‘H‑naught,’ ‘H‑null’) denotes that there is no relationship
(difference) between the population variables in question

• Alternative hypothesis (H1 and Ha) denotes that a statement


between the variables is expected to be true.

Winters et al., 2010 9


Inferential statistics continued..
• The P value (or the calculated probability) is the probability of
the event occurring by chance if the null hypothesis is true.
The P value is a numerical between 0 and 1 and is interpreted
by researchers indeciding whether to reject or retain the null
hypothesis

Table 1- P values with interpretation

Winters et al., 2010 10


Parametric tests
• Assumptions
1. The parametric tests assume that the data are on a
quantitative (numerical) scale.
2. Data is normally distributed
3. Equal variance among sample and populations
4. Random selection of sample
5. E.g. t-test, ANOVA etc

Winters et al., 2010 11


Non- Parametric tests
• Assumptions
1. The non- parametric tests assume that the data are
on a qualitative (categorical) scale.
2. Data is not normally distributed- distribution free
tests
3. Non‑parametric tests may fail to detect a significant
difference when compared with a parametric test.
That is, they usually have less power.
4. Eg- sign test, Mann-Whitney test

Winters et al., 2010 12


Table 2- Analogous parametric and non-parametric tests

Winters et al., 2010 13


14
15
16
17
18
19
SPSS contains all basic statistical tests and
multivariate analyses such as
• t-tests;
• chi-square tests;
• ANOVA;
• correlations and other association measures;
• regression;
• nonparametric tests;
• factor analysis;
• cluster analysis.

20
R (R Foundation for Statistical Computing)

21
Matlabs

22
SAS

23
Cropstat

24
• CropStat is a computer program for data management and basic statistical analysis of
experimental data. It can be run in any 32-bit Windows operating system. It has been developed
primarily for the analysis of data from agricultural field trials, but many of the features can be
used for analysis of data from other sources.
The main modules and facilities are

• Data management with a spreadsheet


• Text editor
• Descriptive statistics and Scatterplot Graphics
• Balanced analysis of variance
• Unbalanced analysis (generalized linear models)
• Linear Mixed Models
• Combined analysis of variance
• Analysis of repeated measures
• Regression and correlation analysis 
• Single-site analysis for variety trials
• Spatial Analysis
• Genotype × environment interaction analysis
• Pattern Analysis
• Quantitative trait loci analysis 
• Graphics
25
• Utilities for randomization and layout, and orthogonal polynomial
STAR

26
STAR

27
FieldLab
FieldLab is an application for Android tablet that used for data
collection in the field. IRRI’s researchers and technicians are
using this application to go paperless and thus, promote
digital revolution.

Features
• Import ICIS workbook as a study
• Export observation data collected to an ICIS workbook format
• With validation, range entry and look-up values on data entry
form.
• Integration with wireless bar-code reader (Baracoda brand)
• Manages trait to measured
• Manages images and audio captured
28
Online statistical packages
• IASRI Stat
• ICAR GOA STAT 2.0

29
Multivariate analysis
Ordination

Discrimination

Canonical analysis

Adeyanju, 2015 30
Multivariate Analysis
• Ordination aims at describing data by identifying a
reduced data dimension of a few variables accounting
for the greatest amount of variability in the data.

• Discrimination aims at delineating experimental


groups or classifying observations into experimental
groups based on a set of variables.

• Canonical aims at describing and predicting the


relationship between two sets of variables.

Adeyanju, 2015 31
Ordination Methods
• Principal component analysis (PCA)
• Principal coordinate analysis,
• Correspondence analysis
• Multidimensional scaling
• Factor analysis (FA)

Adeyanju, 2015 32
Discrimination Methods
• Discriminant analysis
• Multiple logistic regression analysis
• Multivariate analysis of variance (MANOVA),
• Cluster analysis (CA)

Adeyanju, 2015 33
Canonical Analysis
• Canonical correlation
• Canonical redundancy
• Canonical correspondence

Adeyanju, 2015 34
PCA
• PCA is a multivariate statistical technique which reduces
the dimension of a p-dimensional array by introducing a
set of linear combinations of the original variables.

• It has been suggested that PCA provides a means to


quantitatively evaluate the relative importance of curve
elements (Madden and Pennypacker, 1979).

• They identified three principal curve components,


indicating the level of effect of a factor curve, the rate of
yield increase, and the variation in shape or skewness
from the mean curve

Adeyanju, 2015 35
Cluster Analysis
• CA is an exploratory data analysis tool which aims at
sorting different objects into groups

• Grouping is done in a way that the degree of


association between two objects is maximal if they
belong to the same group and minimal otherwise

Adeyanju, 2015 36
Factor analysis
• FA, as a branch of multivariate analysis, is useful to
explain the inter-correlations of variables (Maxwell,
1961)

• It helps to find out the number and nature of


causative influences on which more intensive
investigations can be concentrated

Adeyanju, 2015 37
MANOVA
• MANOVA is a procedure for assessing differences among several
nonmetric dependent variables based on the linear combination
of several metric dependent variables.

• This procedure enables the simultaneous examination of several


dependent variables.

• MANOVA was first used by Golinski et al (2002) to assess the


effect two pathogens (Fusarium avenaceum and F. culmorum)
on three yield components (1000-grain weight, and weight and
number of kernels per winter wheat head) of 14 winter wheat
cultivars in a two year study.

Adeyanju, 2015 38
Correspondence Analysis
• Correspondence analysis describes the relationships among
two or more cross-tabulated categorical variables (contingency
table).

• The frequencies in the contingency table are transformed into


Chi-square distances, which are used to establish a perpetual
map of the relation among variables

• In one of the first studies using this method the genomic


variability of 66 isolates of Xanthomonas arboricola pv.
juglandis from different geographic origins is investigated by
analyzing the proximities among amplified fragment length
polymorphic (AFLP) banding patterns using correspondence
analysis (Loreti et al, 2001).
Adeyanju, 2015 39
Canonical Correlation analysis
• Canonical correlation analysis describes the
association between two sets of variables.

• Canonical correlation analysis was first employed by


Schlosser et al (2000) to characterize the relationship
between plant morphological variables such as plant
height, leaf length, leaf area, and plant growth rates
and rice blast disease variables like lesion densities,
and lesion types in six upland rice cultivars

Adeyanju, 2015 40
Redundancy analysis
• Redundancy analysis, which aims at measuring the
percentage of variation in a set of variables (considered singly)
that is accounted for by the other set of variables (considered
collectively)

• This determination is achieved by regressing each variable


from one set on all variables in the other set

• Redundancy analysis was first used by Folman et al (2003) to


describe the relationship of carbon source utilization profiles
of 20 clusters of rhizobacteria to 9 root tissue types consisting
of 3 root regions (tip, intermediate and base of root) sampled
at three developmental stages (seedling, vegetative and
generative)
Adeyanju, 2015 41
GGE Biplot

42
Mukherjee et. al., 2013
Additive main effects and multiplicative interaction
analysis (AMMI)

43
Mukherjee et. al., 2013
Modeling approcahes
• Generalized linear model (GLM)- it assumes one fixed
factor and multinomial distribution for the variable

• Linear mixed model (LMM)- with at least one fixed


effect factor and one random effect factor excluding
residual

• Generalized linear mixed model(GLMM)- it is an


extension to LMM, which contains more than one
random effect in addition to the usual fixed effects

44
Mukherjee et. al., 2013
THANK YOU

45

You might also like