Applno FST at Tools

Sudershan Mishra, ID-51063
Applications of statistical tools for data

analysis and interpretation in crops
Introduction
• Statistical methods involved in carrying out a study
include planning, designing, collecting data, analyzing,
drawing meaningful interpretation and reporting of the
research findings
• The statistical analysis gives meaning to the meaningless

numbers, thereby breathing life into a lifeless data. The
results and inferences are precise only if proper
statistical tests are used.
• Very handy softwares are available to perform those

tests nowadays
Winters et al., 2010 2
Variable
• Variable is a
characteristic that
varies from one
individual member
of population to
another
Kaur, 2013
3
Variables ………….. Contd.
• Categorical or nominal variables are unordered. The data are merely
classified into categories and cannot be arranged in any particular order. If
only two categories exist (as in gender male and female), it is called as a
dichotomous (or binary) data
• Ordinal variables have a clear ordering between the variables. However,

the ordered data may not have equal intervals
• Interval variables are similar to an ordinal variable, except that the intervals
between the values of the interval variable are equally spaced
• Ratio scales are similar to interval scales, in that equal differences between
scale values have equal quantitative meaning. However, ratio scales also
have a true zero point, which gives them an additional property.
Kaur, 2013 4
Statistics
Descriptive
• About relationship between variables in sample or polupation
Inferential
• About estimation of population parameter from sample

Components of Descriptive Statistics
• Measures of central tendency

• Measures of dispersion
• Skewness
• Kurtosis
• Gaussian distribution
• Skewed distribution

Fig1- Normal/Gaussian distribution curve

Fig 2 – Curves showing negatively and positively skewed distribution

Inferential Statistics
• Data from sample analyzed to get a sample parameter
• Purpose is to test the validity of calculated sample parameter

against population parameter
• In inferential statistics, the term ‘null hypothesis’ (H0

‘H‑naught,’ ‘H‑null’) denotes that there is no relationship
(difference) between the population variables in question
• Alternative hypothesis (H1 and Ha) denotes that a statement

between the variables is expected to be true.

Inferential statistics continued..
• The P value (or the calculated probability) is the probability of
the event occurring by chance if the null hypothesis is true.
The P value is a numerical between 0 and 1 and is interpreted
by researchers indeciding whether to reject or retain the null
hypothesis
Table 1- P values with interpretation

Parametric tests
• Assumptions
1. The parametric tests assume that the data are on a
quantitative (numerical) scale.
2. Data is normally distributed
3. Equal variance among sample and populations
4. Random selection of sample
5. E.g. t-test, ANOVA etc

Non- Parametric tests
• Assumptions
1. The nonparametric tests assume that the data are
on a qualitative (categorical) scale.
2. Data is not normally distributed- distribution free
tests
3. Non‑parametric tests may fail to detect a significant
difference when compared with a parametric test.
That is, they usually have less power.
4. Eg- sign test, Mann-Whitney test

Table 2- Analogous parametric and non-parametric tests

14
15
16
17
18
19
SPSS contains all basic statistical tests and
multivariate analyses such as
• t-tests;
• chi-square tests;
• ANOVA;
• correlations and other association measures;
• regression;
• nonparametric tests;
• factor analysis;
• cluster analysis.
20
R (R Foundation for Statistical Computing)
21
Matlabs
22
SAS
23
Cropstat
24
• CropStat is a computer program for data management and basic statistical analysis of
experimental data. It can be run in any 32-bit Windows operating system. It has been developed
primarily for the analysis of data from agricultural field trials, but many of the features can be
used for analysis of data from other sources.
The main modules and facilities are
• Data management with a spreadsheet

• Text editor
• Descriptive statistics and Scatterplot Graphics
• Balanced analysis of variance
• Unbalanced analysis (generalized linear models)
• Linear Mixed Models
• Combined analysis of variance
• Analysis of repeated measures
• Regression and correlation analysis
• Single-site analysis for variety trials
• Spatial Analysis
• Genotype × environment interaction analysis
• Pattern Analysis
• Quantitative trait loci analysis
• Graphics
25
• Utilities for randomization and layout, and orthogonal polynomial
STAR
26
STAR
27
FieldLab
FieldLab is an application for Android tablet that used for data
collection in the field. IRRI’s researchers and technicians are
using this application to go paperless and thus, promote
digital revolution.
Features
• Import ICIS workbook as a study
• Export observation data collected to an ICIS workbook format
• With validation, range entry and look-up values on data entry
form.
• Integration with wireless bar-code reader (Baracoda brand)
• Manages trait to measured
• Manages images and audio captured
28
Online statistical packages
• IASRI Stat
• ICAR GOA STAT 2.0
29
Multivariate analysis
Ordination
Discrimination
Canonical analysis
Adeyanju, 2015 30
Multivariate Analysis
• Ordination aims at describing data by identifying a
reduced data dimension of a few variables accounting
for the greatest amount of variability in the data.
• Discrimination aims at delineating experimental

groups or classifying observations into experimental
groups based on a set of variables.
• Canonical aims at describing and predicting the

relationship between two sets of variables.
Adeyanju, 2015 31
Ordination Methods
• Principal component analysis (PCA)
• Principal coordinate analysis,
• Correspondence analysis
• Multidimensional scaling
• Factor analysis (FA)
Adeyanju, 2015 32
Discrimination Methods
• Discriminant analysis
• Multiple logistic regression analysis
• Multivariate analysis of variance (MANOVA),
• Cluster analysis (CA)
Adeyanju, 2015 33
Canonical Analysis
• Canonical correlation
• Canonical redundancy
• Canonical correspondence
Adeyanju, 2015 34
PCA
• PCA is a multivariate statistical technique which reduces
the dimension of a p-dimensional array by introducing a
set of linear combinations of the original variables.
• It has been suggested that PCA provides a means to

quantitatively evaluate the relative importance of curve
elements (Madden and Pennypacker, 1979).
• They identified three principal curve components,

indicating the level of effect of a factor curve, the rate of
yield increase, and the variation in shape or skewness
from the mean curve
Adeyanju, 2015 35
Cluster Analysis
• CA is an exploratory data analysis tool which aims at
sorting different objects into groups
• Grouping is done in a way that the degree of

association between two objects is maximal if they
belong to the same group and minimal otherwise
Adeyanju, 2015 36
Factor analysis
• FA, as a branch of multivariate analysis, is useful to
explain the inter-correlations of variables (Maxwell,
1961)
• It helps to find out the number and nature of

causative influences on which more intensive
investigations can be concentrated
Adeyanju, 2015 37
MANOVA
• MANOVA is a procedure for assessing differences among several
nonmetric dependent variables based on the linear combination
of several metric dependent variables.
• This procedure enables the simultaneous examination of several

dependent variables.
• MANOVA was first used by Golinski et al (2002) to assess the

effect two pathogens (Fusarium avenaceum and F. culmorum)
on three yield components (1000-grain weight, and weight and
number of kernels per winter wheat head) of 14 winter wheat
cultivars in a two year study.
Adeyanju, 2015 38
Correspondence Analysis
• Correspondence analysis describes the relationships among
two or more cross-tabulated categorical variables (contingency
table).
• The frequencies in the contingency table are transformed into

Chi-square distances, which are used to establish a perpetual
map of the relation among variables
• In one of the first studies using this method the genomic

variability of 66 isolates of Xanthomonas arboricola pv.
juglandis from different geographic origins is investigated by
analyzing the proximities among amplified fragment length
polymorphic (AFLP) banding patterns using correspondence
analysis (Loreti et al, 2001).
Adeyanju, 2015 39
Canonical Correlation analysis
• Canonical correlation analysis describes the
association between two sets of variables.
• Canonical correlation analysis was first employed by

Schlosser et al (2000) to characterize the relationship
between plant morphological variables such as plant
height, leaf length, leaf area, and plant growth rates
and rice blast disease variables like lesion densities,
and lesion types in six upland rice cultivars
Adeyanju, 2015 40
Redundancy analysis
• Redundancy analysis, which aims at measuring the
percentage of variation in a set of variables (considered singly)
that is accounted for by the other set of variables (considered
collectively)
• This determination is achieved by regressing each variable

from one set on all variables in the other set
• Redundancy analysis was first used by Folman et al (2003) to

describe the relationship of carbon source utilization profiles
of 20 clusters of rhizobacteria to 9 root tissue types consisting
of 3 root regions (tip, intermediate and base of root) sampled
at three developmental stages (seedling, vegetative and
generative)
Adeyanju, 2015 41
GGE Biplot
42
Mukherjee et. al., 2013
Additive main effects and multiplicative interaction
analysis (AMMI)
43
Modeling approcahes
• Generalized linear model (GLM)- it assumes one fixed
factor and multinomial distribution for the variable
• Linear mixed model (LMM)- with at least one fixed

effect factor and one random effect factor excluding
residual
• Generalized linear mixed model(GLMM)- it is an

extension to LMM, which contains more than one
random effect in addition to the usual fixed effects
44
THANK YOU
45

Applno FST at Tools

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applno FST at Tools

Uploaded by

Copyright:

Available Formats

Sudershan Mishra, ID-51063

Applications of statistical tools for data

• The statistical analysis gives meaning to the meaningless

• Very handy softwares are available to perform those

• Ordinal variables have a clear ordering between the variables. However,

• About relationship between variables in sample or polupation

• About estimation of population parameter from sample

Winters et al., 2010 5

• Measures of central tendency

Winters et al., 2010 6

Winters et al., 2010 7

Winters et al., 2010 8

• Purpose is to test the validity of calculated sample parameter

• In inferential statistics, the term ‘null hypothesis’ (H0

• Alternative hypothesis (H1 and Ha) denotes that a statement

Winters et al., 2010 9

Table 1- P values with interpretation

Winters et al., 2010 10

Winters et al., 2010 11

Winters et al., 2010 12

Winters et al., 2010 13

• Data management with a spreadsheet

• Discrimination aims at delineating experimental

• Canonical aims at describing and predicting the

• It has been suggested that PCA provides a means to

• They identified three principal curve components,

• Grouping is done in a way that the degree of

• It helps to find out the number and nature of

• This procedure enables the simultaneous examination of several

• MANOVA was first used by Golinski et al (2002) to assess the

• The frequencies in the contingency table are transformed into

• In one of the first studies using this method the genomic

• Canonical correlation analysis was first employed by

• This determination is achieved by regressing each variable

• Redundancy analysis was first used by Folman et al (2003) to

• Linear mixed model (LMM)- with at least one fixed

• Generalized linear mixed model(GLMM)- it is an

You might also like