A Comparison of Random Forest Regression and Multiple Linear PDF

Journal of Neuroscience Methods 220 (2013) 85–91
Contents lists available at ScienceDirect
Journal of Neuroscience Methods

journal homepage: www.elsevier.com/locate/jneumeth
Basic Neuroscience
A comparison of random forest regression and multiple linear

regression for prediction in neuroscience
Paul F. Smith a,∗ , Siva Ganesh c , Ping Liu b
a
Department of Pharmacology and Toxicology, The Brain Health Research Centre, University of Otago, Dunedin, New Zealand
b
Anatomy, School of Medical Sciences, The Brain Health Research Centre, University of Otago, Dunedin, New Zealand
c
Bioinformatics and Statistics, AgResearch Ltd., Palmerston North, New Zealand
h i g h l i g h t s
• Multiple linear regression is often used for prediction in neuroscience.

• Random forest regression is an alternative form of regression.
• It does not make the assumptions of linear regression.
• We show that linear regression can be superior to random forest regression.
a r t i c l e i n f o a b s t r a c t
Article history: Background: Regression is a common statistical tool for prediction in neuroscience. However, linear
Received 22 May 2013 regression is by far the most common form of regression used, with regression trees receiving com-
Received in revised form 13 August 2013 paratively little attention.
Accepted 28 August 2013
New method: In this study, the results of conventional multiple linear regression (MLR) were compared
with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals
Keywords:
in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical path-
Regression
way (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and
Linear regression
Regression trees
␥-aminobutyric acid (GABA)).
Random forest regression Results: The R2 values for the MLRs were higher than the proportion of variance explained values for the
l-Arginine metabolism RFRs: 6/9 of them were ≥0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R2 values
Vestibular nucleus for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained
Cerebellum values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for
the RFRs in all but two cases.
Comparison with existing methods: In general, MLRs seemed to be superior to the RFRs in terms of predictive
value and error.
Conclusion: In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value
and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience
with this kind of data set, but that RFR can still have good predictive value in some cases.
© 2013 Elsevier B.V. All rights reserved.
1. Introduction the data are normally distributed, with homogeneity of variance,

and that they are independent of one another (e.g. not autocorre-
Linear regression is a part of the general linear model (GLM) that lated) (Vittinghoff et al., 2005). Furthermore, the predictor variables
is often used to predict one variable from another in neuroscience. should be numerical, although indicator variables can be used in
Simple linear regression can be expanded to include more than one order to include nominal variables (e.g., binary coding to represent
predictor variable to become multiple linear regression. However, male and female). The violation of the assumption of normality can
formal statistical tests of multiple linear regression, like simple lin- sometimes be redressed using data transformation, which may also
ear regression, make assumptions regarding the distribution of the correct heterogeneity of variance, but other issues such as autocor-
data, which cannot always be fulfilled. These assumptions are that relation are not easily dealt with and may require methods such as
time series regression (Ryan, 2009).
Although modelling using regression trees has been used for
∗ Corresponding author. Tel.: +64 3 479 5747. over 25 years, its use in the neurosciences has been very limited. In
E-mail address: paul.smith@stonebow.otago.ac.nz (P.F. Smith). regression tree modelling, a flow-like series of questions is asked
0165-0270/$ – see front matter © 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.jneumeth.2013.08.024
86 P.F. Smith et al. / Journal of Neuroscience Methods 220 (2013) 85–91
about each variable (‘recursive partitioning’), subdividing a sample

into groups that are as homogeneous as possible by minimising the
within-group variance, in order to determine a numerical response
variable (Vittinghoff et al., 2005). The predictor variables can be
numerical also, or they can be ordinal or nominal. By contrast with
linear regression, no assumptions are made about the distribution
of the data. The data are usually split into training and test data sets
(e.g., 90:10) and the mean square error (MSE) between the model
based on the training data and the test data is calculated as a mea-
sure of the model’s success. Variables are chosen to split the data
based on the reduction in the MSE achieved after a split (i.e., the
information gained). Unlike linear regression, interactions between
different predictor variables are automatically incorporated into
the regression tree model and variable selection is unnecessary
because irrelevant predictors are excluded from the model. This
makes complex, non-linear interactions between variables eas-
ier to accommodate than in linear regression modelling (Hastie
et al., 2009). Breiman et al. (1984) extended the concept of regres-
sion trees by exploiting the power of computers to simultaneously
generate hundreds of regression trees, known as ‘random forests’,
which were based on a random selection of a subset of data from
the training set. The various regression tree solutions are aver-
Fig. 1. The arginine metabolic pathway showing the conversion of l-arginine to the
aged in order to predict the target variable with the smallest MSE
neurotransmitter, nitric oxide (NO), and l-citrulline, by the enzyme, nitric oxide
(Marsland, 2009). synthase (NOS), of which there are 3 isoforms; the conversion of l-arginine to
The aim of this study was to compare the results of a con- agmatine by the enzyme, arginine decarboxylase (ADC), which is then converted
ventional multiple linear regression with those of random forest to polyamines such as putrescine, spermidine and spermine by agmatinase and
regression, using data on the expression of neurochemicals related ornithine decarboxylase (ODC); and the conversion of l-arginine to l-ornithine by
arginase, which is then converted to the same polyamines, which are essential for
to the l-arginine metabolic pathway in the rat hindbrain as an
cell proliferation, differentiation and communication, including neuronal synaptic
example. Two areas of the hindbrain concerned with the control plasticity in the brain. The major excitatory neurotransmitter, glutamate, is one of
of movement were investigated: the brainstem vestibular nucleus the end products of l-arginine, and glutamate serves as a precursor for the synthesis
complex (VNC) and the cerebellum (CE), in young (4 month old) of the major inhibitory neurotransmitter, GABA. Therefore, all of these neurochem-
and aged (24 month old) rats (Liu et al., 2010). Chemical analy- icals are interconnected.
ses were performed to determine the concentrations of 9 related

neurochemicals that form a biochemical pathway that is critical
for neuronal function (see Fig. 1): agmatine, putrescine, spermi-
dine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and relationship between the VNC and CE (Liu et al., 2010). This meant
␥-aminobutyric acid (GABA). Although Fig. 1 presents certain causal that for the aged group with standard housing, n was = 13, aged
connections between some of these neurochemical variables, the with enriched housing, n was = 16; young with standard hous-
mechanisms through which they interact with one another are not ing, n was = 14, and for young with enriched housing, n was = 15.
completely understood and additional pathways, particularly feed- These smaller sample sizes were less important because age and
back pathways, are possible (Mori and Gotoh, 2004). It is therefore enrichment were categorical variables that were never the tar-
of interest to determine whether the concentrations of one part of get variables, but they were included in the regression analyses as
this complex neurochemical pathway can be predicted from the predictor variables. A previous study using the same data set ana-
other parts. lysed the data using multivariate analyses of variance (MANOVAs),
linear discriminant and cluster analyses (Liu et al., 2010), but
the main interest in the latter case was the prediction of the
2. Methods age of the brain tissue based on the other variables rather than
predicting neurochemical concentrations using regression analy-
2.1. Data set and variables ses. Determination of the concentrations of agmatine, putrescine,
spermidine, spermine, l-arginine, l-ornithine, l-citrulline, gluta-
The data set was obtained from Liu et al. (2010). Male Sprague- mate and ␥-aminobutyric acid (GABA) was carried out using high
Dawley rats (aged: 24 months old, n = 14; young: 4 months old, performance liquid chromatography (HPLC) or a highly sensitive
n = 14) were housed 3–5 per cage and maintained on a 12 h light- liquid chromatography/mass spectrometric (LC/MS/MS) method
dark cycle and provided with ad lib. access to food and water. and expressed as ␮g/g of wet tissue weight (see Liu et al., 2008a,
All experimental procedures were carried out in accordance with 2010 for details).
the regulations of the University of Otago Committee on Ethics The experimental design thus consisted of 2 main indepen-
in the Care and Use of Laboratory Animals. Animals were housed dent variables: age with 2 levels, 4 months old and 24 months
either in a standard rat cage or an enriched environment including old; and housing, with 2 levels, standard and enriched. There were
toys and other novel objects, since enriched environments have 9 potential dependent variables corresponding to the concentra-
been shown to reduce age-related memory impairment (Olson tions of agmatine, putrescine, spermidine, spermine, l-arginine,
et al., 2006). Therefore, the sample sizes for the aged and young l-ornithine, l-citrulline, glutamate and GABA. However, in any one
groups were divided according to the housing conditions. In order regression analysis, only one of these continuous neurochemical
to achieve as large a sample size as possible, data from the VNC variables was the target or y variable and the other 8 were included
and CE were combined in the regression analyses, so that for each as predictor variables. Consequently, each analysis involved 10 pre-
of the 9 neurochemical variables the total n was 58. This was con- dictor variables, i.e. 8 continuous variables and 2 categorical ones,
sidered to be a reasonable solution given the close physiological and one dependent continuous neurochemical variable.
P.F. Smith et al. / Journal of Neuroscience Methods 220 (2013) 85–91 87
2.2. Statistical methods results associated with the latter approach are presented here. The
function randomForest of the R software was utilised when fitting
2.2.1. Preliminary data inspection RFRs.
The variance of the glutamate concentrations was substantially The nature of RFR automatically provides tools for assessing its
different from that of the other neurochemicals and this raised a performance. Much of this information comes from using the “out-
concern about whether the assumptions for normal parametric sta- of-bag” (OOB) cases in the training set that have been left out of the
tistical analysis would be violated (Liu et al., 2010). Although some bootstrapped training set. The magnitude of the residual error and
data mining methods such as random forest and neural network of the (pseudo) R2 can be computed for the OOB cases. RFR also pro-
regression do not require that assumptions such as normality and vides a ‘proportion of variance explained’ for the overall model and
homogeneity of variance be met, this is not true of multiple lin- a ‘variable importance’ score for each of the predictor variables. This
ear regression (Marcoulides and Hershberger, 1997; Manly, 2005). can be regarded as comparable to the variable selection associated
There were more measurements than dependent variables in all with stepwise MLR.
cases, i.e., 58 samples versus 9 variables (Tabachnick and Fidell, It is clear that the ‘internal’ assessments of the two models, MLR
2007). Inspection of normal probability plots (Q–Q plots) and resid- and RFR, are incompatible in order to choose the better of the two
uals versus fitted value plots for the different variables suggested models. One solution is to compute MSE and R2 via the predicted
that the assumptions of at least univariate normality and homo- values of the response of each observation in the training data set
geneity of variance were likely to be upheld with this sample size using the fitted model. Here, MSE and R2 are defined as:
(see Fig. 2). Transformations were investigated for the multiple lin-
58
1
58 2
ear regression but none that was attempted (natural log, square 2 (y − ŷi )
i=1 i
root etc.) resolved the remaining problems. However, multiple lin- MSE = (yi − ŷi ) and R2 = 1 − 58
58 (y − ȳ)2
ear regression is believed to be reasonably robust against violation i=1 i=1 i
of its assumptions provided that the sample sizes for the different
where, yi , ŷi and ȳ are respectively, the observed and predicted
dependent variables are reasonably large and nearly equal, which
responses of the ith observation and the mean of all responses.
they were in this case (Marcoulides and Hershberger, 1997; Manly,
This approach may be regarded as ‘over-optimistic’ because
2005). The total sample size for the 9 neurochemical variables was
MSE and R2 are obtained via ‘re-substitution’, where the regression
n = 58 in most cases; therefore the central limit theorem should
model is built using all 58 observations and then each observa-
have provided some protection against violation of the assumption
tion is predicted using the fitted model. An alternative solution is
of multivariate normality (Marcoulides and Hershberger, 1997;
to use a common test data set to assess the fitted models. This
Tabachnick and Fidell, 2007). Marcoulides and Hershberger (1997)
can be achieved by dividing the given data into training and test
have argued that if the assumption of multivariate normality is met,
sets using, for example, a 90:10 split. Alternatively, a leave-one-
then the assumption of homoskedasticity is likely to be met also.
out cross-validation (LOO-CV) approach may be utilised. Here, each
observation is removed from the training data, a model built based
2.2.2. Multiple linear and random forest regression
on n − 1 (or 57 in our case) observations and then the removed
All analyses were conducted using the computer package R
observation predicted using the fitted model. This approach, while
(2012). The data were split 90:10 into training and test data
producing unbiased estimates, is prone to high variation, especially
sets. Based on the considerations described above, multiple lin-
when the sample size is small.
ear regressions (MLRs) were performed on the training data set,
It was therefore decided to use the R2 and residual standard error
using one neurochemical variable at a time as the response variable,
(RSE) criteria for MLR, and the proportion of variance explained
and the other 8 as predictor variables, in addition to the categori-
and RSE criteria for RFR, based on a common 90:10 training:test
cal predictor variables, age and housing. In all cases, the response
data split, in order to evaluate the success of the regression anal-
neurochemical variable was a continuous variable, expressed as a
ysis via the two modelling processes. This also makes it easier to
concentration. The other 8 predictor neurochemical variables were
compare the chosen important subsets of predictor variables by the
also continuous variables, but age and housing were nominal and
two modelling processes.
these were converted to binary indicator variables, where for age,
young was = 0 and aged was = 1, and for housing, standard was = 0
and enriched was = 1. 3. Results
The success of the MLRs can be assessed by evaluating the mag-
nitude of the adjusted R2 , the residual standard error (RSE) for the 3.1. Multiple linear regression
regression, the t test results for the individual predictor variables
and the analysis of variance (ANOVA) for the regression. The validity Table 1 shows the results of the MLRs. The R2 values ranged from
of the regression can be investigated by inspecting the diagnostic 0.50 to 0.95. Although all of these regressions were statistically sig-
plots for the residual versus fitted values, the normal Q–Q plots, the nificant according to ANOVAs (data not shown), those with high
scale-location plots and the residuals versus leverage plots, includ- R2 values, e.g. ≥0.7, were GABA, spermidine, spermine, l-arginine,
ing Cook’s distance (see Fig. 2). For formal significance tests, the ␣ agmatine and l-citrulline. The highest R2 (0.95) was for the predic-
rate (type I error rate) is usually set at 0.05 for all comparisons. The tion of l-citrulline from l-arginine, GABA and l-ornithine. However,
R software function lm was utilised when fitting MLRs. this regression did not have the lowest RSE (12.37 compared to 0.49
RFR modelling requires choosing m, the number of variables (a for putrescine; see Table 1).
subset of available p predictor variables) used to determine the Inspection of the diagnostic plots suggested that the data for
decision at a node of the tree. Since there were 10 predictor vari- most variables was fairly normally distributed, i.e. the data were
ables for any target neurochemical variable, it was decided to set closely distributed along the straight line in the Q–Q plot (see Fig. 2
m as the integer part of the square root of p, i.e. m = 3. The number for an example for l-citrulline). Furthermore, the residuals versus
of trees to be fitted was set at 1000. The optimum value for m was the fitted values plots suggested that the residuals were approxi-
also determined using the tuneRF function of the R software, as an mately randomly distributed (Fig. 2). Likewise, the scale-location
alternative to setting m = 3. The majority of the models resulted in and residuals versus leverage plots did not indicate any serious
tuneRF choosing m = 3 and in general, the overall results were very violation of the assumptions of MLR (Fig. 2). Therefore, it was con-
similar to the modelling under the choice of m = 3. Hence, only the cluded that the regression analyses were valid.
Fig. 2. Diagnostic plots for l-citrulline following multiple linear regression showing residuals versus fitted values, normal Q–Q, scale location and residuals versus leverage
plots.
3.2. Random forest regression 3.3. Comparison of multiple linear regression and random forest
regression
Table 2 shows the results of the RFRs. The proportion of vari-
ance explained values ranged from 0.27 for l-ornithine to 0.94 for In order to compare the results of the two different kinds of
spermine. The RSEs ranged from 0.54 for agmatine to 293.71 for regression, the R2 values for the MLRs were compared to the pro-
glutamate. Fig. 3 shows the order of variable importance for the portion of variance explained values for the RFRs. Tables 1 and 2
RFR for spermine and Fig. 4 the decrease in error as a function of show these and the RSE values for the 2 kinds of regressions for
the number of trees. Fig. 5 shows the predicted versus the observed the 9 neurochemical variables, with the predictors listed in order
values for the test data, based on only 6 observations (i.e. 10% of 58). of importance. For the RFRs, these variables were the ones that had
The pseudo R2 was 0.98, although the sample size is very small. the largest effect on the MSE (see Fig. 3) and for the MLRs, they
Table 1
Multiple linear regression.
GABA Put Spd Spm Arg Glut Agm Orn Cit
R2 0.78 0.68 0.85 0.93 0.92 0.61 0.76 0.50 0.95

RSE 32.03 0.49 14.24 6.38 16.97 258.5 0.38 15.35 12.37
Significant predictor variables glut*** ag*** spm*** spd*** cit*** GABA*** put*** age*** arg***
cit** age* age*** glut** spm** age* cit* GABA**
orn*
Results of the multiple linear regression analyses (MLRs) showing the R2 values, the RSEs, and the significant input variables.
***
P ≤ 0.0001.
**
P ≤ 0.001.
*
P ≤ 0.05.
Fig. 3. Variables in order of importance for the RFR for spermine, which had the highest proportion of variance explained (0.94).
were the statistically significant variables, listed in order from the It was apparent that the R2 values for the MLRs were higher
smallest to the largest P value. In order to facilitate comparison, the than the proportion of variance explained values for the RFRs: 6/9
same number of variables is shown for the RFRs as for the MLRs, of them were ≥0.70 compared to 4/9 for the RFRs. Even the variables
i.e. if only 2 variables were significant for the MLR, then only the 2 that had the lowest R2 values for the MLRs, e.g. ornithine (0.50) and
most important variables for the RFR are shown. glutamate (0.61), had much lower proportion of variance explained
Fig. 4. Decrease in error as a function of the number of trees for the RFR for spermine,
which had the highest proportion of variance explained (0.94). Fig. 5. Predicted versus observed values for the spermine test data.
Table 2
Random forest regression.
GABA Put Spd Spm Arg Glut Agm Orn Cit
Prop. Var. 0.66 0.43 0.72 0.94 0.92 0.49 0.52 0.27 0.90
RSE 39.40 0.64 19.01 5.88 16.59 293.71 0.54 18.40 16.32
Most important predictor variables cit agm spm arg cit spm arg age arg
glut spm arg cit GABA cit put spm
GABA
Results of the random forest regression models (RFRs) showing the proportion of variance explained values, the RSEs, and the input variables chosen by the stepwise process.
values for the RFRs (0.27 and 0.49, respectively). The RSE values for spermidine, spermine, l-arginine, l-ornithine, l-citrulline,
the MLRs were lower than those for the RFRs in all but two cases. glutamate and GABA) and then relate individual changes in
The most important variables in the prediction of the target them to ageing. Such studies have demonstrated that age-related
variables differed in some cases between the two types of regres- neurological impairment is associated with, and probably caused
sion. For l-citrulline: l-arginine and GABA were common to the two by, changes in these neurochemical variables (e.g., Liu et al.,
regression analyses (2/3). For putrescine: only agmatine was com- 2003a,b, 2004a,b, 2005, 2008a,b); however, some of these studies
mon (1/2). For spermidine: only spermine was common (1/2). For have used multiple univariate analyses and may have been under-
spermine: there was no common variable (0/2). For l-arginine: only mined by an escalating type I error rate (Quinn and Keough, 2006).
l-citrulline was common to the two regressions (1/1). For gluta- More recent studies have used multivariate statistical analyses
mate: spermine and GABA were common (2/2). For agmatine: there involving conventional approaches such as MANOVA and linear
was no common variable (0/2). For l-ornithine: age was common discriminant analyses (Liu et al., 2010). However, these statistical
(1/2). Finally, for l-citrulline: l-arginine and GABA were common methods, which are part of the GLM, require that certain assump-
(2/3). The fact that the R2 values for the MLRs were higher than the tions be met. Therefore, the aim of this study was to compare the
proportion of variance explained values for the RFRs in 7/9 cases, results of a conventional multivariate approach using MLR, with
and the RSEs were lower in 7/9 cases, suggested that the predictive RFR, which does not involve the assumptions of the GLM.
value of the MLRs was greater than for the RFRs in the case of this MLR was found to be generally effective in predicting any one
data set. of the 9 neurochemicals from the other 8. However, for the RFRs,
the proportion of variance explained values were lower than the R2
4. Conclusions values for the MLRs in 7/9 cases, although the largest differences
were for l-ornithine, agmatine and putrescine. The proportion of
Experimental phenomena in biology in general, and in neu- variance explained values were still >0.6 for 5/9 RFRs. However, in
roscience in particular, usually involve the complex, non-linear terms of the R2 and proportion of variance explained values, as well
interaction of multiple variables, and yet historically, statistical as in terms of the RSE values, the MLRs seemed to be superior to
analysis has focussed on comparison between treatment groups, the RFRs for this set of data.
of one variable at a time. This approach not only tends to inflate the In a previous study (Liu et al., 2010), linear discriminant func-
type 1 error rate as a result of large numbers of statistical analy- tions were highly successful (100% accuracy for the VNC, based on
ses, but neglects the fact that changes may occur at the level of the 6/9 variables, Wilks’ significant at P = 0.000; 90% for the CE, based
interaction within a system of variables that cannot be detected on only 2/9 variables, Wilks’ significant at P = 0.000) in predicting
in individual variables (Liu et al., 2010; Smith et al., 2013). Con- the age of the animals on the basis of a subset of the 9 neurochem-
sequently, in areas such as the analysis of gene microarray data, ical variables. Likewise, a MANOVA showed that age was a very
protein interaction and medical diagnostics, multivariate statisti- important factor in determining the concentrations of these vari-
cal analyses and data mining approaches are now being employed ables. Consistent with this result, the MLR analyses showed that
in an attempt to understand complex interactions between systems age was an important predictor for 4/9 neurochemical variables
of variables (e.g., Pang et al., 2006; Krafczyk et al., 2006; Ryan et al., (putrescine, spermidine, agmatine and l-ornithine). By compari-
2011; Brandt et al., 2012; Smith et al., 2013). son, the RFR analyses showed that age was an important predictor
The process of ageing is associated with major neuro- for only 1/9 neurochemical variables (l-ornithine)
physiological and neurochemical changes, some of which result In summary, in the case of this data set, MLR appeared to be
in neurological deficits such as memory loss and impaired superior to RFR in terms of its explanatory value and error. This
motor control. A biochemical pathway responsible for l-arginine result suggests that MLR may have advantages over RFR for predic-
metabolism is critically involved in the production of several tion in neuroscience with this kind of data set, but that RFR can still
neurochemicals that are necessary for communication between have good predictive power in some cases.
neurons (e.g. the neurotransmitters glutamate, nitric oxide, and
GABA, which is synthesised from glutamate) and is involved
in their maintenance or degeneration (e.g., l-ornithine and the References
polyamines, spermine, spermidine and putrescine) (e.g., Liu et al.,
Brandt T, Strupp M, Novozhilov S, Krafczyk S. Artificial neural network posturo-
2003a,b, 2004a,b, 2005, 2008a,b). The neurochemicals that make
graphy detects the transition of vestibular neuritis to phobic postural vertigo. J
up this system interact in a complex, non-linear way that may Neurol 2012;259:182–4.
have multiple positive and negative feedback loops. Although Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. 1st
Ed. Boca Raton: CRC Press; 1984.
Fig. 1 summarises what is currently known of this system, there
Hastie T, Tibshirani R, Friedman J. Elements of statistical learning: data mining,
are almost certainly other interactions that occur. Therefore, it inference and prediction. 2nd Ed. Heidelberg: Springer Verlag; 2009.
is not possible to provide a simple linear causal model of how Krafczyk S, Tietze S, Swoboda W, Valkovic P, Brandt T. Artificial neural network: a
one neurochemical affects another. The traditional approach new diagnostic posturographic tool for disorders of stance. Clin Neurophysiol
2006;117:1692–8.
has been to measure the concentrations of many of the vari- Liu P, Smith PF, Appleton I, Darlington CL, Bilkey D. Nitric oxide synthase and
ables in the biochemical system (e.g., agmatine, putrescine, arginase expression and activity in the rat hippocampus and the entorhinal,
perirhinal, postrhinal and temporal cortices: regional variations and effects of Marcoulides GA, Hershberger SL. Multivariate statistical methods. A first course.
aging. Hippocampus 2003a;13:859–67. Mahwah, New Jersey: Lawrence Erlbaum Assoc; 1997.
Liu P, Smith PF, Appleton I, Darlington CL, Bilkey D. Regional variations and age- Marsland S. Machine learning. An algorithmic perspective. Boca Raton: CRC Press;
related changes in nitric oxide synthase and arginase in the subregions of the 2009.
hippocampus. Neuroscience 2003b;119:679–87. Mori M, Gotoh T. Arginine metabolic enzymes, nitric oxide and infection. J Nutr
Liu P, Smith PF, Appleton I, Darlington CL, Bilkey DK. Potential involvement of 2004;134, 2820S-2028S.
nitric oxide synthase and arginase in age-related behavioural impairments. Exp Olson AK, Eadie BD, Ernst C, Christie BR. Environmental enrichment and volun-
Gerontol 2004a;39:1207–22. tary exercise massively increase neurogenesis in the adult hippocampus via
Liu P, Smith PF, Appleton I, Darlington CL, Bilkey DK. Age-related changes in dissociable pathways. Hippocampus 2006;16:250–60.
nitric oxide synthase and arginase in prefrontal cortex. Neurobiol Aging Pang H, Lin A, Holford M, Enerson BE, Lu B, Lawton MP, et al. Pathway analysis using
2004b;25:547–52. random forests classification and regression. Bioinformatics 2006;22:2028–36.
Liu P, Smith PF, Appleton I, Darlington CL, Bilkey DK. Hippocampal NOS and Quinn GP, Keough HJ. Experimental design and data analysis for biologists.
arginase and age-associated behavioural deficits. Hippocampus 2005;15: Cambridge: Cambridge University Press; 2006.
642–55. Ryan TP. Modern regression methods. New Jersey: Wiley; 2009.
Liu P, Chary S, Devaraj R, Jing Y, Darlington CL, Smith PF, et al. Effects of Ryan M, Mason-Parker SE, Tate WP, Abraham WC, Williams JM. Rapidly induced gene
aging on agmatine levels in memory-associated brain structures. Hippocampus networks following induction of long term potentiation at perforant synapses
2008a;18:853–6. in vivo. Hippocampus 2011;21:541–53.
Liu P, Smith PF, Darlington CL. Glutamate receptor subunit expression in memory- Smith PF, Haslett SJ, Zheng Y. A multivariate statistical and data mining analysis of
associated brain structures: regional variations and effects of aging. Synapse spatial memory-related behavior following bilateral vestibular deafferentation
2008b;62:834–41. in the rat. Behav Brain Res 2013;246:15–23.
Liu P, Zhang H, Devaraj R, Ganesalingam G, Smith PF. A multivariate analysis of Tabachnick BG, Fidell LS. Using multivariate statistics. 5th Ed. Boston: Pearson Edu-
the effects of aging on glutamate, GABA and arginine metabolites in the rat cation Inc.; 2007.
vestibular nucleus. Hear Res 2010;269:122–33. Vittinghoff E, Glidden DV, Shiboski SC, McCulloch CE. Regression methods in
Manly BFJ. Multivariate statistical analysis. A primer. 3rd Ed. London: Chapman and statistics: linear, logistic, survival and repeated measures models. New York:
Hall/CRC; 2005. Springer; 2005.

A Comparison of Random Forest Regression and Multiple Linear PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Comparison of Random Forest Regression and Multiple Linear PDF

Uploaded by

Copyright:

Available Formats

Journal of Neuroscience Methods 220 (2013) 85–91

Contents lists available at ScienceDirect

Journal of Neuroscience Methods

A comparison of random forest regression and multiple linear

• Multiple linear regression is often used for prediction in neuroscience.

1. Introduction the data are normally distributed, with homogeneity of variance,

about each variable (‘recursive partitioning’), subdividing a sample

ses were performed to determine the concentrations of 9 related

GABA Put Spd Spm Arg Glut Agm Orn Cit

R2 0.78 0.68 0.85 0.93 0.92 0.61 0.76 0.50 0.95

GABA Put Spd Spm Arg Glut Agm Orn Cit

You might also like