You are on page 1of 8

Title: Biodiversity is overlooked in the diets of different social groups in Brazil

Journal name and publication year: Scientificreports – 2023


DOI: https://doi.org/10.1038/s41598-023-34543-8
Aria Foroughi 2531465
Summary
The article is about the investigating the consumption of biodiverse foods in Brazil
and the socioeconomic factors that influence their consumption. The study used
data from a Brazilian national dietary survey and applied machine learning and
statistical methods to analyze the data. The study found that the consumption of
biodiverse foods was low, related to 1.3% of the population, and varied according
to area, ethnicity, age, food insecurity, sex, and educational level. The findings
suggest a significant mismatch between the rich biodiversity of Brazil and its
representation in the human diet.

Figure A. Levels of food security in the households of mushroom consumers, wild meat consumers, and
unconventional food plant consumers, compared to a reference group of people who did not report
consuming the biodiverse foods analyzed in this study, with a 95% confidence interval.
Figure B. SHAP values for the predictors of unconventional food plant consumption, considering the
direction (values to the right of the central axis have a positive impact on consumption) and magnitude
(indicated by the colors) of the relationships between the variables.

Q1. In the above figure (figure B), the investigated factors on consumption of
unconventional food plants are mentioned. Living in rural areas, being non-white,
being older, living in a household with food insecurity, being a woman, and having
more years of schooling are the main factors affecting the consumption of
unconventional food plants. Living in the rural areas is the most effective factor.
Moreover, the figure shows the SHAP values for investigating. In the right-hand
side, the positive effects are shown. The colors show the magnitude of the
investigating. In the left-hand side, the lowest relationships are shown in the blue
color.
Q2. Firstly, observational study is study that can identify the correlations without
applying some treatment, and just by observing. Experimental studies involve
implementing an intervention and observing its results by creating or applying
some treatments. Thus, the research is based on an observational study because
all the variables or factors are certain, and the analysis is done on the
relationships between them.
Q3. On the one hand, the explanatory variable, dependent variable, is what you
manipulate, explain, or observe in study or experiment. On the other hand,
response variable, which is the independent variable, is the focus of the study or
experiment. In other words, a response variable is what changes as a result.
Hence, UFP is the response variable. The explanatory variables are Living in rural
areas, being non-white, being older, living in a household with food insecurity,
being a woman, and having more years of schooling.

Q4.
Variable Type
Living in rural Categorical-nominal
Being non-white Categorical-nominal
Age Numerical-discrete
Food security Categorical-ordinal
Woman Categorical-nominal
Years of schooling Numerical-discrete
UFP Categorical-nominal

Q5. Bar chart is used for levels of food security in the households. This graph is
appropriate for showing the differences between the levels. Histogram graphs can
be useful for showing the distribution of numerical variables like age and years of
schooling.

Q6. No there is no error in the graphs. The graphs could be merged into one bar
chart. This bar chart can present the information in two aspects. One aspect is the
categorizing the degrees of UFP. The other one is the categorizing consuming
biodiverse foods.
Q7.

In statistical hypothesis testing, the null hypothesis (H0) and alternative


hypothesis (Ha) are formal statements about the relationship between a
population parameter and a sample statistic.
The null hypothesis states that there is no significant difference or relationship
between the population parameter and the sample statistic. And alternative
hypothesis means there is association, difference, preference, or effect between
the population parameter and the sample statistic.

1.
H0: Area doesn’t have significant effect on UFP.
Ha: Area has a significant effect on UFP.
2.
H0: Ethnicity doesn’t have significant effect on UFP.
Ha: Ethnicity has a significant effect on UFP.
3.
H0: Age doesn’t have significant effect on UFP.
Ha: Age has a significant effect on UFP.
4.
H0: Food insecurity doesn’t have significant effect on UFP.
Ha: Food insecurity has a significant effect on UFP.
5.
H0: Income doesn’t have a significant effect on UFP.
Ha: Income has a significant effect on UFP.
6.
H0: Sex doesn’t have significant effect on UFP.
Ha: Sex has a significant effect on UFP.
7.
H0: Educational level doesn’t have a significant effect on UFP.
Ha: Educational level has a significant effect on UFP.

Q8.

According to data analysis part of this scientific research, the descripted analysis is
performed to describe and illustrate the food groups and socioeconomic and
demographic variables, using relative frequencies, means, and 95% confidence
intervals, which means range of values that you can be 95% confident contains
the true mean of the population. Also, the author accounted for the sample
weights to accurately represent the study population according to the sample
design of the research.
Binary or categorical variables: These variables indicate several groupings or
categories. This category is represented in the table by the variables "Area
(rural)", "Ethnicity (non-white)", "Food insecurity", "Sex", and "Educational Level".
They most usually respond in a binary or categorical manner, indicating the
existence or absence of a certain trait or membership in each group.
These variables show numerical values and are known as continuous variables.
This category includes the table's "Age" and "Income" variables. They most likely
have a continuous scale and are capable of taking any number within a defined
range.
In fact, the given table provides information on various variables and their
corresponding estimations, standard errors, and p-values. By analyzing the table,
we can classify the variables into two types. Firstly, categorical, or binary variables
are represented by "Area (rural)", "Ethnicity (non-white)", "Food insecurity",
"Sex", and "Educational Level". These variables likely have binary or categorical
responses, indicating different categories or group memberships. Secondly, we
have continuous variables such as "Age" and "Income". These variables are
measured on a continuous scale and can take any numerical value within a certain
range.
Unfortunately, we could not use statistical methods that we learn up to chapter
12. However, we learn about Poisson distribution, and the method that
performed in this study uses the Poisson distribution as its underlying probability
distribution to analyze count data and estimate the relationship between the
response variable and predictor variables. While the Poisson distribution focuses
on the probability of observing different counts of events, the Poisson regression
model aims to model the relationship between the response variable and
predictor variables. Moreover, it estimates the parameters of the regression
equation, which links the mean of the Poisson distribution (representing the
response variable) to the predictor variables through a logarithmic link function.
Indeed, the model allows for prediction, inference, and interpretation of the
effects of the predictors on the expected counts of events.
Based on the information provided in the table, it appears that a regression
analysis was conducted to examine the effect of socioeconomic and demographic
variables on unconventional food plant consumption (UFP). Specifically, it is likely
a Poisson regression model was employed, considering the mention of "UFP
intake-serving/day (Poisson part)" and the estimation of coefficients (Estimation)
and standard errors (SE) for each covariate.

In Poisson regression, the dependent variable is assumed to follow a Poisson


distribution, which is suitable for counting data. The formula for Poisson
regression involves estimating the parameters of the Poisson distribution using
maximum likelihood estimation. The specific formula for calculating the p-values
in Poisson regression depends on the test statistic employed, which could be the
Wald statistic or the likelihood ratio statistic.

In other words, the given study employed a regression model—likely a Poisson


regression model—to examine the influence of socioeconomic and demographic
factors on the intake of unconventional food plants (UFP). Poisson regression
works well with count data, and in this instance, the dependent variable is the
daily consumption of UFP in servings. The analysis's goal was to determine how
several factors, including Area, Ethnicity, Age, Food Insecurity, Income, Sex, and
Educational Level, affected people's use of UFP. Information on the scope and
accuracy of these effects may be gleaned through the estimation of coefficients
and standard errors. In Zero-Inflated Regression, on the other hand, is also used
for count data analysis, but it addresses situations where the count data has
excessive zeros. In some datasets, there may be an excess of zeros compared to
what would be expected from a Poisson distribution. Zero-inflated regression
models have two components: a binary component that models the excess zeros
(i.e., the probability of a zero outcome) and a count component that models the
count distribution for the non-zero outcomes. In other words, Zero-inflated
regression is a statistical model used to analyze count data when there is an
excess of zeros compared to what would be expected from a Poisson distribution.
It aims to capture two separate components: the excess zeros (zero-inflation) and
the count distribution for non-zero outcomes. Also, it allows for the identification
of predictors associated with both the probability of observing a zero outcome
and the count distribution for non-zero outcomes.

The formula for the ZIP regression model can be expressed as follows:

Y represents the count variable. π is the probability of excess zeros. λ is the mean
count for non-zero observations.

the Poisson regression model is not a part of the Poisson distribution. Rather, the
Poisson regression model is based on the assumption that the response variable
follows a Poisson distribution. The Poisson distribution is a probability distribution
that describes the probability of a certain number of events occurring in a fixed
interval of time or space, given a known average rate of events. It is a standalone
probability distribution. On the other hand, the Poisson regression model is a
statistical method used to analyze count data by modeling the relationship
between the response variable (which represents the counts) and predictor
variables. The model assumes that the response variable follows a Poisson
distribution, but it extends beyond the distribution itself to incorporate covariates
and estimate regression coefficients. To conclude, In this sense, the Poisson
regression model utilizes the Poisson distribution as a theoretical foundation but
is a distinct statistical modeling technique.

According to research, to confirm the relationship between UFP consumption and


the predicted variables, we utilized the multiple zero-inflated Poisson (ZIP)
regression model. This mixture model predicts the distribution of the result by
mixing two distributions: a logistic regression model for the zero component of
the model and a Poisson regression for the count portion of the model27. It is
used to study skewed distributions with a high fraction of zeros.

Multiple zero-inflated Poisson (ZIP) regression model to verify the association


among UFP consumption and predict variables. Yes, this method is suitable.

Q9. The p-values associated with each covariate indicate the statistical
significance of the estimated effects, suggesting whether there is evidence of a
meaningful relationship between the covariate and UFP intake. A lower p-value
generally indicates a stronger statistical association, indicating a more significant
impact of the corresponding covariate on UFP consumption. The p-values for
“area” and “income” are 0.042 and 0.015, respectively. These values are less than
0.05. Therefore, it can be concluded that these two variables both have a
significant effect on UFP. Thus, just in two cases (area and income) we are be able
to reject the null hypothesis.

You might also like