You are on page 1of 29

1

Predicting school outcomes using Mixed Effects Logistic


Regression
Ciske Schreuder
c.c.schreuder@tilburguniversity.edu
STUDENT NUMBER: 282287

PROPOSAL SUBMITTED IN PARTIAL FULFILLMENT


OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE IN DATA SCIENCE & SOCIETY
DEPARTMENT OF COGNITIVE SCIENCE & ARTIFICIAL INTELLIGENCE
SCHOOL OF HUMANITIES AND DIGITAL SCIENCES

TILBURG UNIVERSITY

Thesis committee:
Dr. Bruno Nicenboim
Dr. Silvy Colin

Thesis Internship at:


Dienst Uitvoering Onderwijs (DUO)

External supervisor at DUO:


Erik Fleur

Tilburg University
School of Humanities and Digital Sciences
Department of Cognitive Science & Artificial Intelligence
Tilburg, The Netherlands
January 2021

Word count: 8157


2

Abstract
In the final year of Dutch primary education, pupils receive an advice score by the teacher and
are examined through a final test. Rather than the test score, the advice score determines the level
of secondary education. A lower advice score than the test score, in this study referred to as a
negative discrepancy, is reported to negatively affect performance in secondary education.
Furthermore, research indicates that education level of the parents, cultural background, gender
and single parenthood seem to be related to receiving a negative discrepancy. However, the relation
between the newly introduced school weights and a negative discrepancy has yet to be assessed.
Also, previous studies show conflicting results concerning the relation between urbanization and
negative discrepancy. Finally, previous research has mostly been of an inferential nature. Using a
mixed effects logistic regression approach, the current study compares the predictive performance
of school weights on negative discrepancy with the predictive performance of education level of
parents on negative discrepancy, while also including cultural background, gender, education level of
the parents and single parenthood as predictors. The predictive performance of an interaction by
urbanization on the relation between school weights and negative discrepancy is also assessed. Data
was collected from the Dienst Uitvoering Onderwijs (DUO) database and consists of 179,770 pupils in
primary education during school year 2018/2019. Results show that, rather than the included
predictors and interactions between predictors, predictive performance is mostly attributable to
differences between schools. Implications of these results are discussed.
3

Data source
Work on this thesis did involve collecting data from human participants. As the data was obtained
from the database of Dienst Uitvoering Onderwijs (DUO), DUO the original owner of the data and
code used in this thesis, retains ownership of the data and code during and after the completion of
this thesis. The author of this thesis acknowledges that they do not have any legal claim to this data
or code. The data used in this thesis will not be publicly available. The author of this thesis has
evaluated his/her project according to the “Ethics checklist Student research with human
participants.”
4

Introduction
In the final year of Dutch primary education, at the age of 12, pupils complete a final test and
receive an advice score from their teacher. The advice score supersedes the test score when
determining the level of secondary education a pupil can have access too. Receiving a lower advice
score than the test score, in this study referred to as a negative discrepancy, seem to have a negative
effect on children’s further path in education (Smeets et al., 2007). Furthermore, contrary to the
meritocratic ideal in education, various studies show a relation between a negative discrepancy and
cultural background, gender, single-parent families and urbanization (see background section). The
goal of the current study is to assess the relation between various pupils characteristics, school
characteristics, and a discrepancy between the test and advice score that pupils receive in the final
year of primary school in the Netherlands.
The scientific relevance of this study is threefold. First, an algorithm is used to predict a negative
discrepancy between advice and test score. Although there is a wide body of research using machine
learning algorithms for predicting school outcomes (Ara et al, 2015; Sorensen, 2019; Fedushko &
Ustyianovych, 2019), research towards Dutch education, and more specifically, the discrepancy
between test and advice score, has mostly been of an inferential nature. In the current research, a
comparison will be made between the predictive performance of the education level of the parents
and school weights, with negative discrepancy between test and advice score as the dependent
variable. Gender, cultural background, urbanization and single parenthood are also included as
predictors. Second, this research includes the newly introduced school weights as a predictor. Since
recently, school weights (calculated by the Central Bureau of Statistics) are used to evaluate primary
school performance. However, their relation with negative discrepancy has not been assessed yet.
Third, this research also takes into account urbanization. Previous research has not yet determined if
urbanization rate contributes to a negative discrepancy. Some studies seem to suggest so (Driessen
et al., 2007), but other studies seem to suggest otherwise (Kansenkaart EUR, 2019). In the current
research, urbanization will therefore also be taken into account as a possible interaction effect on
the relation between school weights and negative discrepancy.
The general research question follows from the background, and more specifically from the gaps
in current research and investigative reports. The general research question of the current study is
the following: What is the relation between various factors related to the parents' background and
schools' characteristics on the discrepancy between test and advice score Dutch pupils receive in the
last year of primary school. In order to further operationalize the general research question, and to
be able to answer it by the outputs of the modelling strategy, the following three sub-questions are
defined:
5

(1) What is the predictive performance of education level of parents, gender, cultural
background, urbanization and single-parent families on negative discrepancy between test and
advice score.
(2) Do school weights, gender, cultural background, urbanization and single-parent families
predict the negative discrepancy between test score and advice score better than the previous model
(including education level of the parents).
(3) Do school weights, gender, cultural background, urbanization, single-parent families and the
interaction between urbanization and school weights, predict the negative discrepancy between test
and advice score better than the previous model.
For each sub-question, a model will be evaluated based on predictive performance, which will be
compared with the predictive performance of the previous model. However, the performance of the
first model will be compared to the base model. The base model only contains a fixed intercept and
a term for random intercept variance by the schools. Note that in the second sub-question education
level of the parents is changed for school weights. The motives for this are twofold. First, school
weights already include education level of the parents. Second, this allows us to see if school weights
matter in terms of predictive performance, rather than education level of the parents (as school
weights, rather than education level of the parents, are now used for evaluating primary school
performance).
Results show that the predictive performance of the education level of parents, gender, cultural
background, single-parent families and urbanization on negative discrepancy, is not better than its
baseline. Changing the education level of parents for school weights neither improves the predictive
performance, nor does adding an interaction of urbanization between school weights and
discrepancy. Interestingly, an increase in school weights seems to increase the likelihood of a
negative discrepancy. Regarding urbanization, pupils in municipalities with over 50,000 inhabitants
seem to be less likely to have a negative discrepancy.
Background
In the Dutch education system, the meritocratic ideal is a dominant idea when it concerns
discussions on inequality of opportunities (SER, 2019). The meritocratic ideal refers to the idea that
differences in outcome should be attributable to individual actions and efforts, and not to a lack of
opportunities (Nissen et al., 2019). However, various studies point out that inequality of
opportunities still is attributable to various factors other than individual action, such as gender,
cultural background and socio-economic status (SER, 2019). Education in this light is often seen as
the great equalizer (SER, 2019). Therefore, creating equality of opportunities is one of the pillars of
the Dutch education system (Inspectie van het Onderwijs, 2018). However, various reports and
6

studies show that in the Dutch education system, inequality of opportunities exists on multiple levels
(SER, 2019). To illustrate this, in the period 2016 – 2018 there was a growing inequality of
opportunities in Dutch education (Inspectie van het Onderwijs, 2018). Even though this inequality of
opportunities has stabilized in 2018, the Inspectorate still reports inequality of opportunities
(Inspectie van het Onderwijs, 2019).
In the Netherlands, the education system is hierarchically structured as of secondary education.
At the age of 12, in the final year of primary education, pupils complete a final test and receive an
advice score from the teacher. The combination of the advice score and the test score determines
the level of secondary education pupils can have access to. There are six levels of secondary
education: VMBO-PRO, VMBO-BB/KB, VMBO-KB/GT, VMBO-HAVO, HAVO - VWO and VWO. VMBO
levels are a four-year track, which prepare pupils for vocational training. Next, HAVO secondary
education is a five-year track which prepares for higher professional education (HBO) and VWO is a
six-year track which prepares pupils for the University. Since 2014, the advice score, rather than the
test score, determines the level of secondary education. One area where inequalities of
opportunities occurs is the discrepancy between test and advice score (SER, 2019).
The education level of parents seems to be associated with the discrepancy between test and
advice score. Several reports by the Dutch Education Inspectorate show that children of parents with
lower education levels receive lower advice scores and have a higher probability of receiving an
advice score below their test score (Inspectie van het Onderwijs, 2019). Furthermore, pupils with
lower educated parents have their scores updated less frequently (Inspectie van het Onderwijs,
2018). Also, pupils from a poorer background have lower chances of receiving a higher advice score
than their test score (Centraal Planbureau, 2020) and economic position is strongly related to
education level (Shavers, 2007). Research also shows that various other factors seem to be
contributing to receiving an advice score below the test score (SER, 2019). Male pupils seem to have
a higher probability of receiving a lower advice score than their test score (Herweijer 2011; KWT,
2018) and so do pupils with an immigrant background (De Jonge & Nelis, 2018). Furthermore,
urbanization rate (Kansenkaart EUR, 2019) and being a child from a single parent family (Smeets et
al, 2009) seem to also be related to a lower advice than the test score.
The discrepancy between test and advice score seems to be most problematic when the advice
score is lower than the test score, in this study referred to as a negative discrepancy. A negative
discrepancy between test and advice score has a negative effect on children’s further path in
education (Smeets et al., 2014; De Boer et al., 2007). Furthermore, a negative discrepancy between
test and advice score causes a lasting disadvantage throughout secondary education (De Boer,
Bosker & Van der Werf, 2007). Moreover, pupils with a negative discrepancy perform worse than
7

other pupils in secondary education (Timmermans et al., 2013). Also, talents of pupils with a
negative discrepancy remain unused (Mulder, Roeleveld, & Vierke, 2007). A negative discrepancy,
according to Timmermans et al. (2013), functions like a self-fulfilling prophecy: obtaining a negative
discrepancy causes pupils to adapt to a lower level than the level corresponding to their capabilities.
Until recently, learning outcomes by primary schools were evaluated by the Dutch ministry of
Education, Culture and Science based on the school's overall education level of the parents. As seen
earlier, studies show that the education level of the parents seems to be related to discrepancies
between test and advice score. However, since school year 2020/2021, a new measurement has
been developed by the Central Bureau of Statistics (CBS) at the request of the ministry (Posthumus
et al., 2016). This measurement is called school weights. Each primary school will be assigned a
weight (0-40) according to a number of factors, such as the education level of the parents, whether
parents are in a debt relief program, the country of origin of the mother, the time of stay of the
mother in the Netherlands and the overall education level of parents in the school. The lower the
school weight, the higher the expected results of the school (Inspectie van het Onderwijs, 2021). As
the school weights encompass factors that are related to education level, immigration background
and socio-economic status, there might be a relation with a negative discrepancy between test and
advice score.
There are a number of gaps in previous studies on negative discrepancy that the current
research aims to resolve. First, although there is a wide body of research using machine learning
algorithms for predicting school outcomes (Ara et al, 2015; Sorensen, 2019; Fedushko &
Ustyianovych, 2019), research towards Dutch education, and more specifically the discrepancy
between test and advice score, has mostly been of an inferential nature. Second, up to date, no
research has been done towards the relation between school weights and the negative discrepancy
between test and advice score. Third, this research also takes a further look at the relation between
urbanization and negative discrepancy between test and advice score. Previous work shows
conflicting results concerning the effect of urbanization rate on learning outcomes (Driessen et al.
2007; Kansenkaart EUR, 2019): Some studies suggest that an increase in urbanization raises the
likelihood of a negative discrepancy (Driessen et al., 2007). Other studies suggest the opposite
(Kansenkaart EUR, 2019). Thus, previous research has not yet determined if urbanization rate
contributes to discrepancies between test and advice score.
The currents research is aimed at filling the previously mentioned gaps. First, the current
research will take a predictive modelling approach. Based on the beforementioned sub-questions,
three models will be compared based on their performance predicting a negative discrepancy
between advice score and test score. The first model will be focused on the predictive performance
8

of parents’ education level, gender, cultural background, single parenthood and urbanization on
negative discrepancy. In the second model education level of parents will be switched with school
weights as a predictor of negative discrepancy. Lastly, the predictive performance of a possible
interaction by urbanization on the relation between school weights and discrepancy, will also be
assessed.
Methods
In order to find the right modeling strategy for the sub-questions, a number of requirements had
to be taken into account. First, as the dependent variable in this study is binary and categorical
(there is either a negative discrepancy, or there is not), a classification algorithm will be used. Next,
in order to relate back to previous studies, the model should preferably produce interpretable
coefficients. Lastly, the grouped nature of the dataset needs to be taken into account: certain
variables such as gender, cultural background, are on the level of the student/pupil, as they relate to
a pupil. Other relevant variables, such as urbanization and school weights, are on the level of
municipalities and schools. To accurately account for possible group differences, the model should
be mixed effects, also called multilevel. To accurately answer the research question and at the same
time take into account the above requirements, mixed effects logistic regression was chosen as
modeling strategy.
Logistic regression allows relating a dichotomous dependent variable to one or more predictor
variables (Hosmer and Lemeshow, 2000; Menard, 2002). Logistic regression, unlike linear regression,
does not model the outcome, but rather the conditional probability that an outcome variable equals
one at a particular value of a predictor variable (e.g. the likelihood of a negative discrepancy for a
pupil whose parents have a low education level) (Sommet & Morselli, 2017). Using the logistic
function for making a binary prediction, a conditional probability (between 0 and 1) is computed for
each observation, and, if the threshold is set at .5, probabilities larger than .5 correspond to an
outcome value of one, otherwise zero. The function below (Equation 1) represents the equation for
logistic regression.
(1) Log 𝓔 (Y)= β0 + β1x
In the equation above, β0 corresponds to the intercept, the value when all predictors are set to
zero. Next, β1 corresponds to the slope, the increase or decrease of the log odds of outcome
variable Y for each change in β1. The likelihood of an outcome in logistic regression is represented in
the form of log odds of the outcome variable Y(Y = 1), or Y(Y = negative discrepancy). Odds are the
ratio of between the probability that something will happen and the probability of something not
happening. It is important to note that logistic regression uses the natural logarithm of the odds. This
is due to the fact that odds do not allow values under 0. Instead of predicting the conditional
probability that the outcome variable equals one, the logit of the conditional probability that the
9

outcome variable equals one over the probability that it equals zero (not having a negative
discrepancy) is computed. When interpreting the coefficients, the current study uses the odds ratio,
which is obtained by exponentiating the log odds produced by the logistic regression model. The use
of odds ratio eases interpretation.
Logistic regression has a number of assumptions that need to be complied with in order for it to
be used as a modeling strategy (Hosmer and Lemeshow, 2000; Menard, 2002). One of these
assumptions is the independence of observations (Hosmer and Lemeshow, 2000; Menard, 2002). In
the current data set, that requirement is not met. As it is often the case with educational data, pupils
are nested in schools and these schools might have an effect on the students (O'Connell & McCoach,
2008). Pupils in the same schools are likely to behave/perform similarly. To account for this
phenomenon, a multilevel (or mixed effects) approach will be used. In a multilevel approach, the
clustered nature of the data is accounted for (Sommet & Morselli, 2017). Certain variables, called
level 1 predictors here, describe individual rows of data: pupils, in the case of this study. Other
predictors describe groups of pupils, and thus pertain to a higher level (level 2 or higher). For
example, in the current study, gender is a level 1 predictor and so is the educational background of
the parents. School weights however, are a level 2 predictor: as they are a characteristic of schools.
While level 1 variables can both differ between schools as within schools, level 2 predictors can only
differ between schools.
There are two main ways in which multilevel logistic regression accounts for the grouped
structure of data (Sommet & Morselli, 2017). First, the log odds of the outcome variable are allowed
to vary between clusters. The below equation models an empty multilevel logistic regression model,
containing only variance between groups (schools and/or municipalities here), called random
intercept variance, and the fixed intercept.
(2) Log 𝓔 (Y) = β00 + u0j
Here (Equation 2), β00 corresponds again to the fixed intercept: the log odds that the outcome
variable equals 1, or, in the current study, corresponds to a negative discrepancy. u0j however,
corresponds to the random intercept variance: the variance of the between-cluster differences of
these log odds. This model contains no predictors, it is an intercept only model. Second, when the
relation between one or more predictors and the outcome variable is modeled, a multilevel
approach also allows adding between group differences in the effect of the predictor on the
outcome variable, also called random slopes.
(3) Log 𝓔 (Y) = β00 + (β10 + uij) * xij + u0j
The equation above (3) includes random slopes. In this equation, not only is the variation of the log
odds of the outcome variable accounted for, but also the variation between groups uij of the effect
10

of a predictor variable β10. Note that level 2 variables do not vary in their log odds of the outcome
variable. For example, school weights are the same for each pupil in a school: there is no within-
school variation. There is only between group variation of school weights. Education background of
the parents is a level 1 variable, because it relates to the pupil and not to the group. Thus, the
possible effect of educational background of the parents can vary both between clusters as within
clusters. In some schools there could be a strong effect and in other schools there might be no
effect. Not taking into account between cluster variation might wrongfully lead to the conclusion
that the effect is absent (Sommet & Morselli, 2017).
Experimental Setup
Data was collected from the DUO database and consisted of 179,770 pupils in the final year of
primary education in the Netherlands, during schoolyear 2018/2019. During preprocessing the
outcome variable Negative_discrepancy was created. Included predictors are: education level of
parents (edu), school weights (SW), gender (gen), single-parenthood (sin), cultural background (cb)
and urbanization (urb). Using the three-step procedure by Sommet & Morselli (2017) the neccesity
of a multilevel approach was assesed. Based on this procedure, random intercept variance by
schools (brinvest) and the random slope by education level of the parents were assesed. Then, four
multilevel logistic regression models were created. The models were compared and evaluated using
hold-out validation, intraclass correlation coefficient (only base model) and predictive accuracy. In
what follows, the dataset, preprocessing procedure, experimental procedure, implementation,
validation and evaluation are described in more detail.
Dataset Description
In the early stages of the research project, the target population was defined as all pupils in the
final year of primary education during schoolyear 2018/2019. Data collection was conducted in
August 2021. In order to gain access to the necessary data, two SQL queries were executed. One
query was done in the general part of the DUO database (called Stromen), containing general data
on the whole Dutch student population. Then, another query was done in another part of the
database that specifically concerns primary education (called PO), in order to obtain the weights
corresponding to the education level of the parents. Using the package obd (Hester & Wickham,
2021) the SQL query was coded in R, to load the dataset. The two datasets were then connected
using the unique ID of the student and a unique ID of the school as keys. Finally, a third dataset was
connected to the queried data. This open dataset contains the school weights calculated by the
Central Bureau of Statistics (CBS).
After the previous merging operations the dataset consisted of 179,770 records: all pupils in
the final year of primary school in the school year 2018/2019, divided over 6225 primary schools.
11

Pupils are defined by a six-digit number, called the OWN-number and schools are defined by a six-
character value called brinvest. Besides these identifiers, the preliminary dataset had 12 other
variables, among them pupil characteristics (all of them binary categorical), such as gender,
cultural background, single parents, urbanization and educational background of the parents (see
Appendix A, Table A1). Furthermore, variables related to performance: test type, test score and
advice score: these were used to compute the outcome variable discrepancy and later discarded
from the final dataset. Finally, the school weights (M = 29.5, SD = 3.9), a continuous variable,
corresponding to a weight between zero and 40. The dependent variable in this study is called
discrepancy and is the discrepancy between test and advice score. Originally, it is a variable that is
coded as 0 in case of a lower test score compared to the advice score, 1 in case of a higher test
score compared to the advice score and 2 in case of identical scores. Analysis of missing values
showed that they only occurred on the variables educational level of the parents, school weights
and discrepancy. Further analysis showed that missing values were mostly due to new schools
whose weight hasn’t been calculated yet, pupils not yet passing on to secondary education and
education levels that weren’t disclosed. These observations were ultimately discarded (13,435).
Leaving our final dataset with 166,335 pupils.
Dataset Preprocessing
As stated before, the performance related variables were only used in the early stages of the
research project to create the dependent variable discrepancy, later they were discarded. The
dependent variable discrepancy was created in three steps. First, the auxiliary variable called
t_advice was created from test type and test score, by unifying the different test types and their
scoring margins (Appendix A, Table A2). Primary schools can choose from 5 different test that are
currently available, with each having its own scoring margins. The auxiliary variable t_advice
standardizes the different test score margins per test type, so that each pupil has a test score that
corresponds with one of the six levels of secondary education. Second, auxiliary variable advice_rec
was created by doing the same operation with the advice scores, a categorical variable which
contains 25 levels (Appendix A, Table A3). The levels of the advice scores were related to the 6
corresponding levels of secondary education. Third, the dependent variable discrepancy was
computed by computing the difference between the t_advice and advice_rec. When t_advice was
higher than advice_rec, discrepancy was encoded with ´positive´, in the opposite situation ´negative´
and if both scores were identical ´none´. Then, discrepancy was dummy coded, allowing the model
to compare pupils with a negative discrepancy (Negative_discrepancy = 1) with the rest
(Negative_dicrepancy = 0). Finally, before beginning the modeling stage, our dataset consisted of 10
features (Appendix A, Table A1), excluding the OWN number and brinvest).
12

Experimental Procedure
Sommet & Morselli (2017) provide an intuitive and practical three-way procedure for assesing
the neccesity of multilevel logisitic regression, which therefore was reproduced in the current study.
In the first step of the procedure (Sommet & Morselli, 2017) the extent of between-group variation
of the log odds of Negative_discrepancy is assesed. This assessment is done by interpreting the ICC,
or intraclass correlation coefficient (equation 4), of the base model (Equation 5). This gives the
proportion of variation of the log odds of the outcome variable (Negative_discrepancy) attributable
to group differences. An ICC of 0 means that none of the variation in the log odds of the outcome
variable is attibutable to group differences, while an ICC of 1 means that all variation in the log odds
of the outcome variable is attributable to group differences. Based on the ICC, we know if its worth it
to use a multilevel aproach rather than a basic logistic regression approach. In the equation below
var(𝑢0𝑗 ) refers to the random intercept variance between schools and/or municipalities. (π2/3)
represents the standard logistic distribution, which is the variance within schools and/or
municipalities.
var(𝑢0𝑗 )
(4) ICC =
var(𝑢0𝑗 ) + (𝜋2 / 3)
In the second step of the procedure (Sommet & Morselli, 2017) the necessity of adding a
random slope (of our main independent variable) is assesed. The main level-1 independent variable
of the current study is educational level of the parents. In order to to assess the between-group
variation of the effect of main level 1 variable (education) on Negative_discrepancy, two
intermediate models were compared. The comparison was made between an augmented
intermediate model and a constrained intermediate model. The constrained intermediate model
(CIM) contains the level 1 variables, level 2 variables and the random intercept variance. The
augmented intermediate model (AIM) adds the random slopes for the main independent variable
of interest, allowing to assess the between-group (schools) differences in the log odds of the effect
of education on negative discrepancy. Model comparison was done by comparing predictive
accuracy. The result of the comparison led to the decision to only include random intercept
variance (CIM).
In the third part of the procedure (Sommet & Morselli, 2017), assuming no relevant random
slope by education, using the glmer() function of lme4 (Bates et al., 2015), the following four
models were constructed, based on the three sub-questions (see introduction). Each model has
Negative_discrepancy (N_D) as outcome. The base model (Equation 5) contains a fixed intercept
and random intercept variance by schools (brinvest) and serves as a baseline for model 1 (Equation
6). Models are compared by accuracy with hold-out validation. The base model is also evaluated by
intraclass correlation coefficient (ICC) (Sommet & Morselli, 2017) to assess the multilevel approach.
13

(5) 𝑁_𝐷 ~ (1|brinvest)


Model 1 (Equation 6): contains fixed effects education (edu), gender (gen), single-parenthood (sin),
cultural background (cb), urbanization (urb), and random intercept variance by schools (brinvest).
Model 1 serves to answer sub-question 1, by comparison with the base model accuracy.
(6) 𝑁_𝐷 ~ 𝑒𝑑𝑢 + 𝑔𝑒𝑛 + 𝑠𝑖𝑛 + 𝑐𝑏 + 𝑢𝑟𝑏 + (1|brinvest)
Model 2 (Equation 7): contains fixed effects school weights (SW), gender (gen), single-parenthood
(sin), cultural background (cb) and urbanization (urb), and random intercept variance by schools
(brinvest). Model 2 serves to answer sub-question 2 and is evaluated using the accuracy of model 1 as
a baseline.
(7) 𝑁_𝐷 ~ 𝑆𝑊 + 𝑔𝑒𝑛 + 𝑠𝑖𝑛 + 𝑐𝑏 + 𝑢𝑟𝑏 + (1|brinvest)
Model 3 (Equation 8): contains fixed effects school weights (SW), gender (gen), single-parenthood
(sin), cultural background (cb), urbanization (urb), the interaction between urbanization and school
weights and random intercept variance by schools. The output of model 3 serves to answer the third
sub-question and is evaluated by using the accuracy of model 2 as a baseline.
(8) 𝑁_𝐷 ~ 𝑆𝑊 + 𝑔𝑒𝑛 + 𝑠𝑖𝑛 + 𝑐𝑏 + 𝑢𝑟𝑏 + (SW ∗ 𝑢𝑟𝑏) + (1|brinvest)
Implementation
The software used will be SQL Server Management Studio and RStudio. SQL Server
Management Studio is used to extract data from the DUO database. RStudio is used to run the R
programming language (2021), which was used for this research. The following packages were
used: Tidyverse (Wickham et al., 2019), odbc (Hester & Wickham, 2021), DBI (Wickham & Müller,
2021), lme4 (Bates et al., 2015), Sjplot (Lüdecke, 2021), fastDummies (Kaplan, 2021) and caTools
(Tuszynski, 2021). Tidyverse (Wickham et al., 2019) was used for efficient data manipulation, style
and visualizations. Odbc (Hester & Wickham, 2021) and DBI (Wickham & Müller, 2021) are used for
enabling the SQL queries to be executed in RStudio. lme4 (Bates et al., 2015) is used to create a
mixed effects logistic regression model, using the glmer function. Sjplot (Lüdecke, 2021) was used
to generate formatted tables containing odds ratios. fastDummies (Kaplan, 2021) was used to
create dummy variables and, finally, caTools (Tuszynski, 2021) was used for the data split for the
hold-out validation.
Validation and Evaluation
To effectively evaluate and compare models, the current study uses hold-out validation. Hold-
out validation is considered a good method for assessing predictive performance, when the full
population data is available and when dealing with computationally complex models (James et al.,
2013). In hold-out validation, the data is divided up into a train, validation and test set. In the
current study, the train set will comprise 64% of the dataset, the validation set will be 16% of the
data and the test set will be 20%. Each model is trained on the train data, which is a larger portion
14

of the data, optimizing the learning process. Then, in order to evaluate and/or compare the
predictive performance of the models on unseen data, each model will be run on the validation set.
Finally, in order to provide a reliable definite assessment of predictive accuracy of a model, the
model will be provided with the hold-out test data. In conclusion, each model will be trained on the
train set, validated and compared on the validation set and then finally evaluated by running it on
the test set.
While applying hold-out validation, in order to evaluate the performance of the different
models on the validation set and the test set, the confusion matrix will be interpreted for each
model. The confusion matrix is a performance measurement for machine learning classification
problem where output can be two or more classes. It is a table with 4 different combinations of
predicted and actual values. Using the computed confusion matrix, the accuracy score will be
computed per model. The accuracy score encompasses the amount of correctly guessed (true
positive and true negative) instances by the model, divided by the total amount of instances.
Accuracy is used as an evaluation metric in this study because we are mostly interested in correct
predictions by the model. Besides accuracy, only the base model is also evaluated using the
intraclass correlation coefficient (ICC), which allows us to assess the neccesity of a multilevel
aproach rather than a normal logistic regression (see experminetal procedure).
Number of correct predictions
(5) Accuracy =
Total number of predictions
Results
The result section reports the beformentioned three-way procedure for multilevel logistic
regression (Sommet & Morselli, 2017). First, the ICC of the base model is reported and interpreted,
to assess the extent of between-group variation of the log odds of the dependent variable
Negative_discrepancy. Secondly, the random slope of our main predictor of interest is assesed.
Thirdly, the results of the three models stemming from the sub-questions are reported.
Results: ICC and Base Model
First (Sommet & Morselli, 2017), the base model was evaluated by intraclass correlation
coefficient (ICC). This base model, will also serve as a baseline for the first sub-question, which is
answered using model 1 (Table 4). The model was first trained, during which a bobyqa optimizer
was added to the glmer function, which solved convergence issues (Bates et al. 2015). On
validation, the model resulted in an ICC of .21 and a predictive accuracy score of .77. Thus,
including the random intercept variance of schools was considered neccesary, as the ICC was
considerably higher than 0 (Sommet & Morselli, 2017). Performance (in terms of accuracy) on the
test set was similar to the results on validation (see Table 2 for the confusion matrix). This model
will be used as a baseline for the first sub-question (model 1).
15

Table 1
Base model coefficients

Effect Estimate SE 95% CI

LL UL

Fixed effects

Intercepta .790 .040 .750 .820

Random effects

ICC .210
Note: number of schools = 6800, total N = 166,335. CI = confidence interval; LL = lower limit; UL = upper limit. aOdds ratio

Results: Random Slope Analysis


Second, to assess the random slope (between-school variation of the effect) of our main level 1
predictor education level of parents (education) on Negative_discrepancy, an augmented
intermediate model (AIM) was compared to a constrained intermediate model (CIM). Therefore,
both models were trained on the training data and then compared on the validation dataset, in
terms of accuracy. The best performing model was then run on the test set. On training, a bobyqa
optimizer was added to the glmer function of both models, to avoid non-convergence (Bates et al.,
2015). On validation, accuracy of both models amounted to .77. Performance on the test set of the
best performing model (CIM) again showed an accuracy of .77 (Table 3). As the augmented model
did not improve accuracy of the CIM, in what follows, the random slope of our main level 1
predictor (education) will not be taken into account: as the random slope of education level of the
parents (education) does not improve the model, it is discarded to avoid possible
overparameterization, failure of convergence, and uninterpretable findings (Bates et al., 2015).
Other random slopes are not taken into account in the current study.

Table 2
Confusion matrix base model
Predicted values Negative_discrepancy
0 1
0 24580 7009
1 736 943
Note: Negative_discrepancy = 1 corresponds to the category of pupils with a negative discrepancy, 0 corresponds to the
rest of the pupils; accuracy is .77; test N = 33,267; accuracy = .77 ((24,580 + 943)/n)
16

Table 3
Confusion matrix CIM
Predicted values Negative_discrepancy
0 1
0 24715 6923
1 749 881
Note: Negative_discrepancy = 1 corresponds to the category of pupils with a negative discrepancy, 0 corresponds to the
rest of the pupils; test N = 33,267; accuracy = .77 ((24,715 + 881)/n)

Results: Sub-Question 1
Thirdly, after determining the absence of a random slope, we can run the models to answer the
research questions. First, the predictive performance of the education level of the parents
(education), cultural background (achtergr), urbanization, single-parent families and gender (sexe)
was evaluated using model (1). Moreover, random intercept variance of schools is included as
random effect. Results were compared with the base model, containing only random intercept
variance (Table 2), as baseline. The model was first trained on the train data, where a bobyqa
optimizer was added to avoid convergence issues (Bates et al., 2015). Then, on the validation set,
the model had an accuracy of .77 (compared to .77 accuracy of the baseline model), thus failing to
outperform the base model (Table 2). Table 4 shows the similar performance of model (1) on the
test set. Interpreting the odds ratios of model (1): pupils whose parents have a low level of
education (rather than average) seem to be more likely to receive a negative discrepancy (OR =
1.70, 95% CI [1.63, 1.78]), while holding all the other variables constant (see Appendix B, Table B1).
Being male (OR = 0.92, 95% CI [0.90, 0.94]), having a Dutch background (OR = 0.88, 95% CI [0.84,
0.91]), living in a large municipality (OR = 0.87, 95% CI [0.83, 0.91]), and having both parents (OR =
0.74, 95% CI [0.69, 0.79]), seem to decrease the likelihood of a negative discrepancy (see Appendix
B, Table B1).

Table 4
Confusion matrix model (1)
Predicted values Negative_discrepancy
0 1
0 24598 6980
1 718 972
Note: Negative_discrepancy = 1 corresponds to the category negative discrepancy, 0 corresponds to positive
discrepancy/none; test N = 33,267; accuracy = .77 ((24,598 + 972)/n
17

Table 5
Confusion matrix model (2)
Predicted values Negative_discrepancy
0 1
0 24672 7102
1 644 850
Note: Negative_discrepancy = 1 corresponds to the category of pupils with a negative discrepancy, 0 corresponds to the
rest of the pupils; test N = 33,267; accuracy = .77 ((24,672 + 850)/n)

Results: Sub-Question 2
Then, regarding the second sub-question, educational background of the parents (education)
was switched for the grand-mean centered school weights. In this model (model 2) the predictive
performance of the predictor school weights, cultural background (achtergr), single-parenthood,
urbanization and gender (sexe), and the random intercept variance of schools (brinvest) as random
effect. Therefore, a multilevel logistic regression was performed to ascertain the predictive
performance of the beforementioned predictors on negative discrepancy (Negative_discrepancy),
while taking into account the random intercept variance of schools (brinvest). The model was first
trained on the train set. As before, the bobyqa optimizer was added to the glmer function of lme4
to avoid convergence issues (Bates et al., 2015). On the validation set, accuracy of model (1) and
model (2) were compared. Both had an accuracy of .77, thus model (2) failed to outperform the
previous model (Table 4). On the test set, model 2 had an accuracy score of .77 (Table 5). Thus,
performance of model (2) (containing the school weights) was similar to both the previous models,
model (1) (containing education level of the parents). Interpreting the odds ratios of model (2):
with each one unit increase in school weights, the likelihood of a negative discrepancy relative to a
positive discrepancy or no discrepancy, seems to increase by 10% (OR = 1.10, 95% CI [1.09, 1.11]),
while holding all the other variables constant (Appendix B, Table B2). Again, being male (OR = 0.92,
95% CI [0.89, 0.94]), having a Dutch background (OR = 0.88, 95% CI [0.84, 0.91]) and coming from a
household with both parents (OR = 0.73, 95% CI [0.68, 0.78]) seems to decrease the likelihood of
receiving a negative discrepancy, while holding all other variables constant (Appendix B, Table B2).
Results: Sub-Question 3
Finally, the predictive performance of model (3) was evaluated. A multilevel logistic regression
was repeated to ascertain the predictive performance. Model (3) includes the interaction of
urbanization on the relation between the school weights and Negative_discrepancy. Other fixed
predictors are gender, cultural background and single parenthood. The random intercept variance
of schools (brinvest) is included as random effect.
18

Figure 1
Overview odds ratios model (2)

Note: sexe[0] = female; achtergr[0] = non-Dutch background; Single-parent[1] = pupil from single parent household; red
dots denote a negative association; blue dots denote a positive association.

The model was first trained on the train data, where a bobyqa optimizer was added to avoid
convergence issues (Bates et al., 2015). Then, on the validation set, model (3) had an accuracy of
.77, similar to the previous model (Model 2), thus failing to outperform the previous model (Table
5). Table 6 shows similar performance on the test set, which also amounted to an accuracy of .77.
When interpreting the odds ratios, we find urbanization seems to decrease the likelihood of a
negative discrepancy. More specifically, while keeping all other variables constant, pupils from
urbanizations with over 50,000 inhabitants seem to be less likely to have a negative discrepancy in
comparison with pupils from municipalities with under 50,000 inhabitants (Appendix B, Table B3).

Table 6
Confusion matrix Model (3)
Predicted values Negative_discrepancy
0 1
0 24673 7111
1 643 841
Note: Negative_discrepancy = 1 corresponds to pupils with a negative discrepancy; 0 to positive or no discrepancy; n =
33,267; accuracy = .77 ((24,673 + 841)/n)
19

Discussion
The objective of the current study is to further investigate the relation between various pupil
and school characteristics on the one hand, and the negative discrepancy between test and advice
score in the final year of Dutch primary education on the other hand. Previous studies suggest that
receiving an advice score below the test score, in this study called a negative discrepancy, has a
negative effect on pupils’ performance in secondary education (Smeets et al., 2014; De Boer et al.,
2007). Furthermore, research seems to point out that a negative discrepancy between test and
advice score is related to cultural background (De Jonge & Nelis, 2018), gender (Herweijer 2011;
KWT, 2018), single parenthood (Smeets et al, 2009) and urbanization (Driessen et al., 2007). The
current study builds on previous research by focusing mainly on predictive performance, rather
than inference. More specifically, we compared the predictive performance of the education level
of the parents on negative discrepancy, with the predictive performance of school weights
calculated by the Central Bureau of Statistics (CBS), on receiving a negative discrepancy. Other
included predictors are cultural background, gender, single parenthood and urbanization. The
outcome variable, negative discrepancy, is a binary categorical variable denoting whether a pupil
had a negative discrepancy between test score and advice score or not.
Three sub-questions were defined. First, the predictive performance of the educational level of
the parents, gender, cultural background, single parenthood and urbanization on negative
discrepancy was assessed in terms of predictive accuracy. Next, the predictor education was
switched with school weights, to see if the predictive performance of the model improved in terms
of predictive accuracy. Finally, urbanization was added as an interaction term between school
weights and negative discrepancy, to see if the predictive performance of the model improved in
terms of predictive accuracy. The dataset of this research was provided by DUO (Dienst Uitvoering
Onderwijs) and consisted 179,770 pupils in the final year of Dutch primary education in schoolyear
2018/2019.
Results of a multilevel logistic regression analysis show that the included predictors education
level of the parents, school weights, cultural background, gender, single parenthood and
urbanization have little predictive performance on negative discrepancy. The same goes for the
interaction of urbanization on the relation between school weights and negative discrepancy. The
predictive performance of a baseline model containing only the random intercept variance of the
log odds of negative discrepancy was not improved by including predictors education level of the
parents, cultural background, gender, single parent families and urbanization. Nor did the accuracy
improve when including school weights instead of education level of the parents, or when including
urbanization as an interaction term. Predictive performance on negative discrepancy between test
20

and advice score seems to be mostly attributable to between-school differences in the likelihood of
receiving a negative discrepancy.
However, it might be interesting to note that when interpreting the odds ratios of the second
model in the current study, school weights seem to be positively associated with negative
discrepancy. Furthermore, most results from previous studies seem to be confirmed. For a male
pupil, with a Dutch background, having both parents and coming from a municipality with over
50,000 inhabitants, one unit increase in school weight corresponds with a 10% increase in the
likelihood of a negative discrepancy. Further interpreting the odds ratios (Appendix B, Table B2), we
also find that pupils without an immigrant background seems to be less likely to have a negative
discrepancy than pupils with an immigrant background, this lines up with findings by De Jonge &
Nelis (2018). Results by Herweijer (2011) and KWT (2008) were also supported: Male pupils are
slightly less likely to receive a negative discrepancy, in comparison to female pupils. Next, as can be
consulted on the interactive map by Kansenkaart (2019), the current study found that pupils in
large(r) municipalities (over 50,000 inhabitants) are less likely to have a negative discrepancy in
comparison with pupils from smalle(r) municipalities (under 50,000 inhabitants). This goes against
results by Driessen et al. (2007), which found opposite results. However, previous effect sizes were
relatively small in comparison to effect sizes of educational background of the parents. For a male
pupil, with a Dutch background, having both parents and coming from a municipality with over
50,000 inhabitants, having parents with a low level of education (max. primary education or
secondary education), a negative discrepancy is 70% more likely than a counterpart with parents of
average education level (higher education), see Appendix B, Table B1. This seems to be in line with
previous research and investigative reports (Inspectie van het Onderwijs, 2018; 2019; SER, 2019).
When evaluating the results, various limitations of this study must be taken into account. First,
the absence of multicollinearity is one of the assumptions of logistic regression. As a negative
discrepancy is surely related to more variables than the predictors included in this study. Therefore,
results could have been impacted by not-included variables correlating with the included
predictors. Next, the current study uses a simple version of cross-validation, called hold-out
validation. A more advanced method could be used for validation, like k-fold cross-validation.
Furthermore, the random slopes of other predictors, besides the main lower level effect of the
current study (education level of parents), could be studied as well. Finally, measurement of
education level is simplified to only two categories. A more thorough and differentiated measure
could be used for the education level of the parents. The same limitation stands for the current
measurement of urbanization, which was simplified to only two categories (over and under 50,000
inhabitants).
21

Conclusion
Using a mixed effects approach, the current study sets out to assess the predictive
performance of a number of predictors of a negative discrepancy between test and advice score
among Dutch pupils in the last year of primary school. Results of a multilevel logistic regression
analysis show that the predictive performance of the included predictors (education level of
parents, CBS school weights, cultural background, gender, single parent families and urbanization)
is marginal and that predictive performance is mostly attributable to differences between schools
in the likelihood of a negative discrepancy. Furthermore, the results show that school weights
calculated by the Central Bureau of Statistics, are positively associated with negative discrepancy.
Finally, urbanization seems to be positively associated with negative discrepancy, pupils in larger
municipalities seem to be less likely to have a negative discrepancy.
A number of recommendations for policymakers can be made based upon this study. First, this
study intends to shows that between-school differences in the likelihood of negative discrepancies
for pupils are better predictors than the predictors stemming from the literature (education level of
parents, gender, single parenthood and urbanization). A negative discrepancy seems to negatively
influence performance in secondary education (Smeets et al., 2014; De Boer et al., 2007). Thus,
policymakers advised to do further research on the factors that identify primary schools where the
likelihood of negative discrepancy is higher. Furthermore, as this study shows that an increase in
school weight is related to an increase in the likelihood of a negative discrepancy, policymakers are
adviced to focus especially on schools with a higher school weight, when adressing the problem of
a negative discrepancy between test and advice score. Finally, specific campaigns could be
organized informing schools, teachers and parents on the adverse effects for pupils of a negative
discrepancy between test and advice score.
Finally, a number of recommendations are given for future research. Future research could
include more variables/predictors to assess predictive performance on negative discrepancy. Also,
as this study uses only logistic regression, in future research multiple algorithms could be compared
by performance. Furthermore, as this study uses data from school year 2018/2019, in future
research comparisons could be made over various school years. Lastly, the current study has only
assessed the predictive performance of the random slopes of the variable education (education
level of the parents). Future research could use also other random slopes, which could potentially
shed more light on the between-school differences in the likelihood of a negative discrepancy
between test and advice score.
22

Acknowledgements
The current research is part of an internship at DUO (Dienst Uitvoering Onderwijs). DUO is an
executive agency within the Ministry of Education, Culture and Science.
23

References
Ara, N-B., Halland, R., Igel, C., & Alstrup, S. (2015). High-school dropout prediction using machine
learning: a Danish large-scale study. In M. Verleysen (Ed.), Proceedings. ESANN 2015: 23rd
European Symposium on Artificial Neural Networks, Computational Intelligence and Machine
Learning (pp. 319-324). i6doc.com.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using
lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Beer, P. de (2016) Meritocratie: op weg naar een nieuwe klassensamenleving?
Centraal Planbureau. (2020a). De waarde van eindtoetsen in het primair onderwijs.
Boer, H. de, Bosker, R., & Werf, M. van der (2007). De gevolgen van onder- en overadvisering. In
Inspectie van het Onderwijs, Onderadvisering in beeld (pp. 83-92). Utrecht: Inspectie van het
Onderwijs.
De Jong, M., & Nelis, H. (2018). Help onze school is gekleurd. De toekomst begint in het onderwijs.
Amsterdam: Nijgh & Van Ditmar.
Driessen, G., Smeets, E., Mulder, L., & Vierke, H. (2007). De relatie tussen prestaties en advies. Onder-
of overadvisering bij de overgang van basis-naar voortgezet onderwijs?.
EUR (2019). Kansenkaart. https://kansenkaart.nl/maps/schooladvieslager#6.58/52.28/5.285
Fedushko, S., & Ustyianovych, T. (2019, January). Predicting pupil’s successfulness factors using
machine learning algorithms and mathematical modelling methods. In International Conference
on Computer Science, Engineering and Education Applications (pp. 625-636). Springer, Cham.
Herweijer, L. (2011), Gemengd Leren. Den Haag: SCP
Hester, J., & Wickham, H. (2021). odbc: Connect to ODBC Compatible Databases (using the DBI
Interface). R package version 1.3.2. https://CRAN.R-project.org/package=odbc
Hosmer, D. W. and Lemeshow, S. (2000). Applied Logistic Regression. New York, NY: John Wiley &
Sons, Inc., DOI: https://doi.org/10.1002/0471722146
Inspectie van het Onderwijs (2018) De Staat van het Onderwijs 2018, p. 23-27.
Inspectie van het Onderwijs (2019) De Staat van het Onderwijs 2019
Inspectie van het Onderwijs (2021) ONDERWIJSRESULTATENMODEL PO
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol.
112, p. 18). New York: springer.
Kuhn, M. (2021). caret: Classification and Regression Training. R package version 6.0-88.
https://CRAN.R-project.org/package=caret
KWT 2018 (Kenniswerkplaats Rotterdams Talent (2018). Samen talenten en kansen versterken.
Manifest voor onderwijs van Kenniswerkplaats Rotterdam Talent. Den Haag: René de Haan.
24

Menard, S. (2002). Applied Logistic Regression Analysis. 2nd ed. Thousand Oaks, CA: Sage. (Sage
University Series on Quantitative Applications in the Social Sciences, series no. 07–106)
Mulder, L., Roeleveld, J., & Vierke, H. (2007). Onderbenutting van capaciteiten in basis- en voortgezet
onderwijs. Den Haag: Onderwijsraad.
Nissen, R., Hogervorst, W., Maatoug, S. & Ziesemer, V. (2019) Kansenongelijkheid vraagt om
aandacht bij beleid en wetenschap. ESB, 104 (4780), 568 – 571.
O'Connell, A. A., & McCoach, D. B. (Eds.). (2008). Multilevel modeling of educational data. IAP.
Posthumus, H., Bakker, B., Laan, J. van der, Mooij, M. de, Scholtus, S., Tepic, M., Tillaart, J.
van den, & Vette, S. den (2016). Herziening gewichtenregeling primair basisonderwijs – Fase
I. Den Haag: CBS.
R Core Team (2020). R: A language and environment for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
SER (2019) Gelijke kansen in het onderwijs. Structureel investeren in kansengelijkheid voor iedereen
Shavers, V. L. (2007). Measurement of socioeconomic status in health disparities research. Journal of
the national medical association, 99(9), 1013.
Smeets, E., Driessen, G., Elfering, S., & Hovius, M. (2009). Allochtone leerlingen en speciale
onderwijsvoorzieningen. [Sl]: Nijmegen: ITS.
Smeets, E., Kuijk, J. van & Driessen, G. (2014) Handreiking bij het opstellen van het basisschooladvies.
Nijmegen: ITS
Sommet, N., & Morselli, D. (2017). Keep Calm and Learn Multilevel Logistic Modeling: A Simplified
Three-Step Procedure Using Stata, R, Mplus, and SPSS. International Review of Social
Psychology, 30(1), 203–218. DOI: http://doi.org/10.5334/irsp.90
Sorensen, L. C. (2019). “Big data” in educational administration: An application for predicting school
dropout risk. Educational Administration Quarterly, 55(3), 404-446.
Timmermans, A., Kuyper, H., & van der Werf, G. (2012). Schooladviezen en onderwijsloopbanen.
Voorkomen, risicofactoren en gevolgen van onder-en overadvisering, Gronings Instituut voor
Onderwijs van Onderwijs, Rijksuniversteit Groningen.
Wickham, H. (2016) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
Wickham, H. & Müller, K. (2021). DBI: R Database Interface. R package version 1.1.1. https://CRAN.R-
project.org/package=DBI
25

Appendix A
Table A1
Overview included categorical variables

Frequency

n %

Gender

Female 83,849 50.40

Male 82486 49.60

Cultural Background

Dutch 144,779 87.04

Non-Dutch 21,536 12.96

Urbanization

Small (<50,000) 70,655 42.48

Large (>50,000) 95,680 57.52

Single parenta 4,661 2.80

Educationb

Average 152,533 8.30

Low 13,802 91.70


a Reflects the number and percentage of pupils from a single-parent household;
b Average corresponds to parents with max higher education, low to max primary education;
Note: total N = 166,335
26

Table A2
Test types, their corresponding margins and related levels

Test 1 2 3 4 5 6

CET 501 - 504 505 - 523 524 - 531 532 - 538 539 - 543 544 - 550

R-8 100 - 158 159 - 182 183 - 202 203 - 219 220 - 232 233 - 300

IEP 50 - 51 52 - 68 69 - 76 77 - 84 85 - 91 92 - 100

DIA 321 - 341 342 - 349 350 - 356 357 - 364 365 - 370 371 - 400

AMN 300 - 305 306 - 326 327 - 370 371 - 427 428 - 463 464 - 500

Note: CET = Centrale Eindtoets; R-8 = Route 8; IEP =ICE Eindevaluatie Primair; DIA = DIA-eindtoets; AMN = AMN
eindtoets.
1 = VMBO-PRO; 2 = VMBO – BB/KB; 3 = VMBO – KB/GB; 4 = VMBO/HAVO; 5 = HAVO; 6 = VWO

Table A3
Advice score categories and corresponding levels

Advice score Corresponding level

10 - 21 VMBO - PRO

22-23 VMBO - BB/KB

30 - 35 VMBO - KB/GT

40 - 52 VMBO - HAVO

60 – 61 HAVO - VWO

70 VWO
27

Appendix B

Table B1
Regression coefficients baseline

Effect Odds ratio SE 95% CI

LL UL

Fixed effects

(Intercept) 0.438 0.020 0.400 0.479

Education [low] 1.704 0.038 1.632 1.780

Gender [male] 0.922 0.011 0.900 0.945

Background [Dutch] 0.878 0.019 0.842 0.915

Urbanization [>50k] 0.869 0.022 0.828 0.913

Single parent 0.738 0.026 0.688 0.791

Random effects

Marginal R2 .011

Conditional R2 .378
Note. Number of groups = 6137; total N = 166,335; CI = confidence interval; LL = lower limit; UL = upper limit.
28

Table B2
Regression coefficients model (2)

Effect Odds ratio SE 95% CI

LL UL

Fixed effects

(Intercept) 0.452 0.022 0.415 0.494

Schoolweights 1.106 0.004 1.098 1.113

Gender [male] 0.921 0.011 0.899 0.944

Background [Dutch] 0.881 0.019 0.845 0.918

Urbanization [large] 0.833 0.020 0.795 0.873

Single parent[NA] 0.731 .026 0.682 0.784

Random effects

Marginal R2 .054

Conditional R2 .378
Note. Number of groups = 6137 , total N = 166,335. CI = confidence interval; LL = lower limit; UL = upper limit.
29

Table B3
Regression coefficients model (3)

Effect Odds ratio SE 95% CI

LL UL

Fixed effects

(Intercept) 0.453 0.020 0.415 0.494

School weights 1.113 0.007 1.100 1.126

Gender [male] 0.921 0.011 0.899 0.944

Background [Dutch] 0.879 0.044 0.844 0.916

Urbanization [large] 0.835 0.020 0.797 0.875

Single parent [no] 0.731 0.026 0.682 0.783

Schoolweights*Urbanization 0.991 0.007 0.978 1.004

Random effects

Marginal R2 .054

Conditional R2 .378
Note. Number of groups = 6137 , total N = 166,335. CI = confidence interval; LL = lower limit; UL = upper limit.

You might also like