Ravi Thesis

Ethnic differences in body composition
and lifestyle factors by obesity and

metabolic health
Ravi Mohan Lal
Supervisor: Prof. Stephen McKenna

Co-Supervisors: Dr. Rajna Golubic & Prof. Shumone Ray
A dissertation submitted for the Degree of Master of Science

MSc in Data Science and Engineering
School of Science and Engineering

University of Dundee
20th March 2023

Executive Summary
Obesity is a chronic, relapsing, multifactorial disease with associated economic and public
health burden. Obese people are at an increased risk of developing type 2 diabetes and
Cardiometabolic disorders. Although obesity has become a widespread public health concern,
research indicates that the causes of being overweight and obese can be preventable and
reversible through timely and appropriate interventions. Despite awareness of the health and
economic implications of obesity, no country has yet been able to reverse the growth of this
epidemic. This is mainly because obesity is multifactorial and complex with various biological,
environmental, genetic, and biochemical factors contributing to its development and
persistence. Since it is a heterogenous condition, it can manifest in different ways and have
varying underlying causes and risk factors. The effects of available treatments for obesity vary
between individuals. Current treatment based on a one-size-fits all approach (BMI) to classify
individuals by body shape and weight has limitations. Therefore, detailed profiling of
individuals based on biological, behavioural, clinical, and demographic factors is needed to
generate a more accurate description of the interplay among these factors.
The purpose of this study is to understand mutual interactions that occur between these
variables. This will help to understand the varying degrees of diseases that may have caused
by these factors and will likely pave the way to stratify obesity into clinically relevant
phenotypes, which ultimately help develop effective prevention and treatment strategies to
prevent and manage it. To investigate the heterogeneity of the condition, detailed
information on patients is needed. For this study, data are taken from the UK biobank
resource to support the investigation of risk factors of major diseases such as diabetes. This
thesis aims to describe body composition, lifestyle factors, biochemical markers, and
demographic and clinical characteristics to see differences among people divided into four
categories (underweight, obese, normal, and overweight). Statistical analysis using ANOVA,
and linear regression to understand the association of BMI as a predictor with certain
continuous outcome variables is used to assess the differences. The findings/outcomes are
discussed in light of literature for clinical relevance and recommendations for future work are
provided.
i
Declaration
I declare that the special study described in this dissertation has been carried out and the
dissertation composed by me, and that the dissertation has not been accepted in fulfilment
of the requirements of any other degree or professional qualification.
Signed1:
Date:
1
Electronic signatures are acceptable
ii
Certificate
I certify that Ravi Mohan Lal has satisfied the conditions of the Ordinance and Regulations
and is qualified to submit this dissertation in application for the degree of Master of Science.
Signed2:
Date:
2
Electronic signatures are acceptable
iii
Acknowledgements
I would like to express my sincere gratitude to my supervisor Professor Stephen McKenna, for
his continuous support, invaluable guidance, insightful suggestions throughout this project.
Special thanks to Professor Shumone Ray for providing me opportunity to work at NNEdPro,
global institute of food, nutrition and public health as data science intern and to Dr. Rajna
Golubic for her patience on answering my questions and for setting clear direction for the
project.
I am grateful to my wife, Monika, for her encouragement and support, and for understanding
the many sacrifices that I had to make in order to complete the master’s course.
Finally, I would like to express my thanks to all faculty members who were involved in teaching
valuable courses and providing me with feedback and comments.
Thank you all for your help and encouragement.
iv
Table of Contents
Executive Summary..................................................................................................................... i
Declaration ................................................................................................................................. ii
Certificate .................................................................................................................................. iii
Acknowledgements................................................................................................................... iv
Table of Contents ....................................................................................................................... v
List of Tables ............................................................................................................................. vi
List of Figures ........................................................................................................................... vii
1 Introduction ........................................................................................................................ 1
2 Obesity Phenotypes ............................................................................................................ 3
3 Study Design ....................................................................................................................... 8
3.1 Methods ...................................................................................................................... 9
3.2 Data Availability......................................................................................................... 10
3.3 Data Collection and conversion ................................................................................ 10
3.4 R for analysis ............................................................................................................. 13
4 Statistical Analysis ............................................................................................................ 15
4.1 HbA1c ........................................................................................................................ 15
4.2 Systolic Blood Pressure (SBP) .................................................................................... 17
4.3 Visceral Adipose ........................................................................................................ 19
4.4 Moderate Physical Activity ........................................................................................ 20
4.5 Vigorous Activity ....................................................................................................... 21
4.6 Linear Regression ...................................................................................................... 22
5 Discussion ......................................................................................................................... 28
6 Summary and Conclusions ............................................................................................... 31
7 References / Bibliography ................................................................................................ 32
8 List of Appendices ............................................................................................................. 34
v
List of Tables
Table 1. Obesity Phenotypes and association of lifestyle factors on health outcomes ............ 7
Table 2. R commands and their functions used in the study .................................................. 14
Table 3 Mean differences (HbA1c) and their significance with “Normal weight” taken as
reference. ................................................................................................................................. 16
Table 4 Mean differences (SBP) and their significance with “Normal weight” taken as
reference. ................................................................................................................................. 17
Table 5.Mean differences (Visceral adipose) and their significance with “Normal weight”
taken as reference. .................................................................................................................. 19
Table 6 Mean differences (Visceral adipose) and their significance with “Normal weight” taken
as reference. ............................................................................................................................ 20
Table 7. Mean differences (Vigorous physical activity) and their significance with “Normal
weight” taken as reference. ..................................................................................................... 21
Table 8 Comparison of ANOVA with Kruskal-Wallis test ......................................................... 28
vi
List of Figures
Figure 1. Data Downloading steps in UK Biobank.................................................................... 11
Figure 2. HbA1c stratified by body shape phenotype ............................................................. 15
Figure 3.Q-Q plot for HbA1c showing deviations from normal curve at higher values .......... 16
Figure 4. Systolic blood pressure stratified by body shape phenotype................................... 17
Figure 5. QQ plot and histogram showing type of distribution for SBP measurement........... 18
Figure 6.Visceral Adipose stratified by body shape phenotype .............................................. 19
Figure 7.Moderate physical activity stratified by body shape phenotype .............................. 20
Figure 8.Moderate physical activity stratified by body shape phenotype .............................. 21
Figure 9. Scatter plots between outcome variables and Age .................................................. 23
Figure 10.Box plots showing outcome variables stratified by BMI-categories ....................... 24
Figure 11. Box plots showing outcome variables stratified by Gender ................................... 25
Figure 12Box plots showing outcome variables stratified by Gender ..................................... 26
Figure 13 Outcome variables satisfied by Deprivation index .................................................. 27
vii
1 Introduction
Obesity is a long-term and complex health issue caused by many factors. Its prevalence has
grown worldwide, and WHO has estimated that it has more than tripled since 1975. [1].
Being overweight and obese increases the risk of hospitalizations, placing strain on current
healthcare infrastructure and resulting in considerable direct and indirect costs for the
affected individuals. Direct costs refer to expenses incurred through outpatient and
inpatient health services, laboratory tests, medical procedures, and doctor visits [2]; On the
other hand, indirect costs refer to resources forgone as a result of a health condition like
absenteeism from work and loss of work productivity” [3]. Besides economic and social
burden, this issue wreaks havoc on individual lives as it is a precursor to developing
noncommunal diseases such as type 2 diabetes, CVD, cancer, and comorbidities [4].
Known measures which are employed include pharmacotherapy, medical devices, surgical
procedures, adopting healthy dietary patterns, engaging in regular physical activity, and
maintaining a healthy weight. While these are proven and effective approaches, the
response to each of these differs from one person to another. This heterogeneity in response
can be attributed to various factors, namely: biochemical, demographic and clinical,
environmental, genetic, and socioeconomic factors. As a result, the development of
different obesity subtypes makes individual-level obesity management complex and
multifactorial, and many countries have struggled to curb its progression on a larger scale.
Considering this variation, there is a growing recognition that the current clinical approach
of categorizing individuals weights based on standard BMI requires re-evaluation to avoid
recommending same treatment to individuals on same BMI level. The reason is that BMI fails
to capture differences in body fat distributions leading to possible misclassification of
individuals with high muscle mass and low adiposity as being obese or failing to classify
individuals with low muscle mass and high adiposity as having an increased cardiometabolic
risk [5-7].
Given the limitations of a one-size-fits-all approach, it becomes reasonable to design

prevention and treatment programs based on individual unique needs and risk profiles [8].
Moreover, to identify potential risk factors and areas for intervention, a comprehensive
understanding of causes of obesity becomes a key step leading to development of clinically
relevant subtypes. To begin the process, a thorough analysis of biological, behavioural,
clinical, and demographic data from individuals with varying weight statuses can be
conducted to develop a comprehensive profile. This thesis lays foundation to understand
mutual interactions among these factors and their relative contribution to cardiometabolic
biomarkers (HbA1c for example). This will pave way to stratify obesity into clinically relevant
phenotypes, which in turn help develop effective prevention and treatment strategies to
prevent and manage it. The detailed profiling is carried out using UK Biobank resource. The
1
analysis deals with variables based on four factors namely: body composition, lifestyle
factors, demographic, biochemical factors, where each variable is stratified by BMI-
categories, and differences of each sub-category are assessed using ANOVA and regression
models. Specifically, for the scope of this thesis, the discussion is carried out for these the
variables of interest (age, sex, Ethnicity, deprivation as independent variable, BMI (ordinal
predictor), outcome variables (Systolic blood pressure, HbA1c, visceral adipose, moderate
to vigorous activity). The list of variables is provided in Appendix A.
This introduction is followed by brief review of previous work on heterogeneity and

implications of stratifying obesity into various phenotypes on individual risk for disease. This
is followed by an outline of the study design and statistical tool used for data extraction. The
findings are then presented and discussed in context of existing literature. In the end
conclusion and recommendation are outlined.
2
2 Obesity Phenotypes
Heterogeneity in obesity implies variability in the causes, and risk factors of obesity among
individuals. This means not all obese people have same underlying causes and health
outcomes. Therefore, uniform strategies to treat obesity show variability since all are
designed based on the same basic recommendations for prevention and treatment without
considering factors such as age, race, ethnicity.
The diversity among patients with obesity is evident in their varied responses to weight loss
interventions and makes clinically significant and sustained weight loss very challenging.
Moreover, relying solely on BMI and individual preference to recommend weight loss
intervention has limitations since as has does not reflect accurately visceral fat
accumulations and adiposity. For instance, the limitations of BMI include the inability to
distinguish between fat and lean tissue, as well as the inability to provide information on
body fat distribution. As a result, BMI can lead to misclassification, with some individuals
having a high BMI despite being muscular and fit, while others may be classified as obese
based on their BMI despite having little VAT accumulation.
To account for the diversity within the obese population, it may be beneficial to classify
obesity into distinct subtypes. That will likely offer greater insights into its underlying causes
and lead to more tailored and effective approaches to preventing and treating obesity. In
[9], by categorizing obesity into four distinct phenotypes based on factors such as abnormal
satiation, hedonic eating, abnormal satiety, and decreased metabolic rate, the researchers
were able to adopt a more targeted approach to treatment. According to the findings, the
phenotype-guided approach was associated with a 1.75-fold greater weight loss after 12
months, with participants in this group experiencing a mean weight loss of 15.9%, compared
to 9.0% in the non-phenotype-guided group.
Previous efforts to characterise obesity focused on metabolic health defined by blood

pressure, lipid profile and glycaemic control or insulin resistance [10-15]. For instance, in
[10], investigation is carried out whether people with metabolically healthy obesity (MHO)
are truly healthy by assessing the risk of developing type 2 diabetes and cardiovascular
disease (CVD) in this population. The study found that individuals with MHO had a
significantly higher risk of developing type 2 diabetes and CVD compared to those with
normal weight and without metabolic abnormalities.
Another study [11] defines phenotypes based on various cardiometabolic abnormalities

(elevated bp, elevated triglycerides) and body size criteria was defined using (normal, obese,
overweight, metabolically healthy, metabolically abnormal). The study revealed that in
individuals with a normal weight, the clustering of cardiometabolic risk factors was more
prevalent in older age groups, men, and those who had a history of smoking or were
currently smoking. The likelihood of exhibiting a metabolically healthy phenotype is lower
3
among overweight and obese individuals who are older, have a history of smoking, or have
a larger waist circumference [11].
Yet another study [15] showed that long-term fluctuations in body mass index (BMI) could
indicate a greater risk of cardiometabolic disease among individuals who are not obese. In
comparison to those with a consistent BMI, those with variable BMI showed 163%, 67%,
58%, and 74% greater risks of developing obesity, metabolic dysfunction, diabetes mellitus,
and hypertension, respectively. These associations were observed among non-obese
participants. Other pathophysiology and behaviour-based subtypes of obesity described in
the literature include high insulin secretion [16], high responsiveness to external food cue
[17], learned patterns of preference for calorie-dense foods, binge eating and food addiction
[18], low reinforcing value of physical activity [19] and high reinforcing value of sedentary
behaviour [20]. Summary of findings for the related work on phenotypes are provided in
table 1.
These studies characterize obesity in number of ways, however the limitations of prior
research on lifestyle behaviours, biochemical and clinical are that they are mainly based on
cross-sectional surveys and self-report measures, without direct measures of body
composition. Additionally, variable-centred approaches have also been employed, which
reveal how individual variables are related to one another, but they fail to capture the
complexity of how these variables cluster within individuals and change over time. A more
nuanced understanding of these interrelationships could aid in tailoring clinical
management strategies for individual patients. This research would help understand
interactions based on these factors to identify individuals at higher risk, create personalized
treatment plans and advance research to better understand underlying mechanisms of
obesity and related diseases.
4
Reference Objective Study Design Sample size Key findings Limitations and
and and results Implications
characteristics
10 Examine the population-based 381,363 UK Metabolically weight management
associations of prospective cohort Biobank healthy obesity could be beneficial
metabolically healthy study participants is not actually to all people with
obesity with a wide healthy, as they obesity irrespective
range of obesity- have a higher of metabolic profile
related outcomes. risk of (MHO and
developing
ASCVD, heart
failure, and
respiratory
diseases
11 Investigate the Study design: 5,440 adults (≥20 The "normal Non standardization
prevalence and Cross-sectional years) who weight with of body type
characteristics of two study participated in cardiometabolic phenotype
phenotypes, the obese the National risk factor definitions.
without Health and clustering" Further research
cardiometabolic risk Nutrition phenotype was examining the
factor clustering and Examination associated with effects of different
the normal weight Survey (NHANES) older age, non- definitions of body
with cardiometabolic 1999-2004 Hispanic white size phenotypes on
risk factor clustering, ethnicity, lower the risk of CVD is
among the US income, and needed
population higher physical
activity.
12 Investigate whether Collaborative 163,517 considerable Although

differences in the analysis of ten Participants from variation in the harmonized
prevalence of MHO are large cohort European occurrence of measures captured
due to geographic studies countries. 18–80 MHO across the the essential
variation or differences years different information content
in measurements European for the MHO
populations phenotype, there
even when were differences
unified criteria between studies in
or definitions the way that specific
were used to variables such as
classify this blood pressure and
phenotype serum lipid levels
were measured.
13 What are effective Literature review Obesity Implementing the
strategies for healthy and expert prevention and recommended
weight and obesity consensus healthy weight strategies may
prevention? recommendations. maintenance reduce the risk of
require a obesity-related
multifaceted health problems and
approach that improve overall
includes healthy health outcomes.
eating habits,
regular physical
activity,
5
adequate sleep,
stress
management,
and social
support. Early
intervention
and education
are crucial.
14 Compare differences in standardized sample of 4,757 The prevalence Different
MHO and MUO questionnaire, adults aged 35 of MHO and definitions, criteria,
prevalence according biological years and older MUO in the and cut-offs have
to the 5 most measurements (male 51.1%) was Chinese been utilized by
frequently used and tests enrolled. population researchers to
definitions varies according define metabolic
to different obesity phenotypes,
definitions of leading to
obesity and inconsistency and
metabolic difficulty in
disorders comparing results
across studies.
15 Association between Prospective cohort 9,687 Greater Both BMI variability
variability in body study. participants from variability in and metabolic
mass index (BMI) and the Multi‐Ethnic BMI and worse health are
metabolic health, and Study of metabolic important factors to
how does it affect Atherosclerosis health were consider when
cardiometabolic (MESA). Middle- both assessing
disease risk aged and older independently cardiometabolic
adults (45-84 associated with disease risk, and
years) from six a higher risk of stable metabolically
different ethnic cardiometabolic healthy obesity may
groups in the disease events not confer the same
United States (e.g., heart level of protection
attack, stroke, as stable
diabetes) over a metabolically
median follow- healthy normal
up period of weight
12.2 years
16 How does the Prospective cohort The The findings suggest
interaction between study. combination of that the interaction
dietary composition a high glycaemic between dietary
and insulin secretion load (HGL) diet composition and
affect weight gain in and low insulin insulin secretion
the Quebec Family secretion was may play an
Study? associated with important role in
the greatest weight gain, and
weight gain, individuals with low
whereas a low insulin secretion
glycaemic load may be particularly
(LGL) diet and susceptible to
high insulin weight gain on a
secretion was high glycaemic load
associated with diet.
the least weight
gain
17 How do food-cue Overweight (n = food-cue The study highlights
exposure and body 52) and normal- exposure the importance of
weight status affect weight (n=52) increased rated environmental
6
food intake in participants were hunger and factors in regulating
overweight and lean exposed to the desire to eat, food intake and may
individuals? sight and smell of increased have implications
a ‘cued’ food pizza prospective for the design of
for 60 seconds portion size of weight management
all savoury programs that
foods, and incorporate
increased strategies to reduce
salivation exposure to food
cues.
18 What is the association Prospective cohort Participants were Participants The findings suggest
between common study. aged 9-15 years with anorexia that eating disorders
eating disorders at baseline (in nervosa or may have long-term
(anorexia nervosa, 1996) and were bulimia nervosa consequences for
bulimia nervosa, and followed up had an mental and physical
binge eating disorder) through 2009. increased risk of health, highlighting
and adverse health depression, the importance of
outcomes in young anxiety, and early detection and
adults? substance use treatment.
disorders
compared to
those without
eating
disorders.
19
20 The motivation The motivation to be

to be sedentary sedentary may be an
predicted important factor in
weight change weight
when sedentary management, and
behaviour was reducing sedentary
reduced. behaviour may be a
Participants useful intervention
with higher for weight loss in
motivation to be individuals with high
sedentary had motivation to be
greater weight sedentary
loss when
sedentary
Table 1. Obesity Phenotypes and association of lifestyle factors on health outcomes
7
3 Study Design
The data was collected from UK biobank which is a large-scale prospective cohort study in
the United Kingdom. It is a research resource that provides access to a vast array of health
and genetic data on approximately 500,000 participants aged between 40 and 69 years. The
baseline data in the UK Biobank was collected through a combination of self-reported
questionnaires, physical measurements, and biological samples.
Participants completed an initial questionnaire covering topics such as demographics,

lifestyle, medical history, and family history of disease. They also underwent a range of
physical measurements, including height, weight, body composition, blood pressure, and
lung function.
In addition, biological samples were collected from participants, including blood, urine, and
saliva. These samples were used to measure a range of biomarkers, such as blood glucose
levels, cholesterol levels, and genetic information.
Finally, participants also provided consent for researchers to access their health records
from national health registries, which allows for the collection of information on any future
health events that may occur. It is worth mentioning that data is completely anonymized.
Overall, the combination of self-reported information, physical measurements, biological

samples, and health records provides a comprehensive picture of each participant's health
status at the baseline assessment [21]
8
3.1 Methods
To summarize key characteristics of a patient population, such as age, sex, and medical
history, descriptive analytics is usually employed. It is also used to describe the distribution
of a particular disease or condition within a population, including its prevalence, incidence,
and risk factors, identify outliers or implausible values, explore patterns in the data.
Descriptive summary statistics are computed, using mean (±SD) or median (IQR) for
continuous variables and frequencies (%) for categorical variables.
Four phenotypes based on BMI: underweight (BMI< 18 kg/m2), normal weight (BMI ≥ 18 &
BMI< 25 kg/m2), overweight (BMI ≥ 25 & BMI< 30 kg/m2), and obese (BMI ≥ 30 kg/m2). For
each phenotype, describe the following: body composition, physical sleep duration, diet
(DASH score and its individual components), and biochemical markers by calculating
descriptive statistics [unadjusted means (SD) or median (IQR)] and marginal means
(regression models) of baseline values. Complete profile is attached in Appendix A. For the
purpose of this thesis, we have studied the interaction of following variables
1. BMI as ordinal variable as predictor

2. Systolic blood pressure, Visceral adipose, HbA1c, moderate and vigorous activity
Continuous outcome variable
3. Age, Ethnicity, Gender, Deprivation index as adjustment variables
Afterwards, ANOVA is used to compare the means of two or more groups or categories of
data to see if they are significantly different, followed by linear regression. These will help
identify differences in the variables of interest between different groups and to understand
the factors that contribute to these differences.
In medical research, linear regression can be used to investigate the relationship between a
patient's clinical characteristics (such as age, sex, and medical history) and their health
outcomes (such as disease incidence or mortality). Linear regression can also be used to
predict the value of the outcome variable based on the values of the predictor variables.
9
3.2 Data Availability
For this study, R is used as programming language and statistical software for data
extraction, running descriptive, ANOVA and regression model. It is acknowledged that the
facility for data collection and tools provision is given by DPUK Data portal [22]. The portal
gives facility for researchers to analyse the data of relevant cohort in a secure, remote
environment complete with data linkage and analysis packages. The author, working as
researcher at NnedPro has signed data access agreement (DAA) made between Swansea
University and NnedPro to access the DPUK portal and Biobank folder where data related to
Biobank is analysed and stored. The access to data is provided by the UK Biobank Resource
under approved application number: 77447.
3.3 Data Collection and conversion

The process of data collection begins with data download from UK biobank. The approved
researcher authorized to access the data follows this typical process: downloading the
encrypted data, validating, decrypting and converting into format of choice. In our case,
since we use R, the format would be to run helper functions that convert the decrypted files
into format that can run in R studio. The summarized steps are given below, taken from the
guide (https://biobank.ndph.ox.ac.uk/~bbdatan/Data_Access_Guide_v3.0.pdf)
1. Download the encrypted dataset from AMS (access management system) of biobank.
Click on Projects and select the relevant project ID and click the blue button
"View/Update". Now click on the Data tab at the top right, and then on the “Go to
Showcase to refresh or download data” button which will lead to the Showcase
Downloads page. Click on the ID (also called the "Run ID") for the dataset you wish
to download, which will take you to the authentication screen
Now enter the 32-character MD5 checksum, which was included in the main body of
the notification email for the dataset. Then click "Generate". Click the "Fetch" button
to download the encrypted dataset.
10
Figure 1. Data Downloading steps in UK Biobank
11
2. Once the encrypted data is downloaded, validation step to ensure download is
correct is accomplished with helper function (ukbmd5) which will verify the integrity
of the downloaded main dataset file using the command:
ukbmd5 ukb23456.enc (replacing ukb23456.enc with the name of your main dataset
file)
3. Decrypting the dataset (ukbunpack)

Datasets are supplied in a compressed encrypted format. The ukbunpack program
decrypts and decompresses the downloaded file into a custom UK Biobank format.
4. Converting a main dataset to your preferred format
The result of the unpacking program is a dataset in a custom UK Biobank format (the.
enc_ukb file above). The ukbconv program can be used to convert this into various
other formats.
The ukbconv program is run via the command:

ukbconv ukb23456.enc_ukb
where ukb23456.enc_ukb is the file generated from the previous unpacking step
(with 23456 replaced by the run ID of your dataset), and is replaced by one of: docs,
csv, txt, r, sas, stata or bulk depending on the output desired.
For example, to convert unpacked file containing coded categorical variables into R,
first the unpacked dataset in step 3 is converted into csv file containing the columns
downloaded, then ukbconv can be used to automatically replace the coded values
used in categorical variables with their meanings. For example, in Field 31 (Sex),
Female is represented by the value 0 and Male by the value 1. If, for example, the r
option is used, a tab-separated file will be created that still codes the values 0 and 1
for this variable, but an R script will also be generated that, when used to import the
dataset into R, will recode each 0 to "Female" and each 1 to "Male".
The two R based file that are generated from the process above are (ukb23456.tab and
ukb23456.r). The tab file is actual data file that is imported to dataframe in R for analysis
using ukb2356.r file. This r file contains list of commands that help to import the tab file into
a dataframe as well as recode categorical variables, for example, if coded variable such as
sex is categorized “Female” as 0 and “Male” as 1, it can be recoded in opposite way. (Male
as 0 and Female as 1).
12
3.4 R for analysis
Once data is retrieved and saved in data frame in R, it is now ready to be analysed. The data
is in the following form as shown in table. eid is unique patient identifier, the first number
from other columns, 53 and 20002 are datafields represents Date of attending assessment
centre and non-cancer illness code respectively). The second digit in each datafield
represents the instance (0 for baseline, 1 for first repeat assessment, 2 for 2nd repeat) and
the last digit called array index indicates if multiple pieces of data were gathered at the same
time. Thus (for patient id: 1256847, the date of attending assessment at the recruitment is
11/04/2007)
The main R commands and their functions used to conduct descriptive and inferential
statistics are shown in table with name of packages.
Commands Functions
Fread () Read files from tab delimited file.
ordered Convert data into ordered factor.
source Read and Parse input file into current workspace
readRDS Read RDS file into dataframe
select Select specific columns from dataframe
filter Use selects to choose specific rows based on

condition specified
13
Summarize () Descriptive statistics (mean, standard deviation, IQR,
count)
Createworkbook () export results form dataframe to excel file for analysis
Crossable 2- way frequency table (e.g between BMI group and

gender)
geom_boxplot () Draw Box plots

Geom_barplot() Drawbars plots
Chisq.test Perform chi squared test
anova Perform anova
lm Perform linear regression
Kruskal.test Kruskal test
hist histogram
resid Residual plots
qqnorm Quantile-quantile plots
plot Scatter plot
Table 2. R commands and their functions used in the study
14
4 Statistical Analysis
Each type of variables (biochemical, baseline characteristics, clinical variables, physical
activity variables, body composition, diet, sleep time) are stratified by body shape
phenotype (underweight, normal, overweight and obese) and descriptive statistics is run as
explained in “method” section. For outcome variable chosen namely (HbA1c, systolic blood
pressure, visceral adipose, vigorous and moderate activity), ANOVA is used to compare the
mean values of across different groups based on weight categories.
4.1 HbA1c
The results of ANOVA showed that there were significant differences in HbA1c [Fig.1] across
the four weight groups (F ≈ 7395, p < 0.001). The F value indicates there is a significant
variation between weight categories, but to see which groups are different from each other,
post ad hoc test is performed [table 1]. It is observed that obese people have higher mean
values of HbA1c (3.79 mmol/mol, p < 0.001) compared to Normal weight (reference group)
and overweight (3.79 - 1.10 = 2.69).
Figure 2. HbA1c stratified by body shape phenotype
15
Reference (Normal weight) estimate Std error statistic p value
(Intercept) 34.724 0.016 2053.700 0.000
Obese 3.797 0.025 146.578 0.000
Overweight 1.108 0.022 49.273 0.000
Underweight 0.4709 0.179 2.630 0.008
Table 3 Mean differences (HbA1c) and their significance with “Normal weight” taken as
reference.
One of the assumptions for applying ANOVA is the data comes from normal distribution and
the Q-Q plot or quantile-quantile plot is a graphical tool that would help assess that. As seen
from [Fig. 2], many of the values higher values does deviates from normality. It is also
indicated from the box plot [Fig. 1] that 75% percentile values fall between (34.45 to 40.1).
It is important to note that higher values, though outlier in statistical sense, are plausible
values and indicate person falling into diabetes if HbA1c goes past 48 mmol/mol).
Figure 3.Q-Q plot for HbA1c showing deviations from normal curve at higher values
16
4.2 Systolic Blood Pressure (SBP)
The process of finding differences among SBP category is same as detailed for HbA1c. From
Fig 4, it is seen that mean SBP increases proportionally as weight of person increases and
differences are significant (F > 5927, p <0.001). Very high value of F statistic does indicate
there is a significant variance among groups compared to variance within group. Also seen
from table 2, all mean values are significant.
Figure 4. Systolic blood pressure stratified by body shape phenotype

(Intercept) 134.680 0.049 2725.663 0.000
Obese 9.348 0.075 123.520 0.000
Overweight 6.574 0.065 99.938 0.000
Underweight -6.333 0.520 -12.160 0.000
Table 4 Mean differences (SBP) and their significance with “Normal weight” taken as
reference.
17
Figure 5. QQ plot and histogram showing type of distribution for SBP measurement.
18
4.3 Visceral Adipose
Significant differences also seen among groups for Visceral adipose. These are correlated
linearly with weight as well.
Figure 6.Visceral Adipose stratified by body shape phenotype

(Intercept) 2.273 0.018 121.125 0.000
Obese 3.611 0.032 109.614 0.000
Overweight 1.901 0.025 73.624 0.000
Underweight -1.444 0.263 -5.477 0.000
Table 5.Mean differences (Visceral adipose) and their significance with “Normal weight”
taken as reference.
19
4.4 Moderate Physical Activity
Figure 7.Moderate physical activity stratified by body shape phenotype

(Intercept) 60.059 0.203 295.476 0.000
Obese -3.988 0.322 -12.377 0.000
Overweight 0.403 0.272 1.479 0.138
Underweight -1.698 2.213 -0.767 0.442
Table 6 Mean differences (Visceral adipose) and their significance with “Normal weight”
taken as reference.
20
4.5 Vigorous Activity
Figure 8.Moderate physical activity stratified by body shape phenotype

(Intercept) 41.337 0.147 280.277 0.000
Obese -2.401 0.244 -9.814 0.000
Overweight 0.245 0.199 1.233 0.217243
Underweight -7.323 1.783 -4.105 0.000
Table 7. Mean differences (Vigorous physical activity) and their significance with “Normal
weight” taken as reference.
21
4.6 Linear Regression
To quantify the relative contribution of factors affecting outcome variables and to estimate
their values from predictors (BMI categories), linear regression analysis is performed. For all
outcome variables, the linear regression is run twice, first taking age and sex as adjustment
variables and then deprivation index and ethnicity is added to it.
Prior to running the regression model, scatter plots were used to check for linearity in the
relationship between the continuous variables, while box plots were used to visualize the
relationship between the categorical variables and the continuous outcome variable.
When age and gender variables were controlled for, it was found that the average increase
in HbA1c level for obese people was the highest and significant (βobese = 2.609, p < 0.001)
compared to normal weight (reference weight). The mean increase in HbA1c value for males
was also significant (βsex = 0.165, p < 0.001) compared to females. When ethnicity and
deprivation index were included as adjustment variables, it was observed that Asians (e.g.,
Pakistanis (βPK = 4.37, Indians (βIN = 3.56), Bangladeshis (βBD = 6.46)) had higher mean values
of HbA1c compared to Britons. The same trend was seen for other ethnicities namely:
Africans, Black or black British, Caribbeans, and Chinese. (βcarib = 2.87 > βCHI = 2.58> βAF =
2.51 > ΒBl.British = 2.36 )
Systolic blood pressure showed an average increase of 7.89 and 4.76 among obese and
overweight individuals, respectively, when adjusted for age and sex, compared to those with
normal weight. The same increase was observed in males (βobese = 4.42, p < 0.001). When
ethnic groups and deprivation index were added, we see that Africans and the Caribbean
experience a higher rise in SBP compared to British individuals, while Asians showed the
opposite trend (e.g., Pakistanis (βPK = -1.39, Indians (βIN = -0.14), Bangladeshis (βBD = -3.93)).
For visceral adipose, same trend is observed, when adjusted for age and sex, with average
increase in visceral adipose is higher for obese and changes positively with increase in age
and change in gender (female to male). When ethnicity and deprivation index were added,
visceral adipose shows lower mean values in ethnicities compared to Britishers i.e., we see
decrease in mean visceral adipose values when compared to British individuals. Also, we see
downward trend (decreasing mean values) for each quintile (Reference for deprivation index
quantile is taken as 1 i.e. least deprived group) for systolic and visceral adipose.
For moderate activity, it is seen that obese people are less active than normal people on
average, and ethnic groups such as Asians are least active compared to Britons. Also seen
richer people are more active on average as compared to the deprived ones.
22
Figure 9. Scatter plots between outcome variables and Age
23
Figure 10.Box plots showing outcome variables stratified by BMI-categories
24
Figure 11. Box plots showing outcome variables stratified by Gender
25
Figure 12Box plots showing outcome variables stratified by Gender
26
Figure 13 Outcome variables satisfied by Deprivation index
27
5 Discussion
It is important to comment on statistical methods that were followed and to evaluate the
results of the analysis in context of existing literature. First, assessment of the differences
between groups are usually carried with help of standard techniques like ANOVA and it does
assume that the data follows normal distribution. Seen from the distribution of outcome
variables (HbA1c, visceral adipose, moderate activity, vigorous activity), it seems to deviate
at extreme but valid values. It may cause results to be invalid, thus alternative nonparametric
test could be conducted to detect meaningful differences. For this purpose, Kruskall Wallis
test is also performed as complementary step to confirm the results of ANOVA and results
are as shown in table 8 for HbA1c. Kruskall Wallis test does indicate the difference between
underweight and normal weight and significate at 5 percent. These can be done for other
variable outcomes.
Kruskal- Wallis test [HbA1c]
Normal weight Obese Overweight

Obese 0
Overweight 0 0
underweight 0.000 0.000 0.028
ANOVA [HbA1c]
Normal weight Obese Overweight

Obese 0
Overweight 0 0
underweight 0.051 0.000 0.002
Table 8 Comparison of ANOVA with Kruskal-Wallis test
28
Another aspect is sample size of underweight people is low compared to other weight
categories, the reason is most of the data (88%, n = 440574) comes from British people and
moreover, 68 percent includes overweight and obese. This has implication when studying
ethnic differences and risk factor profiles across ethnic groups and requires new statistical
methods to deal with class imbalance or sample size difference, therefore typical chi-
squared test use to find differences among categorical subgroups may be invalid and lead to
misleading statistics. Better approach would be to assess the effect size with tests such as
Cohen's d or eta-squared) to indicate whether the difference observed between two
treatments is clinically relevant. Also, other non-parametric tests such as Kruskal-Wallis test
or the Mann-Whitney U test can be used.
https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00863/full
The issue of small sample size can lead to a large variance in a 95% confidence interval.
Specifically, since data on other ethnicities is low, we have greater uncertainty to infer
population parameters, as can be seen when estimating Beta coefficients for SBP. This may
lead to non-significant p-values indicating large variability present in data, which in turn
makes it more challenging to identify true effects. This can be seen when estimation of
coefficients in 5th quintile (HbA1c)
Another issue is with normality assumption for linear regression. As QQ plot deviates from
theoretical normal distribution at extreme values, alternative such as log transformation of
independent variables can be adopted to reduce the skewness of variables. Better approach
would be to use robust regression model. This is preferred method where the data contains
outliers or influential observations that can bias the estimated coefficients of the linear
model.
Th results obtained from regression can now be analysed. Referring to scatter plots relating
outcome variables and age, it is seen that SBP correlates well with age (0.32 spearman
correlation coefficient). As people age, their blood pressure does increase, and this
relationship is extensively studies in medical literature and recognized well in medical
community. For instance, in [23] it is reported that the mean systolic blood pressure
increased with age in all countries studied, and this trend was consistent in both men and
women. Also, European countries showed slight modest heterogeneity in BP, but had higher
mean values compared to Canada and US.
HbA1c has relatively weak correlation with age (0.26 mmol/mol). Some studies have
reported weak relationship with HbA1c, while in other studies, greater associaton is found
between HbA1c and age [24-25]. In [25]. It was reported that HbA1c levels increased by
0.10% for 10 years increase in age in people with NGT (normal glucose tolerance) and 0.07%
in those with IFG (impaired glucose tolerance) and/or IGT, independent of fasting and 2-h
glucose on OGTT's (oral glucose tolerance test). Considering various ethnicities, it was found
that differences in HbA1c levels between ethnic groups were also independent of glycemia.
29
Left this page to add little bit more on discussion side.
30
6 Summary and Conclusions
To be done after review
31
7 References / Bibliography
1. WHO. "Obesity and overweight. World Health Organization." (2021).
2. Economic Costs | Obesity Prevention Source | Harvard T.H. Chan School of Public Health
3. Dee, Anne, et al. "The direct and indirect costs of both overweight and obesity: a
systematic review." BMC research notes 7.1 (2014): 1-9.
4. Di Angelantonio, Emanuele, et al. "Body-mass index and all-cause mortality: individual-
participant-data meta-analysis of 239 prospective studies in four continents." The
Lancet 388.10046 (2016): 776-786.
5. Kaess, B. M., et al. "The ratio of visceral to subcutaneous fat, a metric of body fat
distribution, is a unique correlate of cardiometabolic risk." Diabetologia 55 (2012): 2622-
2630.
6. Rosenquist, Klara J., et al. "Visceral and subcutaneous fat quality and cardiometabolic
risk." JACC: Cardiovascular Imaging 6.7 (2013): 762-771.
7. Abraham, Tobin M., et al. "Association between visceral and subcutaneous adipose
depots and incident cardiovascular disease risk factors." Circulation 132.17 (2015): 1639-
1647.
8. Field, Alison E., Carlos A. Camargo, and Shuji Ogino. "The merits of subtyping obesity: one
size does not fit all." Jama 310.20 (2013): 2147-2148.
9. Acosta, Andres, et al. "Selection of antiobesity medications based on phenotypes
enhances weight loss: a pragmatic trial in an obesity clinic." Obesity 29.4 (2021): 662-671.
10. Zhou, Ziyi, et al. "Are people with metabolically healthy obesity really healthy? A
prospective cohort study of 381,363 UK Biobank participants." Diabetologia 64.9 (2021):
1963-1972.
11. Wildman, Rachel P., et al. "The obese without cardiometabolic risk factor clustering and
the normal weight with cardiometabolic risk factor clustering: prevalence and correlates
of 2 phenotypes among the US population (NHANES 1999-2004)." Archives of internal
medicine 168.15 (2008): 1617-1624.
12. van Vliet-Ostaptchouk, Jana V., et al. "The prevalence of metabolic syndrome and
metabolically healthy obesity in Europe: a collaborative analysis of ten large cohort
studies." BMC endocrine disorders 14.1 (2014): 1-13.
13. Larvie, Carl J., et al. "Healthy weight and obesity prevention: JACC health promotion
series." Journal of the American College of Cardiology 72.13 (2018): 1506-1531.
14. Liu, Chania, et al. "The prevalence of metabolically healthy and unhealthy obesity according to
different criteria." Obesity facts 12.1 (2019): 78-90.
15. Sponholtz, Todd R., et al. "Association of variability in body mass index and metabolic health with
cardiometabolic disease risk." Journal of the American Heart Association 8.7 (2019): e010793
16. Chaput JP, Tremblay A, Rimm EB, Bouchard C, and Ludwig DS. A novel interaction
between dietary composition and insulin secretion: effects on weight gain in the Quebec
Family Study. Am J Clin Nutr. 2008;87(2):303-9.
17. Ferriday D, and Brunstrom JM. 'I just can't help myself': effects of food-cue exposure in
overweight and lean individuals. Int J Obes (Lond). 2011;35(1):142-9.
18. Field AE, Sonneville KR, Micali N, Crosby RD, Swanson SA, Laird NM, et al. Prospective
association of common eating disorders and adverse outcomes. Pediatrics.
2012;130(2):e289-95.
32
19. Roemmich JN, Barkley JE, Lobarinas CL, Foster JH, White TM, and Epstein LH. Association
of liking and reinforcing value with children's physical activity. Physiol Behav. 2008;93(4-
5):1011-8.
20. Epstein LH, Roemmich JN, Cavanaugh MD, and Paluch RA. The motivation to be sedentary
predicts weight change when sedentary behaviors are reduced. Int J Behav Nutr Phys Act.
2011;8:13.
21. About our data, https://www.ukbiobank.ac.uk/enable-your-research/about-our-data,
Accessed at 15 Feb 2023.
22. DPUK data portal, https://portal.dementiasplatform.uk/.
23. Wolf-Maier, Katharina, et al. "Hypertension prevalence and blood pressure levels in 6
European countries, Canada, and the United States." Jama 289.18 (2003): 2363-2369
24. Pani, Lydie N., et al. "Effect of aging on A1C levels in individuals without diabetes:
evidence from the Framingham Offspring Study and the National Health and Nutrition
Examination Survey 2001–2004." Diabetes care 31.10 (2008): 1991-1996
25. Davidson, Mayer B., and David L. Schriger. "Effect of age and race/ethnicity on HbA1c
levels in people without known diabetes mellitus: implications for the diagnosis of
diabetes." Diabetes research and clinical practice 87.3 (2010): 415-421
33
8 List of Appendices
List your appendices here. See the example below. NOTE: this is just a list of headings. The
actual content of the appendices shouldn’t be placed here in your report. Your appendices
will be submitted electronically, as a collection of files and documents.
APPENDIX A List of Variables
APPENDIX B Project Plan
APPENDIX C Ethics Approval
34

Ravi Thesis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ravi Thesis

Uploaded by

Copyright:

Available Formats

Ethnic differences in body composition

and lifestyle factors by obesity and

Ravi Mohan Lal

Supervisor: Prof. Stephen McKenna

A dissertation submitted for the Degree of Master of Science

School of Science and Engineering

20th March 2023

Thank you all for your help and encouragement.

Given the limitations of a one-size-fits-all approach, it becomes reasonable to design

This introduction is followed by brief review of previous work on heterogeneity and

Previous efforts to characterise obesity focused on metabolic health defined by blood

Another study [11] defines phenotypes based on various cardiometabolic abnormalities

12 Investigate whether Collaborative 163,517 considerable Although

20 The motivation The motivation to be

Table 1. Obesity Phenotypes and association of lifestyle factors on health outcomes

Participants completed an initial questionnaire covering topics such as demographics,

Overall, the combination of self-reported information, physical measurements, biological

1. BMI as ordinal variable as predictor

3.3 Data Collection and conversion

3. Decrypting the dataset (ukbunpack)

The ukbconv program is run via the command:

ordered Convert data into ordered factor.

source Read and Parse input file into current workspace

readRDS Read RDS file into dataframe

select Select specific columns from dataframe

filter Use selects to choose specific rows based on

Createworkbook () export results form dataframe to excel file for analysis

Crossable 2- way frequency table (e.g between BMI group and

geom_boxplot () Draw Box plots

Table 2. R commands and their functions used in the study

Figure 2. HbA1c stratified by body shape phenotype

Figure 4. Systolic blood pressure stratified by body shape phenotype

Reference (Normal weight) estimate Std error statistic p value

Figure 6.Visceral Adipose stratified by body shape phenotype

Reference (Normal weight) estimate Std error statistic p value

Figure 7.Moderate physical activity stratified by body shape phenotype

Reference (Normal weight) estimate Std error statistic p value

Figure 8.Moderate physical activity stratified by body shape phenotype

Reference (Normal weight) estimate Std error statistic p value

Kruskal- Wallis test [HbA1c]

Normal weight Obese Overweight

Normal weight Obese Overweight

Table 8 Comparison of ANOVA with Kruskal-Wallis test

APPENDIX A List of Variables

APPENDIX B Project Plan

APPENDIX C Ethics Approval

You might also like