You are on page 1of 12

SupportED: AI-Based System to Monitor Progression of Eating Disorders

Translational Medical Science (TMED)

Navya Nori

Milton High School, Milton, GA


Introduction
30 million people in the US alone live with an eating disorder, and 95% of those people are

between the ages 12 and 25, primarily between 16 and 17 [1]. Anorexia Nervosa is a mental

condition in which people starve themselves to achieve unhealthy weight loss. 1 out of every 5

Anorexia deaths is by suicide, and hospitalizations for this condition increased by 119% in the last

decade [2]. 35-55% of adolescent girls engage in fasting, self-induced vomiting, or diet pills [3].

Eating disorders as a whole have the highest mortality rate out of any mental illness and

characterizing a patient’s biological and psychological manifestations is the key to reducing this

statistic [4]. Due to the sudden onset of the Covid-19 pandemic, adolescents all over the world

have spent more time alone behind closed doors criticizing their eating habits and bodies,

especially due to the heavy influence of social media and lack of positive interactions [5]. These

habits both exacerbate ED risk, severity, and mental health. Eating disorders can be triggered in a

patient due to a multitude of factors, including negative media exposure, social isolation, irregular

eating patterns, and more. Anorexia Nervosa, a type of eating disorder, is the third most prevalent

chronic illness among adolescents. Current anorexia prevention efforts target the mental health

and emotional response and act as a trusted individual for the patient to communicate with [6].

These options include teaching users to be mindful and reduce stress levels, offering various

therapy methods, and recognizing negative thoughts. However, these methods do not consider the

patient’s biological manifestations, monitor their progression, or provide any treatment

recommendation.

SupportED is an AI-based system that predicts future progression of a patient’s eating

disorder and tracks both their psychological and biological manifestations, provides a treatment

plan tailored to their needs, and has the potential to reduce the staggering mortality rate of Anorexia
Nervosa in adolescents. There are two main components to this project: AI/ML models based on

social media interactions (computational component) and bacterial growth analysis (laboratory

component). The research question for the overall project is “Can a combination of mathematical

modeling and bacterial growth analysis be used to predict future mental illness severity for

Anorexia Nervosa?” and for the lab component is “What is the effect of carbohydrate starvation

(g/mL) on the gut microbiome (cells/mL)?” The hypothesis for the overall project was that a

mathematical model can be combined with lab testing to predict future mental illness severity

(MIS) in Anorexia Nervosa patients. The lab component hypothesis predicted that if the glucose

concentration reaches 60%, the gut bacteria population will begin to decline significantly. The

independent variables used in the laboratory component of this study was the carbohydrate

concentration (g/mL), and the dependent variable was the bacterial absorbance measured by a

spectrophotometer and later converted to cell count using beer-lambert law. Controlled variables

in this experiment included the volume of each trial (6mL), amount of bacteria introduced, (1

inoculating loop’s worth), wavelength used to measure absorbance (600nm), and the temperature

of the incubator (60 degrees Celsius). Two notable constants were the room temperature, and

overall light exposure.

Materials and Methods

The materials used in this study can be broken up into two categories: materials and equipment. 3

pieces of equipment were used in this experiment, an incubator, autoclave, and spectrophotometer.

To conduct the experiment, the materials included volumetric pipettes, 16 cuvettes, parafilm, a

cuvette rack, thioglycolate medium, Clostridium sporogenes in its liquid tube form, sterile

inoculating loops, and distilled water.


This study can be broken down into two parts, the emotional and physiological monitoring. The

computational component will be discussed first.

Computational methods

The first step was to obtain a Twitter developer accounts. The python library Tweepy was used to

collect streaming tweets from Twitter over two months. Then, the tweets were sorted into cases,

which were indicative of Anorexia, or healthy control tweets. These decisions were made using

the DSM 5 clinical criteria for Anorexia Nervosa. The data was then split into train, validation,

and test and stop words, punctuation, and extra characters were removed. This step ensure that the

models are only analyzing word choice. Term frequencies were calculated using the tweets. A

regularized regression was built and statistics were computed, and then a gradient boosted machine

and generalized linear model were built using matrices. The resulting coefficients were analyzed

using shap package for interpretability. Finally, receiving operating characteristics and precision

recall charts were created using the scores from the models. The coefficients, or values of

importance for each word, were implemented into the Android application, SupportED.

Laboratory methods

The purpose of the laboratory experiment was to determine the percentage of the full carb content

at which the gut microbiome’s health is compromised. 16 cuvettes and a cuvette rack were

sterilized in an autoclave before use to eliminate the possibility of extraneous bacterial growth. 15

sterile cuvettes were placed on a cuvette rack, there were five staggered concentrations of glucose

and three trials of each. They were labelled according to concentration number and trial number.

For example, concentration one had 1500 g/mL of glucose and concentration 5 had 5500 g/mL

(the maximum). 0.54 mL of thioglycolate medium was pipetted into the first cuvette using a sterile
volumetric pipette, and this action was repeated 2 more times for the different trials. Then, 1.09

mL of the medium was pipetted into each cuvette for concentration 2, and so on. The final

concentration, concentration 5, had 6mL of the medium. Then, the complementary amount of

distilled water was pipetted into each cuvette making the volume of every cuvette 6 mL. Each

cuvette was sealed with parafilm and covered with an opaque box to prevent any light degradation.

The next day, sterile inoculating loops were used to inoculate Clostridium sporogenes bacteria in

the form of a liquid tube culture. This particular species of bacteria was chosen because of its

nativity to the gut microbiome. The tubes were re-sealed with parafilm and not opened again in

the study, making this a BSL 1 experiment. The cuvettes were then incubated at 40 degrees Celsius

overnight. The next day, the cuvettes were removed from the incubator. The spectrophotometer

was calibrated, and the wavelength was set to 600nm in the absorbance setting. Blanks were

created for each concentration mirroring the process from day one and run before each

experimental cuvette. Creating 5 blanks account for the color gradient caused by the dilutions and

established that changes in absorbance value are due to the bacterial growth only and not any other

factors. Each absorbance value was recorded in a data table and the cuvettes were placed back in

the incubator. Over the next two days, the same data collection process was repeated. Finally, the

bacteria and growth medium were disposed of properly and the facility was clean thoroughly.

Mobile Application Building

The final stage of this study was the app creation, or the bridge between the computational and lab

components. The user opens the app and completes the initial set up process and then the emotional

monitoring begins. The models score Instagram post captions for words that are indicative of

Anorexia. If the post is highly indicative, the user is sent a survey asking what they consumed in

the past 24 hours, which is divided by the recommended number of carbs they should have per
day. The app compares this value to the threshold calculated by the lab experiment, which was

50.13%. Depending on the comparison, the app provides a personalized treatment

recommendation and sends that as well as an alert to the supportER, or linked account interface

helping the user.

A logo was created using paint software. In android studio, 8 screens and wireframes were

created and designed using XML layout editing. To store data, two options were evaluated: the

firebase real time database and the cloud datastore. Each user interface screen is an android activity

and the MainActivity is the starting point of the application. For user authentication, google sign-

in was integrated and event listeners were added to the screen for UI elements such as buttons and

dropdowns. Finally, the app was connected to the google firebase backend and a cohesive color

scheme was implemented to make it visually appealing.

Results

The glucose concentrations for the lab were numbered as concentration 1-5, where 1 is the most

diluted and 5 is the undiluted broth. A T test was used to compare each of the concentrations (1-

4) against concentration 5. On day 1, the T value is the highest because the least growth

occurred, demonstrating the larger difference between concentrations 1-4 and 5. The threshold

calculated by this experiment is the percentage of the undiluted medium at which the bacterial

population begins to decline significantly. On day 2, the T value decreased to a greater extent

and similar changes occurred in the P value and confidence interval. Between concentration 2

and 3, the P value increased to above 0.05, making the data no longer significant. Over the same

interval, the confidence interval began to include 0, accepting the null hypothesis after
concentration 3 because no significant decline in bacterial growth occurs. Thus, the threshold lies

between concentration 2 and 3 (2500 to 3500 g/mL of glucose), at approximately 50.13%.

This table shows the mentioned T values, p values, and confidence intervals by day (1-3) and by

concentration (1-5), where the threshold occurs between concentration 2 and 3 at the red line.

Computation results: the GLM and GBM models produced coefficients, or values of importance,

for various words. The GLM coefficients are shown below.

Words Importance

Skinny 2.02

Weight 1.94
Eat 1.66

Thinspo 1.54

Stop 1.33

Look 0.9

Lose 0.46

Healthy -0.75

Optimistic -1.27

Living -2.09

Amazing -2.57

Kindness -3.22

The most indicative of Anorexia is the word “skinny” with a coefficient of 2.02. Negative

coefficients in this table indicate no association with anorexia. More neutral words are found in

the middle of the table, as it is ordered from most to least importance.

Shapley values computed using the GBM model to show their relative important is shown below.
Variables are ranked in descending order based on feature importance (y-axis). Each dot

represents an observation (row). The position along x-axis is the shap value of that feature for that

observation. The color shows whether that feature was high or low (how significant) for that row

of the dataset.

A confusion matrix shows how well the models recognizes actual cases as cases and actual controls

as controls. The confusion matrix from the GLM model is shown below.

There is the most data in the predicted control, actual control and predicted case, actual case

sections. This shows the GLM model is accurate and reliable and that at a threshold of 50%, the

accuracy is high. The confusion matrix for the GBM model is shown below.
There is the most data in the predicted control, actual control and predicted case, actual case

sections. This shows the GBM model is accurate and reliable and that at a threshold of 50%, the

accuracy is high.

These graphs show the accuracy of the training data for the GLM and GBM models.

The sensitivity, specificity, and accuracy at various thresholds are shown from the GLM model.

At approximately the threshold of 0.67, all three measures are optimal. From 0.3 to 0.64, the

accuracy is the highest.


The sensitivity, specificity, and accuracy at various thresholds are shown from the GBM model

At approximately the threshold of 0.5, all three measures are optimal. At a similar threshold, the

accuracy is optimized.

Conclusions and Discussion

A combination of mathematical modeling and bacterial growth analysis can be used to predict

future MIS and monitor progression of Anorexia Nervosa. The carbohydate starvation threshold

at which the growth of gut bacteria begins to drop significantly is approximately 50.13%. This

system is not costly and acts in real-time, making it a viable solution in versatile settings. Future

enhancements in this study would include implementing a pre-trained convolutional network

(CNN) model for images, and sequential model (RNN) for text. Another enhancement possibility

is the use of more data, which would improve the vocabulary, and potentially the accuracy of the

models. Finally, the app could be connected to a data base which contains all the carbohyrate

values for various food and produce items. Currently, the carb values for several common food
items have been anually inputted into the app and its automation would increase the solutions

versatility.

Acknowledgements

I would like to thank Ms. Martinez and Ms. Riley for their support, and Milton High School for

allowing me to use their facility for the lab component.

References

[1] Anorexia Nervosa in Teens: The Two-Year Window. Integrated Care Clinic, 2017 [URL]

[2] Eating Disorder Statistics. U.S. News, 2020 [URL]

[3] Child Eating Disorders on the Rise. Cable News Network (CNN), 2012 [URL]

[4] Eating Disorders: New Solutions. American Psychological Association, 2009 [URL]

[5] South Carolina Department of Mental Health. Eating Disorder Statistics, 2015 [URL]

[6] Study Shows Covid-19 Has Wide-Ranging Effects on Eating Disorder Concerns. Spectrum
News 1, 2021 [URL]

You might also like