You are on page 1of 15

2/3/2019 Analyzing Sleep Data with R | Sean Nguyen

Saving to zotero.org

Analyzing Sleep Data with R

Analyzing Sleep Data with R | Sean Nguyen


An error occurred saving with Embedded Metadata.
Attempting to save using Save as Webpage instead.

Bedtime and Sleep Duration (2014-2018)

Analyzing Sleep Data with R


Apr 6, 2018 · 9 min read · 0 Comments (/post/analyzing-sleep-data-with-r/#disqus_thread) · 

Data Visualization (/categories/data-visualization), R (/categories/r)


(https://twitter.com/intent/tweet?

text=Analyzing%20Sleep%20Data%20with%20R&url=%2fpost%2fanalyzing-sleep-data-with-r%2f)


(https://www.facebook.com/sharer.php?u=%2fpost%2fanalyzing-sleep-data-with-r%2f)


(https://www.linkedin.com/shareArticle?mini=true&url=%2fpost%2fanalyzing-sleep-data-with-

r%2f&title=Analyzing%20Sleep%20Data%20with%20R)


(http://service.weibo.com/share/share.php?url=%2fpost%2fanalyzing-sleep-data-with-

r%2f&title=Analyzing%20Sleep%20Data%20with%20R)

https://www.seanlnguyen.com/post/analyzing-sleep-data-with-r/ 1/15
2/3/2019 Analyzing Sleep Data with R | Sean Nguyen

(mailto:?subject=Analyzing%20Sleep%20Data%20with%20R&body=%2fpost%2fanalyzing-sleep-data-

with-r%2f) Saving to zotero.org

Analyzing Sleep Data with R

I’ve been using the iOS Sleep Cycle app to track my sleeping since
Analyzing Sleeplate
Data2014
with Rand have
| Sean Nguyen
An error occurred saving with Embedded Metadata.
accumulated quite a bit of information about my sleeping habits. I wanted to see if I could do
Attempting to save using Save as Webpage instead.
some exploratory data analysis and try out some different packages to clean up and visualize
my data. The data from the app comes in a csv le and contains information like date, sleep
quality, sleep duration, and more recently it can integrate with the pedometer so you can get the
number of steps in a given day. The tidyverse is my go to package for a ton of my data analysis,
it works so well with cleaning up and exploring data.

For this analysis I wanted to utilize four packages to learn more about them and apply them to
this datatset:

1. lubridate (http://lubridate.tidyverse.org/) helps you work with dates and times


2. ggridges (https://cran.r-project.org/web/packages/ggridges/vignettes/introduction.html)
allows you to make ridgeplots
3. viridis (https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html) for
nice diverging color palettes/schemes
4. here (https://github.com/r-lib/here) to reproducible refer to les for importing and saving
data within an R project directory link to a repo explaining this package
(https://github.com/jennybc/here_here)

library(tidyverse)
library(lubridate)
library(ggridges)
library(viridis)
library(here)

First I used the here package to tell R that I wanted to read in my le that was located in the
‘data’ folder. The nice thing about here() is that is automatically defaults to the root R project
folder so as long as you have the project folder it doesn’t matter where you place the folder on
your computer, the le will still be accessible since here() takes care of it for you.

The next step was reading in the le with read_delim() with the delim = “;”. I then used rename
to change column names and used mutate() extensively to create new columns from existing
data. I used a couple tricks to get the columns that I wanted like str_replace() to remove “%”
from the sleep_quality column. I also used ifelse() and grepl() to create a column of Weekend
Weekday to tease apart differences in days of the week. After mutate I used select() to pick the
columns I wanted. mutate_at() is one of my favorite functions to convert multiple columns to a
different data type in R. After that I was able to have a clean dataset that was ready for
exploratory data analysis!
https://www.seanlnguyen.com/post/analyzing-sleep-data-with-r/ 2/15
2/3/2019 Analyzing Sleep Data with R | Sean Nguyen

file <- here("data/sleepdata.csv")


Saving to zotero.org
df <- read_delim(file, delim = ";") %>% Analyzing Sleep Data with R
rename(sleep_quality = `Sleep quality`,
Analyzing Sleep Data with R | Sean Nguyen
time_in_bed = `Time in bed`,
steps = `Activity (steps)`) %>% An error occurred saving with Embedded Metadata.
Attempting to save using Save as Webpage instead.
mutate(sleep_quality = str_replace(sleep_quality, "\\%", "") %>% as as.numeric,
day_of_week = as
as.Date(Start) %>% wday(label = TRUE),
what_day = ifelse(grepl("Sat|Sun", day_of_week),"Weekend","Weekday") %>%
as.factor(),
as
sleep_hour = ymd_hms(Start) %>% hour(),
sleep_min = ymd_hms(Start) %>% minute()/60,
sleep_time = sleep_hour + sleep_min,
time_hr = period_to_seconds(hms(time_in_bed))/3600,
date = ymd_hms(Start),
day = day(date),
month = month(date),
year = year(date)) %>%
select(year, month, day, day_of_week,what_day,sleep_time, Start:time_in_bed, tim
e_hr,steps) %>%
mutate_at(vars(year:day),as
as.factor)
df

## # A tibble: 1,270 x 12
## year month day day_of_week what_day sleep_time Start
## <fct> <fct> <fct> <ord> <fct> <dbl> <dttm>
## 1 2014 9 28 Sun Weekend 3.57 2014-09-28 03:34:30
## 2 2014 9 30 Tue Weekday 1 2014-09-30 01:00:51
## 3 2014 10 1 Wed Weekday 2.12 2014-10-01 02:07:23
## 4 2014 10 2 Thu Weekday 1.4 2014-10-02 01:24:34
## 5 2014 10 3 Fri Weekday 0.4 2014-10-03 00:24:24
## 6 2014 10 5 Sun Weekend 1.65 2014-10-05 01:39:23
## 7 2014 10 6 Mon Weekday 23.6 2014-10-06 23:37:14
## 8 2014 10 8 Wed Weekday 0.7 2014-10-08 00:42:22
## 9 2014 10 8 Wed Weekday 23.6 2014-10-08 23:37:56
## 10 2014 10 10 Fri Weekday 0.433 2014-10-10 00:26:28
## # ... with 1,260 more rows, and 5 more variables: End <dttm>,
## # sleep_quality <dbl>, time_in_bed <time>, time_hr <dbl>, steps <int>

I plotted all the data points as a function of time to see if there was anything general trend with
sleep duration and sleep quality these past couple years. Notice how I used
scale_color_viridis() to have a nice diverging color palette for geom_point(). We can see that it
seems like I get around 7-7.5 hours a sleep per night on average.

https://www.seanlnguyen.com/post/analyzing-sleep-data-with-r/ 3/15
2/3/2019 Analyzing Sleep Data with R | Sean Nguyen

df %>%
ggplot(aes(x = Start, y = time_hr, color = Saving to zotero.org +
sleep_quality))
geom_point(alpha = 0.6) + Analyzing Sleep Data with R
geom_smooth() +
Analyzing Sleep Data with R | Sean Nguyen
scale_color_viridis() +
labs(title = "Hours of Sleep", An error occurred saving with Embedded Metadata.
Attempting to save using Save as Webpage instead.
subtitle = "Source: iPhone sleep cycle data (2014-2018)",
x = "Date",
y = "Hours of Sleep",
color = "Sleep Quality\n") +
theme_bw()

Next I wanted to see what time I went to sleep and look at the sleep quality. We can see that the
quality appears to go up as I go to bed earlier.

https://www.seanlnguyen.com/post/analyzing-sleep-data-with-r/ 4/15
2/3/2019 Analyzing Sleep Data with R | Sean Nguyen

df %>%
ggplot(aes(x Saving
x = Start, y = sleep_time, color to zotero.org
= sleep_quality)) +
geom_point() + Analyzing Sleep Data with R
scale_color_viridis() +
Analyzing Sleep Data with R | Sean Nguyen
scale_y_continuous(expand = c(0,0),
breaks = seq(0,24,4), An error occurred saving with Embedded Metadata.
Attempting to save using Save as Webpage instead.
limits = c(0,24.1),
labels = c("12 AM","4 AM","8 AM", "12 PM", "4 PM","8 PM","11:5
9 PM")) +
labs(title = "Bedtime and Sleep Quality",
subtitle = "Source: iPhone sleep cycle data (2014-2018)",
x = "Date",
y = "Bedtime",
color = "Sleep Quality\n") +
theme_bw()

Then I wondered if there might be a difference in bedtime between weekdays and weekends. It
looks like there’s not a huge difference between bedtime between the two.

https://www.seanlnguyen.com/post/analyzing-sleep-data-with-r/ 5/15
2/3/2019 Analyzing Sleep Data with R | Sean Nguyen

df %>%
ggplot(aes(x
x = Start, y = sleep_time, colorSaving to zotero.org
= what_day)) +
geom_point(alpha = 0.6) + Analyzing Sleep Data with R
scale_y_continuous(expand = c(0,0),
Analyzing Sleep Data with R | Sean Nguyen
breaks = seq(0,24,4),
limits = c(0,24.1), An error occurred saving with Embedded Metadata.
Attempting to save using Save as Webpage instead.
labels = c("12 AM","4 AM","8 AM", "12 PM", "4 PM","8 PM","11:5
9 PM")) +
labs(title = "Bedtime and Sleep Quality",
subtitle = "Source: iPhone sleep cycle data (2014-2018)",
x = "Date",
y = "Bedtime",
color = "") +
theme_bw()

We can then look at the comparison between Sleep Quality and Hours of Sleep. Here it’s readily
apparent the more you sleep the better the quality. This is isn’t ground breaking insight but it is
pretty neat to be able to see my own data verifying what’s known in the literature. We can see
that sleep quality taper off after 8+ hours of sleep.

https://www.seanlnguyen.com/post/analyzing-sleep-data-with-r/ 6/15
2/3/2019 Analyzing Sleep Data with R | Sean Nguyen

df %>%
x = time_hr, y = sleep_quality, Saving
ggplot(aes(x color to= zotero.org
what_day)) +
geom_point(size = 2, alpha = 0.3) + Analyzing Sleep Data with R
geom_smooth(aes(group = 1)) +
Analyzing Sleep Data with R | Sean Nguyen
scale_x_continuous(breaks = seq(0,12,2)) +
An error occurred saving with Embedded Metadata.
labs(title = "Sleep Quality vs. Hours of Sleep",
Attempting to save using Save as Webpage instead.
subtitle = "Source: iPhone sleep cycle data (2014-2018)",
x = "Hours of Sleep",
y = "Sleep Quality",
color = "") +
theme_bw()

Sleep cycle also allows you to integrate your sleep data with the pedometer so I plotted sleep
duration vs the number of steps per day to see if there was a trend. There doesn’t appear to be
too much of a trend between the two. The only thing we really see is that 8+ hours of sleep is a
good predictor of good sleep quality. Side note: all the days I went over 10,000 steps were days
I where I was traveling since I tend to walk everywhere when traveling to different places.

https://www.seanlnguyen.com/post/analyzing-sleep-data-with-r/ 7/15
2/3/2019 Analyzing Sleep Data with R | Sean Nguyen

df %>%
filter(steps > 0, Saving to zotero.org
steps < 15000) %>% Analyzing Sleep Data with R
ggplot(aes(x
x = steps, y = time_hr, color = sleep_quality)) +
Analyzing Sleep Data with R | Sean Nguyen
geom_point(alpha = 0.9) +
An error
geom_smooth(color = "Purple", fill = "White") + occurred saving with Embedded Metadata.
Attempting to save using Save as Webpage instead.
scale_color_viridis() +
scale_y_continuous(breaks = seq(0,12,2)) +
labs(title = "Sleep Duration vs Daily Steps",
subtitle = "Source: iPhone sleep cycle data (2014-2018)",
x = "Number of Steps",
y = "Sleep Duration (hr)",
color = "Sleep Quality\n") +
theme_dark()

The next visualization we can perform is a ridgeline plot which is this nice way of visualizing
changes over time. I like to think of them is layered/stacked histograms. We can make a
ridgeline plot of steps and compare them throughout the days of the week. We see that there’s
not too much of a difference in the number of steps but there tends to be more step activity as
the week progresses.

https://www.seanlnguyen.com/post/analyzing-sleep-data-with-r/ 8/15
2/3/2019 Analyzing Sleep Data with R | Sean Nguyen

df %>%
Saving
ggplot(aes(x = steps, y = day_of_week, fill to zotero.org
= ..x..)) +
geom_density_ridges_gradient(scale = 3) + Analyzing Sleep Data with R
scale_x_continuous(expand = c(0.01, 0)) +
Analyzing Sleep Data with R | Sean Nguyen
scale_y_discrete(expand = c(0.01, 0)) +
An =
scale_fill_viridis(name = "Steps\n", option error occurred
"C") + saving with Embedded Metadata.
Attempting to save using Save as Webpage instead.
labs(title = "Number of Steps Each Day of the Week",
subtitle = "Source: iPhone sleep cycle data (2014-2018)",
x = "Steps") +
theme_ridges(font_size = 13, grid = TRUE) + theme(axis.title.y = element_blank
())

Next thing to do is look at sleep duration across the days of the week. Here we can see that I
de nitely catch up on my sleep on Saturdays!

https://www.seanlnguyen.com/post/analyzing-sleep-data-with-r/ 9/15
2/3/2019 Analyzing Sleep Data with R | Sean Nguyen

df %>%
ggplot(aes(x Saving
x = time_hr, y = day_of_week, fill to zotero.org
= ..x..)) +
geom_density_ridges_gradient(scale = 3) + Analyzing Sleep Data with R
scale_x_continuous(expand = c(0.01, 0),
Analyzing Sleep Data with R | Sean Nguyen
breaks= seq(0,12,4)) +
scale_y_discrete(expand = c(0.01, 0)) + An error occurred saving with Embedded Metadata.
Attempting to save using Save as Webpage instead.
scale_fill_viridis(name = "Hours", option = "C") +
labs(title = "Duration of Sleep",
subtitle = "Source: iPhone sleep cycle data (2014-2018)",
x = "Time spent sleeping (hr)") +
theme_ridges(font_size = 13, grid = TRUE) + theme(axis.title.y = element_blank
())

I then wanted to look at quality of sleep across the week but didn’t see an obvious trend.

https://www.seanlnguyen.com/post/analyzing-sleep-data-with-r/ 10/15
2/3/2019 Analyzing Sleep Data with R | Sean Nguyen

df %>%
Savingfill
ggplot(aes(x = sleep_quality, y = day_of_week, to zotero.org
= ..x..)) +
geom_density_ridges_gradient(scale = 3) + Analyzing Sleep Data with R
scale_x_continuous(expand = c(0.01, 0)) +
Analyzing Sleep Data with R | Sean Nguyen
scale_y_discrete(expand = c(0.01, 0)) +
An error
scale_fill_viridis(name = "Quality of Sleep", occurred
option = saving
"C")with
+ Embedded Metadata.
Attempting to save using Save as Webpage instead.
labs(title = 'Quality of Sleep Each Day of the Week',
subtitle = "Source: iPhone sleep cycle data (2014-2018)") +
theme_ridges(font_size = 13, grid = TRUE) + theme(axis.title.y = element_blank
())

One of the most interesting things that I saw was when I graphed my quality of sleep by year.
2014 was a rough time during my graduate school education and you can clearly see an
improvement in my sleep quality after I joined a new lab in 2015. The crazy thing is that this is
only sleep data and you can get a ton of interesting information from a phone app.

https://www.seanlnguyen.com/post/analyzing-sleep-data-with-r/ 11/15
2/3/2019 Analyzing Sleep Data with R | Sean Nguyen

df %>%
Saving
ggplot(aes(x = sleep_quality, y = year, fill to zotero.org
= ..x..)) +
geom_density_ridges_gradient(scale = 3) + Analyzing Sleep Data with R
scale_x_continuous(expand = c(0.01, 0)) +
Analyzing Sleep Data with R | Sean Nguyen
scale_y_discrete(expand = c(0.01, 0)) +
An error
scale_fill_viridis(name = "Quality of Sleep", occurred
option = saving
"C")with
+ Embedded Metadata.
Attempting to save using Save as Webpage instead.
labs(title = "Quality of Sleep by Year",
subtitle = "Source: iPhone sleep cycle data (2014-2018)",
x = "Sleep Quality (%)") +
theme_ridges(font_size = 13, grid = TRUE) + theme(axis.title.y = element_blank
())

Here’s how I created the header of this blog post. I graphed bedtime across time and mapped
the color to sleep duration and removed the axis to make it look more artistic.

https://www.seanlnguyen.com/post/analyzing-sleep-data-with-r/ 12/15
2/3/2019 Analyzing Sleep Data with R | Sean Nguyen

p <- df %>%
Saving
ggplot(aes(x = Start, y = sleep_time, color to zotero.org
= time_hr)) +
geom_point(size = 3, alpha = 0.8) + Analyzing Sleep Data with R
scale_color_viridis(option="plasma") +
Analyzing Sleep Data with R | Sean Nguyen
guides(color = FALSE) +
scale_y_continuous(expand = c(0,0))+ An error occurred saving with Embedded Metadata.
Attempting to save using Save as Webpage instead.
theme_bw() +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
p

ggsave(here("static/img/headers", "sleep.png"),
height = 4, width = 7, units = "in", dpi = 600)

https://www.seanlnguyen.com/post/analyzing-sleep-data-with-r/ 13/15
2/3/2019 Analyzing Sleep Data with R | Sean Nguyen

Notice how I use here within the ggsave function to specify which folder within the R project
directory to save the le. This makes sure that the code will
Saving to work on ANYONE’S computer
zotero.org
regardless of where they put their folder! Analyzing Sleep Data with R
Analyzing Sleep Data with R | Sean Nguyen
So I hope you were able to learn something new from this occurred
An error blog post orwith
saving hopefully be
Embedded inspired to
Metadata.
Attempting to save using Save as Webpage instead.
analyze an interesting dataset of your own. Feel free to comment or reach out if you have any
questions or suggestions!

And one more thing


Be sure to get enough sleep!

Tidyverse (/tags/tidyverse) Lubridate (/tags/lubridate) viridis (/tags/viridis)

here (/tags/here) EDA (/tags/eda)

Related
The World’s Most Powerful Rocket (/post/the-world-s-most-powerful-rocket/)

https://www.seanlnguyen.com/post/analyzing-sleep-data-with-r/ 14/15
2/3/2019 Analyzing Sleep Data with R | Sean Nguyen

0 Comments sean's tumblr Saving to zotero.org  Gyorgy

 Recommend t Tweet Important Update


f Share
Analyzing Sleep Data with R
Sort by Best
Analyzing Sleep Data with R | Sean Nguyen
An error occurred saving with Embedded Metadata.
When you log in with Disqus, we process personal data to facilitate your
Attempting to save using Save as Webpage instead.
Start the and
authentication discussion…
posting of comments. We also store the comments you post
and those comments are immediately viewable and searchable by anyone
around the world.
I agree to Disqus' Terms of Service
I agree to Disqus' processingBe of
theemail
first toand IP address, and the use of
comment.
cookies, to facilitate my authentication and posting of comments, explained
further in the Privacy Policy
I agree to additional processing of my information, including first and third
✉ partydcookies,
Subscribe Add Disqusfor personalized
to your content
siteAdd DisqusAdd and Privacy
🔒 Disqus' advertising as outlined
PolicyPrivacy our in
PolicyPrivacy
Data Sharing Policy

Proceed

© 2018 · Powered by the Academic theme (https://sourcethemes.com/academic/) for Hugo (https://gohugo.io).


https://www.seanlnguyen.com/post/analyzing-sleep-data-with-r/ 15/15

You might also like