You are on page 1of 31

Captured data and

extracted indicators by
BigO
Multimedia Understanding Group
Prof. Anastasios Delopoulos https://mug.ee.auth.gr
Dr. Christos Diou Electrical and Computer Engineering
h t t p : / /e e . a u t h . g r
Vasileios Papapanagiotou Aristotle University of Thessaloniki
h t t p s : / /a u t h . g r

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Presentation content
BigO
◦ project overview
◦ project purpose
◦ data capturing approach
Extracted indicators
◦ physical activity
◦ visited locations
◦ transportation mode
Self reports
Raw data
◦ parsing and understanding metadata
Summary

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


BigO: project overview
Big data against childhood obesity
Horizon 2020 – European union funding for research and
innovation
https://bigoprogram. eu/
EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET
BigO aim
BigO collects and analyzes big data on
obesogenic behaviours and environments
to enable public health authorities to
plan and execute effective programs
against childhood obesity
Exploit sensor technologies and Big Data
analytics to:
• measure obesogenic behaviour
indicators and environment
• offer evidence and tools for targeted
actions against obesity to:
• public health authorities
• health professionals
• schools

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Need of multi-
level approaches
Evidence exists that interventions
targeting various elements of
children’s behavioural patterns, like
what and how they eat, how they
move and how they sleep and
environmental community factors, can
have a positive outcome against
childhood obesity

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


BigO builds a technological platform
Collects behaviour and local environment data
Develops intelligent algorithms to recognize
behavioural patterns
Visualizes individual behavioural patterns for
health professionals to help them follow-up
obese child patients
Visualizes aggregated evidence for public
health authorities and schools to help them
design and monitor programs
Extracts associations between environment
and obesogenic behaviours to investigate
causality

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


BigO on Eurostat
hackathon

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


25 January – 12 February
Participants have been using the BigO mobile app which recorded
multiple signals and uploaded them to the BigO servers

What happened 13 – 18 February


since the 25th of
The uploaded signals were processed using signal processing and
January machine learning

19 February
Three groups of behavioural indicators are extracted for each
participant and made available as a dataset

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Recording
approach
Sensor signals (such as accelerometer
signals) are recorded almost
constantly

When there is no activity, the


recording stops to preserve battery

Recording continues when the phone


is activated or moved again

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Recording
approach
Location is also recorded in a similar
approach

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Behavioural
indicators
Three types of behavioural indicators
are extracted

• Physical activity indicators

• Visited locations

• Transportation mode between


visited locations

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Extracted indicators
Physical activity indicators
Visited locations
Transportation mode between visited locations

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Physical activity
Values are computed in time-slots of 10-minute intervals
For each 10-minute interval, the following indicators are extracted
◦ Activity counts: a metric provided by Actigraph1 devices that shows general motion activity2
◦ Steps: number of steps walked in the time-slot
◦ Activity type: the type of activity from a set of predefined list of activities (e.g. sitting, walking, running)
performed during the time-slot. We provide the number of seconds for each activity.
Additionally, location information is provided
◦ Location is encoded as geohash3
◦ In case of multiple geohashes within the 10-minute time-slot, we provide the geohash that the most time was
spent at
1: https://www.actigraphcorp.com/
2: https://www.ncbi.nlm.nih.gov/pubmed/28604558
3: https://en.wikipedia.org/wiki/Geohash

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Physical activity: CSV file columns
utc_timestamp: UTC time of the indicator
utc_offset: Time-zone offset from UTC
geohash: 6-character geohash localization
counts: Number of actigraphy/activity counts
steps: Number of steps
biking: Number of seconds spend doing this activity type
downstairs: Number of seconds spend doing this activity type
jogging: Number of seconds spend doing this activity type
sitting: Number of seconds spend doing this activity type
standing: Number of seconds spend doing this activity type
upstairs: Number of seconds spend doing this activity type
walking: Number of seconds spend doing this activity type

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Visited locations
Visited locations are detected based on GPS/location data
Location information
◦ To preserve privacy, the actual co-ordinates (latitude & longitude) are not provided
◦ Instead, we provide the geohash that the visited location belongs too (similarly to physical activity
indicators)

Location type information


◦ FourSquare1 is used to provide the type of the location (e.g. park, restaurant)

Arrival and departure times are also provided


1: https://foursquare.com/

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Visited locations: CSV file columns
start_utc_timestamp: UTC time that the user arrived at the detected POI
start_utc_offset: Time-zone offset from UTC (for start_utc_timestamp)
stop_utc_timestamp: UTC time that the user departed from the detected POI
stop_utc_offset: Time-zone offset from UTC (for stop_utc_timestamp)
geohash: 6-character geohash localization
closest_category_name: The type of POI

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Transportation mode
Transportation mode is computed for each trip
◦ Trips are intervals between consecutive visited locations, when data are available continuously
◦ Mode is provided as a normalized histogram over time across a set of predefined modes (e.g. walk, car,
train)

Departure and arrival times are provided


The geohash of the departure and arrival visited locations is also provided

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Transportation mode: CSV file columns
start_utc_timestamp: UTC time that the user departed (from the previous POI)
start_utc_offset: Time-zone offset from UTC (for start_utc_timestamp)
stop_utc_timestamp: UTC time that the user arrived (at the next POI)
stop_utc_offset: Time-zone offset from UTC (for stop_utc_timestamp)
geohash_from: 6-character geohash localization of the previous POI
geohash_to: 6-character geohash localization of the next POI
foot: Percentage of travel time spent on this transportation mode
bike: Percentage of travel time spent on this transportation mode
car: Percentage of travel time spent on this transportation mode
bus: Percentage of travel time spent on this transportation mode
train: Percentage of travel time spent on this transportation mode

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Self reports
Daily answers
Meal reports

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Daily answers
datetime [string]: the date and time that the answer was provided, in ISO 8601 format
mood [integer]: the answer to the question "how do you feel", 1 corresponds to grumpy and 5
to excellent
user_id [integer]: the user id (same as the name of the user folder)

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Meal reports
location [JSON]: a small JSON document describing the location that of the orientation_p [number]: the phone's pitch angle during the recording
advertisement:
◦ type [string]: the type of the location, should always be "Point" orientation_r [number]: the phone's roll angle during the recording
◦ coordinates [Array of numbers]: the longitude and latitude (in that order)
snack_attributes [JSON]: attributes of the meal applicable only to snack
location_accuracy [number]: a circle with radious (in meters) that the food ◦ food_temperature [string]: can be "warm" or "cold"
advertisement could be into ◦ fruit [boolean]: if the meal contains fruit
◦ home_prepared [boolean]: if the meal has been prepared at home (e.g. not in a
location_altitude [number]: the altitude of the location that the food advertisement restaurant)
was recorded
◦ retail [boolean]: if the meal comes in retail packaging
location_bearing [number]: the bearing of the mobile phone while recording the ◦ sugar [boolean]: if the meal contains sugar
food advertisement ◦ other [boolean]: if the meal does not belong to any of the above categories
meal_attributes [JSON]: a description of the meal
drink_attributes [JSON]: attributes of the meal applicable only to drink
◦ food_temperature [string]: can be "warm" or "cold"
◦ coffee_tea [boolean]: if drink is coffee or tea
◦ fruit [boolean]: if the meal contains fruit
◦ dairy_milk [boolean]: if drink is milk or dairy product
◦ home_prepared [boolean]: if the meal has been prepared at home (e.g. not in a
restaurant) ◦ energy_drink [boolean]: if drink is an energy drink
◦ sugar [boolean]: if the meal contains sugar ◦ juice [boolean]: if drink is a juice
◦ other [boolean]: if drink is something else
meal_type [string]: can be "breakfast", "snack", "lunch", "dinner", "drink", etc ◦ soft_drink [boolean]: if drink is a soft drink
◦ sugar [boolean]: if drink contains sugar
orientation_a [number]: the phone's azimuth angle during the recording
◦ water [boolean]: if drink is water

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Raw data
Raw data structure, metadata, and parsing

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Signals are recorded almost constantly

When there is no activity, the recording stops to preserve battery

Recording continues when the phone is activated or moved again

Data capturing:
Sessions
Data capturing occurs in sessions (gray
intervals)
Sessions start and stop automatically
During inactivity (00:00 to 09:00 in this
example), capturing stops to preserve
battery and reduce the data volume
Capturing resumes automatically
when the phone is moved or used
again
Sessions last at most 4 hours (if
maximum duration is reached, the
session ends and a new one starts
immediatetely)

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Data capturing: file structure
Data are organized in one folder per participant
Metadata
◦ Metadata are stored in simple coma-separated-values (CSV) files
◦ The file “sessions.csv” contains one row per session (session metadata)
◦ The file “datafiles.csv” contains one row per data file (data file metadata)

Data
◦ The actual data (e.g. accelerometer files, battery files) in sub-folders, one sub-folder for each session
◦ Python parsers for the binary data are provided (see slide “Parsing raw data”)

Location data are stored for convenience in a single file “location.csv”

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Session metadata
Column name Description

1 device_id Should always be 1

2 session_id A numerical ID of the session; it is the table’s primary key

The reason the session started; e.g. “doze” (resumed after waking up from doze mode), “max_session_duration” (restarting after the previous session
3 start_cause
reached maximum duration), “checker” (internal checking mechanism)

4 start_time_utc UTC time-stamp of session start1

5 start_time_local Local time time-stamp of session start

6 stop_time_utc UTC time-stamp of session stop

7 stop_time_local Local time time-stamp of session stop

8 duration Session duration in seconds (it is computed as stop_time_utc – start_time_utc)

The reason the session stopped; e.g. “doze” (device is going into doze mode), “max_session_duration” (reached maximum session duration and stopping in
9 stop_cause
order to start a new session), “unknown” (the application was probably killed abruptly)

10 subsessions Should always be 0

11 data_files Number of data files created in this session

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Data files metadata
Column name Description

1 device_id Should always be 1

2 session_id The session ID this data file belongs to (foreign key to “session_id” column of session metadata)

3 subsession_id Should always be -1

4 sensor_type The name of the sensor that the data belong to, e.g. “accelerometer”

5 sensor_mode The recording mode of the sensor

6 start_time_utc UTC time-stamp that the file was created

7 start_time_local Local time time-stamp that the file was created

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


# parsing sensor signal
import fileparsers
sensor_data = fileparsers.read_session_data(
'./data/123/1_456', 'accelerometer’)

# individual channels
t = sensor_data[:, 0]
x = sensor_data[:, 1]
Parsing raw data y
z
= sensor_data[:, 2]
= sensor_data[:, 3]
Sensor data are stored in simple binary
files # visualization
import matplotlib.pyplot as plt
You can import the samples in python plt.figure()
using the method read_session_data
from the provided fileparsers plt.plot(t, x)
module plt.plot(t, y)
plt.plot(t, z)
The first input argument is the path of plt.xlabel('Time (s)')
the session folder plt.ylabel('Acceleration (m/s^2)')
plt.grid()
The second input argument is the name
of the sensor, and it can be one of the plt.legend(('x-axis', 'y-axis', 'z-axis'))
following: “accelerometer”, plt.title('Session acceleration plot')
“proximity”, “light”, and plt.show()
“battery” (not that some devices do
not support all sensors)

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Location data (1 of 2)
Column name Description

1 user_id The internal representation of user_id

2 date The date part of the time-stamp, in “YYY-MM-DD” format (local-time)

3 hour The hour part of the time-stamp (local-time)

4 minute The minute part of the time-stamp (local-time)

5 seconds The seconds part of the time-stamp (local-time)

6 accuracy Accuracy of location, i.e. radius (in meters) of circle around the latitude-longitude centre that the participant could have been

7 altitude Altitude of location point

8 bearing Bearing (in degrees) of participant

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Location data (2 of 2)
Column name Description

9 latitude Latitude co-ordinate of location point

10 longitude Longitude co-ordinate of location point

11 sensor_type Should always be “LOCATION_MOBILE”

12 speed Speed of participant at location point

13 timestamp ISO8601-formatted version of time-stamp (local-time)

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Summary
BigO recording approach
Extracted behavioural indicators
Self reports
Raw data

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET


Thank you & good luck!
Contact points
Prof. Anastasios Delopoulos
◦ adelo@eng.auth.gr

Dr. Christos Diou


◦ diou@mug.ee.auth.gr

Vasileios Papapanagiotou
◦ vassilis@mug.ee.auth.gr

EUROSTAT BIG DATA HACKATHON 2019: BIGO DATASET

You might also like