You are on page 1of 40

From Big Data to Actionable

Insights in Public Health


Seminar Nasional
Fakultas Kesehatan Masyarakat
Universitas Airlangga

Setia Pramana, Ph.D


Professor in Statistics
Politeknik Statistika STIS

Saturday, 10 September 2022


Certified Big Data Analyst

EDUCATIONAL CAREER
MEMBERSHIPS
BACKGROUND PATH

Ph.D (2011) Politeknik Statistika STIS Member of Global Working


Statistical Bioinformatics, Jakarta Group on Big Data for
Hasselt University Belgium. Professor in Statistics Official Statistics,
UN Statistics Division. Setia Pramana, Ph.D
M.Sc (2007) BPS Statistics Indonesia Founder and Board Member
setia.pramana@stis.ac.id
Biostatistics Universiteit Head of Sub Directorate of Indonesia Data Science
Hasselt, Diepenbeek, of Statistics Modelling Association,
Belgium. Development. Asosiasi Ilmuan Data
Indonesia (AIDI).
Founder and Board Member
CURRENT RESEARCH
M.Sc (2006) Hasselt University
Belgium (2007-2011) of Indonesian Society for
Applied Statistics, Bioinformatics and
Universiteit Hasselt, Assistant Researcher. Big Data Analytics and
Biodiversity,
Diepenbeek, Belgium. Masyarakat Bioinformatika Development for Public Policy.
dan Biodiversitas Indonesia Computational Statistics,
Medical Epidemiology
(MABBI). Machine Learning and Artificial
B.Sc (1999) and Biostatistics
Statistics, Brawijaya Karolinska Institutet, Board member of Indonesia Intelligence.
University, Malang, Stockholm (2011-2014) Statistical Society, Statistical Software development.
Indonesia. Postdoctoral Researcher. Ikatan Statistisi Indonesia.
2
Data,
Data &
Data
Everywhere…

3
VOLUME VARIETY
Scale of Data Forms of Data

The volume of persistent usable data in analytics system at any The form and content of data structured (RDBMS), semi structured
point in time. (Social Media) or unstructured (text/documents).

Click Stream Printed Corpus Structured Unstructured

Active/Passive Sensor Speech Semi-structured

Log Social Media

Event Traditional
BIG
VELOCITY
Analysis of Data-Flow
DATA VERACITY
Uncertainty of Data

How quickly the analytics system process the data to create The degree to which data is accurate, precise and trusted.
insights.

Speed of Generation Rate of Analysis Uncleansed Untrusted

- A term -> describe extremely large amounts of structured and unstructured data.
- The activity -> capture/storage/processing/sharing/reporting of data –> beyond ability of legacy software tools and hardware infrastructure.
- Related to many “science” branch -> data analytics, data science, machine learning, artificial intelligence, IoT, and many more.
- The application -> on many field -> efficient, cost effective, faster and accurate decision making.
4
Type of
Data
Relational Data (Tables/Transaction/Legacy Data)

Text Data (Web)

Semi-structured Data (XML)

Graph Data

Social Network, Semantic Web (RDF), etc

Streaming Data
5
STRUCTURED Data UNSTRUCTURED Data
High Degree of Organization, such as a relational Information that is difficult to organize using traditional
database. mechanisms.
Example : Example :

Column Value
“The patient came in complaining of chest pain,
Patient John Brown shortness of breath, and lingering headaches.. Smokes 2
Date of Birth 12/07/1993 packs a day.. Family history of hearth disease.. Has been
experiencing similar symptoms for the past 12 hours…”
Date Admitted 02/03/2011

Characteristics : Characteristics :

Well defined content Structure not obvious

Easily understood Process data to understand

Stored in RDBMS RDBMS not a good fit

Easy to enter, store and analyze Difficult and costly to analyze

Example: data in database table (customer Example: email, videos, audio, web pages,
data, sales data and sensor data) social media feeds, presentation
6
Byte of data One grain of rice

Kilobyte Cup of rice

Megabyte 8 bags of rice

Gigabyte 3 container lorries

Terabyte 2 container ships

Petabyte Covers Manhattan

Exabyte Cover the UK (3 times)

HOW LARGE
Zettabyte Fills the pacific ocean

Yottayte Earth size rice ball

IS BIG? 7
TAXONOMY
Exhaust Data
BIG DATA
SOURCES
Passively collected data from people’s use of digital
services such as mobile phones, financial
transactions or web searches.

Sensing Data
Actively collected data from sensors, e.g. in smart
cities or from wearables and also through remote
sensing and satellite images.

Digital Content Open web content actively produced by people such


as social media interactions, news articles, blogs or
job postings. Unlike exhaust and sensing data is
digital content intentionally edited by somebody, i.e.
subjective or even deceptive, depending on the
intentions of the author.

Source : Letouzé (Data-Pop Alliance, 2015) 8


DATA SOURCES

Exhaust data Sensing data Digital Content

Mobile phone data Satellite and UAV imagery Social media data

Financial transactions Sensors in cities, transport and homes Web scraping

Online search and access logs Sensors in nature, agriculture and water Participatory sensing/crowdsourcing

Citizen card Wearable technology Health records

Postal data Biometric data Radio content

Internet of Things (IoT)

WHAT PEOPLE DO WHAT PEOPLE SAY


9
ANALYTICS
• The discovery, interpretation, and communication of meaningful patterns in data (Wikipedia).
• The process to uncover hidden patterns, unknown correlation, and other useful information that can help
organizations make more informed business decision.

Opportunity Activity Benefit

BIG DATA DATA SCIENCE SOCIAL COMPUTING


Large, Fast, Complex the nV’s The Science to extract Quantification of human/social
data. knowledge/pattern from data. behavior.

Methodology Application

INSIGHT
SOURCE
Market segmentation, information
Review, opinion, historical data,
dissemination, fraud detection,
conversation, network friendship,
personalized adv, purchase
CCTV, Vlog, location tagging, etc.
behavior, brand awareness, etc.

10
What is DATA SCIENCE ?
Theories and techniques from many fields and disciplines are used to investigate and analyze a large
amount of data to help decision makers in many industries such as science, engineering, economics,
politics, finance, and education.

COMPUTER SCIENCE MATHEMATICS STATISTICS

Pattern recognition, Mathematical Statistical and


visualization, data modeling. Stochastic
warehousing, high modeling,
performance Probability.
computing,
databases, AI.

11
DATA SCIENCE IS
MULTIDISCIPLINARY

12
DATA SCIENCE Math and
Theory Statistics, Linear Algebra, Optimization, Time Series,
etc.
❖ A Mashed-Up Discipline.
❖ A multi-disciplinary field that Applied
uses scientific methods, Algorithms Machine Learning, Data Structures, Parallel
processes, algorithms and Algorithms, etc.
systems to extract knowledge
Engineering and
and insights from structured Technologies Storage and computing platforms, Statistical tools,
and unstructured data. etc.

Domain
Expertise
Text, Finance, Images, Econometrics, etc.

Art
Visualization, Infographics.

Best Practices
And Hacks Handle missed values in data, transform and
represent data, etc.

13
DATA SCIENCE
❖ New Discipline.
❖ Very few books covering the
discipline as a whole.
❖ Interdisciplinary fields like
business analysis that
incorporate computer science,
modeling, statistics, analytics,
and mathematics.

14
DATA
SCIENCE
Body of
Knowledge

15
DATA
ANALYTICS

16
DATA
ENGINEERING

17
DATA
SCIENCE

Monica Rogati https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007 18


DATA QUALITY

19
Dimensions of DATA QUALITY

20
Data Transformation

- Aggregation
- Normalization/
Standardization
- Discretization
- Feature Engineering
Data Integration

Data Preprocessing

DATA
Data Cleaning
- Accuracy
- Completeness Data Analysis
- Consistency
- Data Dimension REFINERY
Reduction

Data Dissemination/
Data Collection Visualization

https://infokomputer.grid.id/read/122132921/data-refinery-kunci-kualitas-analisis-data?

21
1 2 3 4 5 6
Descriptive
What happened or what is happening now? ANALYTICS
APPROACHES
Diagnostic
Why did it happen or Why is it happening now?

Predictive
What will happen next? What will happen under
various conditions?

Prescriptive
What are the options to create the most optimal/high
value result/outcome?

22
ANALYTICS
APPROACHES

23
Types of
ANALYTICS
TECHNIQUES

https://www.bi.wygroup.net/big-data-analytics/what-are-the-different-approaches-to-advanced-analytics/ 24
Targeting : Find the needle in the haystack

What to target? Data Science Service Change

Target areas

Target categories

Target individuals

Service Issue: Data Science Process: Service Change:


Difficult to identify Use existing data and Engage with target
targets in a population predictive modeling to subset of population
identify targets

Result: Department resources are spent where most needed


25
Prioritizing

What to prioritize? Data Science Service Change

Service Issue: Data Science Process: Service Change:


Backlog is tackled via Create a model to Prioritize cases based on
first in, first out (FIFO) categorize and group categories in order of
past and current cases risk, need or opportunity

Result: Department addresses high priority cases first


26
Predictions

How to detect? Data Science Service Change

Service Issue: Data Science Process: Service Change:


Hard to predict future Use historical and Use estimates to change
condition which leads to current data to create and tailor intervention
reactive services estimate ranges for points
potential outcomes

Result: Department provides pro-active early interventions


27
Optimization

How to distribute? Data Science Service Change

Service Issue: Data Science Process: Service Change:


Difficult to identify where Use geospatial and/or Re-allocates resources
to place or distribute other data to identify to optimal distribution
resources to be most optimal distribution of
effective resources

Result: Department decreases response times; increases volume


28
DATA SCIENCE METHOD

Sentiment analysis Time series analysis Data mining Classification and clustering

Pattern recognition
Missing data
Multilevel modeling
imputations

Machine learning
Principal component
AB testing
and factor analysis

Survival analysis Forecasting


Propensity Logistic, multinomial and
score multiple linear regression
matching techniques
Network analysis

29
Data Science in HEALTHCARE

30
https://www.techtarget.com/searchhealthit/definition/digital-health-digital-healthcare
Data Science in HEALTHCARE

https://www.frontiersin.org/articles/10.3389/fpubh.2018.00099/full https://cbdrh.med.unsw.edu.au/where-can-master-science-health-data-science-take-me

31
Big Data Initiatives and Developments in BPS

Web-crawling Google and Facebook mobility index

Marketplace E-commerce Data People mobility

Flight Tracker,
bus booking site Transportation analytics
Satelite Imagery
Job Vacancy
Labor analysis
Site Economic activities, Agriculture
Online booking site Room occupancy rate, statistics Poverty mapping
and review Number of tourists, et
Air Quality, weather Enviromental and disaster FB Relative Wealth Index
reporting site statistics
Property and vehicles Economic activities
Mobil123, rumah 123 Poverty mapping
statistics
Online news and Current fenomena, citizen
social media sensing Mobile Positioning Data
Infrastructures and people
Google Map Tourism statistics,
activities
Metropolitan Statistics
Company financial report, Area
IDx
Stock index 32
POTENTIAL UTILIZATION OF SATELLITE IMAGERY DATA

Multiple Valuable Features from Geospatial Data

• Most Geospatial Data of satellite imagery


has BIG DATA properties.

• We can extract multiple valuable features


from geospatial data: infrastructures,
urban footprints, agricultural areas, etc

• The emergence of new types of spatial


data from increasingly diverse data
acquisition methods: Social Media,
Mobile phone data, etc.

33
POVERTY MAPPING USING SATELLITE IMAGES

PREDICTED POVERTY DATA AND MAPPING USING DAYTIME


OFFICIAL POVERTY DATA OF SOUTH SULAWESI (PBDT 2015)
AND NIGHTTIME SATELLITE IMAGERIES

With RMSE of 18.54, the results


of satellite image poverty
modelingis able to estimate
the spatial poverty of Official
Poverty Database (PBDT) 2015

34
Environmental Related Data

IQair.com
Variables : 1. Air Quality Index (AQI)
2. Air temperature
3. Air pressure
4. Wind speed
5. Humidity

power.larc.nasa.gov

Variabel : 1. Rainfall
2. Temperature
3. Humidity
4. Wind speed
5. Surface pressure
6. The temperature of the earth's crust
35
Social Media and Online News for Covid-19 Government Response

Number of Tweets

Discussion about government social aid, poverty and stunting Sinovac Astra Zeneca Moderna
Word Total Word Total Word Total
Sleepy 181 Fever 166 Fever 46
Sore 144 Sore 118 Pain 34
Hungry 106 Dizzy 67 Sore 33
Fever 66 Pain 53 Kipi 30
Dizzy 42 Feverish 47 Painful 26
Safe 36 Sleepy 43 Dizzy 18
Sick 34 Hungry 39 Steady 18
Heavy 34 Painful 39 Safe 18
Critical 27 Afraid 34 Thermal 15
Weak 27 Safe 28 Feverish 12

Vaccine Side effect based on Tweets


Social Network based on retweets of COVID-19 Information 36
http://commdis.telkomuniversity.ac.id/jdsa/index.php/jdsa/article/view/73
CHALLENGES USING BIG DATA

37
CHALLENGES USING BIG DATA

STATISTICAL PRIVACY AND REGULATION ON


DATA ACQUISITION METHODOLOGY DATA SOURCE QUALITY DATA PROTECTION NATIONAL STATISTICAL
SYSTEM

38
CONCLUDING REMARKS

Big Data has lot of potentials, risks and challenges


for Healthcare

Multidisciplinary experts → need collaboration with


different experts and data ecosystem

Collaboration from all stake holders -> Penta helix

Numerous opportunities to be explored and


discovered, focus on translational research

National AI research center for Health?

39
THANK
YOU
@setia.pramana@stis.ac.id

You might also like