Professional Documents
Culture Documents
EDUCATIONAL CAREER
MEMBERSHIPS
BACKGROUND PATH
3
VOLUME VARIETY
Scale of Data Forms of Data
The volume of persistent usable data in analytics system at any The form and content of data structured (RDBMS), semi structured
point in time. (Social Media) or unstructured (text/documents).
Event Traditional
BIG
VELOCITY
Analysis of Data-Flow
DATA VERACITY
Uncertainty of Data
How quickly the analytics system process the data to create The degree to which data is accurate, precise and trusted.
insights.
- A term -> describe extremely large amounts of structured and unstructured data.
- The activity -> capture/storage/processing/sharing/reporting of data –> beyond ability of legacy software tools and hardware infrastructure.
- Related to many “science” branch -> data analytics, data science, machine learning, artificial intelligence, IoT, and many more.
- The application -> on many field -> efficient, cost effective, faster and accurate decision making.
4
Type of
Data
Relational Data (Tables/Transaction/Legacy Data)
Graph Data
Streaming Data
5
STRUCTURED Data UNSTRUCTURED Data
High Degree of Organization, such as a relational Information that is difficult to organize using traditional
database. mechanisms.
Example : Example :
Column Value
“The patient came in complaining of chest pain,
Patient John Brown shortness of breath, and lingering headaches.. Smokes 2
Date of Birth 12/07/1993 packs a day.. Family history of hearth disease.. Has been
experiencing similar symptoms for the past 12 hours…”
Date Admitted 02/03/2011
Characteristics : Characteristics :
Example: data in database table (customer Example: email, videos, audio, web pages,
data, sales data and sensor data) social media feeds, presentation
6
Byte of data One grain of rice
HOW LARGE
Zettabyte Fills the pacific ocean
IS BIG? 7
TAXONOMY
Exhaust Data
BIG DATA
SOURCES
Passively collected data from people’s use of digital
services such as mobile phones, financial
transactions or web searches.
Sensing Data
Actively collected data from sensors, e.g. in smart
cities or from wearables and also through remote
sensing and satellite images.
Mobile phone data Satellite and UAV imagery Social media data
Online search and access logs Sensors in nature, agriculture and water Participatory sensing/crowdsourcing
Methodology Application
INSIGHT
SOURCE
Market segmentation, information
Review, opinion, historical data,
dissemination, fraud detection,
conversation, network friendship,
personalized adv, purchase
CCTV, Vlog, location tagging, etc.
behavior, brand awareness, etc.
10
What is DATA SCIENCE ?
Theories and techniques from many fields and disciplines are used to investigate and analyze a large
amount of data to help decision makers in many industries such as science, engineering, economics,
politics, finance, and education.
11
DATA SCIENCE IS
MULTIDISCIPLINARY
12
DATA SCIENCE Math and
Theory Statistics, Linear Algebra, Optimization, Time Series,
etc.
❖ A Mashed-Up Discipline.
❖ A multi-disciplinary field that Applied
uses scientific methods, Algorithms Machine Learning, Data Structures, Parallel
processes, algorithms and Algorithms, etc.
systems to extract knowledge
Engineering and
and insights from structured Technologies Storage and computing platforms, Statistical tools,
and unstructured data. etc.
Domain
Expertise
Text, Finance, Images, Econometrics, etc.
Art
Visualization, Infographics.
Best Practices
And Hacks Handle missed values in data, transform and
represent data, etc.
13
DATA SCIENCE
❖ New Discipline.
❖ Very few books covering the
discipline as a whole.
❖ Interdisciplinary fields like
business analysis that
incorporate computer science,
modeling, statistics, analytics,
and mathematics.
14
DATA
SCIENCE
Body of
Knowledge
15
DATA
ANALYTICS
16
DATA
ENGINEERING
17
DATA
SCIENCE
19
Dimensions of DATA QUALITY
20
Data Transformation
- Aggregation
- Normalization/
Standardization
- Discretization
- Feature Engineering
Data Integration
Data Preprocessing
DATA
Data Cleaning
- Accuracy
- Completeness Data Analysis
- Consistency
- Data Dimension REFINERY
Reduction
Data Dissemination/
Data Collection Visualization
https://infokomputer.grid.id/read/122132921/data-refinery-kunci-kualitas-analisis-data?
21
1 2 3 4 5 6
Descriptive
What happened or what is happening now? ANALYTICS
APPROACHES
Diagnostic
Why did it happen or Why is it happening now?
Predictive
What will happen next? What will happen under
various conditions?
Prescriptive
What are the options to create the most optimal/high
value result/outcome?
22
ANALYTICS
APPROACHES
23
Types of
ANALYTICS
TECHNIQUES
https://www.bi.wygroup.net/big-data-analytics/what-are-the-different-approaches-to-advanced-analytics/ 24
Targeting : Find the needle in the haystack
Target areas
Target categories
Target individuals
Sentiment analysis Time series analysis Data mining Classification and clustering
Pattern recognition
Missing data
Multilevel modeling
imputations
Machine learning
Principal component
AB testing
and factor analysis
29
Data Science in HEALTHCARE
30
https://www.techtarget.com/searchhealthit/definition/digital-health-digital-healthcare
Data Science in HEALTHCARE
https://www.frontiersin.org/articles/10.3389/fpubh.2018.00099/full https://cbdrh.med.unsw.edu.au/where-can-master-science-health-data-science-take-me
31
Big Data Initiatives and Developments in BPS
Flight Tracker,
bus booking site Transportation analytics
Satelite Imagery
Job Vacancy
Labor analysis
Site Economic activities, Agriculture
Online booking site Room occupancy rate, statistics Poverty mapping
and review Number of tourists, et
Air Quality, weather Enviromental and disaster FB Relative Wealth Index
reporting site statistics
Property and vehicles Economic activities
Mobil123, rumah 123 Poverty mapping
statistics
Online news and Current fenomena, citizen
social media sensing Mobile Positioning Data
Infrastructures and people
Google Map Tourism statistics,
activities
Metropolitan Statistics
Company financial report, Area
IDx
Stock index 32
POTENTIAL UTILIZATION OF SATELLITE IMAGERY DATA
33
POVERTY MAPPING USING SATELLITE IMAGES
34
Environmental Related Data
IQair.com
Variables : 1. Air Quality Index (AQI)
2. Air temperature
3. Air pressure
4. Wind speed
5. Humidity
power.larc.nasa.gov
Variabel : 1. Rainfall
2. Temperature
3. Humidity
4. Wind speed
5. Surface pressure
6. The temperature of the earth's crust
35
Social Media and Online News for Covid-19 Government Response
Number of Tweets
Discussion about government social aid, poverty and stunting Sinovac Astra Zeneca Moderna
Word Total Word Total Word Total
Sleepy 181 Fever 166 Fever 46
Sore 144 Sore 118 Pain 34
Hungry 106 Dizzy 67 Sore 33
Fever 66 Pain 53 Kipi 30
Dizzy 42 Feverish 47 Painful 26
Safe 36 Sleepy 43 Dizzy 18
Sick 34 Hungry 39 Steady 18
Heavy 34 Painful 39 Safe 18
Critical 27 Afraid 34 Thermal 15
Weak 27 Safe 28 Feverish 12
37
CHALLENGES USING BIG DATA
38
CONCLUDING REMARKS
39
THANK
YOU
@setia.pramana@stis.ac.id