You are on page 1of 43

Data Science for

Government and
Business
Siti Mariyah
Pusat Kajian Komputasi Statistik
Politeknik Statistika STIS

1
Relevancy of Data Science
PAST PRESENT
Lack of Data A Data deluge
Software was expensive Software is open source and free
Expensive to store large A fraction of the cost, we can have
amounts of data millions of datasets for a very low cost

There's never been a better time to be a data scientist

2
Data Science
❑ Have data
❑ Have curiosity
❑ Working with data
❑ Manipulating it
❑ Exploring it
❑ Exercises of going through analyzing data
❑ Trying to get some answers [insights]
❑ Tell the findings through a great story

3
What kind of data?
╸ Survey or census data ╸ Click stream, transactional logs,
╸ Social Media (text, images, video) search history
╸ Crowdsourcing data ╸ Short message services, speech
╸ Satellite Imagery ╸ Financial transactions, purchase
transaction
╸ Mobile phone data includes Call
Detail Records ╸ Electronic health records (EHR),
patient data
╸ GPS-generated data
╸ IoT-generated data

4
Data Science
Use Case
in Government

5
Price Nowcasting using Crowdsourcing, Online
Shops and Instagram
• Partners: Badan Pusat Statistik, Politeknik Statistika STIS
• To understand behaviour online purchasing transaction and the shifting from
offline to online

6
Item Sales Location

Density map showing the location distributions of all the items currently on sale
7
Item Category Sold
Tree maps showing the percentage of the amount of product
currently on sale for each category

Bubble chart showing percentage of number


of item per category
8
Average item price per category. Products within technology category have slightly higher average
price than others. There is high probability that there are outliers in Souvenir dan Pesta category
9
Average item sold for each product in the last 30 days per category. Kecantikan category
dominates with more than 800 items sold for each product in this category.
10
Jabodetabek Urban Transportation Policy
Integration Phase 2
• Partners: JICA, Kemenko Perekonomian,
Politeknik Statistika STIS
• Human mobility, September 2018-
February 2019
• In addition to transportation need
analysis, curious to mobility behaviour
and how mobility reflect the well-being
of a county.
Heatmap persebaran stay-point

11
Trip Estimation

Heatmap matrik perjalanan


Heatmap matriks perjalanan lintas
kabupaten/kota di Jabodetabek Jumlah perjalanan harian di Jabodetabek
berdasarkanwaktu 24 jam
12
Radius of
Gyration

Choropleth map jarak radius of gyration dan jarak perjalanan paling jauh dari
tempat tinggal user menurut kecamatan di Jabodetabek
13
Correlation between Human Mobility and
Socioeconomic Indicators in a County

Korelasi Persentase penduduk Indeks kedalaman Indeks keparahan


PDRB
pearson miskin (P0) kemiskinan (P1) kemiskinan (P2)

(1) (2) (3) (4) (5)


Jumlah mobilitas 0,62 -0,51 -0.58 -0.61
14
Transit Points

Sankey diagram kecamatan asal, kecamatan transit, dan


kecamatan tujuan dari perjalanan orang lintas kecamatan
15
Nowcasting Food Prices in Indonesia using
Social Media Signals
• How Twitter data can be used to ‘nowcast’ or provide real-time food
prices
• Partners: Pulse Lab Jakarta, UN World Food Programme, Bappenas,
KOICA (Food & Agriculture)
• To understand the basic commodities’ price dynamics in near real-
time, four commodities: beef, chicken, onion, and chili
• To develop a model to nowcast each commodity price based on the
price extracted from the tweets messages, volume of tweets today
and volume of tweets yesterday

16
Nowcasting Food Prices

17
Analyse Twitter Usage
Patterns and Roles During
Disaster Events in
Indonesia
TWITTER SCRAPING BNPB BMKG

Disaster events in Indonesia Time when earthquake


2013 - happened

2018
‘gempa’ 2013 -
‘kebakaran hutan’ 2018
‘gempa’
‘banjir’ ‘kebakaran hutan’
Bahasa Indonesia ‘banjir’
DIV Komputasi Statistik
18
Data Science
Descriptive analyses
“gempa” tweet volume before, during, and after earthquake
happened

327,37%

19,61%

19
Descriptive analyses
Volume tweet gempa sebelum, selama dan sesudah kejadian gempa

“gempa” tweet volume after one hour earthquake happened

20
Descriptive analyses
“gempa” tweet volume after one hour earthquake happened
based on twitter account classification

21
Descriptive analyses
Volume tweet gempa satu jam pasca kejadian gempa
menurut sumber jenis akun twitter
Interaction Network

Volume tweet gempa satu jam pasca kejadian gempa


menurut sumber jenis akun twitter tanpa akun individu

Individual
Public Figure
Government
Media
NGO

22
Descriptive analyses
Percentage of type of tweet messages
Bantuan Politik
Politik
4% 3% 5%
Dukungan
4%
Politik
Dukungan
Bantuan 15%
2%
Emosi 10%
7%
Emosi
Bantuan
6%
11%
Dukungan
3%
Informasi
Emosi
Informasi 69%
Informasi 2%
77%
82%

Earthquake Bushfires Flood


23
Data Science
Use Case
in Business

24
Search Personalization
• Gojek Technology
• Are you looking for food?
Martabak, Mie ayam, Burger, or
Coffee?
• You open the app and click
“NEAR ME”
• List of restaurants come up,
sorted from the nearest to
furthest

25
Search Personalization
• Mila and Husain will sort out, spend time
scrolling and then it is not impossible to
leave the application without ordering a
restaurant
• So?
• Data science plays here by relying on
transaction data in the past, if Mila and
Husain had bought it, then we know
little about their preferences.
• Apply

26
Search Personalization
• Rank the restaurants
Relevance score = 2 * (1/distance) + 1.2 * rating of restaurant
• Using past search data, click stream, and order data, we can make
relevance judgment for a restaurant level 0 if viewed, 1 if clicked
and 2 if ordered.

27
Search Personalization

• Using past search data, click stream, and order data, we can make relevance judgment for a restaurant level
0 if viewed, 1 if clicked and 2 if ordered.
• Treat as pairwise ranking problems then develop a model using LambdaMART algorithm for predicting
relevance score for couple of restaurants.
credit to Jewel James (Gojek Engineering)

28
29
Fraud Detection (Money Laundry) in Banking
• In carrying out business activities, every bank can be exposed to
operational risks, one of which comes from fraud.
• To minimize the occurrence of fraud, it is necessary to strengthen the
internal control system in the form of implementing an anti-fraud
strategy by the bank.
• Fraud is an act of deviation or omission intentionally carried out to
deceive, deceive, or manipulate a bank, customer, or other party, which
occurs in a bank environment and / or uses bank facilities so as to cause
the bank, customer, or other party to suffer losses and / or perpetrators
fraud obtains financial benefits both directly and indirectly.

30
Fraud Detection (Money Laundering)
in Banking
• How?
• Using data science, conduct a fraud analysis, develop system that to be
adaptive to the transaction
• Look at your customer transaction data (withdrawal, card credit usage,
cash deposit & transaction, payment, loan payment, remittance) & look
at your customer profile
• Set some adaptive rules or alerts

31
Adaptive Alerts
• Beyond Normal Transaction: If there are customers that do transactions at
? % of the average customer transactions in the last ? months with a
minimum value of ? amount
• Early Loan Repayment: If there are customers who do payments before the
nominal maturity of ? amount in the period of financing (PF) (? x PF)
• Fraud Indication: If there are customers who get a minimum of ? amount
cash transfers in a minimum of ? transactions in ? days, then withdrawn /
moved with a ratio of at least ? of funds coming in and a maximum of ? of
funds coming in ? days.

32
Adaptive Alerts
• Out of occupation profile: If there is a transaction within a period of ? day
worth more than ? the customer's income
• Pass by: If there are deposit transactions or incoming transfers made by
customers followed by withdrawals or outgoing transfers worth the
incoming funds within ? days of receiving the funds
• U-turn: If a transaction is carried out by several customers with a minimum
transaction above ? (in ? transactions) for a period of ? days

33
Netflix
• Founded in the year 1997 by Reed Hastings and Marc Randolph in Scotts
Valley, California, Netflix now has its presence in more than 190 countries
thus the world’s leading provider of on-demand video, movie streaming,
and TV series.

34
Netflix Motto
The happier the customers are, the longer they stay
subscribed to the service.

35
Factors Impacting Customers Enjoyment
• Netflix captures viewers’ enjoyment through rating given to the
shows/Movies.
• As streaming video becomes many more data points
• Time of day something was watched
• User age and gender (based on individual logins)
• Time spent selecting movies
• How often a movie or program was paused/resume

36
Netflix predicts “Perfect situation”

• Models to predict “perfect situation” in which, customer receiving the


programs they enjoy
• Netflix assigns users to 3–5 different clusters among more
than 1300 clusters, based on their viewing preferences.

37
Data-Driven Categorization of Movies
• Netflix Service created 76,897 unique ways to describe types of movies.
• These are called “alt-genres” which is what leads to Netflix’s Scarily
specific movie/show suggestions(e.g. “Movie-like: The Rise of
Skywalker”)

Clearly they go beyond the classical


categories like drama, sci-fi, and
comedy.

38
Cover Image Personalization
• As you observed that all users have different cover pages based on
their movie preferences also it may change with time.
• This is the most important thing which Netflix does for brings more
new viewers.

Netflix models the shows’


cover image on the colors
and styles for successful
similarly tagged programs.
39
Approach to achieve
• Recommendation Engine

Netflix disrupted the TV industry


using Data Science to provide
viewers with exactly the content
they want.
40
Conclusion
• Empowering both government and business to make better decisions
• Decision making with quantifiable, data driven evidence
• Identifying opportunities, refining target customers
• Not stack on one skill but various skills, not on a particular tools but
on appropriate tools
• Data scientist direct the actions based on trends which in turn help in
defining the goals

41
Credit to
• Politeknik Statistika STIS
• Badan Pusat Statistik
• Badan Perencanaan Pembangunan Nasional
• Pulse Lab Jakarta
• Gojek
• Medium

42
References
• bigdata.stis.ac.id
• United Nations Global Pulse 2014 Nowcasting food prices in
Indonesia using social media signals Global Pulse, Project Series no. 1
• https://blog.gojekengineering.com/the-secret-sauce-behind-search-
personalisation-a856fb83c2f?gi=31038c496ab3

43

You might also like