Professional Documents
Culture Documents
Data Science For Government and Business
Data Science For Government and Business
Government and
Business
Siti Mariyah
Pusat Kajian Komputasi Statistik
Politeknik Statistika STIS
1
Relevancy of Data Science
PAST PRESENT
Lack of Data A Data deluge
Software was expensive Software is open source and free
Expensive to store large A fraction of the cost, we can have
amounts of data millions of datasets for a very low cost
2
Data Science
❑ Have data
❑ Have curiosity
❑ Working with data
❑ Manipulating it
❑ Exploring it
❑ Exercises of going through analyzing data
❑ Trying to get some answers [insights]
❑ Tell the findings through a great story
3
What kind of data?
╸ Survey or census data ╸ Click stream, transactional logs,
╸ Social Media (text, images, video) search history
╸ Crowdsourcing data ╸ Short message services, speech
╸ Satellite Imagery ╸ Financial transactions, purchase
transaction
╸ Mobile phone data includes Call
Detail Records ╸ Electronic health records (EHR),
patient data
╸ GPS-generated data
╸ IoT-generated data
4
Data Science
Use Case
in Government
5
Price Nowcasting using Crowdsourcing, Online
Shops and Instagram
• Partners: Badan Pusat Statistik, Politeknik Statistika STIS
• To understand behaviour online purchasing transaction and the shifting from
offline to online
6
Item Sales Location
Density map showing the location distributions of all the items currently on sale
7
Item Category Sold
Tree maps showing the percentage of the amount of product
currently on sale for each category
11
Trip Estimation
Choropleth map jarak radius of gyration dan jarak perjalanan paling jauh dari
tempat tinggal user menurut kecamatan di Jabodetabek
13
Correlation between Human Mobility and
Socioeconomic Indicators in a County
16
Nowcasting Food Prices
17
Analyse Twitter Usage
Patterns and Roles During
Disaster Events in
Indonesia
TWITTER SCRAPING BNPB BMKG
2018
‘gempa’ 2013 -
‘kebakaran hutan’ 2018
‘gempa’
‘banjir’ ‘kebakaran hutan’
Bahasa Indonesia ‘banjir’
DIV Komputasi Statistik
18
Data Science
Descriptive analyses
“gempa” tweet volume before, during, and after earthquake
happened
327,37%
19,61%
19
Descriptive analyses
Volume tweet gempa sebelum, selama dan sesudah kejadian gempa
20
Descriptive analyses
“gempa” tweet volume after one hour earthquake happened
based on twitter account classification
21
Descriptive analyses
Volume tweet gempa satu jam pasca kejadian gempa
menurut sumber jenis akun twitter
Interaction Network
Individual
Public Figure
Government
Media
NGO
22
Descriptive analyses
Percentage of type of tweet messages
Bantuan Politik
Politik
4% 3% 5%
Dukungan
4%
Politik
Dukungan
Bantuan 15%
2%
Emosi 10%
7%
Emosi
Bantuan
6%
11%
Dukungan
3%
Informasi
Emosi
Informasi 69%
Informasi 2%
77%
82%
24
Search Personalization
• Gojek Technology
• Are you looking for food?
Martabak, Mie ayam, Burger, or
Coffee?
• You open the app and click
“NEAR ME”
• List of restaurants come up,
sorted from the nearest to
furthest
25
Search Personalization
• Mila and Husain will sort out, spend time
scrolling and then it is not impossible to
leave the application without ordering a
restaurant
• So?
• Data science plays here by relying on
transaction data in the past, if Mila and
Husain had bought it, then we know
little about their preferences.
• Apply
26
Search Personalization
• Rank the restaurants
Relevance score = 2 * (1/distance) + 1.2 * rating of restaurant
• Using past search data, click stream, and order data, we can make
relevance judgment for a restaurant level 0 if viewed, 1 if clicked
and 2 if ordered.
27
Search Personalization
• Using past search data, click stream, and order data, we can make relevance judgment for a restaurant level
0 if viewed, 1 if clicked and 2 if ordered.
• Treat as pairwise ranking problems then develop a model using LambdaMART algorithm for predicting
relevance score for couple of restaurants.
credit to Jewel James (Gojek Engineering)
28
29
Fraud Detection (Money Laundry) in Banking
• In carrying out business activities, every bank can be exposed to
operational risks, one of which comes from fraud.
• To minimize the occurrence of fraud, it is necessary to strengthen the
internal control system in the form of implementing an anti-fraud
strategy by the bank.
• Fraud is an act of deviation or omission intentionally carried out to
deceive, deceive, or manipulate a bank, customer, or other party, which
occurs in a bank environment and / or uses bank facilities so as to cause
the bank, customer, or other party to suffer losses and / or perpetrators
fraud obtains financial benefits both directly and indirectly.
30
Fraud Detection (Money Laundering)
in Banking
• How?
• Using data science, conduct a fraud analysis, develop system that to be
adaptive to the transaction
• Look at your customer transaction data (withdrawal, card credit usage,
cash deposit & transaction, payment, loan payment, remittance) & look
at your customer profile
• Set some adaptive rules or alerts
31
Adaptive Alerts
• Beyond Normal Transaction: If there are customers that do transactions at
? % of the average customer transactions in the last ? months with a
minimum value of ? amount
• Early Loan Repayment: If there are customers who do payments before the
nominal maturity of ? amount in the period of financing (PF) (? x PF)
• Fraud Indication: If there are customers who get a minimum of ? amount
cash transfers in a minimum of ? transactions in ? days, then withdrawn /
moved with a ratio of at least ? of funds coming in and a maximum of ? of
funds coming in ? days.
32
Adaptive Alerts
• Out of occupation profile: If there is a transaction within a period of ? day
worth more than ? the customer's income
• Pass by: If there are deposit transactions or incoming transfers made by
customers followed by withdrawals or outgoing transfers worth the
incoming funds within ? days of receiving the funds
• U-turn: If a transaction is carried out by several customers with a minimum
transaction above ? (in ? transactions) for a period of ? days
33
Netflix
• Founded in the year 1997 by Reed Hastings and Marc Randolph in Scotts
Valley, California, Netflix now has its presence in more than 190 countries
thus the world’s leading provider of on-demand video, movie streaming,
and TV series.
34
Netflix Motto
The happier the customers are, the longer they stay
subscribed to the service.
35
Factors Impacting Customers Enjoyment
• Netflix captures viewers’ enjoyment through rating given to the
shows/Movies.
• As streaming video becomes many more data points
• Time of day something was watched
• User age and gender (based on individual logins)
• Time spent selecting movies
• How often a movie or program was paused/resume
36
Netflix predicts “Perfect situation”
37
Data-Driven Categorization of Movies
• Netflix Service created 76,897 unique ways to describe types of movies.
• These are called “alt-genres” which is what leads to Netflix’s Scarily
specific movie/show suggestions(e.g. “Movie-like: The Rise of
Skywalker”)
38
Cover Image Personalization
• As you observed that all users have different cover pages based on
their movie preferences also it may change with time.
• This is the most important thing which Netflix does for brings more
new viewers.
41
Credit to
• Politeknik Statistika STIS
• Badan Pusat Statistik
• Badan Perencanaan Pembangunan Nasional
• Pulse Lab Jakarta
• Gojek
• Medium
42
References
• bigdata.stis.ac.id
• United Nations Global Pulse 2014 Nowcasting food prices in
Indonesia using social media signals Global Pulse, Project Series no. 1
• https://blog.gojekengineering.com/the-secret-sauce-behind-search-
personalisation-a856fb83c2f?gi=31038c496ab3
43