Professional Documents
Culture Documents
9 Analytics
9 Analytics
2
2
Instructor
M. Hamdi Özçelik
– Marketing Analytics and Optimization Manager at YKB Retail Banking
– Portfolio Analytics and Optimization Manager at YKB Retail Banking
– Datamining & Analytical Applications Manager at YKB IT
– Innovation & R&D project leader at YKB IT
– Worked as research assistant, sw developer, project manager, sw development
manager, part-time lecturer
hamdi.ozcelik@yapikredi.com.tr
3
WHAT AGE ARE WE LIVING IN?
• Industrial Age ?
?
• Computer Age ?
• Internet Age ?
• Data Age ?
• Information Age ? Knowledge Age ?
• Communication Age ? Mobile Age ?
• Innovation Age ?
4
VIDEO 1: EXPONENTIAL TIMES
5
SOURCES OF BIG DATA
• The big data challenge comes with its four main dimensions (*)
– Volume
– Velocity
– Variety
– Veracity
• Tsunami of information comes from
– Internet of Things (IoT)
– Unstructured data
– Social Media
– Customer data
– External sources
(*) http://www-01.ibm.com/software/data/bigdata/
6
AGE OF DIGITAL CUSTOMER
Customers Banks
7
HADOOP and MAPREDUCE
Hadoop Datawarehouses
• HDFS: Hadoop Distributed File System
• Not common in most corporates, usually at businesses like Google and Amazon
• Used for log parsing, indexing, batch processing, and most importantly storing huge
unstructured data
• Applications mainly on web log analysis, visitor behavior, image processing, search indexes,
analyzing and indexing textual content, research in natural processing
MapReduce
• A new programming model/paradigm/architecture/framework
• Provides automatic parallelization and distribution, so massive
parallel processing of large datasets across many end nodes
becomes feasible
• Different from relational databases
YARN
• A resource management framework for scheduling and handling resource requests from distributed
applications.
8
BIG DATA – BIG OPPORTUNITY
• Paradigm shift: "lots of data as a problem" to "lots of data as an opportunity".
• Firms that adopt data driven decision making (DDD) have output and productivity that is
5-6% higher than what would be expected given their other investments and information
technology usage (*).
9
Volume BIG DATA Variety
Distributed file system-Hadoop Sandboxes
In-database analytics Information discovery
Structured Unstructured
Mostly internal External
Mostly offline Mostly online
Known identity May be anonymous
Own channels Social Media
Transactional Interactions
credit Demographics Location data
bureau Financial bahavior Every behavior
Velocity Veracity
Real Time Decisions Data governance
Complex Event Processing • privacy
• security
• quality
• metadata
Targeted Actions
BIG VALUE
FROM DATA TO VALUE
11
VIDEO 2: EXPLAINING BIG DATA
12
WHAT IS A DECISION?
13
KIND OF DECISIONS
14
COMPARISON OF DECISION TYPES
15
BUSINESS ANALYTICS
«Data Poor, Insight Rich» is much better than «Data Rich, Insight Poor» !!
16
BIG DATA – BUSINESS ANALYTICS
• Business Analytics = Extracting value from the new digital wealth surrounding us.
• The ease of capturing big data’s value, and the magnitude of its potential, vary
across sectors:
17
HIGH PERFROMANCE ANALYTICS - HPA
18
VIDEO 3: BIG DATA WILL CHANGE OUR WORLD
19
DATA SCIENCE
• Movie Recommendations
• Weather Forecasts
• Stock Market Predictions
• Production Process Improvements
• Health Diagnosis
• Flu Trend Predictions
• Targeted Advertising
20
INVENTING USE CASES: CREDIT CARD BEHAVIOR
Customer behavior detected: The customer uses her credit card only at the first half of
the statement period
Behavioral pattern Credit bureau data Metric Detect fraud Sell another card
21
USING DATA = PLAYING AT SAND
22
IMPORTANT ISSUES WITH BIG DATA ANALYTICS
1) OVERENGINEERING
3) PARALYSIS BY ANALYSIS
5) DATA WASTE
23
ISSUE 1: OVERENGINEERING
Weight: 220 tons
Speed: 13 km/h
• Results in Milage: 62 km
Armament: 32 rounds
– Overcomplexity Fuel Tank: 2,700 lt
– Higher costs
– Low usability
– High maintenance costs
• Design principles
– “Simple is best”
– “less is more”
– “Balanced” design
– Design as simple as possible, but not simpler
"That’s been one of my mantras: focus and simplicity. Simple can be harder than complex: You
have to work hard to get your thinking clean to make it simple. But it’s worth it in the end because
once you get there, you can move mountains." Steve Jobs
24
DESIGNING THE SIMPLE ONE
When you first start off trying to solve a problem, the first solutions
you come up with are very complex, and most people stop there. But if you
keep going, and live with the problem and peel more layers of the onion off,
you can often times arrive at some very elegant and simple solutions.
Steve Jobs
25
ISSUE 2: ANALYTICAL TALENT NEED
• In the United States alone, a research (**) shows, the demand for people with
deep analytical skills in big data could outstrip current projections of supply by 50
to 60 percent.
– By 2018, as many as 140,000 to 190,000 additional specialists will be required.
– Also an additional 1.5 million managers and analysts are needed who has a “sharp”
understanding of “how big data can be applied”.
(*) Accenture
(**) McKinsey Quarterly
26
HOW TO BE AN ANALYTICAL TALENT?
• Multidiscipliner education/training at areas:
– Database
– Programming
– Mathematics
– Statistics
– Operations research
– Sociology
– Physycology
– Engineering Economics
– Behavioral Economics
– Marketing
– Finance
– Artificial Intelligence
– Neuroscience
– Game Theory
– Machine Learning
– ….
27
DATA SCIENTIST
• Curiosity
• Creativity
• Focus
• Attention to detail
Domain
Expertise
Information Technologies (IT) needs to spend;
• more time on the «I»
Computer • less on the «T».
Math
Science
28
ISSUE 3: PARALYSIS BY ANALYSIS
• It occurs when the cost of decision analysis exceeds the benefits that could be gained by
enacting some decision.
• The cures:
– Good prioritization
– Using stochastic methods
– Using simpler methods
– Understanding the business, the customer and the data
– Common goal: value creation
– Doing more than “the analysis”!
29
ISSUE 4: ANALYSIS IS NOT ENOUGH
• Doing analysis is not enough. Analytic propositions are not useful alone.
“For although analytical judgments are highly important and necessary, they are so,
only to arrive at that clearness of conceptions which is requisite for a sure and
extended synthesis".
30
ISSUE 5: DATA WASTE
It occurs when we do not fully utilize the wealth of data that we already have.
31
VIDEO 4: BIG DATA FOR SMARTER CUSTOMER EXPERIENCES
32
DEEP LEARNING
• Deep learning technology is highly utilized in the IT industry for a wide range
of applications such as computer vision, sound recognition, and natural
language processing. Google, Microsoft, and Apple are some of the leading
companies in deep learning applications.
• Deep learning is the artificial neural network with a large number of layers;
however, it starts to recognize the pattern starting from the lowest layer and
forms a prototype.
33
MODELING – SIMPLE START
Selection Rules
Prior Probability
The ratio of # of elements with a desired property to the # of elements at the whole set.
Example: : Lets assume we have 10 customers and 4 of them have casco insurance. The ownership ratio
becomes 40%. In other words if we select a customer randomly, the probability of having a casco
insurance for the selected customer is 40%.
Selection Rule
A selection rule divides the set into two subsets. The average rate at these subsets could be different than
the whole set. Such rule based splitting could be illustrated by decision trees as branchs and leaves or
could be shown in decision tables:
# customers = 6 # customers = 4
# casco owners = 4 # casco owners = 0
ownership = 67% ownership = 0%
Select Criteria
prior posterior
probability Selection probability
posterior probability
Lift =
prior probability
Lift shows that the density of selected cluster in terms of ownership with respect to the
common overall average. If a random selection is made, the ownership would be same as
the overall, so the lift comes out to as 1. This ownership shows how intense the selected
audience is based on that feature. It can be at least 0, and its maximum value is max_lift.
1
max_lift = 0<= Lift <= max_lift
prior probability
Gain also shows how much density has been created in addition to the overall average:
Gain = Lift - 1 -1 <= Gain <= max_lift - 1
Decision Tree - 1
# customers = 10
# casco owners = 4
ownership = 40%
# customers = 6 # customers = 4
# casco owners = 4 # casco owners = 0
ownership = 67% ownership = 0%
Female Male
# customers = 5 # customers = 5
# casco owners = 3 # casco owners = 1
ownership = 60% ownership = 20%
Car owners Not car owners Car owners Not car owners
# customers = 5 # customers = 5
# casco owners = 2 # casco owners = 2
ownership = 40% ownership = 40%
• If the number of elements in a leaf in the tree is too low, the measurement is not statistically
secure. Unreliable measurements cause memorization rather than learning from the data, and
we see that the values we expect are not implemented when applied to new data.
• If the average ownership ratio of the leaves formed in a branch is close to each other, this
branching will not produce any benefit, and will even reduce the reliability of subsequent
branches because it divides the mass into small pieces.
A/B Testing
The A/B testing is an experiment based on the application of multiple versions of a concept on different
randomly defined subsets of the target population. The responses of target subsets to different versions
of the scope are used to measure the success of these versions.
For the A/B testing to be successful, the subsets exposed to different versions must be of sufficient size.
Otherwise, the results obtained from the test will not be meaningful.
Alternative
Base (challenger)
(champion) model
model
Repeat
DON’T SETTLE !
Your work is going to fill a large part of your life, and the
only way to be truly satisfied is to do what you believe is great
work. And the only way to do great work is to love what you do.
If you haven’t found it yet, keep looking. Don’t settle. As with all
matters of the heart, you’ll know when you find it. And, like any
great relationship, it just gets better and better as the years roll
on. So keep looking until you find it. Don’t settle!
Steve Jobs
Commencement speech at Stanford University, June 2005
41
Thank you!
42
YAPIKREDİ’NİN FİKRİ ve SINAİ HAKLARI
Gizlilik:
Taraflar (Üniversite, YapıKredi, Öğrenciler) kendilerine diğer tarafça açıklanan bir gizli bilgiyi(*) ;
• Korumaktan,
• Herhangi bir 3. Kişiye hangi suretle olursa olsun vermemek ve/veya alenileştirmemekten,
• Doğrudan ya da dolaylı olarak aralarındaki eğitim ilişkisinin amaçları dışında kullanmamaktan sorumludur,
• Taraflar ancak zorunlu hallerde ve işi gereği bu bilgiyi, öğrenmesi gereken alt çalışanlarına ve kendilerine bağlı olarak
çalışan diğer kişilere verebilirler ancak bilginin gizliliği hususunda alt çalışanlarını ve kendilerine bağlı olarak çalışan
diğer kişileri uyarırlar ve alt çalışanların sorumluluğunu alırlar.
Haklar:
• Eğitim için YapıKredi eğitmeni tarafından kullanılan tüm materyalin (sunum, demo, yazılım kodu, yayınlar) fikri ve sınai
hakları (başka sahiplik belirtilmediyse, kamuya mal olmamışsa) YapıKredi A.Ş.’ye aittir.
• Eğitim içeriğinin (öğrencilere verilen, anlatılan) çoğaltılması, diğer mecralarda paylaşılması, bunlardan yararlanarak
yayın yapılması YapıKredi’nin iznine bağlıdır.
• Banka tarafından verilen eğitimlerden faydalanarak yapılan çalışmaların makale, konferans sunumu vb yayınlar şeklinde
kullanımından önce YapıKredi Fikri Haklar Sorumlusundan Yayın Onay Formu ile onay alınmalıdır.
• Doğacak her türlü uyuşmazlıkların çözümlenmesinde, Türk Hukukunun uygulanacağını ve İstanbul Mahkeme ve İcra
Daireleri ile Banka'nın Genel Müdürlüğü'nün bulunduğu yerdeki mahkeme ve icra dairelerinin yetkili olacağını,
Kanunen yetkili mahkeme ve icra dairelerinin yetkilerinin saklı olduğunu taraflar kabul ederler.
(*): BANKA’nın sahip olduğu ve olacağı ve iştigal ettiği her türlü ticari, mali, teknik ya da benzeri konulardaki sözlü, yazılı veya manyetik ortamda veya başkaca herhangi bir
şekilde taraflara iletilen ve/veya tarafların öğrenebileceği BANKA’ya ya da BANKA’nın kendi customerslerine ve/veya personeline ait her türlü bilgi, yazılım ve yazılım kodu,
belge, lisanslar ve hizmetlere ilişkin bilgiler ile her türlü manyetik şerit, kartuş, doküman, el kitabı, şartname, sirküler, program listeleri, veri dosyaları ile BANKA’nın
yürürlükteki veya henüz kamuya duyurulmamış lisanslar ve hizmetleri, BANKA’nın iştigal ettiği her türlü hizmet, bunların keşfi, icadı, araştırılması, geliştirilmesi, imali ve
satışı, proses ve genel ticari faaliyetler, satış maliyetleri, kar, fiyatlandırma metotları, organizasyon ve personel listesi dahil olmak üzere her türlü bilgi ve belgeler “Gizli Bilgi”
tanımına girer.
43