Professional Documents
Culture Documents
▪ Computing is both the highlight and emphasis of this course and it is done in R
▪ Data used in demonstrations will be sourced explicitly, so the students can get direct access to the
data to replicate the results
PAGE 2
Course Resources
▪ Recommended Textbooks
▪ An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten
Trevor Hastie, Robert Tibshirani. Free online.
▪ R for Data Science by Hadley Wickham and Garrett Grolemund. Free online.
▪ An Introduction to R by W.N Venables, D.M. Smith and the R Core Team. Free online.
PAGE 3
Data Science
▪ An interdisciplinary field on methods of extracting knowledge from data in different forms.
▪ Databases,
▪ Data mining,
▪ Data visualization,
▪ Machine learning/statistical learning
▪ Predictive analytics and big data has significant impact in the financial & actuarial sectors
▪ Considerably improve the quality in the decision-making processes
▪ Give rise to FinTech (Financial Technology) & InsurTech (Insurance Technology)
PAGE 4
Motivating FinTech Examples
(pay attention to the data)
PAGE 5
Asset Management & Credit Risk Assessment
1. Scientific and objective asset management (e.g., Automated Trading & Robo-Advising)
▪ Asset models built from massive amounts of data ➔ profitable automated trading algorithms
▪ Robo-advisors consider all available data and historical trends and a wide spectrum of investable assets
▪ Removes the emotion/sentiment & impressive ability of diversification
2. Timely and credible credit risk assessment (e.g., identifying good/bad borrowers)
▪ Precise evaluation opens an entirely new client base and sharply lowers credit risk
▪ Credit score and unexpected items such as typing speed & word usage can be used to build a more
credible credit risk model
▪ Online capital lenders and others to determine the creditworthiness of an individual by evaluating, say,
15,000 data points, all at once.
PAGE 6
Fraud Detection & Fraud Prevention
▪ Identifying frauds has long been critically important in finance and insurance
▪ E.g., fraudulent credit card transactions, fraudulent insurance claims
▪ Fraud prevention is a high priority of FinTech firms, many resources have been poured in this direction
▪ The ability monitor transactions in real time and flag the ones that fall outside of the “average
norm” is a powerful tool in the war waged against fraud.
▪ Use, say behavioral data, to identify the specific “average norm” for each customer
▪ Some early warning systems have displayed good predictive ability
▪ Some FinTech firms whose main goal is to provide fraud protection services to other firms
PAGE 7
Determining Lifetime Customer Value
▪ Instead of treating customers in a one-time transaction mode, FinTech allows the entire potential
lifetime purchase volume to be assessed
▪ effective and efficient use of resources on customers most likely to be of high value
▪ Data like ranging from social media feeds to direct feedback via surveys can be used to build a
lifetime value model
PAGE 8
Payment & Purchasing Habits
▪ Data Science allows a customer’s payment and purchase history to be assessed at a granular level
▪ Opens the door for precise prediction models as to what behavior to expect going forward
▪ Useful for target marketing, loyalty rewards, and other forms of active customer interfacing
PAGE 9
Main Transforming FinTech Areas
▪ Three main FinTech areas where technology is the main driver in transforming financial services:
▪ It is not an exaggeration to say that Data Science has revolutionized our business/financial world
PAGE 10
Machine Learning/Statistical Learning
PAGE 11
Machine Learning/Statistical Learning (ML/SL)
▪ Machine learning and statistical learning (ML/SL) are both important subsets of Data Science
▪ Similarity: ML/SL studies the design of algorithms that can learn, particularly learning from data
▪ Feeding data into ML/SL algorithms to improve the learning
▪ Eventually, the task becomes automated, i.e., without human interference in the activity
▪ Differences:
▪ ML arises from AI in CS; focuses more on large scale applications and prediction accuracy
▪ SL arises from Statistics; emphasizes more on models and interpretability, and precision & uncertainty
PAGE 12
The Data Analytics Cycle
Descriptive Prescriptive
Analysis Analysis
Observe, Optimize,
Measure, Operations, Decide,
Collect data Management Execute
PAGE 13
Types of ML/SL Problems
▪ ML/SL problems can be categorized into supervised learning and unsupervised learning
▪ Sometimes synonyms with predictive analytics and descriptive analytics
▪ Unsupervised learning: Given 𝑋1 , … , 𝑋𝑛 , identify some underlying patterns or structure in the data
▪ 𝑋𝑖 can be high dimension or unstructured (image/text/video)
▪ “unsupervised” because there is no label
PAGE 14
ML/SL Problems
▪ Examples of supervised learning:
▪ Based on the characteristics of past credit applicants and their repayment history, predict how likely a
new applicant is to repay a loan.
▪ Based on the characteristics of past insureds and their ages of death, predict the age of death for a new
insurance applicant.
PAGE 15
Central Themes
PAGE 16
Predictive Analytics: What are we trying to do?
▪ The advertising data set ▪ Ideally, we want a joint model of the form
𝑆𝑎𝑙𝑒𝑠 = 𝑓 𝑇𝑉, 𝑅𝑎𝑑𝑖𝑜, 𝑁𝑒𝑤𝑠𝑝𝑎𝑝𝑒𝑟
▪ Find a function 𝑓(⋅) such that
𝑓 𝑇𝑉, 𝑅𝑎𝑑𝑖𝑜, 𝑁𝑒𝑤𝑠𝑝𝑎𝑝𝑒𝑟 is a good predictor
of 𝑆𝑎𝑙𝑒𝑠
PAGE 17
Theme 1: Generalizability
▪ We want to construct predictors that generalize well to unseen data
▪ Capture useful trends in the data (don’t underfit)
▪ Ignore meaningless random fluctuations in the data (don’t overfit)
PAGE 18
Theme 2: Bias-Variance Tradeoff
▪ The bias-variance tradeoff relates to the fact that, given a predictor 𝑓መ
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝐸𝑟𝑟𝑜𝑟 𝑓መ = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑓መ + 𝐵𝑖𝑎𝑠 2 𝑓መ + 𝑁𝑜𝑖𝑠𝑒
▪ Cannot control the “unexplained noise”, but want to balance the variance vs. bias
PAGE 19
Theme 3: Interpretability-Flexibility Tradeoff
▪ Some models are highly structured and interpretable,
some are highly flexible but less interpretable
PAGE 20
Theme 4: Feature Engineering
▪ Given unlimited data, sufficiently flexible models learns nearly any complex patterns & structures
▪ In reality, we have limited data, and often many variables, not all of which are useful
▪ Identify useful variables and sometimes combine and transform them into useful forms
▪ “Feature engineering is the process of transforming raw data into features that better represent the
underlying problem to the predictive models, resulting in improved model accuracy on unseen
data.” — Jason Brownlee, Machine Learning Mastery
▪ “…some machine learning projects succeed and some fail. What makes the difference? Easily the
most important factor is the features used.” — Pedro Domingos, A Few Useful Things to Know
about Machine Learning
PAGE 21
An Illustrative Example in R & R Markdown
PAGE 22
PAGE 23