Lecture1 Intro

Introduction to
FinTech & InsurTech

1/17/2021
Presented by: Ben Feng

READI Short Course on Predictive Analytics and Big Data
Course Description
▪ Introduce basic concepts, tools, and working skillsets in FinTech Data Science. You will learn:
▪ how and when to apply ML/SL techniques.
▪ their comparative strengths and weaknesses.
▪ how to critically evaluate the performance of learning algorithms
▪ Computing is both the highlight and emphasis of this course and it is done in R
▪ Data used in demonstrations will be sourced explicitly, so the students can get direct access to the
data to replicate the results
PAGE 2
Course Resources
▪ Recommended Textbooks
▪ An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten
Trevor Hastie, Robert Tibshirani. Free online.
▪ R for Data Science by Hadley Wickham and Garrett Grolemund. Free online.
▪ An Introduction to R by W.N Venables, D.M. Smith and the R Core Team. Free online.
▪ You are encouraged to bring a laptop with R & RStudio installed
PAGE 3
Data Science
▪ An interdisciplinary field on methods of extracting knowledge from data in different forms.
▪ Databases,
▪ Data mining,
▪ Data visualization,
▪ Machine learning/statistical learning
▪ Predictive analytics and big data has significant impact in the financial & actuarial sectors
▪ Considerably improve the quality in the decision-making processes
▪ Give rise to FinTech (Financial Technology) & InsurTech (Insurance Technology)
PAGE 4
Motivating FinTech Examples
(pay attention to the data)
PAGE 5
Asset Management & Credit Risk Assessment
1. Scientific and objective asset management (e.g., Automated Trading & Robo-Advising)
▪ Asset models built from massive amounts of data ➔ profitable automated trading algorithms
▪ Robo-advisors consider all available data and historical trends and a wide spectrum of investable assets
▪ Removes the emotion/sentiment & impressive ability of diversification
2. Timely and credible credit risk assessment (e.g., identifying good/bad borrowers)
▪ Precise evaluation opens an entirely new client base and sharply lowers credit risk
▪ Credit score and unexpected items such as typing speed & word usage can be used to build a more
credible credit risk model
▪ Online capital lenders and others to determine the creditworthiness of an individual by evaluating, say,
15,000 data points, all at once.
PAGE 6
Fraud Detection & Fraud Prevention
▪ Identifying frauds has long been critically important in finance and insurance
▪ E.g., fraudulent credit card transactions, fraudulent insurance claims
▪ Fraud prevention is a high priority of FinTech firms, many resources have been poured in this direction
▪ The ability monitor transactions in real time and flag the ones that fall outside of the “average
norm” is a powerful tool in the war waged against fraud.
▪ Use, say behavioral data, to identify the specific “average norm” for each customer
▪ Some early warning systems have displayed good predictive ability
▪ Some FinTech firms whose main goal is to provide fraud protection services to other firms
PAGE 7
Determining Lifetime Customer Value
▪ Instead of treating customers in a one-time transaction mode, FinTech allows the entire potential
lifetime purchase volume to be assessed
▪ effective and efficient use of resources on customers most likely to be of high value
▪ Creates the opportunity for upselling and targeted marketing

▪ Many “pre-approved” credit cards after having a salary direct deposit
▪ Lots of “financial advisor calls” from the bank when having a sizable savings account
▪ Data like ranging from social media feeds to direct feedback via surveys can be used to build a
lifetime value model
PAGE 8
Payment & Purchasing Habits
▪ Data Science allows a customer’s payment and purchase history to be assessed at a granular level
▪ Opens the door for precise prediction models as to what behavior to expect going forward
▪ This evaluation can vary

▪ basic analytical scores built on month-to-month volume of spending
▪ more complex calculations such as use of payment records and spending habits
▪ Useful for target marketing, loyalty rewards, and other forms of active customer interfacing
PAGE 9
Main Transforming FinTech Areas
▪ Three main FinTech areas where technology is the main driver in transforming financial services:
1. Lending/Banking Services (P2P Lending Determinants; Credit Risk Assessment/Scoring;

Financial Fraud Detection, etc.)
2. Clearing (Blockchain - cryptography, network, and incentives; Ledger architectures and
applications; etc.)
3. Trading (Quantitative/Algo trading; Selling information: building business that collect and
analyze non-standard datasets for trading; etc.)
▪ It is not an exaggeration to say that Data Science has revolutionized our business/financial world
PAGE 10
Machine Learning/Statistical Learning
PAGE 11
Machine Learning/Statistical Learning (ML/SL)
▪ Machine learning and statistical learning (ML/SL) are both important subsets of Data Science
▪ Similarity: ML/SL studies the design of algorithms that can learn, particularly learning from data
▪ Feeding data into ML/SL algorithms to improve the learning
▪ Eventually, the task becomes automated, i.e., without human interference in the activity
▪ Differences:
▪ ML arises from AI in CS; focuses more on large scale applications and prediction accuracy
▪ SL arises from Statistics; emphasizes more on models and interpretability, and precision & uncertainty
▪ The distinction between the two has become blurred overtime
PAGE 12
The Data Analytics Cycle
Detect, Predictive Forecast,

Analysis
Explain, Diagnose,
Understand Scale
Descriptive Prescriptive
Analysis Analysis
Observe, Optimize,
Measure, Operations, Decide,
Collect data Management Execute
PAGE 13
Types of ML/SL Problems
▪ ML/SL problems can be categorized into supervised learning and unsupervised learning
▪ Sometimes synonyms with predictive analytics and descriptive analytics
▪ Supervised learning: Given data 𝑋1 , 𝑌1 , … , 𝑋𝑛 , 𝑌𝑛 , learn a model to predict 𝑌 ∗ from some 𝑋 ∗

▪ 𝑋𝑖 is called the feature, 𝑌𝑖 is called the label
▪ If 𝑌𝑖 is continuous, this is called a regression problem (E.g., stock price, survival time)
▪ If 𝑌𝑖 is discrete or symbolic, this is called a classification problem (E.g., spam email, high risk group)
▪ Unsupervised learning: Given 𝑋1 , … , 𝑋𝑛 , identify some underlying patterns or structure in the data
▪ 𝑋𝑖 can be high dimension or unstructured (image/text/video)
▪ “unsupervised” because there is no label
PAGE 14
ML/SL Problems
▪ Examples of supervised learning:
▪ Based on the characteristics of past credit applicants and their repayment history, predict how likely a
new applicant is to repay a loan.
▪ Based on the characteristics of past insureds and their ages of death, predict the age of death for a new
insurance applicant.
▪ Examples of unsupervised learning:

▪ Cluster customers into groups with similar spending habits
▪ Learn association rule like 50% of clients who {recently got promoted, had a baby} want to {get a
mortgage}
PAGE 15
Central Themes
PAGE 16
Predictive Analytics: What are we trying to do?
▪ The advertising data set ▪ Ideally, we want a joint model of the form
𝑆𝑎𝑙𝑒𝑠 = 𝑓 𝑇𝑉, 𝑅𝑎𝑑𝑖𝑜, 𝑁𝑒𝑤𝑠𝑝𝑎𝑝𝑒𝑟
▪ Find a function 𝑓(⋅) such that
𝑓 𝑇𝑉, 𝑅𝑎𝑑𝑖𝑜, 𝑁𝑒𝑤𝑠𝑝𝑎𝑝𝑒𝑟 is a good predictor
of 𝑆𝑎𝑙𝑒𝑠
▪ What does it mean to be a good predictor?
▪ Label: 𝑌𝑖 = sales in 1,000’s of units

▪ Features: Advertising budgets 𝑋1 , 𝑋2 , 𝑋3 for TV, Radio, and
Newspaper, in $1,000’s
PAGE 17
Theme 1: Generalizability
▪ We want to construct predictors that generalize well to unseen data
▪ Capture useful trends in the data (don’t underfit)
▪ Ignore meaningless random fluctuations in the data (don’t overfit)
▪ Avoid unjustifiably extrapolating beyond the scope of given data
PAGE 18
Theme 2: Bias-Variance Tradeoff
▪ The bias-variance tradeoff relates to the fact that, given a predictor 𝑓መ
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝐸𝑟𝑟𝑜𝑟 𝑓መ = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑓መ + 𝐵𝑖𝑎𝑠 2 𝑓መ + 𝑁𝑜𝑖𝑠𝑒
▪ Cannot control the “unexplained noise”, but want to balance the variance vs. bias
▪ In the language of Theme 1:
PAGE 19
Theme 3: Interpretability-Flexibility Tradeoff
▪ Some models are highly structured and interpretable,
some are highly flexible but less interpretable
▪ The best predictor may turn out to be an

uninterpretable or hard-to-interpret black box
▪ Real input-output relationships are complicated!
▪ Sometimes may prefer a more interpretable, worse-

performing model to a black box
▪ E.g., modeling for regulatory purpose
PAGE 20
Theme 4: Feature Engineering
▪ Given unlimited data, sufficiently flexible models learns nearly any complex patterns & structures
▪ In reality, we have limited data, and often many variables, not all of which are useful
▪ Identify useful variables and sometimes combine and transform them into useful forms
▪ “Feature engineering is the process of transforming raw data into features that better represent the
underlying problem to the predictive models, resulting in improved model accuracy on unseen
data.” — Jason Brownlee, Machine Learning Mastery
▪ “…some machine learning projects succeed and some fail. What makes the difference? Easily the
most important factor is the features used.” — Pedro Domingos, A Few Useful Things to Know
about Machine Learning
PAGE 21
An Illustrative Example in R & R Markdown
PAGE 22
PAGE 23

Lecture1 Intro

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture1 Intro

Uploaded by

Copyright:

Available Formats

Introduction to

FinTech & InsurTech

Presented by: Ben Feng

▪ how to critically evaluate the performance of learning algorithms

▪ You are encouraged to bring a laptop with R & RStudio installed

▪ Creates the opportunity for upselling and targeted marketing

▪ This evaluation can vary

1. Lending/Banking Services (P2P Lending Determinants; Credit Risk Assessment/Scoring;

▪ The distinction between the two has become blurred overtime

Detect, Predictive Forecast,

▪ Supervised learning: Given data 𝑋1 , 𝑌1 , … , 𝑋𝑛 , 𝑌𝑛 , learn a model to predict 𝑌 ∗ from some 𝑋 ∗

▪ Examples of unsupervised learning:

▪ What does it mean to be a good predictor?

▪ Label: 𝑌𝑖 = sales in 1,000’s of units

▪ Avoid unjustifiably extrapolating beyond the scope of given data

▪ In the language of Theme 1:

▪ The best predictor may turn out to be an

▪ Sometimes may prefer a more interpretable, worse-

You might also like