You are on page 1of 39

Beyond GLMs

Colin Priest Director, Customer Success, Asia, , DataRobot


Xavier Conort - Chief Data Scientist, DataRobot

Agenda
1.
2.
3.
4.
5.

GLMs and Actuaries


Extensions to GLMs
Automating GLM model building
Best practice predictive modelling
Conclusion

1) GLMs
Linear models for statistical distributions that
arent Normal
Taught in the actuarial education process
Widely used by actuaries for
Pricing
Understanding lapse rates
Marketing
Claims reserving
3

GLMs are old


Developed in 1972 in
an era of small data
and no PCs
Used by actuaries for
decades
There are newer
techniques to
choose from
4

2) Extensions to GLMs

GAMs
GAMLSS
GLMMs
Regularized GLMs

GAMs
Generalized additive
models
Automatically fits nonlinear relationships
Works a bit like rolling
averages

GAMs
Good For
Designing data
transformations for
GLMs

Unsuitable For
Pricing, or
whenever a
formula is required
Data containing
only categorical
variables

GAMLSS
Generalized additive
models with location,
scale and shape
Allows you to model
how the variance and
skewness varies

GAMLSS
Good For
understanding
variability
simulating risk e.g.
internal models
data exhibiting
heteroskedasticity

Unsuitable For
when you only
want the
prediction
data exhibiting
homoskedasticity

GLMMs
Generalized linear
mixed models
Allows for automatic
credibility weighting

10

GLMMs
Good For
Categorical variables and
interactions between
categorical and numeric
Sparse data
Hierarchical relationships
e.g. vehicle make and
model

Unsuitable For
Large amounts of
highly credible data
Data without
categorical variables

11

Regularized GLMs
Regularisation is any method that penalises
overfitting or complexity in models
Automatically chooses predictors
Automatically allows for credibility

12

Regularized GLMs
Good For
Collinearity in data
Making GLMs more
reliable
Lots of input
variables
Sparse data

Unsuitable For
Complex
interactions
between variables

13

3) Automating GLM Building

Why automate?
Variable selection (feature selection)
Linearising (feature engineering)
Dimensionality reduction

14

Why Automate?
Building GLMs requires
time and people and both
of these are expensive!
Most of the resource
intensive work is ruledriven without much
complex judgement
required
Its just like when books
were copied by hand!

15

Variable Selection
Variable Importance using machine learning

16

Variable Selection
Using lasso regularized GLMs

17

Variable Selection
Using genetic algorithms

18

Linearising
Using GAMs

19

Linearising
Using GBMs

20

Dimensionality Reduction
variable importance for reducing number of
categories

21

Dimensionality Reduction
Text mining for grouping categories together
and reducing the number of categories

22

4) Best practices observed by


a

What is Kaggle?
A social fight club for data geeks
In 2010, Anthony Goldbloom took the
SIGKDD and Netflixs model
And attracted 371,397 data geeks as of Sept
17, 2015!
Kaggle worked with more than 20 Fortune 500 companies

including 3 leading insurance companies + 1 Australian insurer represented by Deloitte

24

Why Geeks like to fight?

My motivation has been to


learn new things.

25

Key takeaways from


competing in Kaggle
The Machine works much faster and harder than me
Feature engineering is key to success. And actuaries
are good in this. They can however learn more by
being exposed to problems and datasets outside
the insurance industry
Top Kagglers use actuarial tricks such as credibility
estimates
Most popular and powerful Machine Learning
algorithms used by the data science community are
open source algorithms
26

Machine Learning works


for insurance too!

Won by Xavier and his


colleague Owen Zhang!

27

Typical learning curve of a


Kaggler
Previously...

now!

28

Lessons learnt
The Machine seems much smarter than I am at capturing complexity in
the data even for simple datasets!
Humans can help the Machine too! But dont oversimplify and discard
any data.
Dont be impatient. My best GBM had 24,500 trees with learning rate =
0.01!

SVM and feature selection matter too!

29

Lessons learnt
Word n-grams and character n-grams can make a big difference

Parallel processing and big servers can help with complex feature
engineering!

Glmnet can do a great job!

Sklearn in Python is cool too!

30

&
Machine
Learning algos to know
to automatically capture complexity in the data
Gradient Boosting Machine packages
1. R gbm
2. R xgboost
3. Sklearn GradientBoostingClassifier and GradientBoostingRegressor
Forest packages
1. R randomForest
2. Sklearn RandomForestClassifier and RandomForestRegressor
3. R extraTrees
4. Sklearn ExtraTreesClassifier and ExtraTreesRegressor
Support Vector Machine packages
1. R e1071
2. Sklearn svc and svr
3. Sklearn Nystroem
31

&
Machine
Learning algos to know
. to take advantage of high cardinality categorical
features or text data
Regularized generalized linear models
1. R glmnet
2. Sklearn Ridge
3. Sklearn LogisticRegression
Feature Extraction for categorical features or text data
1. R Matrix
2. Sklearn OneHotEncoder and DictVectorizer
3. R tau
4. Sklearn TfidfVectorizer

32

&
tools
to know
to make your code efficient
Data manipulation at faster speed
1. R data.table
2. Python pandas
Parallel computing
1. R foreach / doMC
2. Python joblib

33

Adapt feature engineering


to ML algo
Machine Learning (ML) algo

Categorical variables
support

Features
subsampling

Sparse support

Insensitive to
scale &
uniform transf

Automated nonlinear and


interactions
modelling

Handle missing
value

R Random Forest

Yes. up to 32 levels

Yes

No

Yes

Yes

No

Sklearn Random Forest

No

Yes

Yes but slow

Yes

Yes

No

R Gradient Boosting Machine

Yes. up to 1024
levels

No

No

Yes

Yes

Yes

Sklearn Gradient Boosted


Regression Trees

No

Yes

Yes but slow

Yes

Yes

No

eXtreme Gradient Boosting

No

Yes

Yes

Yes

Yes

Yes

Regularized GLMs

No

No

Yes

No

No

No

Support Vector Machine

No

No

Yes

No

Yes

No

34

Other popular algorithms


popular in

35

Most important.
Dont forget to
use your actuarial intuition to help the
machine!
Always consider simple feature engineering that makes sense
for your business such differences / ratios of features
Be creative, feature engineering is often key to success.
Dont trust features that are too good
They can make the Machine lazy! An example: GE Flight
Quest
or they are likely to be caused by a bug or a leak!
36

5) Conclusion
Its time to become actuaries of the 5th kind
17th century: Life insurance,
Deterministic methods

Actuaries of the
Second Kind

Early 20th century: General insurance,


Probabilistic methods

Actuaries of the
Third Kind

1980s: Assets/derivatives,
Contingencies Stochastic processes

Paul
Embrechts
2005

Actuaries of the
Fourth Kind

Early 21st century: ERM

Big Data
Working
Party

Actuaries of the
Fifth Kind

Hans Buhlmann
1987

Actuaries of the
First Kind

Second decade of 21st century: Big Data

37

Conclusion
So that we arent replaced by robots (or data
scientists)

38

Thank You