You are on page 1of 28

Data Mining for Business Decisions

Jesper N. Wulff

Aarhus University
jwulff@econ.au.dk

Jesper N. Wulff (AU) Data Mining 1 / 19


Me (research)
Keeping it within bounds: Regression analysis of proportions in international business

Jesper N. Wulff (AU) Data Mining 2 / 19


Me (research)

Generalized two-part fractional regression with cmp

Jesper N. Wulff (AU) Data Mining 3 / 19


Me (other)

Jesper N. Wulff (AU) Data Mining 4 / 19


The book

Jesper N. Wulff (AU) Data Mining 5 / 19


The other book

Jesper N. Wulff (AU) Data Mining 6 / 19


Each week

Two things to do each week:


Out of class: Read, watch, and practice
Read chapters and watch videos
Practice the stuff you learn in R!
In class: Apply and practice
I will go through code mainly from HOML
I will mix in a lot of exercises - you learn this stuff by using it
Make sure you have installed the nessecary packages for each time and
made sure they work

Jesper N. Wulff (AU) Data Mining 7 / 19


Each week

Two things to do each week:


Out of class: Read, watch, and practice
Read chapters and watch videos
Practice the stuff you learn in R!
In class: Apply and practice
I will go through code mainly from HOML
I will mix in a lot of exercises - you learn this stuff by using it
Make sure you have installed the nessecary packages for each time and
made sure they work

Jesper N. Wulff (AU) Data Mining 7 / 19


The exam

You are provided with a business problem, a medium-sized dataset, and 72


hours.
Your task is to translate the business problem in a data mining
problem, process the data, build a prediction model, argue for your
decisions, report and explain your results, and finally recommend
business action.
Much more detail incl. assessment criteria and a proposed structure
will be provided near the end of the course.

Jesper N. Wulff (AU) Data Mining 8 / 19


The exam

You are provided with a business problem, a medium-sized dataset, and 72


hours.
Your task is to translate the business problem in a data mining
problem, process the data, build a prediction model, argue for your
decisions, report and explain your results, and finally recommend
business action.
Much more detail incl. assessment criteria and a proposed structure
will be provided near the end of the course.

Jesper N. Wulff (AU) Data Mining 8 / 19


Prediction competition

To gain access to the exam of the course, you (or your team) must make
at least three submissions for this competition.
1 Sign up for an account on Kaggle.com

2 Join the competition

3 Get to Work! Solo or in teams of max. 4.

Jesper N. Wulff (AU) Data Mining 9 / 19


Prediction competition

To gain access to the exam of the course, you (or your team) must make
at least three submissions for this competition.
1 Sign up for an account on Kaggle.com

2 Join the competition

3 Get to Work! Solo or in teams of max. 4.

Jesper N. Wulff (AU) Data Mining 9 / 19


Prediction competition

To gain access to the exam of the course, you (or your team) must make
at least three submissions for this competition.
1 Sign up for an account on Kaggle.com

2 Join the competition

3 Get to Work! Solo or in teams of max. 4.

Jesper N. Wulff (AU) Data Mining 9 / 19


Expectations

From last year’s evaluation: ”The teacher who was teaching first part
– is terrible, unprepared and boring. Nobody even showed up for his
class. Also, I feel that the teachers must have realistic expectations.
The curriculum is overloaded.”
To master this stuff takes many years. Why not start practicing now?
I recommend you spend at least 12 hours on this stuff a week (only
done by ≈ 10% last year)

Jesper N. Wulff (AU) Data Mining 10 / 19


Expectations

From last year’s evaluation: ”The teacher who was teaching first part
– is terrible, unprepared and boring. Nobody even showed up for his
class. Also, I feel that the teachers must have realistic expectations.
The curriculum is overloaded.”
To master this stuff takes many years. Why not start practicing now?
I recommend you spend at least 12 hours on this stuff a week (only
done by ≈ 10% last year)

Jesper N. Wulff (AU) Data Mining 10 / 19


Expectations

From last year’s evaluation: ”The teacher who was teaching first part
– is terrible, unprepared and boring. Nobody even showed up for his
class. Also, I feel that the teachers must have realistic expectations.
The curriculum is overloaded.”
To master this stuff takes many years. Why not start practicing now?
I recommend you spend at least 12 hours on this stuff a week (only
done by ≈ 10% last year)

Jesper N. Wulff (AU) Data Mining 10 / 19


Advice

Jesper N. Wulff (AU) Data Mining 11 / 19


Advice

Pieces of advice
What you get out of this course is very much up to you
If you invest your time and skill, I am here for you
The biggest mistake you can make is to postpone everything until the
exam and then blame me
Ask yourselves why you are here. To learn something difficult,
fundamental and fascinating about the universe. Or to get a diploma?

Jesper N. Wulff (AU) Data Mining 12 / 19


Advice

Pieces of advice
What you get out of this course is very much up to you
If you invest your time and skill, I am here for you
The biggest mistake you can make is to postpone everything until the
exam and then blame me
Ask yourselves why you are here. To learn something difficult,
fundamental and fascinating about the universe. Or to get a diploma?

Jesper N. Wulff (AU) Data Mining 12 / 19


Advice

Pieces of advice
What you get out of this course is very much up to you
If you invest your time and skill, I am here for you
The biggest mistake you can make is to postpone everything until the
exam and then blame me
Ask yourselves why you are here. To learn something difficult,
fundamental and fascinating about the universe. Or to get a diploma?

Jesper N. Wulff (AU) Data Mining 12 / 19


Advice

Pieces of advice
What you get out of this course is very much up to you
If you invest your time and skill, I am here for you
The biggest mistake you can make is to postpone everything until the
exam and then blame me
Ask yourselves why you are here. To learn something difficult,
fundamental and fascinating about the universe. Or to get a diploma?

Jesper N. Wulff (AU) Data Mining 12 / 19


The modeling process

“Much like EDA, the ML process is very iterative and


heuristic-based. With minimal knowledge of the problem or data at
hand, it is difficult to know which ML method will perform best. This
is known as the no free lunch theorem for ML (Wolpert 1996).
Consequently, it is common for many ML approaches to be applied,
evaluated, and modified before a final, optimal model can be
determined. Performing this process correctly provides great
confidence in our outcomes. If not, the results will be useless
and, potentially, damaging.”

— Boehmke and Greenwell, HOML CH 2

HOML: CH2 - Modeling Process

Jesper N. Wulff (AU) Data Mining 13 / 19


The modeling process

Jesper N. Wulff (AU) Data Mining 14 / 19


The modeling process

Jesper N. Wulff (AU) Data Mining 15 / 19


Bias example

Jesper N. Wulff (AU) Data Mining 16 / 19


Variance example

Jesper N. Wulff (AU) Data Mining 17 / 19


Bias-variance trade-off

Jesper N. Wulff (AU) Data Mining 18 / 19


Cross-validation

Jesper N. Wulff (AU) Data Mining 19 / 19

You might also like