You are on page 1of 37

BUS265 Machine Learning and

Digital Technology
Lecture 1: Introduction

Dr Valentin Danchev School of Business and Management


Queen Mary University of London
Logistics
• Module Convenor: Dr Valentin Danchev
• Email: v.danchev@qmul.ac.uk
• Lectures: Monday, 15:00–16:00, Laws: 112
• Seminars/labs: Monday, 16:00–17:00, Bancroft:
3.40
• Assessment
- Assessment will be based on what is done
in the lectures and the seminars/labs
- In-class Test (30%), Multiple choice
questions (MCQ); Week 8
- Report (70%), 2000 words, 19 May 2023

2
Week Topic Hands-on coding labs
1 Introduction to Machine Learning and Digital Technology
2 Python for Reproducible Machine Learning Python & Jupyter basics
3 Data Design and Predictive Modelling Human mobility data
4 Building a Machine Learning Model for Prediction Predicting wine quality
5 Supervised learning 1: Classification Vaccine hesitancy
6 Supervised Learning 2: Ridge and Lasso Regression Predicting Airbnb prices
7 Reading week
8 In-class test (Multiple choice questions) Clustering movie ratings
Unsupervised Learning: Clustering
9 Networks Community detection
10 Text Analytics 1: Text processing and Sentiment Analysis Amazon food reviews
11 Text Analytics 2: Topic Discovery and Word Embeddings Movie reviews
12 Ethics, Bias, and Fairness in Machine Learning Fairness of criminal risk models

3
Essential textbooks

Link to the ebook via the QMUL subscription to O’Reilly Platform Link to the ebook via QMUL library

Link to related tutorials in the form of Python Jupyter notebooks Link to code and data

4
Essential textbooks

Ebook is freely available here Link to the ebook via the QMUL subscription to O’Reilly Platform

Jupyter notebooks with Python code/data are available here

5
Advanced textbooks

Link to the ebook via the QMUL subscription to O’Reilly Platform Link to the ebook via the QMUL subscription to O’Reilly Platform

Jupyter notebooks with Python code/data are available here

6
O'Reilly Learning Platform via QMUL Databases A–Z

Free access to a wide range of textbooks and


courses about ML/AI, Data Science,
Digital Technology, & Business

https://go.oreilly.com/queen-mary-university-of-london

7
An open learning resource—links to many other resources

https://valdanchev.github.io/reproducible-data-science-python

8
Responsible Machine learning (ML) & Data
Ethics
Fairness
Data Analytics
Machine
Learning
Computing Responsible Maths and
Digital ML/Data Statistics
Technology Analytics

Software Traditional
Development Research

Business
Knowledge

9
Learning Goals

• Approach business problems data-analytically.


• Address data analytics problems using machine learning applications to improve business decision-
making.
• Know the basic terminology and principles and the advantages and limitations of the main machine
learning algorithms used in business research.
• Analyse tabular, network, and text data using machine learning techniques.
• Build basic machine learning models and evaluate model generalizability and performance.
• Articulate and address issues of data ethics and fairness of machine learning technologies in business.
• Hands-on experience in machine learning for business using Python.

10
Data Opportunities and Computing

• Volume of data
• Variety of data (tabular, text, networks)
• Big data (data sets that are too large for
traditional data processing systems)
• Powerful computers & cloud compute
• Better algorithms

11
From Data to Business Decision Making

Why are ‘big data’ and computing power


important?
Data and computing make possible to:
• transform business problems into data
problems
• and use machine learning techniques
and business knowledge to enable data-
driven solutions

12
What is Machine Learning?

Machine learning is a technology that allows


computer systems to:
• improve their performance
• at some task
• with experience
• without being explicitly programmed

Machine learning is the science of discovering


structure and making predictions in large data
sets.

13
More definitions

14
Example: Spam filtering [Spam or Ham]

15
Business use cases

16
Business example: Predicting customer churn
• A large telecommunication firm are having a
major problem with customer retention—20%
of cell phone customers leave when their
contracts expire
• Telecommunication companies are now
engaged in battles to attract each other’s
customers while retaining their own
• Marketing has already designed a special
retention offer
• Your task is to devise a precise, step-by-step
plan for how the data science team should use
companies’ vast data resources to solve the
problem

17
Business example: Predicting customer churn

• What data you might use?


• How would they be used?
• How should the company choose a set of
customers to receive their offer in order to
best reduce churn?

18
Machine Learning Tasks in Business

19
Machine Learning Tasks in Business: Examples

• Customer segmentation • Predict future events • Draw raw causal insights


• Discover similar customer • Identifying customers at answering ‘Why?’
groups who behave in a risk questions.
similar way to customize Question: Which customers Question: What is causing
marketing. are likely to churn? customers to churn?
Question: Do customers Satisfaction? Content
naturally fall into different quality?
groups?

20
Types of Machine Learning

21
Supervised Machine Learning
• Is there a specific, quantifiable target that we are interested in or trying to predict?
- Think about the decision to churn
• Do we have data on this target?
- Do we have enough data on this target?
- Need a min ~500 of each type of classification
• Do we have relevant data prior to decision?
- Think timing of decision and action
• The result of supervised data mining is a model that predicts some quantity

22
Subclasses of Supervised Machine Learning
• Classification
- Categorical target
• Often binary (yes/no) but could also be multiclass (three or more classes)
• Regression
- Numeric target

23
Subclasses of Supervised Machine Learning
• ‘Will this customer purchase service 𝑆1 if given incentive 𝐼1?’
- Classification problem
- Binary target (the customer either purchases or does not)
• ‘Which service package (𝑆1, 𝑆2, or none) will a customer likely
• purchase if given incentive 𝐼1?;
- Classification problem
- Three-valued target
• ‘How much will this customer use the service?’
- Regression problem
- Numeric target
- Target variable: amount of usage per customer

24
How does supervised machine learning work?

25
Using supervised machine learning to induce a prediction
model from training dataset

26
Example dataset
Target feature is what we want to predict—in the case below this is the probability that the loan will be
repaid or default. The learning is supervised because there is a target variable which supervises what the
model is optimizing for.

27
Simple supervised machine learning model for prediction

28
Unsupervised machine learning
• Goal: discover structure in data
• We have input observations, but no target
feature.
• Used to identify groups of similar observations
using clustering techniques. For example, the
model might segment transactions based on
the money amount, currencies, payment
device and other variables into different
groups.
• Other examples:
- Topic discovery in text
- Community detection in network data

29
Machine Learning Process in Business

• Business Understanding
• Data Understanding
• Data Preparation
• Modelling
• Evaluation
• Deployment

30
Hands-on learning using accessible and user-friendly
computational tools

31
Python & Jupyter for AI/ML Research
Python
• Python is open source and free programming language
• Python is one of world’s most popular programming language with a growing
community
• Python programming skills are in high demand on the job market
• The Python ecosystem includes fast, powerful, and flexible open source tools for
doing data science and AI/ML, such as Pandas, Seaborn, and scikit-learn
Jupyter Notebook and Colab
• Jupyter Notebook is an open-source web application that allows you to create and
share documents that contain code, equations, visualisations and text
• Supports a wide range of workflows in data science and machine learning
• Colab is a free environment that runs Jupyter notebooks on the Google Cloud and
requires no install or setup.

32
33
User-friendly
interactive
computational
tools

• Prior knowledge of
programming is not
required
• Coding for ML/AI will be
taught from first
principles

34
Next week
Week 2
Python for Reproducible Machine Learning

•[HIML] Chapter 2
•Chapter 3 in David Amos et al. Python
Lecture Reading
Basics: A Practical Introduction to Python.
•Watch https://www.youtube.com/watch?v=in
N8seMm7UI ;

•Hands-on lab: Danchev, V. Python for data


analysis on the cloud
Seminar Preparation
•Review Welcome to Colab
•Watch Introduction to Colab

35
Acknowledgements

• Courses/slides by Foster Provost, Panos Adamopoulos, Karolis Urbonas, Leonid


Zhukov, Mladen Kolar, John Kelleher, Chirag Shah

36
Thank you

You might also like