Professional Documents
Culture Documents
Lecture 5: 7/10/2019
Housekeeping (reminder)
• Lectures
• Monday 11:35 – 12:25 Laver 212
• Tuesday 13:35 – 14:25 Queens 1B
• Labs
• Wednesday 9:35-10:25 Harrison 207
• Office Hours Monday after class.
• Assessment
• 2 Courseworks 20% each 8 November, 13 December
• 1 Exam, WINTER term – 60%. - January date.
• https://vle.exeter.ac.uk/course/view.php?id=8310
Course Content
• Introduction: History of Artificial Intelligence and Machine Learning
• Data:
• the nature of data,
• how to represent data: text, sound, images, networks;
• AI and ML applications to real world cases
• Data Representation:
• feature selection,
• feature construction;
• Machine Learning Paradigms
• supervised,
• unsupervised,
• reinforcement learning;
• Error Measures for Different Machine Learning Tasks:
• classification, regression, ranking, clustering;
• Algorithms: k-nearest neighbours, linear models, naïve Bayes, k-means, neural networks;
• Theoretical Notions in Machine Learning:
• model capacity and overfitting,
• curse of dimensionality.
What is data?
Qualitative Quantitative
• Descriptive information • Quantifiable
• Difficult/impossible to • Able to encode as
encode as numerical numerical values
values • Examples:
• Examples: • Quantities, length, mass
• names, • Categories
• feelings,
• aesthetics,
• subjective
interpretation
Representing quantitative data
Signed
Integers
Numerical Unsigned
Real
numbers
Quantitative
Free text
data
Text
Categories
Not
numerical
Colour
Images
B&W
Representing quantitative data
Signed
Integers
Numerical Unsigned
Real
numbers
Quantitative
Free text
data
Text
Categories
Not
numerical
Colour
Images
B&W
OptimiseRx
Prescribing Decision support solution in primary
care. Used in +4,000 GP Practices (England and
Wales)
● How does it work?
○ In the event of a prescription (during a medical
consultation), Optimise Rx might trigger a message
○ The GP can accept/ reject the message (not the only
options).
● What data is stored?
○ The time and practice of the medical consultation
○ The decision (accept or reject) and the message id
○ feedback from GP (text).
Relational database (tables)
● Tidy data
● How many variables and what type? (Numerical, not
numerical)
● Using the rest of the tables we can add new variables:
e.g. Practice Code -> Gps, coordinates, CCG
● In reality we have a large number of variables(~30)
Types of data (for some of the
variables)
● Numerical
○ Rejects/accepts/ hits
○ Date?
○ number of Gps
○ coordinates?
○ index of multiple deprivation
● Not Numerical
○ Type, Intent, focus of the message
○ Rejection reason (text)
○ Clinical commission group
What is the goal of this project?
● Provide a description of the data (Identify
variables affecting the number of rejections
● Suggest ways to improve the system