You are on page 1of 122

CME538 Introduction to Data Science

Week 1 | Lecture 1 (1.1)


Introduction to the Data Science landscape and its application in engineering.
Seb
▪ Completed CivMin PhD in 2014 where I
studies rock fracture and seismology
(RFDF).
▪ AI Lead at KORE Geosystems, a mining
tech startup.
▪ Senior Research Scientist at SickKids.
▪ Joined UofT CivMin in January 2020.
▪ Research topics: rock mechanics,
ultrasonics, signal processing, computer
vision, applied machine learning in
mining.
Building Tech in Mining
▪ Mining companies drill thousands of meters
every year and the recovered rock (core) needs
to be visually described by humans.
Building Tech in Mining
Building Tech in Mining
SPECTOR OPTICS SPECTOR AI SPECTOR GEO

Scanned images are Machine learning algorithms initiate


transmitted to the cloud for core logging by classifying samples
analysis and storage in minutes
Building Tech in Healthcare
▪ Up to 29% of critically ill
children experience

Normal
arrhythmias.
▪ Each year at SickKids 700
critically ill children suffer
arrhythmias.

Not Normal
▪ In 2020, arrhythmia was
directly implicated as the
cause of morbidity and
mortality in 114 children.
▪ Time to diagnosis is the
most important factor in
determining patient
outcomes.
The Critical Care Unit
Reality 19 Rooms
42 Beds
2 Staff Physician
Working
AI The Critical Care Unit
19 Rooms
42 Beds
Mjaye Trains Expert AI
system for Arrhythmia AI
Detection & Diagnosis AI AI AI AI AI AI
AI AI
AI AI AI

AI AI AI

AI AI
AI
AI AI AI
AI AI AI AI
AI AI
AI AI AI
AI AI
AI AI
AI AI AI AI
AI
How is this task done today?

Electrophysiologist
Staff Physician
Fellow
Nurse

Nurse
JET 30 minutes later 1 hour later 12 hours later 1 hour later Treatment
Onset begins
How is this task done with AI? AI Diagnosis
JET 98%

Staff Physician
AI
Detects
JET

Nurse
Pages
Mjaye
Confirm 12-Lead

JET 5 minutes later Treatment


Onset begins

AI
What is the impact?

Onset Hours
Treatment

▪ Impact: Early diagnosis means early treatment,


AI which means a reduction in mortality and
improved patient outcomes and we can
measure it.

Onset Seconds Treatment


I like doing other things

▪ I love travelling and have visited


over 20 countries including France,
Switzerland, Italy, Austria, Croatia,
Costa Rica, Nicaragua, Panama,
Peru, Bolivia, United States, Israel,
Jordan, West Bank, Egypt, Turkey,
Morocco, South Africa, and
Namibia.
▪ I was a founding member of
Ottawa’s premiere Beatles cover
band.
▪ I love camping, hiking, and surfing.
▪ I have a wonderful baby boy named
Avery Goodfellow and baby girl
name August.
▪ I love building things with code (
Seb-Good).
What is Data Science?
Data Science, what is it?
▪ The application of data centric, computation, and inferential
thinking to

understand solve
the world problems
(Science) (Engineering)

* Data Science is fundamentally interdisciplinary.


Drew Conway’s Venn Diagram of Data Science.

Skills of Data Science


Domain
▪ Statistics & Math. Expertise
▪ Computer Science.
▪ Domain Expertise. Danger Data
Zone! Analyst
Data
Science

Machine
Learning
Drew Conway’s Venn Diagram of Data Science.

Skills of Data Science


Domain
▪ Statistics. Expertise
▪ A discipline concerned
with the collection,
organization, analysis, Danger
Zone!
Data
Analyst
interpretation and Data
presentation of data. Science
▪ Math.
▪ Linear Algebra Machine
Learning
▪ Calculus
Drew Conway’s Venn Diagram of Data Science.

Skills of Data Science


Domain
▪ Computer Science. Expertise
▪ The study of computers and
computational systems,
including their theoretical and Danger Data
algorithmic foundations, Zone! Analyst
hardware and software, and Data
their uses for processing Science
information.
▪ Practically, you do not need a Machine
degree in CS to be a Data Learning
Scientist.
▪ But you do need hacking skills
and to think algorithmically.
Drew Conway’s Venn Diagram of Data Science.

Skills of Data Science


Domain
▪ Domain Expertise. Expertise
▪ Implies knowledge and
understanding of the
essential aspects of a Danger
Zone!
Data
Analyst
specific field of inquiry. Data
▪ Traffic Engineer. Science

▪ Climate Scientist.
Machine
▪ Investor. Learning

▪ Medical Doctor.
Drew Conway’s Venn Diagram of Data Science.

Skills of Data Science


Domain
▪ Data Analyst. Expertise
▪ This is the group most
Engineers would fall into.
Danger Data
▪ Engineers have domain Zone! Analyst

expertise, for example Data


Traffic Engineering, and Science
basic Statistics and Math
knowledge. Machine
Learning
▪ However, their knowledge
of computer science is
limited.
Skills of Data Science
Oil Pressure
& Temp
▪ Data Analyst Operator

▪ With Data Analysis, we’re Load


interested in explaining a Air Flow

past event. For example,


how and/or why did the
event occur.
▪ Example:
▪ A mining haul truck breaks
down and we have access to
a data logger. Can we
analyze the data and RPM
understand the event? Past Future
Drew Conway’s Venn Diagram of Data Science.

Skills of Data Science


Domain
▪ Danger Zone. Expertise
▪ Domain Expertise +
Computer Science gives
people the ability to create Danger Data
what appears to be a Zone! Analyst

legitimate analysis without Data


any understanding of how Science
they got there or what
they have created. Machine
▪ This has the potential to Learning

results in misuse.
Drew Conway’s Venn Diagram of Data Science.

Skills of Data Science


Domain
▪ Machine Learning. Expertise
▪ The science of getting
computers to act
intelligently by learning Danger
Zone!
Data
Analyst
from examples and Data
without being explicitly Science
programmed.
▪X → f → y Machine
Learning
▪ Learn a function that maps
from X to y.
▪ image → f → cat or dog
Drew Conway’s Venn Diagram of Data Science.

Skills of Data Science


Domain
▪ Machine Learning. Expertise
▪ It is easy to get caught up
on the idea that you only
need technical skills to Danger Data
solve problems using Zone! Analyst

Machine Learning. Data


Science
▪ The reality is that you’ll
have a hard time getting
very far if you only think of Machine
the problem in front of you Learning

in terms of just numbers


and algorithms.
Danger
Zone!
Skills of Data Science
▪ Machine Learning.
▪ Danger Zone

X → f → y
▪ Blood Pressure ▪ Risk of Sepsis
▪ Heart Rate
▪ Age 98% Accuracy
▪ Diagnosis
▪ Lab Tests (white
blood cell count)
Skills of Data Science
▪ Machine Learning. Missing knowledge of the data.
▪ Danger Zone

X → f → y
▪ Blood Pressure ▪ Risk of Sepsis
▪ Heart Rate
▪ Age 98% Accuracy
▪ Diagnosis
▪ Lab Tests (white
blood cell count)
Skills of Data Science
▪ Machine Learning. Missing knowledge of how the
▪ Danger Zone model would be implemented.

X → f → y
▪ Blood Pressure ▪ Risk of Sepsis
▪ Heart Rate
▪ Age 98% Accuracy
▪ Diagnosis
▪ Lab Tests (white
blood cell count)
Drew Conway’s Venn Diagram of Data Science.

Skills of Data Science


Domain
▪ Machine Learning. Expertise
▪ Danger Zone
▪ Solutions: Danger Data
▪ Take the time to learn about Zone! Analyst
the domain. Data
▪ Approach problems where Science
you have domain
knowledge (Civil & Mineral
Machine
Engineering). Learning
▪ Partner with domain
experts.
Danger
Zone!
Buzz-Word Overload!
Buzz-Word Overload
▪ Data Science
▪ Data Analytics What exactly
▪ Big Data
▪ Data Engineering do all of these
mean?
▪ Artificial Intelligence
▪ Machine Learning
▪ Deep Learning
Drew Conway’s Venn Diagram of Data Science.

Buzz-Word Overload
Domain
▪ Data Science Expertise
▪ An inter-disciplinary field
that uses scientific
methods, processes, Danger
Zone!
Data
Analyst
algorithms and systems to Data
extract knowledge and Science
insights from data.
Machine
Learning
Buzz-Word Overload
▪ Data Analytics
▪ Data Analytics is often
conducted with a specific goal
in mind.
▪ With Data Analytics,
information is often split into
two groups: what companies
know and what they are
aware that they do not know.
▪ Employing Data Analytics, a
company can sort through
data to find specific insights
targeted to its needs and
goals.
Image Link
Buzz-Word Overload
▪ Big Data
▪ Big data is a term that describes
the large volume of data that
inundates a business on a day-
to-day basis.
▪ Four V’s of Big Data:
▪ Volume
▪ Velocity
▪ Variety
▪ Veracity

Image Link
Buzz-Word Overload
▪ Data Engineering
▪ The development,
construction, testing
and maintenance of
architectures, such as
databases and large-
scale processing
systems.
▪ Data Engineers
transform data into a
useful format for
analysis.
Image Link
Buzz-Word Overload
▪ Artificial Intelligence (AI)
▪ Any technique that enables
computers to execute tasks in an
intelligent manner:
▪ Robotics
▪ Expert Systems
▪ Natural Language Processing
▪ Machine Learning
“Just as electricity transformed almost everything 100 years ago, today
I actually have a hard time thinking of an industry that I don’t think AI
will transform in the next several years.”
- Andrew Ng
Buzz-Word Overload
▪ Machine Learning (ML)
▪ The science of getting computers to
act intelligently by learning from
examples and without being explicitly
programmed.
▪ Applications:
▪ Fraud Detection
▪ Spam Filtering
▪ Netflix, Amazon, recommendations
▪ Models:
▪ Linear Regression
▪ Decision Trees
▪ Random Forests
▪ Neural Networks
Buzz-Word Overload
▪ Deep Learning (DL)
▪ A sub-domain of machine
learning that uses deep neural
networks.
▪ Applications:
▪ Speech Recognition (Siri, Cortana,
Alexa)
▪ Natural Language Processing
(Google Translate)
▪ Face Recognition (iPhone X)
What does it mean to be a
Data Scientist today?
What does it mean to be a Data Scientist
▪ Asked people
involved with Data
Science to complete
an online survey.
▪ Self-reported →
selection bias.
▪ 983 Respondents. ▪ 19,717 Respondents.
▪ 2016. ▪ 2019 (more recent).
▪ Survey Bias: More ML focused. ▪ Survey Bias: More ML focused.
▪ Charts focus on 21% with Data
Scientist title.
Country
▪ The largest number of
responses to the
survey were from the
United States and
India. Brazil and
Russia were the next-
most common
locations. Countries
not shown (such as
many in Central
Africa) had no
responses. Remember, self-ported → selection bias.
Country
▪ California has the
highest median salary
of any state or
country, even though
its per capita GDP
($62K) is not ranked
so high.
▪ The anomaly is likely
due to the San
Francisco Bay Area,
where per capita GDP
is $80K–$90K. Remember, self-ported → selection bias.
Education
▪ The data scientist
community is highly
educated.
▪ Looking at only
employed data
scientists, over 70% of
respondents have a
degree above a
bachelor’s degree,
with a majority
(~52%) having a
master’s degree. Remember, self-ported → selection bias.
Age
▪ Millennials dominate
data science, with 25-
29-year old's being
the most common age

Millennials
bracket.
Salary
▪ United States Data
Scientists average
higher wages than
others surveyed,
followed by Germany
and Japan.

Remember, self-ported → selection bias.


Salary
▪ United States Data
Scientists average
higher wages than
others surveyed,
followed by Germany
and Japan.

Remember, self-ported → selection bias.


Activities
▪ What do users say is
the most common
duty of being a data
scientist?
▪ More than complex
machine learning,
over 75% suggested
understanding and
analyzing the data is a
common activity.
Activities
▪ What do users say is the
most common duty of
being a data scientist?
▪ More than complex
machine learning, over
75% suggested
understanding and
analyzing the data is a
common activity.
Activities
▪ Less than half of respondents had major
involvement in ML activities.
▪ BUT! Developing prototype models had the largest
impact on salary.
What you’ll learn in CIV1498
Programming Languages
▪ Python and R are the
most popular
scripting language
for performing most
Data Science tasks.
▪ SQL is a database
query language and
because most data is
store in SQL
databases, SQL is
required for
extracting data.
Dev Tools What you’ll learn in CIV1498

▪ The most common


analytics tools are by
far local development
environments.
▪ Out of those,
JupyterLab and its
offshoots are the most
common, with 83% of
data scientists using it
on a regular basis.
Algorithms What you’ll learn in CIV1498

▪ Respondents are big


fans of keeping it
simple.
▪ The most common
methods are linear or
logistic regression,
followed by decision
trees.
▪ While not as powerful
as more complex
techniques, they can
still be quite effective
and are easier to
interpret.
Frameworks What you’ll learn in CIV1498

▪ As for the machine


learning frameworks
used to employ their
techniques, data
scientists use multiple
tools.
▪ Over 80% use Scikit-
learn, a Python package
containing popular data
science algorithms.
▪ TensorFlow and Keras,
often used in
combination, continue
to be the dominant
deep learning
framework.
Potential, concerns, and a lot
of hype!
A lot of hype
▪ Gartner Hype Cycle
methodology gives you a
view of how a technology or
application will evolve over
time, providing a sound
source of insight to manage
its deployment within the
context of your specific
business goals.
A lot of hype
▪ Massive Open Online Courses
(MOOCs). 2012: edX
2012: Coursera
2012: The year of the MOOC

2012: Udacity
2012: Udemy
2013: Year of the anti-MOOC
2015+: MOOCs evolution

2008: CCK09 2015: MOOCs at catalyst for


first MOOC higher education

Open Education 2013-2015: Increase in MOOC research


A lot of hype
▪ The glamorized headlines are
only part of the picture.
▪ We need to focus on the
realistic applications of AI and
Data Science that tell the full
story.
▪ When thinking about AI and
Data Science in the short-term,
keep in mind that AI takes years
to design and even more time
to perfect, especially due to the
ever-changing and
accumulating nature of data.
Real Concerns
▪ Reshaping the labor market:
▪ Automation will create new job but kill many
more.
▪ Obscuring complex decisions:
▪ Mortgage backed securities → market crash.
▪ Teaching scores & job advancement.
▪ Reinforcing historical biases:
▪ Hiring based on previous hiring data.
▪ Police use of facial cognition AI.
▪ Polarization through AI created echo
chambers:
▪ Social media, news, elections (2016).
▪ We will discuss the ethics of Data Science
throughout the class.
Image Link
Real Concerns
▪ Reshaping the labor market:
▪ Automation will create new job but kill many
more.
▪ Obscuring complex decisions:
▪ Mortgage backed securities → market crash.
▪ Teaching scores & job advancement.
▪ Reinforcing historical biases:
▪ Hiring based on previous hiring data.
▪ Police use of facial cognition AI.
▪ Polarization through AI created echo
chambers:
▪ Social media, news, elections (2016).
▪ We will discuss the ethics of Data Science
throughout the class.
Image Link
Real Concerns
▪ Reshaping the labor market:
▪ Automation will create new job but kill many
more.
▪ Obscuring complex decisions:
▪ Mortgage backed securities → market crash.
▪ Teaching scores & job advancement.
▪ Reinforcing historical biases:
▪ Hiring based on previous hiring data.
▪ Police use of facial cognition AI.
▪ Polarization through AI created echo
chambers:
▪ Social media, news, elections (2016).
▪ We will discuss the ethics of Data Science
throughout the class.
Image Link
Real Concerns
▪ Reshaping the labor market:
▪ Automation will create new job but kill many
more.
▪ Obscuring complex decisions:
▪ Mortgage backed securities → market crash.
▪ Teaching scores & job advancement.
▪ Reinforcing historical biases:
▪ Hiring based on previous hiring data.
▪ Police use of facial cognition AI.
▪ Polarization through AI created echo
chambers:
▪ Social media, news, elections (2016).
▪ We will discuss the ethics of Data Science
throughout the class.
Image Link
Real Potential
▪ Data Science is changing the
world.
▪ We interact with Data Science
technology multiple times a
day without even knowing it.
▪ Recommendations
▪ Authentication
▪ Customer Service Chatbots
▪ Targeted Advertising
▪ Airline Route Planning
▪ Medical Imaging
Real Potential
▪ Data Science is changing the
world.
▪ We interact with Data Science
technology multiple times a
day without even knowing it.
▪ Recommendations
▪ Authentication
▪ Customer Service Chatbots
▪ Targeted Advertising
▪ Airline Route Planning
▪ Medical Imaging
Real Potential
▪ Data Science is changing the
world.
▪ We interact with Data Science
technology multiple times a
day without even knowing it.
▪ Recommendations
▪ Authentication
▪ Customer Service Chatbots
▪ Targeted Advertising
▪ Airline Route Planning
▪ Medical Imaging Image Link
Real Potential
▪ Data Science is changing the
world.
▪ We interact with Data Science
technology multiple times a
day without even knowing it.
▪ Recommendations
▪ Authentication
▪ Customer Service Chatbots
▪ Targeted Advertising
▪ Airline Route Planning
▪ Medical Imaging
Real Potential
▪ Data Science is changing the
world.
▪ We interact with Data Science
technology multiple times a
day without even knowing it.
▪ Recommendations
▪ Authentication
▪ Customer Service Chatbots
▪ Targeted Advertising
▪ Airline Route Planning
▪ Medical Imaging
Real Potential
▪ Data Science is changing the
world.
▪ We interact with Data Science
technology multiple times a
day without even knowing it.
▪ Recommendations
▪ Authentication
▪ Customer Service Chatbots
▪ Targeted Advertising
▪ Airline Route Planning
▪ Medical Imaging
Real Potential
▪ Data Science is already
having a profound impact
on human life.
▪ There are many exciting
opportunities to have real
impact in diverse industries
around the world.
▪ Particularly for engineers!
Data Science and
Civil & Mineral Engineering.
Applications in Civil & Mineral Engineering
▪ The most visible applications of Data Science are in the tech,
ecommerce, advertisement, and social medial space.
Applications in Civil & Mineral Engineering
▪ Data Science is an essential skill for Civil & Mineral Engineers
in the 21st century.
▪ Why?

The Cloud Cheap Sensors Artificial Big Data The Internet


Intelligence Of Things
Applications in Civil & Mineral Engineering
▪ Civil Engineering:
▪ Construction site safety.
▪ Generative design.
▪ Predict cost overruns.
▪ Assessing indoor air quality.
▪ Autonomous machinery (cranes,
loaders, haul trucks).
▪ Project planning and tracking.
▪ Predictive maintenance.
▪ Traffic Optimization.
▪ Ride share planning.
▪ Water planning and management. Image Link
Applications in Civil & Mineral Engineering
▪ Civil Engineering:
▪ Construction site safety.
▪ Generative design.
▪ Predict cost overruns.
▪ Assessing indoor air quality.
▪ Autonomous machinery (cranes,
loaders, haul trucks).
▪ Project planning and tracking.
▪ Predictive maintenance.
▪ Traffic Optimization.
▪ Ride share planning.
▪ Water planning and management. Image Link
Applications in Civil & Mineral Engineering
▪ Civil Engineering:
▪ Construction site safety.
▪ Generative design.
▪ Predict cost overruns.
▪ Assessing indoor air quality.
▪ Autonomous machinery (cranes,
loaders, haul trucks).
▪ Project planning and tracking.
▪ Predictive maintenance.
▪ Traffic Optimization.
▪ Ride share planning.
▪ Water planning and management. Image Link
Applications in Civil & Mineral Engineering
▪ Civil Engineering:
▪ Construction site safety.
▪ Generative design.
▪ Predict cost overruns.
▪ Assessing indoor air quality.
▪ Autonomous machinery (cranes,
loaders, haul trucks).
▪ Project planning and tracking.
▪ Predictive maintenance.
▪ Traffic Optimization.
▪ Ride share planning.
▪ Water planning and management. Image Link
Applications in Civ & Min
▪ Mineral Engineering:
▪ Mineral exploration.
▪ Generative mine design.
▪ Equipment health monitoring.
▪ Mine site safety.
▪ Autonomous vehicles.
▪ Deep mine hazard assessment.
▪ Assisted core logging.

Image Link
Applications in Civil & Mineral Engineering
▪ Mineral Engineering:
▪ Mineral exploration.
▪ Generative mine design.
▪ Equipment health monitoring.
▪ Mine site safety.
▪ Autonomous vehicles.
▪ Deep mine hazard assessment.
▪ Assisted core logging.

Prospectivity
Map
Applications in Civil & Mineral Engineering
Applications in Civil & Mineral Engineering
Oil Pressure
▪ Mineral Engineering: Operator
& Temp

▪ Mineral exploration. Load


▪ Generative mine design. Air Flow
▪ Equipment health monitoring.
▪ Mine site safety.
▪ Autonomous vehicles.
▪ Deep mine hazard assessment.
▪ Assisted core logging.

RPM
Applications in Civ & Min
▪ Mineral Engineering:
▪ Mineral exploration.
▪ Generative mine design.
▪ Equipment health monitoring.
▪ Mine site safety.
▪ Autonomous vehicles.
▪ Deep mine hazard assessment.
▪ Assisted core logging.
"A single missing
tooth can result in
productivity losses of
USD $430k per
incident"
Applications in Civ & Min
▪ Mineral Engineering:
▪ Mineral exploration.
▪ Generative mine design.
▪ Equipment health monitoring.
▪ Mine site safety.
▪ Autonomous vehicles.
▪ Deep mine hazard assessment.
▪ Assisted core logging.
Applications in Civ & Min
▪ Mineral Engineering:
▪ Mineral exploration.
▪ Generative mine design.
▪ Equipment health monitoring.
▪ Mine site safety.
▪ Autonomous vehicles.
▪ Deep mine hazard assessment.
▪ Assisted core logging.
Applications in Civ & Min
Applications in Civ & Min
Applications in Civ & Min
Applications in Civ & Min
Applications in Civ & Min
Applications in Civ & Min
Applications in Civ & Min
Applications in Civ & Min
ML Applications
Applications in Civ & Min
Applications in Civ & Min
Opportunities in Civil & Mineral Engineering
▪ Domain knowledge is essential for successful AI.
# Import 3rd party dependencies
from keras import applications
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D

# Load dataset
x_train, y_train, x_test, y_test, classes = extract_transform_load_dataset()

# Set data parameters


image_height, image_width = 64, 64
num_classes = 6

# Get ResNet50 graph


base_model = applications.resnet50.ResNet50(weights=None, include_top=False,
input_shape=(image_height, image_width, 3))
CIVIL & MINERAL
# Add custom top ENGINEERING.
out = base_model.output
out = GlobalAveragePooling2D()(out)

# Add softmax layer corresponding to the number of class labels in our dataset
out = Dense(num_classes, activation='softmax')(out)

# Define model
model = Model(inputs=base_model.input, outputs=out)

# Train model
model.fit(x_train, x_train, epochs=100, batch_size=64)

# Evaluate model
predictions = model.evaluate(x_test, y_test)

# Save model
model.save_weights('model.h5’ )
Opportunities in CivMin

▪ Industry is waking up the $1,000,000 Investment.


potential of Data Science.
▪ The CivMin-Tech Owner of McEwen
Mining Inc.
President of Cisco
Systems Canada COO of Goldcorp
community is welcoming, CEO of Franco- SHARK on ABC’s
vibrant, and growing. Nevada Corporation Shark Tank

▪ Meetups
▪ Hackathons
▪ Startup Incubators
▪ Competitions #BIGMONEY
Opportunities in Civil & Mineral Engineering
▪ You’re early.

You’re Here
Data Science = Essential CivMin Skill
▪ You will need to be able to access, clean, explore, and interpret
data.
▪ Even if you’re not going to be the one building ML models, you
NEED to be able to speak the language.

The Cloud Cheap Sensors Artificial Big Data The Internet


Intelligence Of Things
Applications in Civil & Mineral Engineering

You’re at ground zero for the CivMin Technology


Revolution, and its just getting started.

- lucky you! ;)
CME538, about this course.
Teaching Team
▪ Sebastian Goodfellow (sebastian.goodfellow@utoronto.ca)
▪ Navid Kayhani (navid.kayhani@mail.utoronto.ca)
▪ Marc Saleh (marc.saleh@mail.utoronto.ca)
▪ Soroush Sobhkhiz (s.sobhkhiz@mail.utoronto.ca)

Seb Navid Marc Soroush


Who is this course for?
▪ Civil & Mineral Engineering Undergraduate and Graduate students
with minimal to no Data Science exposure.
▪ Student’s are expected to have completed these CivMin
undergraduate courses or equivalent:
▪ MAT186 - Calculus I
▪ MAT187 - Calculus II
▪ MAT188 - Linear Algebra
▪ APS106 - Fundamentals of Computer Programming
▪ CME261 - Engineering Mathematics I
▪ CME263 - Probability Theory for Civil and Mineral Engineers
▪ Student’s are expected to be familiar with standard coding concepts
and Python syntax (APS106).
▪ Kaggle - Python
Learning Objectives
▪ Gain a high-level understanding of the various domains that make up the
Data Science landscape.
▪ Develop an intermediate level of proficiency with Python and common
Data Science libraries.
▪ Develop a solid grounding in core Data Science areas including data
collection and cleaning, EDA, visualization, and applied machine learning.
▪ Learn how to navigate through the Data Science lifecycle: Get Data, Clean
Data, Analyze Data, Visualize Data, Answer Questions with Data, Predict
with Data.
▪ Learn how to tell stories with data. #communication
▪ Be prepared to enter more advanced courses in Machine Learning courses
offered by MIE, ECE, and CS.
▪ Empower students to apply learnings from this course to their research,
other courses and beyond to address real-world engineering problems.
Educational Philosophy
▪ Understand the basics and you can apply that understanding to
more complicated problems, models, and tools.
▪ Linear Regression and Logistic Regression
▪ Reinforcement Learning and XGBoost
▪ Real Data Science is messy!
▪ 80/20 rule.
▪ train.csv + test.csv → model.fit(x_train, y_train) → model.predict(x_test)
▪ This isn’t real life.
▪ We spend a lot of time focusing on EDA and visualization.
▪ We focus on the basics to give you a solid foundation to
continue your learning.
▪ This course is just the start of your Data Science education.
My Motivation
▪ I wanted to develop this course to
make Data Science skills accessible to
Civil and Mineral Engineering
students. CIVIL & MINERAL
ENGINEERING.
▪ Having spent the past 5 years in the
private section, its clear to me that
Data Science is an essential skill for
Civil and Mineral engineers.
▪ There are many other Data Science
courses at UofT but they are often to
advanced for many CivMin students.
Term Schedule
▪ Week 1 ▪ Week 4
▪ 1.1 Course introduction ▪ 4.1 Working with text and datetimes
▪ 1.2 Data Science toolbox ▪ 4.2 Data visualization I
▪ 1.3 Python basis review ▪ 4.3 Data visualization II
▪ Week 2 ▪ Week 5
▪ 2.1 Working with DataFrames in Pandas I ▪ 5.1 Time series data
▪ 2.2 Working with DataFrames in Pandas II ▪ 5.2 Geospatial data
▪ 2.3 Working with DataFrames in Pandas III ▪ 5.3 Introduction to machine learning
▪ Week 3 ▪ Week 6
▪ 3.1 Exploratory data analysis ▪ 6.1 Loss functions
▪ 3.2 Importing data from different sources ▪ 6.2 Gradient descent I
▪ 3.3 Cleaning Data ▪ 6.3 Gradient descent II
Term Schedule
▪ Week 7 ▪ Week 10
▪ 7.1 Regression ▪ 10.1 Feature extraction
▪ 7.2 Feature engineering ▪ 10.2 Clustering
▪ 7.3 Generalization ▪ 10.3 Random Forests
▪ Week 8 ▪ Week 11
▪ 8.1 Classification I ▪ 11.1 Bias and variance revisited
▪ 8.2 Classification II ▪ 11.2 MLOps
Reading ▪ 8.3 Cross-validation and class imbalance ▪ 11.3 AI Ethics
Week
▪ Week 9 ▪ Week 12
▪ 9.1 Diagnosing bias and variance ▪ Project presentations
▪ 9.2 Regularization
▪ 9.3 Feature selection and hyper-parameter
tuning
Weekly Schedule
Lectures 3 hours!!!?? Ouch!

Live Lectures
Mondays
12-3 pm

Content Release
Mondays 1 am
Lecture Demo (link)
Tutorials
Tutorials

Content Release
Mondays 1 am
Tutorial Demo (link)
Assignments (35%)

Code Quality
Assignment Demo (link)
Project 1 (25%)
▪ In this project, you'll
compete against your
classmates in a private
CME538 Kaggle machine
learning competition.
▪ We will be releasing more
information on this project
over the coming weeks.
▪ The competition will start
on November 14th and
close on November 22nd.
▪ All code will be checked for
plagiarism.
Project 1 (25%)

Assignment 8
(Optional)
Project 1
▪ Option1
▪ Project 1 is worth 25%.
▪ Option 2
▪ Project 1 is worth 20%.
▪ Assignment 8 is worth 5%.
Project 2 (40%)
▪ In this project, you'll take everything you've learned in lectures,
tutorials, assignments and project 1, and venture out on your own.
▪ Teams of 4 (Teams must be assembled by Friday, October 7th).
▪ You have a lot of freedom with how you choose to approach the
project and with what libraries, visualizations, and models you
choose to use.
▪ Start Date: Now.
▪ Due Date: December 7th.
Project 2 (40%)
▪ Project Topic
▪ Option 1: Choose your own dataset,
problem, question.
▪ Option 2: Use the bikeshare dataset.
Project 2 (40%)
▪ Deliverables
▪ Project Proposal (Due November 4th, 10 marks)
▪ Medium Article (Due December 7th 15 marks)
▪ Good "The bar chart below shows a significant drop in rides on February 17.
Further investigation revealed this was the date of the famed 2006 blizzard, which
shut down the city. A newspaper article linked here documents over 60 cm of
snowfall and widespread power outages which resulted in the military being
called in."
▪ Bad "The bar chart below shows a drop in rides in February."
▪ Live Presentation (Week 12, 10)
▪ Publish Github Repository (Due December 7th 5 marks)
▪ Your code and project structure will be evaluated.
▪ Ensure you're writing #cleancode and that anyone can easily reproduce your
results.
Industry Spotlights
Structure
▪ Lectures
▪ High-level overview of concepts and methods.
▪ Tutorials
▪ Step-by-step code-along.
Independence,
▪ Assignments Confidence &
▪ Individual work where students are guided through a Context
problem and must fill in answers.
▪ Project 1
▪ Test your ML skills.
▪ Project 2
▪ Completely open end-to-end Data Science project.
▪ Industry Spotlights
▪ Provide context (Why are you learning this stuff?).
This is going to be a lot of fun.
CME538 Introduction to Data Science
Week 1 | Lecture 1 (1.1)
Introduction to the Data Science landscape and its application in engineering.

You might also like