Professional Documents
Culture Documents
• Name one problem you would like to solve with data science
In-class rules
• Active participation is highly encouraged; no pressure though.
• Interrupt and speak up if you have questions (no need to raise
hands)
What is data science?
Input Data
Output Data Model
Data Science
Example – chemical engineering
You know all the gene expression pathways, and use this
knowledge to predict a phenotype based on a gene sequence;
this is not a data science approach.
You have a lot of gene sequence and phenotype data, and you
develop a model based on the data to predict which gene(s)
control a certain phenotype (and how?); this is a data science
approach.
Quiz
• Describe how you would tackle the following problem using a
data science approach
ASKCOS (MIT)
RXN4Chemistry (IBM)
……
Other examples in engineering
Other examples in life sciences
Antibody
discovery
What this course is about…
Intended learning outcomes:
• 1. Identify problems that can be formulated as a data science problem
• 2. Process different types of data to be ready for model training
• 3. Understand the principles of supervised learning methods
• 4. Perform model training, validation and testing
• 5. Clearly interpret model predictions and present model results
• 6. Know the application of data science methods in molecular science
related problems
Assessment
• Homework 20%
• Final Exam 40%
• Course project presentation 25%
• Literature critique 15%
Homework rules
• Collaboration is okay, but need to acknowledge
• Please submit timely. Late submissions must be requested
24hrs in advance with justification, otherwise points will be
deducted.
Course project
• Choose a data science problem (preferably in molecular
engineering)
• Define the problem
• Collect data
• Choose/develop models
• Train and evaluate models
• Analyze and present the results/findings
Literature Critique
• Numerous publications are out every day on machine learning
applied to chemistry/biology related problems
• Not all of the applications of machine learning methods are
appropriate – even when they are published!
• Pick a paper and analyze what might be the limitations/could be
improved (An example will be provided in future lectures)
Course Format
• Concepts followed by interactive coding sessions
• Bring laptops on Friday lectures
• Tutorials
• Based on progress of the course, will arrange as needed
• TA will be available to answer questions about homeworks/practices
Weekly schedule
Part 0. Introduction
Week 1 Real-world applications of data science in physical, chemical and life sciences