Professional Documents
Culture Documents
Students will learn the methodology in lectures, sections, and individual homeworks. They will
then use it to carry out data analyses through individual and group projects. After projects are
completed, there will be structured opportunities for peer-to-peer discussion and critique.
1
Objectives: To develop a sense for practical applications of statistical thinking and tools in biomed-
ical sciences. To learn how to make and defend statistical modeling choices, and to constructively
critique the analyses of others. To develop a working knowledge of the R programming language,
as relevant for biomedical data science applications.
Prerequisites:
Required: Stat 110 and (AP Stat or 102 or 104)/111/139.
Recommended: Stat 115.
Lectures and Sections: Lectures will cover general concepts and project critiques. Sections will
cover hands-on examples and programming. Attendance is mandatory for both.
One-hour weekly sections will begin the second week of the course. Sections will be the place to
work through examples, review lecture material in a more detailed way, solve problems, and learn
how to use the statistics packages for implementing the various methods taught in the course.
Table 2 is a draft lecture-by-lecture outline of material covered in the course. Of course we will
leave some wiggle room and make variations.
Homework: The workload will consist of four homework assignments and three projects. Home-
work will help you master analysis and conding skills to carry out the projects.
Homework assignments will be made available on the course web page. You will be told when the
assignment is posted online. Homework solutions are due by 4:00pm on the due date. You are
free to discuss and work on homework problems with other students, but you should write up your
solutions independently (see the collaboration policy statement).
The official course policy is that no late homework will be accepted. In return for your timely
submission of homework, we will make every effort to return graded homework promptly. See
Table 2 for key dates.
Project Assessment and Grading. The course requires three projects. See Table 2 for key
dates.
Projects will be graded by the TF on the same scale as the homework. Separately, projects will also
be reviewed by a peer group, and some will be discussed in class. After projects are turned in, we
will form mini-groups (4-5 students) to discuss each other’s projects. Each group will hold a guided
discussion of project results, and nominates one project for class discussion in the next lecture. The
instructor will review nominated projects and choose two or three. No additional credit will be
given for this selection as they will be identified for controversy/discussion value as much as quality.
2
Class Date Methodological Topic Key Dates
Jan 28 L1 Course introduction
Jan 30 L2 Gene Expression Biomarkers; OCa Data
Feb 4 L3 Biomarker Modeling and Evaluation
Feb 6 L4 Decision Analysis
Feb 11 L5 Bayesian Analysis
Feb 13 L6 MCMC Homework 1 Due on Feb 13th
Feb 18 L7 Time-to-event Data
Feb 20 L8 Last minute Q&A with AW Project 1 Due
Feb 25 L9 Comparing Criteria for Biomarker Discovery
Feb 27 L10 False Discovery Rates
Mar 3 L11 Project 1 Discussion
Mar 5 L12 Empirical Bayes FDR Homework 2 Due Mar 5th
Mar 10 L13 Biomarker Validation
Mar 12 L14 Biomarker Meta-analysis Take-home midterm exam on Mar 11,12,13
SPRING BREAK
Mar 24 L15 MCMC for Hierarchical Models, Binary
Mar 26 L16 MCMC for Hierarchical Models, Continuous
Mar 31 L17 Joint versus Conditional Models
Apr 2 L18 Prediction of Future Studies Homework 3, Due Apr 4
Apr 7 L19 Top Scoring Pairs
Apr 9 L20 Clipping vs LDA vs TSP vs Logistic Project 2 Due
Apr 14 L21 Project 2 Discussion
Apr 16 Cross-Study Validation Homework 4 Due
Apr 21 Class Presentations for Final Projects
Apr 23 Class Presentations for Final Projects
Apr 28 Class Presentations for Final Projects
Final Project Due May 9th
The final project will be due sometimes after the end of classes. Students will make a class presen-
tation of the final project’s results. We will set aside three lecture slots at the end of the semester
for this. The final project will incorporate any feedback, and a discussion of how it was addressed.
Additional more specific instructions will be given when the final project is posted.
Exam: The course will have one take-home midterm exam during the semester. The midterm will
test methodological concepts from the first half of the course. Students will be required to take it
online in a three-hour session within a 24-hour window.
3
Grades: Course grades will be determined by the following components, with the weights shown:
Class participation will be evaluated based on contributions to the peer-review process and class
discussions.
Computing: The computer package for this course is R, which runs on Windows, MacOS, and
Linux systems, and can be used natively or within the Rstudio environment. R is free and available
online through www.r-project.org. The free version of Rstudio is at
https://www.rstudio.com/products/rstudio/download/ R is straightforward to learn, but is
sufficiently powerful and versatile to be useful for many projects that you might carry out after
this course. Reference guides on R and on the specific R packages required by the projects will be
placed on the course web site. No prior knowledge of R is needed, general concepts and examples
will be provided, but a fair amount of independence is expected in figuring out details. The online
resources are very helpful.
Documentation of your analyses will be through the R markdown language within RStudio.
Course Materials There is no single textbook for the course. Materials will include general
references on computing and modeling, such as,
as well as articles or book chapters specific to the individual topics. All course materials will be
free and available through links on the course website.
I will make every effort to make lectures available in real time for tablet note-taking.
Logistics: A detailed schedule, including due dates for homework and projects, is in Table 2.
All course documents, including homeworks, supplementary material, links to data and software,
will be available on the course web site on canvas. Students will receive course announcements
through canvas e-mail.
4
of the term. Failure to do so may result in us being unable to respond in a timely manner. All
discussions will remain confidential.
Collaboration policy statement: University policies against plagiarism will be strictly enforced.
You are encouraged to discuss problem sets with your classmates, but each student must write up
and code solutions separately. Be sure that you have worked through each problem yourself and
that the answers you submit are the results of your own efforts. You also may not share or view
another student’s computer code, submit output from another student’s computer session, or allow
another student to view your code or output. A good rule of thumb: if a fellow student asks if you
would like to discuss a homework problem, we encourage you to say “yes”; if a fellow student asks
to see your answer to a homework problem or R code, the answer is “no.”