You are on page 1of 3

Hello and welcome to Data Analysis. I'm Jeff Leek, I'll be teaching the course.

I'm making this video to introduce myself and to tell you a little bit about where we will be going over the next 8 weeks together. First, a little bit about me. I grew up in Pocatello, Idaho. I went to college at Utah State University. Then I moved west and got an MS and a PhD in Biostatistics at the University of Washington in Seattle. For my dissertation, I worked on statistical methods for moving hidden variables from large scale and big genomic data sets. While in UW, I also met and married my wife, who is also a statistician. After my PhD, I did two brief post doctoral fellowships, one in Stem Cell Biology and another in Computational Biology, before landing here in the Biostatistics Department at the Johns Hopkins Bloomberg School of Public Health. My current research focuses on analyzing the data produced by the latest genomic technologies such as microwaves or next generation sequencers. These data can be used to help understand natural biological variation between human beings, the biochemical progression of disease or to try to predict clinical outcomes, like how people will respond to treatment. One of the coolest things about genomics is that most of the large data sets being produced are for, are available for free from sites like the Gene Expression Omnibus, ArrayExpress, or the Short Read Archive. But the data are enlarged, noisy and sensitive to errors. So, they require careful data analysis, which keeps me busy. My interest in data and statistics also led me to start a blog called Simply Statistics, that focuses on the most pressing issues in data and statistical analysis. So, that's a little bit about me. Now, on to data analysis. The course will consist of around two hours of video lectures per week. My goal is to keep the lectures short so that you can get through them quickly. There will also be a quiz each week and

there will also be two longer data analysis projects, which will be peer graded using a data analysis rubric we will go over during the class. These projects will run during Weeks 3 to 4, and 6 to 7 during the course. Data analysis lies at the intersection of several disciplines including Statistics, Computer Science, English composition, and business or scientific applications. There are no formal prerequisites for the course, but throughout the course, all data analyses will be performed in R using the R Studio Editor. You will use R and R Studio because one, it is free, two, it's the most popular language for data analysis, three, code is the best way to document and communicate components of a data analysis to other people, and four, there is a large user community that is available for support outside of the course along with the support that you will get inside the course. We'll start the data analysis course by focusing on the structure of a data analysis problem and how to frame data analysis questions. Then, we will cover how to obtain, clean, and processed data using R. We will cover the tools of exploratory data analysis such as data visualization, clustering, and principle components. We'll also cover basic methods for statistical inference using linear models, confidence intervals, and basic hypothesis testing. The course will then move on to cover the fundamentals of prediction including study design, cross-validation and a few basic prediction models. We will also cover approaches for simulating data, for modeling predictions or evaluating models and predictions. We'll finish up with a discussion of confounding, multiple testing, and knowing when to quit, or at least be suspicious when performing a data analysis. This can be one of the hardest parts of the whole process. While we cover the basics of data analysis, I will try to point out resources for further in-depth study. I'm excited about teaching you the fundamentals to get you started, but it is worth keeping in mind that eight weeks won't be nearly enough time to become an expert data analyst. Just like eight weeks isn't enough to

become an expert writer or surgeon. There are no data analysis prodigies. The only way to become a good data analyst is to gain experience through practice, repetition, and learning from mistakes. I hope this class will give you a good start down the road and inspire you to learn about the data you care about most. See you on the message boards.