You are on page 1of 2

Abstract

Course: Solving Problems in Statistics and Data Science

Course Description
Past surveys of students have indicated most join AIMS with little understanding and
experience of statistics, or of data science. They also have little appreciation of the importance
of these subjects. This course is designed to address this gap. The main aim of the course is to
introduce the students to the subjects and show their roles in solving real-world problems.
The students are exposed to broader processes within data science and statistics, including
design, collection, management and analysis of data and interpretation and presentation of
results. The overall objective of the course is to equip the learners with skills and understanding
on how to use data and statistical methods to solve problems. The course employs case studies
with real datasets to achieve its aims and objectives. Through these case studies, the students
experience each step of the process in solving problems. The skills taught in this course are
useful to a wide range of disciplines and their demand in the real-world is enormous.

Learning Outcomes
By the end of the course, the students are expected to: -
1. Understand the role of data science and statistics in solving real problems with data
2. Be comfortable with producing and interpreting simple descriptive tools
3. Understand the broader processes within statistics from design and collection to
interpretation and presentation of results
4. Use real world problem and research related projects that have genuine complexity
(without single “correct” answers) and that are adapted to the level and experience of
the students involved.

Course Content
The course is organised into 3 weeks as shown below. The materials for the first two weeks
mainly include lectures, tutorial sessions and practical work. Quizzes and practical
assignments are used to help students master the skills and concepts. The final week involves
a mini-projects for students to work in groups on a problem from the beginning to the end,
culminating in a presentation.
Week 1
 Broad definition of data science and statistics that includes design, collection,
organisation, interpretation and communication
 What does a data scientist or statistician do? Consultancy questions largely
concerning design.
 Two statistical games simulating an experiment and a survey - design the study,
collect data, enter data, analyse and then write a report in the style of a short paper
 Tutorials: Introduce pivot tables in a spreadsheet & a statistics package (R-Instat)
Week 2
 Working with data using a spreadsheet and a statistics package & interpreting results
 Exposing students to handling and analysing large data sets e.g. randomised,
simulated data from large (digital) version of experimental “game”. A different
dataset for each student.
 ANOVA as a descriptive tool for data analysis
 Relating the analysis to the objectives of the study
Week 3
The mini-projects may include but not limited to the following: -
1. Corruption “red flags” in public procurement. Use open World Bank data (200,000
records, from over 140 countries.) and/or data from an individual country.
2. Cameroon climatic data analysis using daily data from Cameroon Met Service.
3. Analysis of timber trees using multi-level data from farms and plots.
4. Designing and administering a survey using ODK.
5. IFAD poverty survey data of 1,300 respondents and over 400 variables from 2018 in
Lesotho.
6. Moving into R. This uses a guide on how to write R commands from within R-Instat,
and how to transfer to writing scripts in R/Studio itself. See below number 11
instead!
7. Tidy data. Based partly on a paper by Hadley Wickham and including his data from
Mexico (500,000 cases) plus messy climatic data. Also uses ideas of data wrangling
from the data science book by Rafael A. Irizarry.
8. Data from a 2017 on farm trial of low-cost fertilisers involving 1,700 mainly women
farmers from Niger
9. Collecting and analysing data from the “Islands”. Use of this example to illustrate
either the process or the use of the ideas in teaching, or both.
10. Machine learning, using the caret package, and based on materials in the Rafael A.
Irizarry, data science book.
11. Visualising data is central to both data science and statistics. What does this involve,
and how is this best done, using R, by users with differing levels of computing
competence.
12. A survey in Ghana to evaluate the PICSA project on the use of climate information to
support innovation in agricultural practices.

Core References
1. Computer Assisted Statistics Textbook (CAST) https://cast.idems.international/
2. R for Data Science online book https://r4ds.had.co.nz/
3. Book: Good statistical practices for natural resources research

You might also like