You are on page 1of 3

STAT 630: Advanced Data Analysis

Procedure for Project

May 29, 2020

A couple of guidelines for Projects

• Should clearly show the assignment date and your name(s).

• Should be stapled properly, or put into a folder which keeps all pages
together.

• The use of external sources (books, articles) when solving problems is


encouraged. Consult the library and/or Internet! References should be
mentioned in a reference list, following a standard scientific convention
(for example, see textbooks). Sloppy reference lists, and sloppy
citations in the text, will lead to a lower score!

• Should be turned in on time. It is possible to allow for exceptions


based on an agreement (individually or for the whole group). But
exceptions should be exceptional!

• Your performance on the project will be taken into account in your


final result.

• One aim of the Advanced Data Analysis class is to further train the
students in writing (short) reports about solving a problem. Indeed, a
statistician should write a report for an often non-technical audience.
This differs from the secondary-schoolish “question-response” style.

Guidelines for writing report

• Try to have an idea about the content of a dataset. Possibly generate


some hypotheses in your head, to prevent you from presenting totally
impossible conclusions, such as an average age in a human population
of 254 years (e.g. due to typing errors in data entry).

1
• When writing your own R code, it is mandatory to present the pro-
grams you used, but they ought to be put into an “appendix”. The
main body of a questions answer should be a (1) an introduction, con-
taining a concise description of the problem, probably some descriptive
statistics; (2) the methods used (you do not need to repeat the whole
course, but just point to the methodology used, perhaps supplemented
with references), (3) then the data analysis, supported with selected
computer output; computer output should be digested (when using R,
we all know that it present enormous amounts of output. No need to
present everything in the main part of your answer. Pages of output
with a couple of handwritten comments on it is totally unacceptable
and will imply failure on that question. Should you think that ex-
tra output is necessary, add it to the appendix. (4) discussion and
concluding remarks. In the concluding section, you may want to add
some ideas about further analysis of the data. It doesn’t harm to
use/suggest techniques that are strictly speaking outside the subjects
covered in the lecture to which the homework assignment refers (e.g.
reference to earlier chapter, or different courses such as regression,
ANOVA, Multivariate analysis, nonparametrics).

• Most students write a report to convince themselves about their under-


standing and/or to show the lecturer that they master the field. This
is not the idea, nor is it necessary to duplicate the course or to mimick
the teacher in writing a report. Rather, put yourself in the position of
your reader (a medical doctor, a biologist, a health economist), who
has limited time and knowledge about statistics, and who wants to
learn about your results. The idea is further that a knowledgeable
reader (say, another statistician), should obtain enough information
in order to check the analysis. To such a reader, it is, for example,
sufficient to mention one-way ANOVA, rather than to spend 3 pages
on the basics of one-way ANOVA.

• For data analysis problems, students will work together in groups of


three or four students. In case a question requires individual work, this
will be mentioned. Further, if students work together on a question,
only one report per group is necessary. (This may mean that for one
part of a homework there are individual pages and for another part
there are group pages). ALWAYS keep a copy of your homeworks when
turning them in. When data have to be entered in the computer, it is
sufficient that a few students perform this task, to avoid unnecessary

2
work. It is wise to enter data TWICE (double entry), as a check on
the correctness of the entry. When you work together, all members of
the team should be listed. On such questions, only one copy per group
is necessary.

• You can choose software freely (unless otherwise stated). R is an


obvious option, but others are allowed. When you want to use an
alternative package, try to collaborate with somebody doing the same
exercise with another package, in order to be able to compare the
results. R should be familiar by now. It is always a good idea to
expand one’s knowledge on software. R, Stata, SPSS and SAS get
extremely good coverage on the Web, by the respective companies,
but also by many third party users. It never hurts to browse the Web
a bit.

Note: Advanced Data Analysis (like other courses) are not confined to
what is covered in class, in the course notes, or the textbook. Rather, it
is a very dynamic environment where the student (and later the practicing
statistician) should find the information where it is available.

Grading
Your report will be graded using the proposed rubric below.

Category Score
Title/Title page 2
TOC, TOF 5
Introduction 10
Literature review 8
Problem Statement 5
Research Objectives/Questions 5
Body (Methodology) (thoughtful paragraphs) 15
Structure and arguments 25
Conclusion (other considerations: Ethics, data sources, time lines) 5
Mechanics (punctuations, spelling, capitalization) 5
Sentence structure and word usage 5
Citations (sufficient, all work cited) 5
Bibliography 5
Total 100

You might also like