Professional Documents
Culture Documents
Joubert
Silvia M. Rogers
Strategic Scientific
and Medical Writing
123
Strategic Scientific and Medical Writing
Pieter H. Joubert • Silvia M. Rogers
Need to write a short, succinct paper, a report which will capture the attention of the
reader and influence the outcome? Help is at hand in this punchy manual written by
two knowledgeable scientists and teachers with experience in the worlds of aca-
demia, industry, and regulation.
This book is written in easily accessible sections, each dealing with the practical
problems that a newcomer to the field may experience and seasoned writers need to
be reminded of. It is written with a lightness of touch, combining common sense
with illustrative examples of how to address different types of situations.
An attractive feature is that you, the reader, can test yourself on your planning
skills and performance and detect the errors you may not even have been aware of,
thereby improving your success rate in making important submissions.
But as the authors say, while learning these skills requires care and attention, it does
not need to be all drudgery and can also be enjoyable once the basic principles have been
mastered. Having read the book, you may say “but I knew all these things before.”
Good! So now is the opportunity to put them into practice, and by reading and noting the
plans outlined in this book, you will become an even better communicator.
v
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Why Bother with Writing Skills? . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Key Components of Good Medical/Scientific Writing. . . . . . . . 2
1.2.1 Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.3 Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.4 Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.5 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 How to Plan a Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 The Nature of the Document . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 The Desired Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.3 Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.4 Target Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.5 Key Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.6 Sources of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Using a Template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Written Communication in Drug Development . . . . . . . . . . . . . . . . . . . 9
2.1 Where Is Written Communication Used in Drug Development? . . . 9
2.1.1 Recording Nonclinical Findings . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Preparing Drug Development Documents . . . . . . . . . . . . . . . 10
2.1.3 Communicating with Regulatory Authorities and Other
Important Institutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Written Communication in Academic Settings . . . . . . . . . . . . . . . . . . . 13
3.1 Where Is Communication Used in the Academic Setting? . . . . . . . . 13
3.2 Scientific Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Theses and Dissertations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.1 Master’s Thesis/Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.2 Doctoral Dissertation/Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 15
vii
viii Contents
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Glossary of Abbreviations Used in This Book . . . . . . . . . . . . . . . . . . . . . 137
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Published Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
List of Tables
xiii
xiv List of Tables
Table 8.1 Age, mass, height, and body mass index in four groups of professional
athletes
Table 8.2 Trough plasma concentrations in healthy volunteers after a single dose
(750 mg) of four different formulations of a new drug intended for a
phase III clinical study
Table 9.1 ICH quality guidelines
Table 9.2 ICH safety guidelines
Table 9.3 ICH efficacy guidelines
Table 9.4 Multidisciplinary guidelines
Table 9.5 ICH multidisciplinary guidelines
Table 9.6 Key regulatory guidelines
Table 10.1 ICH guidelines on the Investigator’s Brochure
Table 10.2 Contents of the Investigator’s Brochure
Table 10.3 Template for planning an IB in conjunction with a specific study proto-
col, aimed at the investigator and staff
Table 11.1 FDA guidelines for IND submissions
Table 11.2 Key elements of the IND application
Table 11.3 European Commission guidance on CTA submissions
Table 11.4 Key elements of the CTA application
Table 11.5 Suggested preparation template for CTA (IMPD) or IND
Table 12.1 Regulatory guidance on Module 1 of the CTD
Table 12.2 Components of Module 2 of the CTD
Table 12.3 ICH guidance on preparing an electronic CTD
Table 12.4 Proposed planning template for the CTD
Table 12.5 Regulatory guidance on nonclinical overview preparation
Table 12.6 Structure of nonclinical overview
Table 12.7 Regulatory guidance on clinical overview preparation
Table 12.8 Structure of clinical overview
Table 12.9 Common mistakes when compiling a CTD
Table 13.1 Regulatory and other guidance on protocol preparation
Table 13.2 Template for planning a study protocol
Table 13.3 Regulatory guidance on studies in children
Table 13.4 Example of a simple schedule of assessments
Table 13.5 Guidance on ethics and informed consent
Table 13.6 Contents of a clinical study report according to the ICH3 guideline
Table 13.7 Suggested numbering of study report contents
Table 13.8 Regulatory guidance on abbreviated studies
Table 14.1 Impact factor for the top 10 medical journals in 2015
List of Figures
xv
xvi List of Figures
Fig. 8.9 Kaplan-Meier plot showing survival in patients with a specific type of
cancer receiving either standard treatment (ST) with an investigational
drug or placebo added to their treatment
Fig. 8.10 Average number of cases of carbon monoxide poisoning seen per
month (average of 5 years) in an African village with no electricity
Fig. 8.11 Data shown in Fig. 8.10, broken down into age groups
Fig. 8.12 Responder rates for drug A and drug B. An inappropriate Y-axis scale
is misleading and makes a small difference appear large
Fig. 8.13 Plasma concentrations in dogs receiving identical doses directly into
the stomach, small bowel, or large bowel (a log Y-axis makes the differ-
ences between the absorption sites appear smaller than they really are)
Fig. 8.14 Cmax versus dose, shown as the mean Cmax versus dose (left graph) and
individual values of Cmax versus dose (right graph). Using the mean
data makes the linear relationship appear clear and statistically signifi-
cant. Using all the individual data points shows that there is no statisti-
cal significance
Fig. 9.1 Organization and numbering of the ICH topics and guidelines
Fig. 10.1 Development of the Investigator’s Brochure and changes over time
Fig. 10.2 The IB viewed primarily as a GCP document
Fig. 11.1 Summary of the planning information required in IND and CTA
documentation
Fig. 12.1 Components of the common technical document (CTD)
Fig. 12.2 Summary of the overall approach for planning and producing the clini-
cal overview
About the Authors
xvii
xviii About the Authors
There is wide consensus that the writings of William Shakespeare are of a high
literary standard that has survived the test of time. It is clear that if text of the scene
from the Merchant of Venice would be used as a bedtime story for a 3-year old, with
the intention of imprinting on a young mind the concept that compassion and
sympathy are useful attributes, it simply would not work. On the other hand, a story
in simple contemporary English of the mouse that took out a thorn from the
elephant’s foot would work as a bedtime story with a message on compassion.
In the area of medical and scientific writing, the underlying science in a document
may be solid and the use of language perfect, but if the text does not convey the
intended message to the target audience, the document might be a dismal failure. A
good document should not only be based on solid medical/scientific data but should
additionally convey the correct messages to the target audience to achieve the
desired outcome.
As the authors of this book we aim to assist you in becoming successful medical
and scientific communicators, on top of being good writers. The key factor in
achieving this is to imagine yourself as being in the shoes of the target reader(s) of
your document and to pre-empt their response.
In most cases, we will discuss medical/scientific writing in the context of
pharmaceutical medicine and drug development. The principles, however, also
apply to academic fields and other areas of medical/scientific writing, as well as to
non-written means of communication.
Science
Tools
Guidelines Language
1.2.1 Strategy
The key to writing successful documents is to use the correct strategic approach. To
plan and execute a successful strategy, you have to familiarize yourself with the type
of document you wish to produce. You need to identify and understand your target
audience so that you can convey the key messages clearly, convincingly, and
concisely and achieve the desired outcome.
Documents always have to be reader-friendly. If you antagonize, bore, confuse,
or irritate the reader, your chances of a successful outcome are greatly diminished.
Documents should be well organized with appropriate headings and skillful use of
graphics. There should be a natural and logical flow, and the reader should not have
to hunt for information. If your document is easy to read and captures the attention
of the reader, you have a high probability of a positive outcome.
1.2.2 Science
1.2.3 Guidelines
There are guidelines for most documents. Examples are internal company guidelines,
guidelines of regulatory authorities, or journal guidelines for authors. In general, it
is advisable to follow guidelines closely, as they represent the expectations of your
target reader(s). Guidelines are, however, exactly what the word suggests, namely,
the mere guidance for the preparation of a document. They cover the general
requirements but are usually not cast in stone. Sometimes, the guidelines do not fit
the issue you are concerned with, and you may have to make some adjustments.
However, any deviation from the guidelines should be made clear upfront, and your
decision should be justified and well motivated. Any deviation from a guideline
should add value, make the key message clearer, and enhance your chances of a
successful outcome.
4 1 Introduction
1.2.4 Language
In this book, we focus on English as it is the most widely used scientific language,
but the principles we cover are universal, irrespective of the language you use for
producing a document.
Many scientists like using complicated and impressive scientific jargon, forgetting
that the people they communicate with do not necessarily have the same scientific
expertise and might not be proficient in the language they use. In general, the
language you use should be simple, correct, clear, and unambiguous. A helpful
principle is to use simple words and keep sentences short. You might communicate
with regulatory authorities, for example, where most of the staff members are
nonnative English speakers. For them, English might be a second, third, or even
fourth language. I once heard a politician speak about the “first exposure of the young
mind to the formal educational process.” He meant when children start going to
school, but for nonnative speakers, this may not have become clear at once. “Sloppy”
and overblown language will plant the seeds of doubt in the readers’ mind; they may
ask themselves whether the scientific work underlying the document may have been
as careless as the language used to describe the findings. Thus, the credibility of your
medical/scientific message may be jeopardized on the grounds of sloppy language.
1.2.5 Tools
Your planning is driven by strategy. Assuming that you work with solid data, your
chances of success will be determined by the way you manage the key elements as
discussed below.
1.3 How to Plan a Document 5
The major determinant of how you will proceed is the nature and purpose of the
document you aim to produce. Many types of documents are integral parts of the
drug development process, such as the Investigational New Drug (IND) document,
Investigational Medical Product Dossier (IMPD), Investigator’s Brochures (IB),
study protocols, study reports, the Common Technical Document (CTD),
publications in a scientific journal, etc. Once you know the type of document you
need to write, the following are key considerations:
1.3.3 Guidelines
If you work for a pharmaceutical company, there will be internal guidelines and
templates for documents such as study protocols and reports. The current guidance
documents of regulatory authorities are readily accessible on their websites. Before
starting to write a document, you should familiarize yourself with the appropriate
guidelines. Sometimes you will find that you need to deviate from a regulatory
guideline because it does not entirely fit your situation. This is fine, provided you
make the deviation clear upfront and can justify your approach. It is also important
to decide whether your document is a final document, such as a final study report,
or a “living” document, such as an IB, where changes should be made as new
information becomes available. A good medical/scientific writer will not only add
new information but will also remove information that has become irrelevant to
prevent the document from becoming too large and difficult to read.
A common mistake is the urge of some authors to write something under a
particular heading at all cost. If no clinical data are available at the time you are
writing an IB (see Chap. 10), Section 7.3.6 of the ICH guideline (Effects in
Humans) should not include any speculation on what may be found, but should
merely contain a simple statement that no data in humans are available at the time
of writing the document.
6 1 Introduction
Your target audience may vary greatly. For example, the audience may include
employees of regulatory authorities (with varying backgrounds in terms of scientific
and/or medical training), ethics committees (which usually include lay members,
people with legal background, scientists, and clinicians), or journal editors and
reviewers. It is imperative to consider the background of the target audience in
choosing the appropriate style of language and the use and explanation of scientific
and clinical terminology. A sentence such as “Deposition of hydroxy apatite in the
intima, irrespective of the presence of atheromatous changes, impacts on vascular
compliance and tissue perfusion” might be fine for a clinician, but for a lay person in
an ethics committee, a more appropriate sentence would be: “When calcium crystals
form on the inner surface of blood vessels, the blood vessels may become stiff and
hard or blocked, and this can decrease the blood supply to important organs.”
The key messages should convey the essential information needed to convince the
responsible person(s) to make a decision in line with your desired outcome. For
example, when compiling an IND application for the first study in humans, you
would primarily like to tell the regulatory authority that you have sufficient and
convincing nonclinical data to justify a study in humans and that all potential safety
issues are addressed in the proposed protocol.
Using a suitable template (Table 1.1) helps you to compile a document that follows
a logical train of thought and covers all important issues. The template we
recommend resulted from years of trial and error. In our experience, the template
1.5 Final Thoughts 7
We hope that your journey through this book will be pleasant and informative and
that sharing our experience with you will help you to optimize your communication
skills, both verbally and in writing.
Exercise
Before proceeding to the next chapters, try to compile the following document
templates:
• A scientific publication: You are working for a company that has developed
a novel antihypertensive drug (use your imagination to create a target
profile), and the CTD has been submitted. Your target prescribers will be
general practitioners and you would like to establish your drug as first-line
therapy. You have the data from two major pivotal studies in patients with
mild to moderate essential hypertension. Plan a scientific publication that
will be complimentary to your marketing campaign.
• An IB
• A study protocol
• A study report
Retain these templates while reading the book and repeat the exercise
when you have finished, without looking at your first set of templates. Then
compare. Have fun!
Chapter 2
Written Communication in Drug Development
Imagine what happens with your experiments or studies if you fail to record their
outcome in writing. No one would know of your important findings, and it will be
almost impossible to prove to others that you had actually carried out the research.
Painstaking efforts and elaborate work may be lost, simply because there is no
written account of them. Essentially, nonclinical findings form the basis of future
studies in humans.
It follows that written communication in drug development is of critical
importance. For this reason, it should be our main concern to document new findings
efficiently, effectively, and truthfully. A concise summary of the documents written
during the preclinical stages of drug development is provided by Rogge and Taft [1].
As pointed out in Chap. 1, the drug development process encompasses many types
of documents, such as IBs (Chap. 10), INDs and IMPDs (see Chap. 11), CTDs (see
Chap. 12), study protocols (see Chap. 13), study reports (see Chap. 13), and
manuscripts intended for publication in a scientific journal (see Chaps. 3 and 14).
In drug development, failure to document our findings inevitably results in delays
in obtaining marketing authorization of a new drug. This translates into substantial
sums of money being lost by the sponsor company.
Although many factors influence the speed and efficiency of a drug development
program, the value of effective communication during the drug development and
approval process is unquestioned, especially for the development of novel
medications for which regulatory guidelines have not yet been established [10].
While early consultation with the authorities is indispensable, other bodies relevant
to the target indication may have to be addressed. Such consultations should,
however, only be undertaken when the pertinent medical and drug information data
are available.
The FDA’s Center for Drug Evaluation and Research (CDER) typically approves
more than 100 new medications every year. In 2014, as many as 41 of the newly
approved agents were novel molecular entities or new therapeutic biologics, which
is considerably more than in previous years [11]. Many of these new drugs are
expected to make a significant contribution to the management of serious or
life-threatening diseases. In addition, an exceptionally large number of drugs
(n = 17) to treat so-called rare diseases were approved in 2014. This achievement is
of particular merit because there are often no (or insufficiently effective) drugs
available to treat diseases occurring in small populations.
Early and regular communication between drug developers and health authorities
allows the authorities to apply tailor-made review and approval procedures, with the
aim to ensure the fast availability of important new medications. In the USA, such
regulatory procedures include Fast Track, Breakthrough Therapy, Priority Review,
and Accelerated Approval [11]. Fast Track and Breakthrough Therapy designations
are intended for drugs to treat serious conditions with unmet medical needs, while
Priority Review is granted for drugs expected to provide a significant advance in
medical care. For such medications, CDER shortens their review period from 10
months to 6 months. The Accelerated Approval program allows early approval of
drugs to treat serious or life-threatening illnesses for which less effective treatments
are available. In these cases, approval is based on a “surrogate endpoint” (e.g.,
laboratory value or biological marker) or intermediate clinical endpoint that is
thought to be “reasonably likely to predict clinical benefit” [11]. After approval of
such drugs, additional clinical studies are usually required to confirm the predicted
clinical benefit [11].
There are special areas where there are timeline and/or financial incentives to
encourage drug development:
• Obtaining orphan drug designation encourages the development of drugs for rare
diseases. Incentives include free advice and possible acceptance of innovative
study designs.
• Pediatric population: Many “adult” diseases have a small subpopulation in
pediatric patients (e.g., rheumatoid arthritis, essential hypertension, type II
diabetes). The FDA encourages pediatric studies as part of drug development
in adults by offering an additional 6 months of marketing exclusivity, for
example.
12 2 Written Communication in Drug Development
Early contact with the regulatory authorities helps to expedite the development
process, and regular communication between the authorities and drug developers is
instrumental in streamlining the review of new products. Thus, health authorities
encourage regular exchange with drug developers and are usually prepared to
provide guidance at an early stage. Responsible professionals in the pharmaceutical
industry should make use of this opportunity; in this way, issues in connection with
manufacture, formulation, and/or testing of the new drug candidate can be addressed
at an early stage, thus preventing unexpected delays.
Chapter 3
Written Communication in Academic Settings
As pointed out in Chap. 14, publishing data in medical and scientific journals is the
most important means of communicating research results, both in the academic and
commercial settings. In either environment, authors are faced with the main three
questions, namely, why they wish to publish, what journal should be chosen, and
how they are going to unveil their scientific “story.”
While the reason for publishing in the commercial setting is usually connected
with marketing strategies, academic publishing primarily aims to advance scientific
and medical knowledge. In the academic environment, the publishing procedure is
usually less regulated than that in the commercial environment where company
policies oversee the publication strategy (see also Chap. 15). Thus, academic
researchers bear sole responsibility for their publishing efforts – a fact that can be
an advantage or a disadvantage.
Chapter 14 provides detailed guidance on the planning and preparing of a
scientific manuscript intended for publication.
For many students, the master’s thesis (also referred to as master’s dissertation)
represents the first attempt at writing a formal scientific document. Although the
Internet supplies ample advice on how to write a thesis and universities tend to
supply good templates, most students face a major challenge when embarking on
their master’s thesis. They are aware of the importance of this document in that it
represents the “formal product” of their studies, on the basis of which their
performance and achievements can be assessed.
Ideally, the master’s thesis is written in a manner that renders it suitable for
subsequent publication in an appropriate scientific journal. For scientists, the
publication track record is of fundamental importance, and the sooner they can
establish themselves in the scientific community, the better are their chances of
advancement. However, not all master’s projects are suitable for eventual publication,
and this may not necessarily reflect on the student’s ability to address a scientific
question. Some projects simply do not deliver publishable results, or they just form
a part of a larger study that will be published by other authors. Students whose
master’s projects involve collaboration with a pharmaceutical company may
additionally be faced with confidentiality issues that prevent them from making
their findings available to a broader audience.
In any case, the master’s thesis has to meet high standards in terms of contents,
format, and writing style, but there is no general consensus on how to present and
structure the data. The type of structure chosen depends primarily on the nature of
the study, as well as on guidelines and example documents provided by the university
or other institution at which the research was conducted. If the outcome of a master’s
project is suitable for publication, the structure of the manuscript is essentially
guided by the specific author instructions of the chosen journal. Most journals
3.3 Theses and Dissertations 15
follow the classic IMRAD structure (an acronym based on the first letters of
Introduction, Methods, Results, and Discussion), or a modification of this. For
example, Introduction may be replaced by Background, Methods by Procedures,
and Results by Findings. Clearly, this simplifies the task of writing a master’s thesis
to some extent because author guidance tends to be concise, and novice writers can
consult examples of papers published by the journal in question.
If you do not intend to publish the data generated within the study for whatever
reason, the thesis should be written in the form of a book consisting of chapters.
Although the number and nature of chapters depends on the specific research project
and extent of information accumulated, the structure suggested in Table 3.1 can be
applied to most situations.
In a section placed before or after the main text, you may wish to acknowledge
any help you have received during the studies. This may include supervisory efforts,
laboratory assistance, statistical help, or even editorial support.
The doctoral dissertation (also termed doctoral thesis) constitutes a more extensive
treatise than the master’s thesis, reflecting the prolonged research period involved.
Commonly, doctoral projects last at least 3 years; occasionally, they can take
considerably longer. If funding of the project is limited to 3 years (as is typical in
certain countries, e.g., the UK), students and supervisors have a vested interest to
complete the studies within a reasonable time frame. However, funding for a limited
time period may put students under undue pressure to complete their studies more
quickly than would be appropriate to solve the research question. Such constraints
are liable to mislead students into careless or even sloppy work, which severely
impinges on the quality and credibility of the generated data. Even worse, students
may be tempted to falsify results or copy data from other authors, thus making them
guilty of plagiarism (see Sect. 6.3.3).
The doctoral dissertation constitutes an exposition of original research and
should reflect not only mastery of research techniques but also ability to deal
competently with an important research question. In addition, the writing of a
doctoral dissertation challenges your skills as a scientific communicator. It would be
unforgivable to present your hard-earned results in a dissertation that is difficult to
read and understand. As is true for any type of scientific treatise, the doctoral
dissertation aims to inform – rather than confuse – the reader. Although writing a
doctoral thesis may be a daunting task, it is perhaps the most important investment
into your future as a scientist. It will be a passport to acceptance into the scientific
community, and the nature of your work will set the scene for your future scientific
endeavors. For these reasons, it is more than worth your while to invest sufficient
time and energy into the completion of your thesis.
Table 3.1 Suggested structure of a master’s thesis
16
While in former times, a doctoral thesis followed roughly the same format as the
one used for a master’s thesis (see Table 3.1), a more popular approach nowadays is
to structure it as a series of articles suitable for publication in scientific journals. As
pointed out in Sect. 3.3.1, scientific endeavors live from sharing information and
contributing to the “knowledge pool.” Moreover, a scientist’s success hinges on his
or her publication track record, and early visibility in the scientific community is
clearly advantageous. Moreover, the thesis structure based on individual manuscripts
spares students the effort of having to write the thesis and articles for publication
separately. Nonetheless, not all universities support this approach, and your
institution and/or supervisor will have to advise you on this.
The Internet provides ample advice on the preparation of a doctoral thesis, but
the most appropriate inspiration usually comes from good examples written by
members of your group or department. There is no hard and fast rule on the
composition, contents, and structure of a doctoral thesis; you are the author and the
expert!
For most science students, the writing of a laboratory report constitutes their very
first attempt at presenting data in a structured and logical manner. Therefore, it is
not surprising that the quality of laboratory reports ranges from clumsy compilations
of methods and materials to rather sophisticated scientific papers. Universities and
other research institutions often provide insufficient advice for novice writers, thus
leaving students alone with a difficult first encounter with scientific communication.
This effort may be additionally impeded by language hurdles; most laboratory
reports are written in English, but this may not be the native tongue of the author
(see also Chap. 5).
Although laboratory reports are written for several reasons, the main reason in all
cases is to communicate the experimental work to your instructor, supervisor, or
other interested reader. Without the written record of your laboratory experiments,
there is no proof of your findings, and your efforts may be lost forever. Much
unnecessary time goes into repeating laboratory experiments if we fail to archive the
work. In the pharmaceutical industry, such deficiencies translate into serious delays
in the development of new drugs and unnecessary (and costly) prolongation of the
“time to market.”
Like all scientific expositions, laboratory reports should be brief, concise, and to
the point. This may be easier said than done; inexperience and language problems
invariably lead to wordiness and redundancies. Students are often tempted to
compensate for their lack of expertise with inappropriate detail and awkward
descriptions, thus making it hard for the reader to work out the “story.” A useful tip
is to rely on short sentences and commonly known words rather than obscure terms
18 3 Written Communication in Academic Settings
that are not understood by your readers. A good laboratory report lives from clear,
transparent, and logical messages that can be grasped on first reading!
Structural organization of a laboratory report is dictated to some extent by the
nature of the experimental work, but the conventional IMRAD (see also Sect. 3.3.1)
is, at least, a good start. Table 3.2 provides a suggestion for organizing your report
if the IMRAD structure can be applied.
Research proposals may be written at various stages of your scientific career. Here,
we focus on those you may have to write early on in your development as a scientist,
e.g., when elaborating a scientific question for your master’s or doctoral studies.
A research proposal does not only aim to “sell” the research project; it additionally
serves as an advertisement for your capability as a scientist and scientific
communicator. You may have an impressive research idea, but if you fail to convince
the financial sponsor and/or your project supervisor(s), the proposal is likely to be
turned down. On the other hand, a well-prepared proposal may stand a good chance
of being approved even if the research idea is not groundbreaking. The quality of
your research proposal depends not only on the quality of your proposed project, but
also on the quality of your proposal writing. Therefore, the writing should be
coherent, clear, and compelling.
3.4 Other Student Papers 19
In the academic setting, grant applications play an important role at various stages
of a scientist’s career. The earliest encounter with grant applications may even be
before you enter university if you apply for a study grant. Here, we are primarily
concerned with applications for research grants, although the points made may
apply to other grant forms as well.
Why would you want to write a grant application as a researcher? Most of us are
aware that grant applications are hard work, demanding, laborious, and time-
consuming – so why would anyone invest their precious time into such an
undertaking?
The answer is obvious: research is costly, and university funds are limited. Thus,
academic institutions depend on financial support from other sources. As pointed
out in Sect. 3.2, advancement and success of a researcher primarily depend on his/
her publication track record, and how can you publish in the absence of data? In
other words, grants enable you to pursue an interesting research question that you
could not have addressed without financial support.
22 3 Written Communication in Academic Settings
When applying for a research grant, you are expected to hand in a detailed and
precise description of the planned study or research proposal, as well as any previous
studies or other information of particular relevance to your project. It is also useful
if you include a realistic budget proposal. Your application for funding should
convince the sponsoring organization that your scientific question is of interest to
the scientific community and that you are able to answer the question on the grounds
of your training, experience, and technical facilities. Moreover, the proposal should
entail a realistic time frame and reasonable expenses. Open-ended proposals tend to
be “suspect” in the eyes of reviewers as it will be difficult to estimate the overall
costs.
A grant application that stands a reasonable chance of being successful must
fulfill a number of criteria. First, the research topic in question must be creative,
novel, and of interest to the scientific community as a whole. Second, your
experimental plan must be realistic, sound, and compelling. Finally, you must make
sure that the information is organized in a logical structure and that the wording of
your text is clear, unambiguous, and free of language mistakes. As with all good
writing, a grant application stands the highest chances of being successful if you are
able to anticipate the reviewers’ questions and work around them. Table 3.4 list the
most pertinent issues reviewers tend to bring up.
The format and extent of your proposal may vary, depending on the complexity
of the proposed topic and the level of detail expected by the sponsoring organization.
If guidelines for the specific grant application are available, make sure your
document complies with them in full. Many organizations provide forms and
templates that facilitate the process of completing a grant application. In the absence
of such guidance, your best bet is to use a layout similar to that suggested for a
research proposal (see Sect. 3.4.2). In addition, it is good advice to consult an
internal advisor with experience in writing successful grant applications. Bearing in
mind that you are asking for something (i.e., financial support), a favorable decision
on the part of the “jury” or selection committee is much more likely if your proposal
is enticing, legible, and well structured.
In both their learning and teaching roles, members of academic institutions have to
pay great attention to the quality of their texts. Poorly written or disorganized
student papers will hinder the advancement of the student, and manuscripts, research
proposals, or grant applications whose contents, structure, or style are deficient will
not be approved.
Chapter 4
Language Pitfalls: Native English Speakers
All men make mistakes, but only wise men learn from their
mistakes.
Winston Churchill
Clearly, to know a language as your mother tongue makes it easier for you to write
a professional text. Native speakers do not usually have to make an effort to find the
right expression or phrase and therefore save much time when writing a scientific
paper. However, being overly familiar with a language may predispose them to
casual and colloquial formulations, and occasionally, even native speakers are prone
to careless use of words, expressions, and punctuation marks.
This chapter deals specifically with issues commonly experienced by native
English speakers.
There are many terms and expressions prone to erroneous use, even by native
speakers of English. Table 4.1 shows some of the troublesome words and expressions.
Please note that this table is by no means exhaustive; there are many more “tricky”
words and phrases in English, and native speakers may struggle with them as much
as do nonnative speakers.
On the level of verbs, several typical mistakes tend to occur. Often, verb matching is
a problem. In principle, a singular noun takes a singular verb, and a plural noun requires
a plural verb. “There was three mice” should be “There were three mice.” “There was
not enough samples available” should be “There were not enough samples available.”
Some writers find it difficult to correctly apply verbs that look similar. Table 4.2
lists some of the verbs frequently confused in scientific texts.
For choosing the correct verb form (singular versus plural), we clearly have to know
whether the word in question is singular or plural in nature and meaning. This is not
always obvious to scientific communicators, even those who grew up with English
as their main language. Especially Latin and Greek terms are often difficult to
categorize correctly as singular or plural. Commonly, Latin terms ending in -us in
the singular change to -i in the plural, terms ending in -a change to -ae, and terms
ending in -um change to -a. Table 4.3 shows the typical singular and plural suffixes
for Latin and Greek words.
28 4 Language Pitfalls: Native English Speakers
Table 4.3 Latin and Greek singular and Latin terms Singular Plural
plural suffixes
Feminine -a -ae
-en -ina
-ex -ices
-ix -ices
-itis -itides
Masculine -us -i
Neuter -um -a
Greek terms Singular Plural
Feminine -is -es
Masculine -os -oi
Neuter -on -a
these experts suggest singular use (i.e., “….data is”), while in situations where the
emphasis is on individual data points, the plural verb should be used (i.e., “….the
data are”). Nonetheless, you should be aware that not everyone (including me)
concurs with this rule. For instance, the most recent edition of Pocket Fowler’s
Guide to Modern English Usage accepts this mass noun use only in general and
computing contexts [3]. The American Garner’s Dictionary of Legal Usage states:
“The Oxford Guide allows the singular use of data in computing and allied
disciplines; whether lawyers own computers or not, they should use data as a plural.”
I wholeheartedly endorse this advice! (see also Rogers, 2014 [5].)
Other special nouns frequently causing problems in science writing are terms
that are plural in appearance but singular in meaning. Examples are measles and
mumps, as well as kinetics, dynamics, politics, mathematics, and other words
ending in -ics. Thus, the verb form in the sentence “Measles is often associated with
complications” is correct. Similarly, a sentence such as “Genetics is an important
scientific field” is correct. However, most of these terms may be correctly used in
the plural if we mean the sum of characteristics. For example, the pharmacokinetics
of a new drug are (rather than is) studied, and the genetics of the fruit fly are (rather
than is) well researched.
4.4 Punctuation
We tend to expect native English speakers to apply proper punctuation, but this may
not always be the case. The main reason for this is the rather different handling of
punctuation in creative and professional texts. While creative writing is highly
flexible with the use of punctuation marks, professional writing, especially within
the sciences, imposes binding rules and important constraints on the writer.
The following sections deal with the punctuation marks that are most troublesome
to writers whose first language is English.
30 4 Language Pitfalls: Native English Speakers
4.4.1 Comma
The comma is the most common punctuation mark in scientific texts. It marks a
slight break between different parts of a sentence and makes the meaning of
sentences clear by grouping and separating words, phrases, and clauses. Correct use
of commas is often essential for proper understanding of the meaning of a sentence,
and incorrect punctuation can distort your message or make your sentence
meaningless.
The use of commas in English differs from that in most other languages, which
adds to the challenges nonnative English speakers may experience. However, even
native speakers often struggle with the proper use of commas in scientific texts.
Because of uncertainty, writers tend to use either too many or too few commas
because they sprinkle them throughout the text in a rather haphazard fashion.
In contrast to many languages, English uses no comma before most subordinate
clauses and causal clauses (e.g., those introduced by “because,” “since,” “if,” or
“that”). Although English generally uses fewer commas than other languages, there
are some specific commas used in addition (e.g., the serial comma or a comma after
an introductory phrase).
Table 4.4 shows the comma rules that should be consistently applied in scientific
and other technical texts.
4.4.2 Hyphen
Hyphens are valuable tools to clarify the meaning of terms and avoid ambiguity.
The hyphen connects full words or prefixes and suffixes with their main word. The
nature of this connection can be either permanent or transient, for example, in word
breaks at the end of a line. As Cheryl Iverson puts it in the American Medical
Association Style Manual [6], the hyphen may join “what is similar and also what is
disjunctive….it divides as well as marries.”
4.4 Punctuation 31
4.4.3.1 Apostrophe
The acute and grave accents (´ and `) are often erroneously used in place of the
apostrophe (’). An accent, when applied without a vowel, looks similar to an
apostrophe, but its erroneous use is annoying to the eyes of experienced writers.
Even more annoying is the erroneous use of the apostrophe in possessive
pronouns such as “its” or “theirs” (see also Table 4.1). If you are uncertain about the
use of an apostrophe, a simple rule is that most apostrophes in science texts are
32 4 Language Pitfalls: Native English Speakers
Nonbreaking spaces are required to keep terms (e.g., a number and its unit) together,
i.e., to prevent them from being separated at line breaks. Usually, the key combination
to insert a nonbreaking space is “shift + ctrl + space.” To avoid hyphenated terms
from being split, you can use the so-called nonbreaking hyphen. It is usually inserted
by the key combination “ctrl + shift + hyphen.”
4.5 Jargonized Writing 33
Even though a specific term containing two components that belong together
may not be at the end of the line when you prepare the manuscript, use nonbreaking
spaces and hyphens right from the beginning because it is tedious to introduce them
afterwards.
The better you know a language, the more prone you are to use jargon and careless
formulations.
Jargonized writing within the sciences encompasses the use of nonspecific words
or phrases, careless or inconsistent application of abbreviations and technical terms,
or the use of “insider” terminology not commonly known by other people. Thus,
jargonized writing could be described as a language that employs an uncommon or
pretentious vocabulary and convoluted syntax that is vague in meaning. The most
extreme form of jargonized writing and speaking is often referred to as “pidgin
English,” a term used to label unintelligible English that often arises as a means of
communication among people who do not share a common language. Pidgin
English is a simplified form of English, relying on a few verbs and other words to
express more complex issues. The word “pidgin” first appeared in print in the late
nineteenth century, but its origin remains unclear. The most broadly accepted
etymology is attributed to the Chinese pronunciation of the word “business.” It goes
without saying that pidgin English is bad enough in spoken communication but is
completely unacceptable in written texts, particularly those dealing with science or
medicine. The resulting message may not be intelligible to the intended readers,
thus making jargonized writing a serious problem in scientific communication.
Remember, our main task as communicating scientists is to inform (rather than
confuse) our fellow scientists!
However, we sometimes use the term “jargon” to mean the vocabulary that is
peculiar to a particular profession or trade, as, for example, “medical jargon” or
“legal jargon.” Here, “jargon” refers to peculiarity rather than lack of specificity.
Terms and expressions that are perfectly fine in verbal communication may be
unacceptable in written texts. A typical example is the contractions (isn’t, shouldn’t,
hasn’t, etc.) very commonly used in speech or informal writing, but strictly disallowed
in professional texts of any kind, especially those reporting scientific information.
Another example is the use of “as” and “since” as conjunctions. The two words
often cause confusion because both “since” and “as” have functions in addition to
that of a conjunction. For this reason, you may be on safer ground using “because”
in place of “as” or “since” in such cases.
34 4 Language Pitfalls: Native English Speakers
A major troublemaker in scientific texts is the personal pronoun “it” (see also
Sect. 4.4.3.1). Often, the reader is uncertain to what “it” refers, and there may be
erroneous interpretation of information on the grounds of vague pronoun reference.
A vague pronoun reference occurs when there are several possible antecedents, or if
no antecedent is stated.
Consider this sentence:
We established our assay with a commonly known antibody, and we used it in all subse-
quent studies.
What does “it” refer to? The assay? The antibody? Nobody knows, and readers
have to guess. To prevent such ambiguity, avoid using “it” if there is any doubt as to
what “it” refers to. It also helps to keep the structure of your sentences as neat and
clear as possible.
4.5.2 Terminology
One of the most disturbing deficiencies in scientific texts is the lack of parallelism.
Interestingly, both native and nonnative speakers of English find it difficult to avoid
nonparallel structures.
Parallelism refers to matching grammatical structures in a sentence containing a
conjunction, such as “and” or “or.” Elements in a sentence that have the same
function or express similar ideas should be grammatically parallel or grammatically
4.5 Jargonized Writing 35
matched. Careful phrasing and the use of short sentences can help avoid such errors
right from the beginning. Make sure that subsets of a complex sentence are logically
linked and that the verb and subject agree.
Consider this sentence:
It was both a complex experiment and very time-consuming.
In either example, “and” now connects two equals, i.e., two adjectives.
When your sentence includes a series, make sure you have used identical
grammatical structures for the items.
Consider this sentence:
He is an expert in studying the sleeping pattern in adolescents, observing the REM sleep
behavior in adults, and to develop strategies for improved sleeping quality. (nonparallel)
Here, the first two items in the series are gerunds (sleeping, observing), but the
third is an infinitive (to develop). The sentence is corrected by using gerunds in all
three cases:
He is an expert in studying the sleeping pattern in adolescents, observing the REM sleep
behavior in adults, and developing strategies for improved sleeping quality. (parallel)
Or: He is an expert in the study of the sleeping pattern in adolescents, the observation
of REM sleep behavior in adults, and the development of strategies for improved sleeping
quality. (parallel)
Errors in parallelism often occur with correlative conjunctions, e.g., either …or,
neither …nor, both …and, not only …but also, and whether …or. The sentence
structure following the second half of the correlative conjunction should mirror the
sentence structure following the first half.
The reviewers criticized not only the layout of the tables but also they criticized the length
of the article. (nonparallel)
The reviewers criticized not only the layout of the tables but also the length of the article.
(parallel)
Either we can use the validated assay or not. (nonparallel)
Either we can use the validated assay, or we cannot. (parallel)
They neither applied proper statistics, nor did they report standard deviations.
(nonparallel)
They neither applied proper statistics nor reported standard deviations. (parallel)
36 4 Language Pitfalls: Native English Speakers
If you use more than one verb in a sentence, be sure to make the verbs parallel by
not shifting tenses or active/passive voice.
Prof. Higgins wrote his lecture on the train, and it was presented by him at the medical
school only one hour after his arrival. (nonparallel)
Prof. Higgins wrote his lecture on the train and presented it at the medical school only
one hour after his arrival. (parallel)
Sometimes, sentences use a single verb form with two helping verbs. Look at the
following example:
The department has in the past and will in the future expand the new learning tool.
(nonparallel)
The department has expanded the new learning tool in the past and will continue to
expand it in the future. (parallel)
Again, correct sentence structure may be a concern to both native and nonnative
English speakers. Here, we focus on some common issues in connection with word
order (syntax).
The Merriam-Webster dictionary [8] defines the verb “to dangle” as “to hang loosely
and usually so as to be able to swing freely, to be a hanger-on or a dependent, or to
occur in a sentence without having a normally expected syntactic relation to the rest
of the sentence.” Thus, dangling modifiers are those that do not correctly modify the
subject of the sentence. A special case of a dangling modifier is the dangling
participle (see also Rogers SM, 2014 [5]). Consider the following:
Determining the minimal inhibitory concentrations (MICs) of the antibiotics, the novel
compound proved more effective than the comparators. (dangling present participle)
To avoid such danglers, ask who or what is doing the action and make sure the
implied subject is really responsible for the action. The sentence above would
correctly be rephrased like this:
The novel compound developed in our laboratory proved more effective than the compara-
tors when we determined the minimal inhibitory concentrations (MICs) of the antibiotics.
(correct syntax)
Of course, the style of the above sentence could be improved, but at least, the
current version does no longer contain a dangler. Passive-voice sentences encourage
the use of dangling participles, which is a good argument in favor of using active
voice wherever possible (see also Sect. 5.2.3).
Not all danglers are present participles; in some cases, past participles or
prepositional or adverbial phrases cause similar confusion.
Based on our findings, the toxicity profile was favorable. (dangling past participle)
With the largest safety margin achieved, we proposed the new formulation for further
development. (dangling preposition)
When reviewing the abstracts, it became apparent that the students need more writing
advise. (dangling adverb)
Most danglers cause a certain degree of confusion and discomfort. You can usually
expose a dangler by finding the subject of the sentence and applying common sense.
The problem here is not a dangler but poor word order (syntax). Rephrase the
sentence like this, for example:
Scientists have identified a protein in peanuts that can increase libido.
It follows that words (or groups of words) should be placed as close to the term
they intend to modify. This applies also to “only,” “almost,” and “even,” all of which
should be placed immediately before the word they modify.
It goes without saying that someone who plays an instrument well will be able to
produce pleasant music. This works for languages, too. Thus, writers who do not
have a good command of the English language will find it harder to write with
virtuosity. On the other hand, many nonnative English speakers are careful writers
because they are far more conscious of possible mistakes than their English native
colleagues. Many writers of non-English origin have learned the English language
systematically and thoroughly and may thus be able to name the underlying rules
and principles far better than those whose mother tongue is English.
While the points mentioned in Chap. 4 are of relevance to writers of non-English
language origin as well, this section deals with problems specifically experienced
by nonnative English speakers. The self-help guide “Mastering Scientific and
Medical Writing” [5] provides a more complete overview of the difficulties
commonly experienced.
Table 5.1 lists the most common English mistakes in texts written by scientists and
other professionals whose native language is not English. These are referred to as
English-as-second-language (ESL) mistakes. Please note that the list is by no means
exhaustive; depending on your experience and working field, there may be other
issues experienced.
As pointed out in Table 5.1, the correct tense is one of the most important aspects of
clear writing in scientific English, in that a clear distinction between “new” and
“old” knowledge is mandatory. This is in contrast to some other languages, e.g.,
German, where the tense appears to be less critical to meaning.
The main tenses used in science reporting are the present tense and past tense,
with other tenses, such as the perfect or future tenses, used rather sparingly.
Essentially, we merely have to know when to use the present or past tense. Table 5.2
shows the rules governing the choice of tense in scientific texts.
42 5 Language Pitfalls: Nonnative English Speakers
one.” This is clearly incorrect since the finding is neither published nor a generally
known fact. Here are some examples of proper use of tense when reporting results:
Method A was superior to Method B in our study.
We observed large intra- and interindividual variability.
The authors concluded that the trial population was too small and terminated the study
prematurely.
When you refer to a table, figure, or other visual contained in your manuscript,
make sure to use the present tense, e.g.:
Table 2 lists the individual percentages.
Figure 1 shows the concentration versus time profile.
Appendix A contains the raw data.
It helps to use active voice in such sentences since the passive voice may
encourage the erroneous use of the past tense. Thus, we have two good reasons for
applying the active voice here (see also Sect. 5.2.3).
Finally, make sure to use the past tense when referring to other researchers or
attributing previous findings to other authors:
Jones et al. reported similar findings.
Miller et al. did not use the same study design.
Some authors prefer to use the present perfect for attributions, as in:
Jones et al. have reported similar findings.
This is not wrong by any standard, but if we want to limit the tenses to the simple
present and simple past, it makes sense to use the past tense also in attributions.
Barely another topic is as heatedly debated as active versus passive writing. For
some reason, many authors hold a strong view on the tradition of passive writing,
and sometimes they can hardly be convinced of the many advantages of active
writing. Proponents of passive writing claim that the “doers” (e.g., the scientists) are
not of relevance, and the main emphasis should be on the outcomes. Although the
notion of modesty is appealing, this view does no longer comply with the scientific
community’s expectations. These days, our peers demand to know who carried out
the work reported, and they require the transparency and clarity that comes from
using the active voice. However, active voice is more than just using the personal
pronouns “I” or “we”; it also concerns the use of active verbs in place of the
ubiquitous verb “to be” (see also Table 5.3).
5.2 Main Troublemakers for Nonnative English Speakers 45
What are we learning from these statements? The first sentence tells us that
someone had conducted a trial in 30 healthy subjects, but we do not know whether
the authors themselves carried out the study or not. The second sentence makes this
clear by using the personal pronoun “we.” The third sentence would actually win the
top prize in a competition of word economy and clarity, although it remains unclear
who carried out the study. This may become clear from the context if the authors
generally use personal pronouns in the article.
We believe that it is good advice to apply the active voice wherever possible and
to limit the passive voice to statements that are more “natural” in the passive than
the active. Consider a sentence like this:
After the terrible incident in the mountains, he was rushed into hospital by helicopter.
(passive)
Although you may be able to rephrase the sentence in the active voice, especially
if you were the nurse or pilot involved, there is little sense in making a truly passive
situation grammatically active. The rule usually applied in science writing states
that no more than 30% of all verbs in the article should be passive. In a standard
research paper, we tend to use most of the (allowed) passive verbs in the section
describing the methods. Here, active sentences involving “we” are sometimes stilted
if used throughout.
In conclusion, the best guide is your common sense, as long as you bear in mind
that active verbs are preferred in most situations.
The pronouns “which” and “that” introduce either a nonessential or essential clause.
Life used to be fairly straightforward when “which” exclusively introduced a
nonessential clause and “that” was reserved for the essential clause. Liberal and
interchangeable use of the two pronouns has led to the confusion many scientific
authors encounter nowadays. Writers need to ensure that it is absolutely clear what
“which” refers to. This may be the term immediately preceding “which” (most
common), or it may refer back to the main subject of the sentence. Let us look at an
example:
The cells sedimented to the bottom of the tube which was associated with a change of color.
If there is any doubt about such a sentence, rephrase it completely. The above
sentence could be rewritten as follows:
Cells sedimented to the bottom of the tube, resulting in a change of color.
Or: Cell sedimentation to the bottom of the tube led to a change of color.
The first sentence tells us that the laboratory is located in the city center and that
it possesses two dark rooms. The latter information is, however, not essential to the
message. The main sentence simply says that the laboratory is located in the city
center. In contrast, the second sentence implies that there are several laboratories,
and the one that has two dark rooms is located in the city center. If you were to
replace the “that” in this sentence with “which,” make sure not to use a comma
before the “which.” If you do nonetheless, your sentence is misread as to imply the
meaning of the first example above.
The rule then would be to be sensitive to the change of meaning that occurs by
using or omitting a comma. To make things easier, at least for you as a writer, stick
to “that” in essential clauses and reserve “which” for nonessential ones.
This implies that the patients in group 1 experienced disease progression after
a mean of 5 weeks, while those in group 2 had a mean time to progression of 9
weeks. When describing something shorter than the time to disease progression,
such as average weights, for example, “respectively” is not necessary.
Mean body weights were 72 kg in group 1 and 83 kg in group 2.
Nowadays, most publications within the sciences are written in English. Authors
who are insufficiently well acquainted with the English language sometimes opt for
their native language when drafting a manuscript. In a second step, the text is
translated into English, usually by a professional translator or colleague whose
native language is English.
Clearly, the quality of the final manuscript depends substantially on the language
skills of the translator. You may have written an impressive paper in your own
language whose beauty may be lost in translation. Thus, translators should be
selected with the greatest possible care. In most cases, it does not suffice to know
the language well; for the translation to be accurate and precise, the translator must
fully understand the science and concepts described. Many terms may be correctly
translated but may be completely inappropriate for the intended meaning. In this
way, ridiculous, if not dangerous, confusion may arise. An example that springs to
mind is the frequently used phrase “not statistically significant.” In a translation
from German to English, the translator used the term “statistically insignificant,”
which is linguistically correct but scientifically inappropriate. If statistical testing
revealed the absence of a statistically significant difference between groups, the
result is said to be “not statistically significant.” To call this finding “insignificant”
is incorrect because the result may be of considerable meaning and significance
although it was not statistically significant.
“Correctness” and meaning are two different things, and the professional who
transfers your reasoning into another language must be able to fully understand the
meaning that you have intended. Much confusion in scientific and medical papers
originates from careless, incomplete, or even incorrect translation. Authors
sometimes have an insufficient understanding of a term even in their own language;
thus, when translating it with the help of a dictionary or thesaurus, they may pick the
wrong translation for the term. In addition, words may have the same spelling and
pronunciation, but their use may vary considerably. Writers with a language origin
other than English have a disadvantage here because correct usage of terms clearly
comes from experience. Moreover, not being familiar with the proper use of terms
can predispose to rather exotic translations involving fancy words and uncommon
formulations. Remember, we do not show off our scientific writing skill by using
words no one knows; we rather impress the readers if we succeed in conveying the
message with few (well-known) words and short sentences.
Literal translation of scientific texts often results in complicated, long, and
obscure sentences. English is a highly precise and powerful language requiring
fewer words than other languages to express informative content. It is helpful to
try to cap the core message in English rather than your native language, using as few
words as possible. At any rate, every manuscript written by nonnative speakers of
English should be scrutinized for spelling and grammar mistakes, and an experienced
writer with a sound knowledge of English should edit the article before submission
for publication.
Chapter 6
Scientific Misconduct
Without any doubt, fraud, forgery, and any other form of scientific misconduct
have always been around. In the past, much of this remained, however, undetected
because the tools enabling exposure of such misconduct were largely missing. In
recent years, articles on scientific misconduct have become ubiquitous, and even
daily newspapers increasingly descend on the subject, frequently using rather
provocative headlines. In January this year, the Swiss newspaper Neue Zürcher
Zeitung (NZZ) stated that “Science fights forgery but nurtures bluff” [13]. The
Welt am Sonntag of May 17, 2015, headed their interesting article on scientific
misconduct with “Lies from the laboratory” [14]. The author, Thomas Vitzthum,
went on to say that deception and fraud are particularly common in disciplines that
hold the greatest hope for mankind, such as medical, genetic, or psychological
research.
Today, scientific misconduct is being fought on all levels. Most universities have
meanwhile appointed an “integrity officer” whose task it is to detect any irregularity
that could infringe on the university’s integrity and reputation. Many organizations,
authorities, and other official bodies have issued guidelines in connection with the
avoidance and detection of scientific misconduct. Most journal editors successfully
use software to detect plagiarism, redundancy, and falsification. In a position paper
issued in April 2015, the German Science Council (Wissenschaftsrat) recommended
to set up an information platform on which cases of scientific misconduct can be
debated and archived. The council’s position paper alludes to the risks of nurturing
scientific misconduct within the science community and makes a plea for a cultural
change.
Scientists are charged with an important mission, namely, to contribute to the “pool
of knowledge.” It goes without saying that scientists have to comply with the highest
ethical standards and must be of unquestioned integrity. And yet, scientific
misconduct is omnipresent across all disciplines – a fact that necessitates a closer
look at the origin of such unacceptable behavior.
Science is about furthering knowledge, and this is achieved by making new
findings available by publishing them in the relevant literature. A scientist who fails
to publish regularly tends to fight a losing battle. Successful application for research
grants and other financial support hinges on the visibility, credibility, and
international recognition of the applying researchers (see also Chap. 14). Thus,
there is substantial pressure on scientists to ensure their visibility and strengthen
their profile. Their competence is measured by the number of articles produced per
year as well as the “value” of the journals that have published them. In addition,
scientists have to confirm their hypotheses if they want to be successful. This means
that researchers are under much pressure to achieve statistically significant findings,
which are expressed as having a p-value of ≤ 0.05. A finding is regarded as “true”
only if it reaches the level of p ≤ 0.05 (see also Chap. 7). Thus, a common practice
in academic circles is to go on a “fishing expedition” by looking at correlations
between large numbers of variables and then reporting the ones with statistically
significant associations. If you estimate 200 correlations with p = 0.05 as the cutoff,
you would expect 10 statistically significant findings by chance. It is valid to report
these, provided that you disclose upfront that multiple correlations were computed.
Hiding this fact would be dishonest.
In the light of the pressures researchers face nowadays, it is barely surprising that
scientists are tempted to present their data in the best possible way, especially in
those cases where borderline significance, i.e., a p-value only slightly above the
critical level of p = 0.05, was achieved. By applying a different statistical test or
eliminating certain data points (outliers), a marginally significant finding may
become statistically significant. Is this cheating? Is it forgery? The line between
optimizing and “massaging” data is rather narrow, and the answer to the above
questions is not always obvious. In principle though, it is our duty as scientists to
report all findings truthfully, accurately, and completely.
Along with other institutions, the German Science Council urges that the
“rewarding system” of the science community should be critically reviewed. At the
same time, the scientific community should learn to appreciate the value of negative
findings. Many journals exclusively publish studies with statistically significant
results although negative findings may be of equal importance to the advancement
of knowledge (see also Chap. 15). This situation is particularly frustrating for
students whose early research questions may not have been answered affirmatively.
Some years ago, a group of PhD students at the University of Mainz had initiated
the founding of a journal dedicated to the publication of studies with a negative or
inconclusive outcome. They named it Journal of Unresolved Questions (JUnQ).
6.3 Forms of Scientific Misconduct 51
The purpose of the journal is to alert other scientists to erroneous hypotheses and to
save them from pursuing the same (unsuccessful) experimental route.
The NZZ article [13] mentioned in Sect. 6.1 states that the most prestigious
scientific journals, i.e., those with a high rank and impact factor (see also Chap. 14),
publish a particularly large number of papers that are based on falsified data. The
desire to publish in a renowned scientific journal appears to promote dishonesty and
greed. This rather disturbing fact can only be successfully addressed if we change
our attitude towards “scientific success” and modify the system of how scientific
performance and communication are being rewarded.
The term “scientific misconduct” encompasses all forms of improper scientific and
medical work and communication. Commonly, we differentiate between fabrication,
falsification, and plagiarism, in line with the American National Science Foundation
[15]. Let us look at these forms in more detail:
Fabrication of data is a serious offense. It implies that results are made up and are
reported as genuine findings, your own, or someone else’s. Luckily, deliberate
falsification of data is rare because such an undertaking is grossly incompatible with
the ethical principles of scientific work. However, more subtle forms of falsification
are prevalent. The most ubiquitous of them concerns the inclusion of literature
references in a paper, e.g., in the discussion section, that are inappropriate, outdated,
or – even worse – inexistent. Authors are often in a rush to complete a manuscript
or thesis, and the temptation to add ill-reviewed references is overwhelming. If
arguments are supported by literature citations, they appear to be more impressive
and credible, and the reader will believe that the author’s interpretation of findings
is generally accepted.
It is almost impossible to estimate the frequency of data falsification in the
literature because experiments and studies would have to be reproduced using
methods identical to those applied in the original work. Discrepant results would
not necessarily be seen as a sign of data fabrication; methodological differences and
experimental limitations would rather be named as reasons for the different findings.
Similarly, reviewers rarely spot faked reference citations, unless they know the
literature intimately or the citation is grossly suspect. Thus, most falsified data tend
to be discovered by chance.
In a systematic review and meta-analysis of 21 surveys, Fanelli (2009) analyzed
the frequency of data fabrication and falsification in research [16]. In the various
surveys, scientists were asked whether they had ever fabricated or falsified data or
52 6 Scientific Misconduct
whether they were aware of someone who had done so. Less than 2% of interviewees
confessed data fabrication or falsification, but nearly 34% of questioned scientists
admitted other questionable research practices. As would be expected, the figures
given for colleagues were markedly higher, reaching a value as high as 72% for
other questionable research practices. Fanelli concluded that these rates were
probably underestimated in view of the sensitive nature of the issue in question.
Clearly, data fabrication and falsification in science are overlapping, and they are
equally reprehensible. In their recommendation paper, the members of the
International Committee of Medical Journal Editors (ICMJE) define data falsification
as “the falsification of data, information, or citations in any formal academic
exercise” [17]. Thus, falsification encompasses manipulating research materials,
equipment, or processes. In contrast to data fabrication, data falsification also
includes the omission of data in an attempt to “optimize” the mean values or trends.
For example, if outliers are excluded from analyses, a nonsignificant finding may
become statistically significant, making the outcome far more convincing than it
would have been otherwise.
In short, falsification includes all forms of manipulation of data or information
that compromise the accurate, truthful, and complete record of scientific work.
6.3.3 Plagiarism
authors) without due acknowledgment” [17]. The New Shorter Oxford English
Dictionary defines plagiarism as “the taking and using as one’s own … the thoughts,
writings, or inventions of another” [9].
Although we do not wish to whitewash plagiaristic behavior by any standard, our
experience shows that acts of plagiarism are rarely committed willfully. More often
than not, failure to appropriately credit contributions by others results from the
authors’ negligence, obliviousness, or carelessness. This is sometimes referred to as
“citation amnesia,” “disregard syndrome,” or “bibliographic negligence” [18].
6.3.3.1 Self-Plagiarism
As pointed out in Sect. 6.2, there is a publication bias for “positive” findings, i.e.,
those that prove the researchers’ hypotheses and achieve statistical significance (see
also Chap. 7). Thus, many studies that did not conclusively answer the underlying
research question go unpublished. We feel, however, that even non-proven or
disproven hypotheses should be communicated because the absence of an assumed
phenomenon may be an equally important finding. Moreover, negative findings may
encourage other scientists to explore new experimental avenues.
The subject of omitting publication of unexpected or unwanted results is highly
controversial. Nonetheless, most people would agree that failure to publish the
results of clinical trials and other studies in humans is a form of scientific misconduct
because any finding, positive or negative, that potentially impacts on the well-being
of mankind must be communicated. If the study was conducted as part of a drug
development plan, the sponsoring company will, however, decide on the audience
and timing of publication (see also Chap. 15).
Some researchers refrain from publishing significant findings because they fear
that the study outcome could adversely impact on their interests (or the interests of
their sponsor, e.g., a pharmaceutical company). This happens if study results that
should have confirmed previously published findings actually contradict them.
Undoubtedly, the practice of selected publication in the interest of hiding new
knowledge is incompatible with scientific publication ethics and is regarded as a
serious form of misconduct.
As pointed out in Sect. 15.1.1.2, the list of authors should be restricted to those who
made a major contribution to the study, with minor contributors acknowledged at
the end of the paper. Omission of significant contributors from the author list,
however, is a form of scientific misconduct. Conversely, conferring authorship on
individuals that have not made substantial contributions to the research is equally
questionable. This practice is very common among group leaders and student
supervisors who take advantage of inexperienced junior researchers. In his
publication, Kwak referred to abusive co-authorship as the “white bull effect” [20].
Enforced authorship is, unfortunately, difficult to prove because consistent
definitions of “authorship” and “substantial contribution” are missing [21]. The
6.5 Post-publication Misconduct 55
worst form of this is called “guest authorship” or “ghost authorship,” in which case
the stated authors had no involvement at all.
Successful publication of your work may lull you into a false sense of security. You
may feel that nothing can happen after your study has appeared in print. There are,
however, certain post-publication pitfalls to be considered.
Failure to properly archive data and inability to retrieve them on demand are forms
of post-publication misconduct. All data collected within a study of any kind must
be retained for later examination, even after publication. Because national laws
regulating the storage and retrieval of data collected during a study vary to some
extent, it is essential to familiarize yourself with the specific (and current)
requirements applicable to the study in question.
For clinical studies, sponsors are required to retain all data collected in connection
with the study medication until the expiry data of the final batch of the medication
studied, albeit for no less than 10 years after completion or premature termination
of the study. If the study involved implantation of a medicinal product, all data
concerning the product must be retained for at least 15 years. Similarly, the
investigators are obliged to archive all original data and all data necessary to identify
and follow up the participants for at least 10 years after study completion. If
implanted products were involved, all data pertaining to the study participants must
be retained for at least 15 years.
easier to amend a published article, but journals have to provide appropriate systems
and instructions to ensure that readers are alerted to any changes made to the original
work.
Correction may become necessary if there was a mistake in the original publication
that escaped the attention of the author and editor. Corrections usually concern
errors in calculations, chemical structures, dosages, or the spelling of drugs and
other chemical compounds. A correction may also become necessary if the original
list of authors or other contributors was incorrect or incomplete. Corrections of
published articles are permitted only if they do not alter the main study findings and
conclusions and if the article is otherwise sound. Needless to say, corrections of
published articles should be avoided at all cost because they impact on the credibility
of the scientific work and the responsible researcher(s) alike.
Retraction of an article is, of course, considerably more far reaching than correction.
Retraction implies that the main findings, conclusions, or implications of a study are
no longer valid. Thus, a retraction indicates that the work should not have been
published and that its outcome must not be used as the basis of further research. The
most common reasons for retracting an article are scientific misconduct including
plagiarism and duplicate/concurrent publishing (self-plagiarism; see also
Sect. 6.3.3.1). In some cases, however, a paper is retracted on the grounds of a
“genuine error” that seriously affects the study outcome.
The retraction may be initiated by the authors themselves or their institution, or
it may be enforced by the editors of a journal. Retractions should be published
formally, clearly stating the reason for the retraction to help readers to distinguish
cases of misconduct from those caused by genuine error. Occasionally, authors
provide an apology for previous error, especially if the error was, in fact, an “honest”
one. This may help to limit the damage caused, but it will never undo it. In addition,
we should bear in mind that even retractions are occasionally incorrect or insincere.
They may have been made for personal gain or external pressure. This means that
we, the readers, essentially have to rely on our intelligent judgment and common
sense.
In 2010, the science writer Ivan Oransky and journal editor Adam Marcus
launched a blog they named “Retraction Watch” [26]. The blog updates on new
retractions and discusses general issues in relation to retractions. You may find this
useful reading when being faced with a publication retraction.
6.6 Final Thoughts 57
Journal editors may issue an expression of concern if they have reasons to doubt the
credibility of the research or ethical conduct of the study or if they suspect any form
of publication misconduct. An expression of concern is particularly due in situations
where the authors’ institution is not willing to investigate the alleged misconduct or
where there is good reason to assume that such an investigation would not be
objective or conclusive.
Like retraction notices, expressions of concern should be clearly linked to the
original article in electronic databases and should specify the reasons for the
concern. If the work is subsequently shown to be credible and reliable, an exonerating
statement will be added to the expression of concern. If, however, further evidence
corroborates the concern, the expression of concern is replaced by a retraction
notice.
Science is about advancing knowledge and solving mysteries. Any form of obscuring
study findings, be it deliberate or undeliberate, is grossly incompatible with scientific
ethics. Thus, we owe it to the scientific community to conduct all studies truthfully
and to report all findings accurately, correctly, and completely.
Chapter 7
Key Statistical Concepts
A major component of medical and scientific work is the collection and interpretation
of data. A competent medical/scientific writer should be able to present and interpret
data honestly and objectively (see also Chap. 6). Although there are many good
textbooks on statistics, we thought it worthwhile to provide a concise and
uncomplicated review of some basic statistical concepts and tests commonly used
in the context of medical and scientific writing.
Data in a scientific document can be analyzed descriptively and/or by formal
(inferential) statistics. These procedures are described below.
Table 7.2 Determining variance and standard deviation (square root of the mean variance)
Mass Deviation from Variance (deviation from
Subject no. (kg) mean mean: squared)
1 88 12.1 146.41
2 72 −3.9 15.21
3 110 34.1 1162.81
4 65 −10.9 118.81
5 56 −19.9 396.01
6 71 −4.9 24.01
7 81 5.1 26.01
8 52 −23.9 571.21
9 95 19.1 364.81
10 69 −6.9 47.61
Total 759 0 2872.90
Mean 75.9 0 319.21
Standard deviation (square root of the – – 17.87
mean variance)
å ( vi - m )
2
SD =
n -1
The symbols used are SD (standard deviation), vi (value for an individual), m
(mean), and n (number of values). In our dataset, the mean ± SD = 75.9 ± 17.87 kg.
The SD can also be expressed as a percentage of the mean. This is referred to as the
coefficient of variation (CV): CV = (SD/mean) × 100.
For our example in this paragraph, CV = (17.87/75.9) × 100 = 23.5%.
Most biological values show a normal distribution with a symmetrical distribution
of data around the mean (Fig. 7.1).
7.1 Descriptive Statistics 61
Number of subjects
Mean
The mean value ± SD represents roughly 2/3 of the area of the normal distribution
curve (68% to be precise), and the mean value ± 2 SD represent about 95% of the
area (to be precise, it is the mean value ± 1.96 SD). For our dataset, the mean ± SD
(75.9 ± 17.87 kg) means that this describes 68% of the population.
Most biological normal-range values represent 95% of the population (roughly
the mean ± 2 SD). It would therefore be expected that 5% of the population undergoing
a laboratory test, for example, would have values outside the “normal” range, with
2.5% below and 2.5% above the normal range. Another way of referring to mean ± SD
is to define it as the 68% confidence interval (CI). Similarly, the mean ± 2 SD
represents the 95% CI, and a CI of 99% is described by the mean ± 2.58 SD.
Note that the discussion above is based on large sample sizes. With small sample
sizes, the curve might not be quite symmetrical although a parameter may be
normally distributed. Thus, the smaller the sample size, the wider are the 95%
confidence limits. This is described by the t-distribution. With a sample size of 15,
for example, the 95% confidence limits are the mean ± 2.14 SD. The value of t (i.e.,
the deviation from the mean that describes a particular confidence interval) can be
looked up in available t-distribution tables. Table 7.3 shows some of the values.
In a normal distribution curve, the 95% CI is the mean ± 1.96 SD. The smaller the
number of data points, the less sure we are that the mean ± 1.96 describes the 95%
CI, and this margin has to be increased. In the table with 6 data points (5 degrees of
freedom), the 95% CI is described by the mean ± 2.57 SD. As n increases, t gets
closer to 1.96. The concept of statistical significance means that if we know the 95%
CI of a dataset, the probability of a value outside the 95% CI is 0.05, and the latter
value is reflected in the t-distribution table (Table 7.3).
Apart from the mean, there are two other terms, namely, the median and mode,
that can be representative of a dataset. The median is the middle value (i.e., 50% of
the dataset lie below and 50% above the median). If you have a dataset with an even
number of data points, e.g., 5, 8, 11, and 14, the median is the mean of the two
middle numbers (8 + 11)/2 = 9.5). If you have an uneven number of data points, e.g.,
5, 8, 9, 11, and 14, the median is the middle number (9 for this example). The mode
62 7 Key Statistical Concepts
is the value that occurs most frequently in a dataset. We will confine our discussion
to the median and mean.
In a dataset with a normal (symmetrical) distribution, the median and mean will
be the same or nearly the same, with 50% of the data above and 50% below these
values. Some datasets may have a skewed distribution (Fig. 7.2).
With data skewed to the right as in graph A above, the median is less than the
mean, whereas with data skewed to the left (graph B), the median is more than the
mean. Data with a distribution curve skewed to the right might indicate that the data
are log-linearly distributed. In such cases, plotting the data values (X-axis) as
logarithms will “normalize” the distribution (see Chap. 8).
A population might show more than one peak in the distribution of data. A
typical example is a bimodal distribution as seen with drug polymorphisms. In these
instances, there is usually a characteristic which is determined by a specific gene
resulting in two populations that may or may not overlap. For instance, this might
concern a gene affecting the activity of an enzyme that is responsible for the
metabolism of a certain class of drugs. The resulting curve will show two distributions
(see Fig. 7.3), i.e., one for slow and one for rapid metabolizers.
To conclude, descriptive statistics are valuable for representing and summarizing
data from a study, such as demographic data or laboratory values, for example, in an
appropriate table or graph (see Chap. 8).
Median
Mean
b Median
Mean
Fig. 7.2 Schematic presentation of a skewed distribution (a skewed to the right, b skewed to the
left)
Fig. 7.3 Schematic
presentation of a bimodal
distribution (two
subpopulations)
p ≤ 0.05, the null hypothesis is rejected. This value is the gold standard and assumes
statistical significance if the probability of a difference occurring by chance is less
than 1 in 20 (falling outside the 95% CI). This, of course, means that if the experiment
is repeated several times, a p-value of 0.05 would occur on average in one of every
20 experiments and a p-value < 0.01 once in every 100 experiments. Two types of
error are possible:
• Type 1 error (false positive): incorrect rejection of the null hypothesis. Thus, an
effect that is not present is erroneously assumed to be present.
• Type 2 error (false negative): erroneous rejection of the null hypothesis. Thus, a
genuine effect present is not detected.
This can be better understood if we consider an example, e.g., how normal ranges
for laboratory tests are determined. Normal values are determined by establishing the
95% confidence limits, assuming a normal distribution. This means that the probability
of a value outside the normal range is 0.05. For example, the normal range for serum
potassium is 3.5–5.0 mEq/L. If serum potassium values are obtained routinely, an
average of 5% (one in 20) of subjects will be wrongly diagnosed as having an
64 7 Key Statistical Concepts
SD
SE =
n
Probably the most widely used statistical test for comparing two small datasets is
the Student’s t-test. In essence, the test determines the 95% confidence limits of the
difference between the means of two samples. The basis of the test is to determine
the t-value (the SD factor that determines the 95% CI). The data required are the
means of the two datasets. The t-value is determined by the following formula:
s12 s22
SE diff = +
n1 n1
7.2 Inferential Statistics 65
Table 7.4 Body mass of football players and Olympic marathon runners (raw data, means, and
standard deviations)
Group Body mass (kg) of individual athletes Mean SD
FBP (x1) 93 82 122 117 98 137 85 101 96 109 103.9 17.4
OMR (x2) 72 59 63 48 81 62 53 70 59 64 63.1 9.5
FBP football players, OMR Olympic marathon runners
Please note that there are several ways of doing this. For illustrative purposes, we
chose one of the easier formulas (s1 is the SD of x1 and s2 the SD of x2). If we
substitute the values, we get the following:
17.4 2 9.52
SE diff = +
10 10
Solving the equation, the SEdiff is the square root of 39.301 which is 6.27. This
means that the mean of the difference ± 6.27 SD describes the 95% CI. In Table 7.3,
the value for 10 degrees of freedom is 2.23 (9 degrees of freedom is not shown, but
it is 2.26). This means that the 95% CI is larger than required for statistical
significance and that the p-value is below 0.05 (in fact, it is < 0.001). Thus, we can
reject the null hypothesis claiming that there is no difference between the weights
of football players and marathon runners. We can conclude that in our study
population, football players are significantly heavier than marathon runners.
This is a relatively simple test used to compare the means of two datasets that can
be paired. This usually means that there are two datasets obtained from the same
subjects. An example would be the comparison of a value before with a value after
treatment or comparing two treatments. Table 7.5 provides an example of such
datasets.
The concept of the paired t-test in this situation is to test the null hypothesis that
there is no difference in diastolic blood pressures (DBP) before and after treatment.
The first step is to find the difference between the two datasets in each pair
(Table 7.5). Note that it is important to distinguish between positive values (increase
in DBP) and negative values (decrease in DBP). If we add up all values and divide
the sum by 10, we obtain the mean of the difference (in this case −12.5 mmHg). The
next step is to determine the SD; in this case it is ±12.3 mmHg. This means that the
difference in DBP seen was −12.5 ± 12.3 mmHg.
We now need to find the SE of the difference (SD/square root of n), i.e.,
12.3/3.16 = 3.89. We can now determine the t-value (t = mean difference/SEdifference)
which would be 12.3/3.89 = 3.16. The t-value for p = 0.05 for 9 degrees of freedom is
2.23, and consequently the p-value is <0.05. We can therefore reject the null hypothesis
and conclude that the drug tested produced a statistically significant decrease in DBP.
66 7 Key Statistical Concepts
Table 7.5 Diastolic blood pressure in patients with essential hypertension before and after
4 weeks of treatment with a beta-blocker
Individual measurements of DBP (mmHg)
Patient no. 1 2 3 4 5 6 7 8 9 10
Before treatment 95 112 98 112 95 114 109 97 115 116
After treatment 98 103 87 101 88 85 82 101 107 86
Change 3 −9 −11 −11 −7 −29 −27 4 −8 −30
DBP diastolic blood pressure
When dealing with datasets, it is always valuable to look at the raw data. A very
useful way of looking at paired data is to do a scatter plot of the two datasets with
the points connected (see also Chap. 8). Figure 7.4 shows a scatter plot of the above
example.
Figure 7.4 shows that the vast majority of lines decrease, and this suggests
visually that the treatment was effective. Scatter plots are particularly valuable with
larger datasets where it might not be easy to look at the raw data in a table.
For small datasets that are nonparametric (not normally distributed), we use tests
based on ranking the data. This is similar to Student’s t-test for unpaired and paired
data. There are some nonparametric biological parameters with a positively skewed
distribution (see Fig. 7.2). This is particularly true for pharmacokinetic data, and
these can be normalized by using the log data (see Chap. 8).
120
115
Diastolic BP (mm Hg)
110
105
100
95
90
85
80
1 2
1=before treatment, 2=after 4 weeks of treatment
Fig. 7.4 Diastolic blood pressure in 10 patients before and after treatment with a beta-blocker for
4 weeks
7.2 Inferential Statistics 67
This test is used for nonparametric data and is equivalent to the t-test for parametric
data. Instead of using the distribution of t, it uses the u-distribution. If we determine
the median of a dataset of n = 8 with two groups (x and y) of n = 4, the null hypothesis
(H0) would suggest that each individual data point in group x (xi) would have the
same chance as each individual data point in group y (yi) of being above or below
the median:
H 0 : p ( xi > yi ) = 0.5
If we have four values of x and four values of y and we rank them from high to
low, there are 70 possible combinations. At the one extreme, all x-values would
be lower than all y-values (sequence x, x, x, x, y, y, y, y); the probability of this
occurring by chance would be 1/70 (p = 0.014). The other extreme (y, y, y, y, x, x,
x, x) also has a p-value of 0.014. The probability of a particular sequence
occurring by chance follows a u-shaped distribution, hence the term u-distribution.
The value of u is obtained by listing the data in sequence and determining u for
either x or y. Below each x in the column, you insert the number of y-values to the
left of it. Table 7.6 shows some possible sequences and the value for each
sequence.
If you look at all possible sequences for two samples of four, u ≤ 3 or u ≥ 14 has
a p < 0.05. Statistical tables of the u-distribution for various samples sizes are readily
available.
Let us look at determining u from a dataset comparing the age of Olympic gold
medal winners in gymnastics (A) with gold medal winners in running 1500 m (B).
Let us assume we have data from n = 4 athletes in each group. In the first two rows,
there are the individual age data points in sequence. In the next two columns, we have
the calculation of u using the values to the left of A as well as values to the left of
B. You only need one of these as the totals obtained give the same p-values (Table 7.7).
We obtained a u-value of 5 using the position of A- relative to B-values. This
gives a p-value of 0.071, so the null hypothesis is not rejected. You could also use
the position of B, and you would obtain a u-value of 11 which has the same p-value
of 0.071. As the p-value is close to being statistically significant, this might be an
example of a type 1 error (false-negative outcome), and it might be worthwhile
repeating this study with a larger number of athletes.
Table 7.6 Illustration of how u is determined in two samples of four data points each
Data sequences with values of y to the left of x in the column below the
symbols u p
x x x x y y y y
0 0 0 0 0 0 0 0 0 0.014
x y x x x y y y
0 1 1 1 3 0.043
x y x x y y x y
0 1 1 3 5 0.071
y y y y x x x x
4 4 4 4 16 0.014
differences, the next step is to add the positive and negative ranks separately to obtain
the sum of the ranks. For this test, the null hypothesis is that the median difference
between the two samples is zero. The bigger the difference between the sums of positive
and negative ranks, the smaller is the probability that this is due to chance. Let us look
at a dataset of pain scores 2 h after treatment in two groups of 6 patients receiving either
placebo or an analgesic for tension headache in a crossover study (Table 7.8).
The lowest rank sum value represents t. In our example, t = 4 (sum of the ranks of
negative values). If we consult the appropriate tables for n = 6 (Table 7.9), we will
find that for t = 4, p < 0.05. Thus, our value is statistically significant, and the null
hypothesis is rejected.
While the statistical evaluations discussed thus far assess numerical variance, the
chi-squared test assesses categorical variance. Let us look at the data in Table 7.8.
In the example we computed numerical statistics looking at the actual pain scores.
Another approach could have been to define what is perceived as adequate pain
relief. Let us assume that a decrease of 3 on a 10-point scale is regarded as a
response. We could then categorize the placebo group and treatment group into
7.2 Inferential Statistics 69
Table 7.8 Pain scores on a 10-point scale after treatment with a placebo or an analgesic
Treatment Parameters
Placebo Analgesic Difference Rank (positive) (Rank negative)
2 4 −2 1
3 0 3 2
3 7 −4 3
8 3 5 4
6 0 6 5
8 1 7 6
Sum 17 4
Table 7.10 2 × 2 Contingency table for two treatments with two categories of response (2 × 2
table)
Variable Placebo Active Total
Responders 13 (a) 35 (b) 48 (a + b)
Nonresponders 37 (c) 15 (d) 52 (c + d)
Total 50 (a + c) 50 (b + d) 100 (a + b + c + d)
( ad - bc ) ( a + b + c + d )
2
X 2
=
(a + b) (c + d ) (b + c) (a + c)
If we enter the values from the table in the equation, we obtain a value of X2 = 19.4. To
calculate the p-value, we need to determine the degrees of freedom (DF) using the
number of data columns as n. As DF = n−1, DF = 2–1 = 1. We can then consult the chi
squared distribution table (Table 7.11) to find the p-value for a X2 of 19.4, which shows
p < 0.001.
The calculation of X2 with more than two columns is more complex than for a 2 × 2
table and is outside the scope of this book. The basic concepts, however, are the same.
70 7 Key Statistical Concepts
Our discussion thus far has mainly looked at comparing two datasets. These tests
depend on p ≤ 0.05 (i.e., the finding is outside the 95% CI) for rejecting the null
hypothesis. Thus, if a test is repeated 20 times, there is a 1/20 chance of a statistically
significant finding (p = 0.05) occurring by chance. Putting it differently, the chance
of finding an outcome that is nonsignificant is 0.95. If you repeat a test several
times, the chance of finding a nonsignificant finding is 0.95n. If you test two datasets,
the chance of a nonsignificant finding is 0.952 = 0.90. This means that the CI has
decreased to 0.90, and consequently p = 0.10. Thus, the possibility of finding a
statistically significant finding by chance is increased.
Let us do the same with p = 0.01. The chance of finding a nonsignificant value
with 5 tests would be 0.995 = 0.95, consequently p = 0.05. This means that if you do
a Student’s t-test five times on five subsets of a database, a t-value that has a p-value
of 0.01 in reality has a p-value of only 0.05. This is called the Bonferroni principle.
This means that if you do multiple two-sample tests, the p required for statistical
significance has to be lowered (Bonferroni correction). Thus, if you do five tests,
you require p = 0.01 to reject the null hypothesis.
Properly designed tables and graphs are powerful tools for getting across key
messages and breaking the monotony of uninterrupted text. Poorly designed tables
and graphs can, however, irritate and confuse the reader. There are some important
golden rules:
• Do not duplicate information by describing something in text and then giving the
same information in a table and/or graph. Tables and graphs should illustrate a
statement and/or complement text.
• A table or graph should be a clear stand-alone component of the document, and
the reader should not have to hunt in the text to find explanations.
• The format of tables or graphs should be consistent throughout a document. Pay
particular attention to consistency of units, symbols, headings, and labeling of
axes.
Table 8.1 Age, mass, height, and body mass index in four groups of professional athletes
Athlete group Age (years) Mass (kg) Height (m) BMI (kg/m)
Long-distance runners (n = 522) 27 ± 12 62 ± 18 1.67 ± 15 22.2 ± 2.1
Gymnasts (n = 615) 17 ± 14 54 ± 15 1.5 ± 21 24.2 ± 1.3
American football players (n = 398) 23 ± 8 87 ± 25 1.82 ± 12 26.2 ± 3.7
Basketball players (n = 487) 21 ± 10 78 ± 20 1.90 ± 14 21.5 ± 1.1
Values are means ± SD. BMI body mass index
As one tends to read from left to right, use as few vertical lines as possible to
allow for uninterrupted data flow from left to right. We recommend that you
stay with this simple template unless you have a strong rationale for deviating
from it. Table 8.1 is an example of a table designed according to the standard
template.
The key features of successful medical and scientific writing are that documents
produced are reviewer-friendly, get key message across effectively, and achieve the
desired outcome (see also Chap. 1). Tables in the main text should display sufficient
data to convey key messages, but detailed tabulations of data should be placed in an
appendix rather than the main document, with the appropriate reference given in the
text. A common mistake is trying to include too much information in a table. To
make a table less complicated, you can split it into separate simple tables.
Table 7.2 in Chap. 7 deviates from our advice on tables in that it shows raw data
(body mass) as well as derived parameters. In a study report, for instance, it is
usually sufficient to show the mean and SD to convey a message or illustrate a point.
You could then include a reference to the raw data, as some regulatory reviewers
might want to analyze the data themselves. In Table 7.2, we illustrate how parameters
are calculated, and therefore, we included the raw data and variance calculations as
well as the total, mean, and SD. We have, in fact, produced two tables in a single
table to explain how SD is calculated.
To conclude, remember that tables should be clear and informative and that they
should complement rather than duplicate text.
As with tables, a sensible graph should not duplicate but complement text or be used
to illustrate a point (e.g., a key finding). The main functions of graphs are to illustrate
the characteristics of a database, compare datasets, examine relationships between
parameters, or show trends over time. As with tables, a graph should have a clear
legend or caption and should make it obvious what the x- and y-axes represent.
8.2 Sensible Use of Graphs 73
Much valuable information can often be derived by just looking at the individual
data points and the descriptive statistics of a dataset without doing formal inferential
statistics. Let us consider a database representing maximum plasma concentration
(Cmax) values in mg/L after administering a tablet of a specific medicine (values
shown in ascending order):
5, 8, 10, 11, 12, 15, 15, 15, 15, 16, 16, 16, 16, 16, 17, 17, 18, 22, 23, 27
The key representative values are the mean ± SD (15.7 ± 5.0 mg/L) and the
median (middle value) which is 16 mg/L. A very useful tool for summarizing a
database is a box-and-whisker plot. To draw it, we require the median and have to
divide the data into quartiles, each quartile representing 25% of the data. In the
dataset above, the first 10 data points represent 50% (the first and second quartiles)
of the data set (5, 8, 10, 11, 12, 15, 15, 15, 15, 16). To identify the two quartiles, we
have to determine the median (i.e., the mean of 12 + 15), which is 13.5. The first
quartile is represented by the 5 data points below the median (5, 8, 10, 11, 12) and
the second quartile by the 5 data points above the median (15, 15, 15, 15, 16). In the
same way, the third and fourth quartiles can be obtained from the second 10 data
points (16, 16, 16, 16, 17, 17, 18, 22, 23, 27).
The box-and-whisker plot has a box (showing the second and third quartiles on
either side of the median) and whiskers showing the first and fourth quartiles. An
alternative option is to use the whiskers to show 95% CIs and individual data points
for values outside the 95% confidence limits.
Figure 8.1 shows the dataset presented as a scatter plot, as means ± SD, and as a
box-and-whisker plot:
30
25
20
15
10
5
A B C D
0
Fig. 8.1 Plasma concentrations (mg/L) shown as A, scatter plot; B, scatter plot with overlapping
values spread horizontally; C, means ± SD; D, box-and-whisker plot (box, median between second
and third quartiles; whiskers, first quartile below the box and fourth quartile above the box)
74 8 Tables and Graphs
A plain scatter plot (A) may be misleading. If there are overlapping values,
separating values horizontally (B) gives a better view of the data. With small
datasets, a scatter plot is mostly sufficient to show the distribution of data. With
large datasets, we recommend the use of mean ± SD or a box-and-whisker plot. The
latter gives more information because this plot clearly shows the spread of the whole
dataset, as well as the distribution of each quartile around the median. As a general
principle, the simpler the graph, the better the chance of conveying the intended
message to the reader.
Table 8.2 Trough plasma concentrations in healthy volunteers after a single dose (750 mg) of four
different formulations of a new drug intended for a phase III clinical study
Parameter Tablet 1 Tablet 2 Capsule A Capsule B
Mean (mg/L) 152.5 163.5 109.2 97.7
SD (mg/L) ±18.0 ±47.0 ±37.1 ±7.0
CV (%) ±11.8 ±34.0 ±33.0 ±7.2
2500
2000
1500
1000
Tablet A
Capsule 2
Fig. 8.2 Trough plasma Tablet B
500
concentrations (mg/L) of
four different potential Capsule 1
formulations of a new drug
for a phase III study 0
8.2 Sensible Use of Graphs 75
In science, we often show the relationship between two parameters (e.g., relationship
between a dependent and independent variable). Examples could be dietary sodium
intake correlating with diastolic blood pressure, the relationship between the dose
and effect of a drug (e.g., warfarin dosage versus prothrombin time), correlating the
dose of a drug with pharmacokinetic parameters, or correlating the degree of renal
impairment (e.g., glomerular filtration rate or serum creatinine) with a factor
affected by renal failure (e.g., vitamin D, hemoglobin, phosphate, etc.). A simple
and common method for showing a relationship is by means of fitting a straight line
(linear regression analysis). The y-axis shows the dependent variable (e.g., plasma
concentration of a drug) and the x-axis the independent variable (e.g., dose of a
drug). For the examples above, you may, for instance, plot plasma phosphate versus
creatinine concentrations in a cohort of patients or a pharmacokinetic parameter
versus the dose of a drug.
During drug development, the impact of impaired renal function on drugs with
significant urinary excretion needs to be studied. Let us illustrate this with a simple
dataset (Fig. 8.3). The graph plots the area under the concentration versus time
curve (AUC) of the drug (i.e., the dependent variable) against renal function as
represented by creatinine clearance (independant variable). A linear regression line
40,000
y = 335.31x − 1586.9
35,000 R2 = 0.6702
30,000
AUC (mg.h/mL)
25,000
20,000
15,000
10,000
5000
0
0 20 40 60 80 100 120
Creatinine clearance (mL/min)
Fig. 8.3 Plasma concentrations (expressed as AUC) after a single dose of a new drug given to 32
subjects with varying degrees of renal impairment. A linear regression (straight-line fit) was
obtained using y = mx + c. Note: x = plasma concentration, m = slope, y = creatinine clearance,
c = intercept on the y-axis
76 8 Tables and Graphs
can be fitted using the equation y = mx + c (see legend of Fig. 8.3). The closeness of
fit determined statistically is expressed as the r-value. A perfect fit would be r = 1. A
better indication of fit is to use the r2 value. In the example below, the values of
r = 0.8187 give an r2 = 0.6702. The latter value means that 0.67 or 67% of the
variability can be explained by the hypothesis (i.e., a linear relationship between
AUC and renal function). Additionally, the p-value can be derived from a statistical
table of r-values; for this dataset, p = 0.0001.
Another widely used method in nonclinical and clinical research is to look at
the dose–effect relationship or the concentration–effect relationship of drugs
(Fig. 8.4).
The point where the effects become measurable (e.g., when the curve goes above
zero) reflects the intrinsic activity of the drug, i.e., the ability to bind to a receptor
and produce a response. This typical sigmoid (s-shaped) dose–response curve is
obtained when plotting effect versus log dose. In biological systems, it might not be
possible to determine Emax for safety reasons. Furthermore, the area of major interest
in the sigmoid curve is a straight line; therefore, we could apply a simple linear log
dose versus effect approach to the data in Fig. 8.4, as shown in Fig. 8.5. Using a
linear regression model covers the concentrations responsible for most of the
measured effects. However, this model excludes effects close to zero at the lower
end and concentrations near the Emax.
As pointed out above, the model fits the linear regression equation, but the drug
concentrations are logarithmic rather than numerical values.
Emax
Effect
EC50
Emax × C
E=
EC50 + C
Fig. 8.4 A typical
concentration–effect curve
shown as an Emax model. E
effect, Emax maximum
effect, EC50 concentration
that produces 50% of the
effect Log concentration
8.2 Sensible Use of Graphs 77
Log concentration
250
Cmax
Drug plasma concentration (mg/L)
200
150
100
50 Tmax
T1/2
0
0 4 8 12 16 20 24 28 32 36 40 44 48
Time (hours)
Fig. 8.6 Typical concentration versus time curve. Note that the T1/2 is obtained from the time it
took for the concentration to decrease by 50% (100 to 50 mg/L)
78 8 Tables and Graphs
Most drugs given at therapeutic doses are eliminated by a fixed fraction of drug
per unit time (first-order kinetics). For such drugs, the half-life represents the time
it takes to eliminate 50% of the drug (in Fig. 8.6, this is 10 h). A concentration
versus time curve, however, is determined by three processes that may overlap,
namely, absorption, distribution, and elimination. When determining the T1/2, you
should avoid the first part of the curve where concentrations are increasing as well
as the first series of decreasing concentrations because they might still be influenced
by absorption and/or distribution.
As an exercise, you are encouraged to plot a theoretical curve starting with a
plasma concentration of 100 units and observe what happens when drug elimination
is the only process occurring with a specific T1/2. If you plot concentration versus
T1/2, the concentration after one half-life would be 50%, after two 25%, after three
12.5%, etc. This would be a curve that approaches, but never reaches, zero. If you
log the y-axis (concentration), the curve becomes a straight line. For this reason,
concentration versus time curves are often shown with the concentrations as
logarithms (Fig. 8.7).
Figure 8.7 shows clearly that the straight line representing elimination is only
apparent from 8 h onwards. Thus, the half-life can only be determined during the
linear part of the log concentration versus time graph. Several pharmacokinetic
(PK) parameters can be obtained from the linear regression equation of this part of
the curve. For instance, the slope reflects the rate of elimination (elimination
constant) and can be used to determine the T1/2.
Because we often use logarithmic values in the presentation of biological data, it
is important to briefly look at the difference between actual values and their
logarithms. Figure 8.8 shows the change in the populations of wild dogs and
elephants in a game reserve, expressed as actual values as well as their logarithms.
Note that the difference between the number of elephants and wild dogs reflects the
actual numerical difference. The graph on the right shows log values reflecting the
relative change in numbers. The graph shows that the rate of increase for the two
species is the same.
1000
Log drug plasma concentration (mg/L)
100
10
1
0 4 8 12 16 20 24 28 32 36 40 44 48
Time (hours)
Fig. 8.7 Data of Fig. 8.6 plotted with the plasma concentrations as log values
8.2 Sensible Use of Graphs 79
In many clinical studies, events are monitored over varying time periods. For
instance, you might follow the occurrence of cardiovascular events over time in
populations with varying plasma cholesterol values, the incidence of stroke in
people in various categories of hypertension, or the survival rates in patients with
different types of cancer. The Kaplan–Meier plot is a way of visually representing
event rates over time (often survival) in a stepwise fashion (Fig. 8.9). These plots
can be assessed with various techniques to determine the statistical significance of
the separation between the curves.
Pasella Game Reserve sensus data Pasella Game Reserve sensus data
35,000 1,00,000
30,000
10,000
25,000
Sensus numbers
Sensus numbers
20,000 1000
Wild dogs Wild dogs
Elephants Elephants
15,000
100
10,000
10
5000
0 1
1920 1940 1960 1980 2000 2020 1920 1940 1960 1980 2000 2020
Year Year
Fig. 8.8 Change in numbers of two animal populations over time in a nature reserve, shown as
actual numbers on the left and their logarithms on the right
100
90
ST + investigational drug
80
70
Survival (%)
60
50
40 ST + placebo
30
20
10
0
0 0.5 1 1.5 2
Time (years)
Fig. 8.9 Kaplan–Meier plot showing survival in patients with a specific type of cancer receiving
either standard treatment (ST) with an investigational drug or placebo added to their treatment
80 8 Tables and Graphs
Column charts are used to compare the magnitude of variables using vertical
columns. They are used to demonstrate differences in magnitude of a variable at
different time points. Column charts should not be used to show a trend over time;
this should rather be shown in an arithmetic line graph.
Figure 8.10 is a simple column chart showing data from an African village
without electrical power. The village relies on open fires for heating. The chart
shows that carbon monoxide poisoning occurs most commonly in June to August,
which is the local winter time.
If you compare several populations, a grouped column chart (Fig. 8.11) can be
used. Figure 8.11 is probably a better representation of the data shown in Fig. 8.10
because it shows that fatal CO poisoning occurs mainly in winter and that children
aged 0–10 years are more vulnerable than the other age groups.
100
90
Number of CO poisoning cases
80
70
60
50
40
30
20
10
0
r
r
r
be
ry
be
be
y
er
st
ch
ar
ua
e
ril
em
ay
ob
y
em
em
gu
n
l
nu
ar
Ap
Ju
br
Ju
M
ct
Au
pt
ov
ec
M
Ja
Fe
O
Se
Fig. 8.10 Average number of cases of carbon monoxide poisoning seen per month (average of 5
years) in an African village with no electricity
8.2 Sensible Use of Graphs 81
A quick look at the graph on the left suggests a much higher responder rate
with drug A than drug B, because the y-axis has been restricted to a narrow range
(80–90% responders). The right shows the true picture where the full y-axis is
shown.
At the other end of the spectrum, differences can be minimized by using a
logarithmic y-axis.
70
60
Number of CO poisoning cases
50
40
0–10 years
30
10–20 years
20
>20 years
10
0
ne
ly
ril
ay
st
r
ch
er
r
ry
be
y
be
Ju
be
Ap
gu
Ju
ar
M
ua
ob
ar
em
em
em
nu
Au
M
br
ct
ec
Ja
ov
Fe
pt
D
Se
Fig. 8.11 Data shown in Fig. 8.10, broken down into age groups
90 100
90
Percentage responders
80
70
Percentage responders
60
85 50
40
30
20
10
80 0
Drug A Drug B Drug A Drug B
Fig. 8.12 Responder rates for drug A and drug B. An inappropriate Y-axis scale is misleading and
makes a small difference appear large
82 8 Tables and Graphs
Drug A plasma concentrations (ng/mL)
120
100
100
80
Stomach Stomach
Small bowel 60 Small bowel
10
Large bowel 40 Large bowel
20
1 0
0 100 200 300 400 0 100 200 300 400
Time (min) Time (min)
Fig. 8.13 Plasma concentrations in dogs receiving identical doses directly into the stomach, small
bowel, or large bowel (a log Y-axis makes the differences between the absorption sites appear
smaller than they really are)
10 15
Y = 0.0069x + 2.7814
Y = 0.0069x + 2.7814 R2 = 0.154
7.5
Cmax (mg/L)
Cmax (mg/L)
R2 = 0.9872 10
5
2.5
0 0
0 100 200 300 400 500 600 0 100 200 300 400 500 600 700
Oral dose (mg) Oral dose (mg)
Fig. 8.14 Cmax versus dose, shown as the mean Cmax versus dose (left graph) and individual values
of Cmax versus dose (right graph). Using the mean data makes the linear relationship appear clear
and statistically significant. Using all the individual data points shows that there is no statistical
significance
The graph on the left in Fig. 8.13 shows the plasma concentrations of a new drug
given to dogs, either directly into the stomach, the small bowel, or the large bowel.
The graph was accompanied by the statement: “The differences in absorption from
the three different sites of administration are minimal, and therefore it is
recommended that a slow-release formulation should be developed to overcome the
short half-life of drug A.”
In the left part of Fig. 8.13, the differences between the three sites of administration
appear to be small because the concentrations are shown on a logarithmic y-axis. If
the same data are shown on a linear scale (on the right), the differences are rather
large.
The last example of inappropriate use of graphs concerns the enhancement of
linear correlations by using mean data instead of individual data (Fig. 8.14).
The figure on the left using the mean Cmax values for each dose seems to show an
excellent correlation between dose and Cmax. On the right part of the figure, all Cmax
data points are plotted against the dose, resulting in a poor linear correlation. While
there are “legitimate” forms of manipulation of data presentation (see Fig. 8.1), the
use of mean values for linear regression analysis is an incorrect and invalid method.
8.3 Final Thoughts 83
Similarly, the use of SE for data with large variability is not acceptable although
such graphs are visually more pleasing. We advise against this practice and
recommend using SD as the standard way of showing the variability of data around
the mean value.
If used correctly, graphs and tables are powerful visual communication tools and
break the monotony of lengthy narratives. For this chapter, we have selected the key
elements that cover most of the needs of everyday scientific and medical writing.
The excellent software packages available nowadays make the production of
informative graphs quite easy but also carry the risk of overdoing things. We have
deliberately not discussed pie charts and three-dimensional graphs; as a general
principle, we recommend that you avoid them. There are always exceptions, and if
you do decide to use them, be sure that you are adding value rather than confusion.
Chapter 9
International Conference on Harmonization
(ICH) and Other Guidelines
You have to learn the rules of the game. And then you have to
play better than anyone else.
Albert Einstein
Multidisciplinary
M1–M8
Table 9.1 shows the key topics, most of them with several subsections, in connection
with the quality guidelines.
9.5 Multidisciplinary Guidelines 87
This is perhaps the most important category of ICH guidelines for medical/scientific
writers working in the area of drug development. The multidisciplinary guidelines
are integrated guidelines that do not fit exclusively into one of the quality, safety, or
efficacy categories. They are available on the ICH website as word documents in a
Zip file (Table 9.4).
Currently, the multidisciplinary guidelines issued by the ICH consist of eight
separate guidelines numbered M1–M8 (Table 9.5).
Several ICH and regulatory agency guidelines are discussed in more detail in other
sections. Readers are strongly advised to consult regulatory websites when starting to
compose regulatory documents. Particularly consider the guidelines listed in Table 9.6.
88 9 International Conference on Harmonization (ICH) and Other Guidelines
9.6 Consistency
The drug development plan for a specific drug usually makes provision for several
studies. It is useful for the project team to agree on a common format within the
framework of the regulatory guidelines. Such agreements at the onset speed up the
production of documents and assure consistency. There are standard text components
that can be used in several or all protocols and study reports.
9.7 Final Thoughts 89
Before continuing to read the remainder of this chapter, the reader should complete
a strategic planning template (see Table 1.1) for the Investigator’s Brochure (IB).
After reading the complete Chap. 10, review and adapt your template if necessary.
Eventually, you can compare it to the template proposal at the end of this section.
As the name indicates, the IB is, in the first instance, directed at the investigator(s)
and staff performing a study that is part of the overall drug development plan for a
new medicinal product. It is, however, not quite as simple as that. The IB is also a
part of various submissions to regulatory authorities, particularly the documentation
to seek regulatory approval to initiate studies in humans. It is also an important part
of submissions to ethics committees and of the documentation supplied to a potential
investor or a business partner seeking a licensing agreement.
The ICH guideline on GCP defines the IB as follows:
The Investigator’s Brochure (IB) is a compilation of the clinical and nonclinical data on the
investigational product(s) that are relevant to the study of the product(s) in human subjects.
Its purpose is to provide the investigators and others involved in the trial with the information
to facilitate their understanding of the rationale for, and their compliance with, many key
features of the protocol, such as the dose, dose frequency/interval, methods of administration,
and safety monitoring procedures. The IB also provides insight to support the clinical
management of the study subjects during the course of the clinical trial. The information
should be presented in a concise, simple, objective, balanced, and non-promotional form
that enables a clinician or potential investigator to understand it and make his/her own
unbiased benefit–risk assessment of the appropriateness of the proposed trial.
Preclinical data
Data mix of non-clinical
Clinical data and clinical data
Preclinical data
Mainly IB Human efficacy and
clinical data late safety data
The IB is a “living” document that should be updated regularly (Fig. 10.1). Thus,
the contents of the IB evolve with time, depending on the available data. This means
that the information should be kept up to date, and obsolete data should be removed.
For example, safety data from animal studies are crucial before human studies have
been initiated, but once data from studies in humans are available, they can replace
most of the animal data. A common mistake is that clinical data are simply added
without reducing the nonclinical information. This makes the IB unduly bulky. Bear
in mind that busy clinicians often do not have the time to peruse reader-unfriendly
documentation.
From the investigator’s perspective, overloading the IB in an attempt to satisfy
regulatory authorities might prevent the transfer of key messages. Key messages
that are not conveyed clearly and concisely might impact on the safety of subjects
participating in a study and the quality of data collection. Thus, a good IB should be
easy to read and give the reader clear insight into the characteristics of the substance
being investigated. In addition, a well-prepared IB will clearly describe the rationale
and key aspects of nonclinical and clinical studies and their contribution to the drug
development program.
10.2 Guidelines
We advise you to read the ICH guideline on the IB in conjunction with this section.
The guidelines are part of the E6 GCP document and can be downloaded as indicated
in Table 10.1.
The IB should contain the information listed in Table 10.2.
10.3 Emphasis 93
10.3 Emphasis
Although ICH, FDA, and the European Medicines Agency (EMA) provide the same
guidelines, the ICH and regulatory authorities have a different emphasis. At the one
end of the spectrum, the ICH regards the IB as a good clinical practice (GCP)
document to properly inform the investigator, while at the other end of the spectrum,
the FDA sees it as a part of the IND application. The IND should justify the rationale
for conducting a study in humans with particular emphasis on safety issues.
94 10 The Investigator’s Brochure
UNDERSTAND
rationale and key • dose,
features of the protocol • dose frequency/interval,
• methods of administration
Investigators/staff COMPLY with • safety monitoring
the protocol
The EMA also regards the IB as a GCP document and therefore requires
additional information in the clinical trial application (CTA) in the form of an
Investigational Medical Product Dossier (IMPD). The latter requires review of the
quality of the data, provision of nonclinical toxicology and pharmacology data,
available human data, and an overall benefit–risk assessment. The EMA approach
makes it possible to keep the IB simple and investigator-oriented, while the
regulatory concerns are addressed in the IMPD (Fig. 10.2).
The key messages might differ depending on the target audience. Because the IB is
primarily aimed at the investigator and associated staff, let us consider what the key
messages to the investigator should be:
• Good understanding of the drug involved: safety, efficacy, and the value the drug
will add if it is approved for clinical use
• Good understanding of the rationale and design of the study protocol
If these messages are clear, one would expect the following outcomes on the part
of the investigator:
• Willingness to do the study
• Adherence to the protocol and timelines
• Assurance of the safety of study subjects
• Assurance of accurate and high-quality data
On the other hand, the messages to the regulatory authorities have a quite
different emphasis. Let us consider the key messages in a submission to a regulatory
authority for seeking approval of the first study in humans:
10.5 Final Thoughts 95
Table 10.3 Template for planning an IB in conjunction with a specific study protocol, aimed at
the investigator and staff
Outcome Willingness to perform the study
Adherence to protocol and timelines: safety of subjects assured
Collection of good-quality data
Guideline ICH E6
Target audience Team conducting the study
Key messages Understanding the drug (pharmacokinetics, safety, and efficacy) and the
drug development plan
Understanding the key elements of the protocol
Information sources Company documentation
Literature
• The nonclinical data provide enough nonclinical efficacy and safety information
to justify studies in humans.
• The protocol is ethical and rational, addresses subject safety issues appropriately,
and provides essential data to decide whether further studies are justified.
The outcome expected on the part of the regulatory authority is straightforward:
• Approval for conducting the study
Table 10.3 shows a template that can be used for planning an IB directed at
investigators and their staff for the performance of a specific study with a specific
drug. We leave it to the reader to construct templates aimed at the FDA/EMA, an
EC, or a potential investor in the program.
The works must be conceived with fire in the soul but executed
with clinical coolness.
Joan Miro
11.1 Background
Table 11.3 shows the source of the European Commission guidance on CTA
submissions.
The CTA application should have the components shown in Table 11.4.
Table 11.5 shows a potential planning template.
11.2 CTA Europe 99
Table 11.5 Suggested preparation template for the CTA (IMPD) or IND
Outcome Approval of the submitted protocol
Guideline European Commission/FDA
Target audience Regulatory authority (e.g., EMA/FDA)
Key messages • There is a clear rationale for doing studies in humans with the new
medicinal product
• There are sufficient nonclinical data to make a clear benefit–risk
assessment
• Adequate safety precautions have been built into the protocol
• The galenics (quality) of the new drug is acceptable
Information sources The IB
Literature
Remember that for planning studies in humans (healthy volunteers and/or patients),
the available data should show that:
• There is a clear rationale in terms of the benefit–risk ratio based on the nonclinical
data for conducting clinical trials in humans
• The safety issues have been thoroughly investigated and that adequate safety
precautions have been incorporated in the protocol
Chapter 12
The Common Technical Document: Overviews
and Summary Documents
The agreement within the ICH on the Common Technical Document (CTD), a
common format for applications to register new drugs, was a major step forward in
the standardization and simplification of the approval process (see also Chap. 9 as
well as the ICH website http://www.ich.org/products/ctd.html). The CTD is the
basis of the 4th multidisciplinary ICH guideline (M4). It allows the submission of a
global dossier to the USA and the ICH member states. This section only highlights
certain key elements of the CTD. For detailed information, the reader should refer
to the actual ICH guidance documents.
The CTD includes information on quality, safety, and efficacy and is organized in
three basic components as shown in the pyramid in Figure 12.1.
Nonclinical Clinical
Overview
2.4
Overview
2.5
CTD
Quality
Overall
Summary Nonclinical Clinical
2.3 Summaries Summary
2.6 2.7
Module 1 is not actually part of the CTD. It allows for the adaptation of administrative
information to regional (mainly specific country) requirements. In the European
community, it should contain the following:
• Table of contents of the submission
• Application form
• Product information (e.g., package insert)
• Information on the experts
• Specific requirements
• Environmental risk assessment
• Orphan drug status (if applicable)
In the USA, Module 1 should contain FDA form 356 H, including:
• Administrative documents, e.g., patent, environment, and exclusivity
• Prescribing information, e.g., labels, package insert
Table 12.1 indicates the source for obtaining specific guidance on the compilation
of Module 1.
Module 2 makes provision for summaries of quality and both nonclinical and
clinical data and should have the components listed in Table 12.2.
12.2 Summaries That Are More Than Summaries 105
Module 3 contains the data (quality, nonclinical, and clinical) on which Module 2 is
based. In this section, we will focus mainly on Module 2 and in particular on the overview
documents. The key concept to remember is that the summaries, and particularly the
overviews, are in general more than just summaries and should include the rationale for
the application as well as a critical appraisal of the program and data. It should be noted
that an ICH guideline (M8) for electronic CTDs is under development (Table 12.3).
Before the CTD was agreed upon, a major strategic component of the submission
documentation in the European Union was the so-called expert report. Because this
eventually formed the basis for the CTD summaries, it is useful to turn back the
clock and examine the strategy of the expert report. This document was extremely
important and required the involvement of an expert. The expert could be internal
(from the pharmaceutical company) or external (e.g., an academic expert).
Three expert reports were required (pharmaceutical, nonclinical, and clinical).
Writing these documents was a challenge, as they were each limited to a length of 25
pages. In essence, the expert report had to make the case that the new drug should be
approved by briefly summarizing the data, giving a critical appraisal of the data,
discussing issues upfront, and showing that the claims are supported by solid data. Key
messages had to be consistent between the three types of reports, and it was particularly
106 12 The Common Technical Document
Apart from the summary data, this module also contains the overviews (short critical
summaries) with a recommended length of 30 pages. It is an art to summarize large
numbers of data in a short and critical document that gets across key messages. It is
worth wile looking at the expert report the document that preceded the overviews.
The following statement which applied to the expert reports, can also be applied to
the overview documents:
It is important to note that the expert report should include a critical discussion of the
properties of the product. The expert is expected to take and defend a clear position on the
product in the light of current scientific knowledge. A simple summary of the information
contained in the application is not sufficient.
Table 12.5 shows the source of guidance for the preparation of the nonclinical
overview.
The nonclinical overview should provide an integrated overall analysis of the
information in the CTD. Specifically, it should
• be an integrated and critical assessment of the pharmacological, pharmacokinetic,
and toxicological data
• discuss and justify any deviation from study guidelines
• discuss and justify the nonclinical testing strategy
• comment on the GLP status, associations between nonclinical findings and
quality characteristics of the drug, the results of clinical trials, or effects seen
with related products
• describe impurities
• cite relevant literature and discuss properties of related drugs
• provide references from the literature and list the available study reports
Table 12.6 shows the structure of the nonclinical overview document (numbered
as subdivisions of Module 2.4).
The clinical overview (section 2.5) is part of Module 2 of the CTD. Table 12.7
shows the source of guidance for the preparation of the clinical overview.
The recommended length of this document is 30 pages, and it should be signed
by an appropriate expert. The clinical overview should describe the clinical
component of the dossier and show how the data are linked to the product
information. The messages should be consistent with those in the quality and
nonclinical components. In essence, it is a dialoguing tool discussing critical issues
upfront. In a nutshell, the document should make a convincing case that the new
medicinal product should be approved because safety and efficacy are supported by
solid data that support the claims made. The overview should also discuss any
critical issues and indicate how they have been dealt with.
Table 12.8 shows the structure of the clinical overview document (numbered as
subdivisions of Module 2.5).
Produce table
Fig. 12.2 Summary of the overall approach for planning and producing the clinical overview
The expert should be someone with appropriate expertise relevant to the new
medicinal product. It can, for instance, be an academic clinical expert in the specific
disease area or an internal expert from the company filing the CTD. It is important
to remember that although the overview documents are usually signed off by a
single person (the expert), there are team efforts with input from many people.
Table 12.9 lists the mistakes commonly experienced in the preparation of a CTD.
12.7 Final Thoughts 111
The main purpose of the summaries and overviews is marketing approval with a
package insert that assures optimal patient benefit and financial success. The
overviews basically show the regulatory authority that the scientific program was of
high quality and produced convincing data (quality, clinical, and nonclinical) to
support the claims and prescribing information, that all issues have been satisfactorily
dealt with, and that the drug should be approved. A good tip is to write as if you
were the reviewer from the regulatory agency. This implies that you can anticipate
the questions and queries the reviewer might have.
Chapter 13
Study Protocols and Reports
In Chap. 1, we asked you to prepare planning templates for writing a clinical study
protocol and a clinical study report. Please write these templates again without
looking at the previous ones or our suggestions further down. Compare the new
version you wrote with the previous one and combine them to prepare your optimal
version, before continuing to read this chapter.
Study protocols and study reports are closely linked. The protocol is essentially
an action document (i.e., the recipe for performing a study), whereas the study
report is an outcome document (i.e., describing the results of the study).
Consequently, the two documents contain the same basic elements. This section
focuses on clinical study protocols and clinical study reports, but the concepts of
nonclinical studies are essentially the same.
Clinical studies form the backbone of the drug-development process, and they
drive internal (i.e., how to proceed with a program) and external (i.e., regulatory
approval and labeling) decision-making. An optimal drug development program
depends on a high-quality and unambiguous target profile based on nonclinical,
clinical, regulatory, and marketing input. This should not be a wish list with
constantly changing goal posts, but should describe the properties the new drug
should have to meet a medical need and to be a marketing success. The target profile
should be fixed unless new scientific, medical, or marketing evidence necessitates a
change. For example, the emergence of a new competitor might require a more
stringent target profile, whereas the withdrawal of a competitor from the market
might allow a relaxation of the target profile.
The target profile will determine the clinical studies to be conducted. The study
reports will determine what will be done internally in proceeding or stopping the
program. If the program is successful, it will provide the evidence of safety and
efficacy required for regulatory approval. With successful drug development, the
target profile requirements will eventually appear as claims in the package insert
and summary of product characteristics (SPC or SmPC). A close match between the
target profile and the package insert reflects optimal planning.
Unfortunately, many protocols and study reports are excessively long, complex,
difficult to read, and repetitive, probably due to fear of litigation. It must be
remembered that lay members of an ethics committee, for example, should be able
to understand these documents. Consequently, particular efforts should go into the
summary sections of these documents. Crucial information lost in the maize of a
long, complicated document that is difficult to read has the same impact as a
document that is too brief and leaves out essential information!
13.1.2.1 Rationale
In simple terms, the rationale for doing a study is the reason for doing the study (i.e.,
looking for a particular outcome). This is usually based on the current state of
knowledge and the requirements stipulated in the target profile. The rationale is
dependent on the nature of the drug, condition to be treated, stage of development,
and overall drug development strategy.
Let us look at the example of a first study in humans with single ascending doses
of a new drug. Here, we argue that we have enough nonclinical data to support the
safety and efficacy to scientifically and ethically justify a study in humans.
As a second example, let us consider a pivotal clinical trial for a specific disease.
Here, we should argue that we have enough nonclinical and clinical safety and efficacy
data to justify the dose selection and expect a positive outcome in a pivotal trial.
If the study has any unusual design features, particularly if they deviate from
standard practice and regulatory guidelines, they should be mentioned upfront and
should be justified. It can be valuable to discuss such deviations with regulatory
authorities before commencing the study.
13.1.2.2 Objective(s)
In the first instance, the objective should be clear and achievable. If there are
multiple objectives, they should not conflict with each other.
The objective of the study is often closely linked to a target-profile requirement
in terms of galenics (i.e., issues around the physical properties and formulation of
the drug), pharmacokinetics, efficacy, and/or safety. Depending on the stage of
development, the objective of the planned study might be to obtain data to decide
whether to continue or discontinue the project (go/no go decision) and, if the project
continues, how to proceed.
Let us return to the example of a first study in humans. Main objectives could be:
• to show that a certain range of single doses (including the anticipated therapeutic
dose with reasonable safety margin beyond that) are safe and well tolerated in
healthy volunteers.
• to show that the drug has an appropriate pharmacokinetic profile. For example,
the target profile might require once-daily dosing and no influence of food on
absorption (this could be a go/no go decision).
13.1 Study Protocols 117
13.1.2.3 Population(s)
When doing a study in humans, there are three options for selecting the subjects:
• Healthy volunteers: in early studies, the emphasis is on safety and tolerability as
well as pharmacokinetics. Due to the disease symptomatology and co-medication,
there is clearly a greater potential for interindividual variability in patients than in
healthy volunteers. The homogeneity of a group of volunteers can also be enhanced
by selecting criteria such as age, gender, and body mass index. There are situations
where the risk in healthy volunteers may be too high, such as testing chemotherapeutic
agents. In this situation the first studies are done in cancer patients where there is
at least some potential therapeutic benefit. It is obvious that the most relevant
information on a drug can be obtained from the patient population for which the
new drug is intended. It is therefore useful to match the healthy volunteers in a first
clinical study as closely as possible to the target population in terms of age and
gender. If, for instance, a new drug is developed to treat osteoporosis in
postmenopausal women, a study in healthy male volunteers aged 20–40 years
makes no sense as the population should be postmenopausal women.
• Patients: it is obvious that dose-finding studies and pivotal therapeutic studies
should be done in patients. As a general rule, patients should be studied as early
as possible in the drug development program. After a single ascending-dose
(SAD) study in healthy volunteers, it might be more informative to test multiple
doses (MAD study) in patients, particularly if it is possible to measure a
therapeutic response. There are also situations where it makes sense to study the
effect of a disease other than the intended indication, e.g., impaired renal or
hepatic function.
• Healthy volunteers and patients: it often makes sense to combine patients and
healthy volunteers in the same study. For instance, if a drug is intended for the
treatment of rheumatoid arthritis, you might choose to administer ascending
118 13 Study Protocols and Reports
doses until the potential therapeutic dose is nearly reached and then continue in
patients with rheumatoid arthritis. You could also do a complete SAD study in
healthy volunteers adding one cohort of the patient population for one of the
doses to ascertain whether the pharmacokinetic profiles and tolerability are
similar in the two groups.
It is important to decide which subjects should be selected for a study and which
ones should be excluded. This is done for several reasons, such as reducing
variability, optimizing the chance of a response, or minimizing the risk of a particular
adverse event. The issues around inclusion/exclusion criteria should ideally be
sorted out before pivotal trials commence. Remember that what you study is what
you get in the label, If you study only a subgroup of the disease population in a
pivotal trial, the label will only cover that subpopulation. Similarly, if you exclude a
certain subpopulation, this subpopulation will be excluded from the label.
There are many diseases in adults that can also affect children (e.g., type II
diabetes, rheumatoid arthritis, or hypertension). Because children are often excluded
from studies, it should be remembered that there are regulatory incentives for
conducting studies in pediatric populations. Table 13.3 shows the link for the
relevant information.
In clinical programs in connection with the development of a new drug, most studies
are randomized and double-blind, and they tend to be either placebo-controlled or
involve the comparison with an approved drug, either in parallel groups or with a
crossover design. Probably the simplest pivotal trial is a bioequivalence study
comparing the plasma concentrations of a generic drug formulation with the plasma
concentrations of the formulation of an approved drug. There are many variants and
refinements in terms of trial design which are outside the scope of this book.
We have already alluded to the importance of the schedule of assessments. The
schedule of assessments is a key component of the protocol, reflecting the various
observations and their timing (Table 13.4). The study design includes definition of
the endpoints (also called study variables that can be primary or secondary),
construction of the database, and data analysis, including statistical analyses.
In pivotal studies, the design is usually fixed although it is possible, for example,
to schedule an interim analysis to select one of two treatments to be continued. Such
variants need to be discussed with regulatory authorities and often incur a statistical
penalty (e.g., more stringent requirement for claiming statistical significance).
In early studies, particularly SAD and MAD studies, flexibility is allowed,
provided it is well described and motivated. Such flexibility can avoid delays caused
by protocol amendments. Although allometric scaling based on animal data can
predict human PK, it is important to allow for changing the timing or number of
blood samples in a typical SAD study. The half-life for example might turn out to
be shorter or longer than expected. You can specify the number and timing of blood
samples for PK analysis and the maximum volume of blood that can be collected for
PK and safety assessments. If, for example, the half-life after the first dose differs
from the expected one, there is the possibility of changing the timing and/or number
of blood samples, provided you had made provision for this in the protocol, and the
amount of blood taken is within the specified volume limits.
The case report form (CRF) is an extremely important document. It records all
relevant information such as demographics, study procedures, measurements,
observations, etc. It is the link between the protocol and database. The CRF should be
user-friendly and should have a format that makes review and analysis of data easy.
As far as possible, the CRF should consist of numbers or choices. Lengthy text entries
by investigators should be avoided as much as possible, particularly in large studies.
However, in early tolerability studies, text entries cannot be avoided because the
description of adverse events and other tolerability information are not standardized.
To facilitate study monitoring, CRF information should, as far as possible, be
verifiable from source documents. The data should be recorded in a format that is
ready for data entry and analysis. If, for instance, time after dosing is recorded, 1 h
and 15 min should be recorded as 1.25 and not 1.15.
120 13 Study Protocols and Reports
The construction of the database and methods of data analysis should be defined in
the protocol. In pivotal studies, the endpoints should be clearly defined. The design
should facilitate data analysis and presentation in the study report.
The protocol should be designed in line with the principles outlined in the latest
version of the Declaration of Helsinki, and a comment to this effect must be provided.
Furthermore, all subjects should be adequately informed (written and verbal
information) about all the relevant issues in connection with the study, particularly
the potential risks involved. All subjects participating in the study should give
written consent prior to study start. Table 13.5 lists the specific information items.
Table 13.6 shows the key components of a typical report of a study involving safety
and efficacy assessments, based on the ICH3 guidelines. In the ICH documentation,
13.2 Study Reports 121
these categories are numbered from 1 to 16, but this does not mean that the same
numbers should necessarily be used to number the appropriate sections in the actual
report. We are of the opinion that the first six headings in Table 13.6 should not be
numbered and that the numbering should logically start with 1. Introduction as
shown in Table 13.7.
The submission requires the inclusion of supportive information in addition to
the core report. As indicated below, all elements are required by the FDA, but only
the first two need to be submitted to the European Union (EU) authorities.
The core report contains the elements shown in Table 13.7. For clinical pharmacology
studies, the efficacy section is replaced by pharmacokinetic and pharmacodynamic
data. The appendices should include the list of investigators, exclusions from
analysis, secondary analyses, adverse event listings, and any study-related
publication(s).
This should include the protocol and amendments, CRFs, informed consent
documentation, list of ECs, randomization code list, and glossary of terms.
122 13 Study Protocols and Reports
This should include demographic data, previous and concomitant conditions, previous
and concomitant treatments, and primary and secondary efficacy parameters.
These should include methods of analysis, results and conclusions, center displays
of demographics, efficacy, and safety.
Writing abbreviated study reports to avoid unnecessary full study reports can save
much time. Table 13.8 shows the source of the FDA guidance on abbreviated study
reports.
The guidelines state that “during the development of a product, studies may be
conducted that ultimately do not contribute to the evaluation of the effectiveness of
a product for a specific indication. Such studies should be submitted as either
abbreviated reports or synopses.”
In essence, key pivotal efficacy studies should be fully reported. There are,
however, many important studies that are crucial to internal decision-making that
contribute little to regulatory claims. Such studies can be reported in abbreviated
form. For example, three different formulations might be evaluated to select a
formulation suitable for phase III pivotal studies and eventual marketing. Once the
data are available, the selection can be made, and it becomes a waste of time writing
a full report after the key decision has already been made. It is, however, important
to remember that regulatory decisions in terms of approval for marketing are made
on the basis of efficacy and safety evaluation. An abbreviated study report should
always include all safety data.
Planning can avoid many problems and delays. The efficient production of protocols
requires good teamwork (local or international team) making optimal use of the
expertise of each individual member under experienced leadership. Prototyping and
project-specific protocol templates (e.g., synopsis structure, consistent table and
figure formats for both clinical pharmacology and therapeutic studies) can greatly
facilitate the rapid production of good-quality protocols. It is also valuable to
develop standard text components, for example background information on the
clinical indication, nature of the compound, drug development plan, main nonclinical
information, etc.
When completing the study report, the key messages must be clear and
unambiguous. In a nutshell, these messages should support the efficacy and safety
of the drug under development and should clearly show how the compound will
impact on the target indication. Regulatory authorities may want to reanalyze certain
data or investigate a specific subset of patients. It is therefore important to use a
reviewer-friendly format for data listings because the perusal of such documents
can be excessively cumbersome. In terms of safety, it is useful to show data not only
in relation to normal ranges but also as the change from baseline. As a useful
attitude, you should imagine yourself as the reviewer!
Study reports should focus on the “customer” (i.e., the regulatory authority). They
should be reviewer-friendly and should be written in clear English (English might
not be the native language of the reviewers). Reports should be consistent across the
project in terms of format, structure, and content. Be unbiased and objective and
avoid superlatives! The main task is to ensure that the key messages are clear,
reviewer-friendly, unambiguous, consistent, and convincing!
Chapter 14
Scientific Papers
Clearly, the scientific community has a substantial interest in the proper, truthful,
and ethically correct publication of new findings. It is, however, not always easy to
spot potential breaches of these principles. Thus, the creation of a “body” that
oversees publication practices and issues relevant guidelines, was urgently needed.
The Committee on Publication Ethics (COPE) was established in 1997 by a
small group of British Medical Journal editors (see also Chap. 6). Currently the
committee consists of more than 9000 international members worldwide many
academic fields. The COPE homepage states that many major publishers (including
Elsevier, Wiley-Blackwell, Springer, Taylor & Francis, Palgrave Macmillan, and
Wolters Kluwer) have signed up as COPE members [23]. All members are expected
to follow the code of conduct for journal editors as defined by COPE. This ensures
better monitoring of publication quality across all fields of research.
COPE’s homepage states that “COPE provides advice to editors and publishers
on all aspects of publication ethics and, in particular, how to handle cases of research
and publication misconduct” (see also Chap. 6). In addition, COPE maintains a
forum for its members to discuss certain issues. COPE does not investigate individual
cases but encourages editors to ensure that cases are investigated by the appropriate
authorities (usually a research institution or employer).
Table 14.1 Impact factor for the top 10 medical journals in 2015
Rank Journal Impact factor
1 New England Journal of Medicine 54.42
2 The Lancet 39.21
3 Journal of the American Medical Association (JAMA) 30.39
4 British Medical Journal (BMJ) 16.38
5 Annals of Internal Medicine 16.10
6 PLoS Medicine 14.00
7 Arch Intern Med 13.24
8 J Cachexia Sarcopeni 7.41
9 BMC Med 7.28
10 Cochrane Database Syst Rev 5.94
Source: http://impactfactor.weebly.com
There are many factors that influence the selection of the journal, e.g., language,
target audience, nature of the paper (e.g., reporting original research or writing a
state-of-the-art review), and the impact factor of the journal. The latter is a factor
that assesses the number of citations. It is determined by dividing the number of
citations by the number of citable papers. Table 14.1 shows the 10 top medical
journals with the highest impact factor in 2015.
It is important to decide whether your target audience consists of generalists or
specialists because this will determine your selection of a suitable journal.
In most cases, the decision to write a paper occurs when you have meaningful research
results to report. Most review papers are written by invitation. If you feel there is a
need and you have a good idea for a review paper, it is advisable to contact the editor
of the chosen journal to find out whether the publishers have an interest before you
embark on extensive work. If you have an interesting or novel concept, there are
opportunities such as submitting a hypothesis paper to The Lancet, for example.
In this section, we primarily focus on reporting scientific research findings (see
also Sect. 3.2). As a start, you have to consider the authors. They should all have
made a substantial contribution to the paper and should be involved in the
planning, writing, and review of the paper (see also Sect. 6.4.3). It is advisable to
identify a suitable journal early on so that you can follow their instructions for
authors from the outset. If you are not able to select a journal at the planning stage
128 14 Scientific Papers
for whatever reason, you should work within a standard framework consisting of
the following:
• Title
• Authors and affiliations
• Abstract
• Keywords
• Introduction
• Materials and methods
• Results
• Discussion
• Conclusions
• Acknowledgments
• Disclosures
• References
• Figures and tables
If you work within this framework, you can always adapt to the requirements of
the journal you eventually select.
To give yourself a good chance of success, you need to write from the perspective
of the reader, i.e., the editorial reviewer(s) and the eventual readers of the paper after
publication. It is our opinion that the title, abstract, conclusions, and visuals are the
key elements determining the impact of a paper. A good place to start is by writing
your conclusions as bullet points reflecting the key messages. You can then decide
what to add, e.g., a sentence leading to the bullet points and a concluding sentence
reflecting the value of your findings. You can also decide against using bullet points
and write a paragraph incorporating the bullet points as a narrative. You can then use
your conclusions to create an informative title and an introduction with aims linked
to the bullet points in the conclusion.
A valuable second step is to decide which visuals will best emphasize the key
messages and create them to clearly reflect the key messages, without having to
refer to the text.
Study protocol titles should not be used as titles of papers. They usually have boring
descriptive titles such as, “Single-ascending-dose, placebo-controlled study of the
pharmacokinetics and tolerability of oral HTY 519 in healthy male volunteers.” If the
results of a study with the title above are published, the publication title should clearly
reflect the nature of the drug and the outcome of the study to entice the viewer to read
14.5 Where Do I Start? 129
the paper. A possible title would be “First human study with a novel oral renin antagonist
for the treatment of hypertension supports further development.” Not all journals accept
“conclusion-type titles” that reveal the outcome in the title. For these, the title could be
“First human study with a novel oral renin antagonist for the treatment of hypertension.”
The affiliations of authors should be included and the corresponding author, with
the correct contact details, clearly shown. Other contributors should be mentioned
in the acknowledgments.
14.5.2 Keywords
Keywords are important. They should have a high likelihood of matching the search
terms used by someone doing a literature search. In terms of medical/scientific
papers, important keywords should include the disease, target organ or tissue, name
and class of drugs studied, and words reflecting the outcome (e.g., survival, toxicity,
cure, and progression). Many papers might contain important information on a
method, although the aim of the paper was not the development or evaluation of a
method. In such papers, it is useful to include keywords that will allow someone
looking for information on methods to pick up your publication in a literature search.
14.5.3 Abstract
An abstract should conform to the length, headings, and format specified by the
instructions for authors of the journal you select. It should be concise and easy to
read. It should clearly describe the nature of the study and any key findings and their
implications/applications and should entice the reader to read the complete paper.
An abstract should be a stand-alone summary of the paper which can be understood
without having to check in the full paper.
14.5.4 Introduction
This section should set the scene in terms of the existing knowledge based on the
literature and the experience of the authors. It should briefly review the current
knowledge on the topic, any open issues, and the rationale for the work that is the
subject of the publication.
This section should clearly discuss how the study was done and analyzed. Clear
headings such as patient selection, treatment schedules, laboratory analyses, or
statistical analysis make this section easier to read. In essence, someone reading this
130 14 Scientific Papers
section should be able to duplicate the study. Instead of describing the methods
applied in detail, you can refer to a paper where a detailed description can be found.
If you do refer to the methods used in a former study, make sure to give full and
accurate references to the original work.
14.5.6 Results
As with methods, clear headings will make this section easier to read. Where possible,
try to use the same subheadings in both the Methods and Results sections, so that the
reader can easily do a cross-reference. Extensive detailed text may be difficult to follow,
and you should carefully consider the inclusion of visual aids (tables and figures) to
enhance the impact of key findings. Avoid duplication of information, e.g., text, table,
and figure, all showing the same data. Adhere to the limits for the number of tables and
graphs allowed (see also Chap. 8).
Your primary goal should be to clearly and convincingly communicate the key
messages to the reader. Data should be put into context in terms of what is known from
the literature. You should show how the data reported in a paper agree and/or disagree
with the literature and discuss the possible explanations of these findings. Two key
reasons for publishing are to strengthen current knowledge and to communicate new
findings. If you communicate new findings, you should discuss the implications, e.g.,
better and earlier diagnosis of a disease or improved treatment options. Give careful
attention to the final conclusions which should reflect the key messages.
14.5.8 Acknowledgments
14.5.9 Disclosures
Any potential conflicts of interest should be disclosed. The following quote from an
editorial in the New England Journal of Medicine gives a succinct view of the key
issues:
“We ask authors to disclose four types of information. First, their associations with
commercial entities that provided support for the work reported in the submitted manuscript
14.6 Final Thoughts 131
(the time frame for disclosure in this section of the form is the life span of the work being
reported). Second, their associations with commercial entities that could be viewed as
having an interest in the general area of the submitted manuscript (the time frame for
disclosure in this section is the 36 months before submission of the manuscript). Third, any
similar financial associations involving their spouse or their children under 18 years of age.
Fourth, non-financial associations that may be relevant to the submitted manuscript. [27]”.
14.5.10 References
Reference formats should conform to the instructions for the authors. The references
should be carefully checked for consistency and correctness. The reader should be
able to find the references quoted!
The most common format used in biomedical journals is the Vancouver style (see
http://www.library.uq.edu.au/training/citation/vancouv.pdf). This system lists
references numerically in the text in the order they appear. They are then listed in
numerical order in the reference list. The following information must be included:
author(s) family name and initials, title of article, abbreviated title of journal,
publication year, month, day (month and day only if available), volume (issue), and
pages. The following is an example of such a citation:
Snowdon J. Severe depression in old age. Medicine Today. 2002 Dec;3(12):40–47.
Another common style, the Harvard system, cites the author’s name and the year
of publication in the text. Detailed information is readily available on the Internet.
The complete references are then listed in alphabetical order according to the
surname of the first author. An easy way of handling this is to look at the reference
list of a recent paper from the journal to which you intend to submit the paper.
Writing a high-quality paper that will be accepted by a good journal and will be read
widely by the targeted readership requires much planning and forethought. Once a
draft conforming to the instructions for authors has been prepared, the manuscript
should be read and reviewed carefully by all contributors. Input from an experienced
person who has published extensively and successfully can be invaluable. The final
review should always include checking the correctness and completeness of the
references.
Chapter 15
Publication Strategy
Publication strategy can be thought of as a road map delineating what, when, and
how research will be published. It should include the type of article and journal,
based on the target audience. The environment in which research is done has a
major impact on delineating the publication strategy.
To a large extent, these choices are based on the target audience and anticipated
impact of the published article. If a new drug is intended for use by general
practitioners, the journal chosen should have a wide reader base among general
15.2 Publication in an Academic Setting 135
practitioners. If, on the other hand, the new drug is intended for use by nephrologists,
the obvious choice is a high-impact journal in this area. At the time of launch,
research papers focusing on the outcome of clinical studies are invaluable.
Detailed pharmacokinetics or assay methods should probably not be part of a
report on a clinical trial in a clinical journal, but should be reported in an appropriate
journal (e.g., a journal on pharmacology or laboratory methods). As mentioned
above, a review of current concepts in a specific area of treatment, highlighting the
position of a new drug in the therapeutic armamentarium, can have major value for
marketing the new medication.
In the early part of your career, you might start with posters or oral presentations
to notify your peers of your work and present preliminary results. This will also give
you feedback that might be important in developing your project. Your ultimate aim
would be a full paper in a reputable journal.
You should avoid publishing the same information in different formats. This is,
in essence, dishonest (see also Chap. 6). There is nothing wrong, however, with a
paper that reviews previously published data and reports new perspectives that add
value to your work.
Books
1. Rogge M, Taft DR. Preclinical drug development. Drugs and the pharmaceutical sciences.
Abingdon, UK: Taylor & Francis Group, CRC Press; 2009
2. Margery S. Berube et al. (eds). The American Heritage guide to contemporary usage and style.
Boston: Houghton Mifflin Harcourt; 2005.
3. Allen R. Pocket Fowler’s guide to modern English usage. 2nd ed. Oxford, UK: Oxford
University Press; 2008.
4. Garner B. American Garner’s dictionary of legal usage. 3rd ed. Oxford, UK: Oxford University
Press; 2011.
5. Rogers SM. Mastering scientific and medical writing. A self-help guide. Heidelberg: Springer;
2014.
6. The JAMA Network. AMA manual of style: a guide for authors and editors. 10th ed. Oxford
University Press. 2009. ISBN 019539205. Available from: www.amamanualofstyle.com.
7. Bernstein TM. The careful writer. New York: Free Press; 1995.
8. Merriam-Webster. Merriam-Webster’s collegiate dictionary. 11th ed. Springfield: Merriam-
Webster Inc.; 2003.
9. Brown L, Stevenson A. New shorter Oxford English dictionary. Revised ed. Oxford, UK:
Oxford University Press; 2012.
Published Literature
10. Pariser A. Early communication: a key to reduced drug development and approval times. FDA
Voice. 6 Feb 2013. Available from: http://blogs.fda.gov/fdavoice/index.
11. Jenkins J. CDER approved many innovative drugs in 2014. FDA Voice, 14 Jan 2015. Available
from: http://blogs.fda.gov/fdavoice/index/priority-review.
12. Jana N, Barik S, Arora N. Current use of medical eponyms – a need for global uniformity in
science publications. BMC Med Res Methodol. 2009;9:18. doi:10.1186/1471-2288-9-18.
PMC 2667526.
13. Hafner U. Plagiat und Fälschung. Die Wissenschaft bekämpft den Betrug und fördert den Bluff
(Plagiarism and falsification. The sciences fight fraud but encourage bluff). Neue Zürcher
Zeitung (NZZ); 2015.
14. Thomas Vitzthum. Lüge aus dem Labor (Lies from the laboratory). Welt am Sonntag.
2015;20:4.
139
140 References