Professional Documents
Culture Documents
Date: 06/09/2019
Title of research proposal: How to Save More Lives: A Competence Assessment for
Radiologists
Total word count (excluding in-text references, the reference list, and this title page): 1808
words
Xinmiao Jin
44200194
Dear Jo,
Thank you very much for giving me the opportunity to revise my article. I have
addressed all the points made by you, as documented below (reviewer’s original
comments are in bold).
Yours sincerely,
Xinmiao Jin
Reviewer Jo
2. More care was required with your written expression throughout this
document. Please provide a final proofread before submitting your work
or look to the PSST tutors for help in future. Alternatively, look to
journal articles and try to emulate how they write and structure their
sentences to get a sense of how to improve your writing style. This
would boost your work, as it helps to communicate more clearly what
you want to your reader and hence conveys your intended message with
more impact.
To be continued
3. The core problem issue here was not clear enough i.e., radiologists
differ greatly in their ability to interpret mammogram screens and we
have no national standardised test in place in Australia to assess for
this ability. You then needed to highlight more clearly the negative
ramifications of this problem given the high rates of breast cancer in
Australia to clarify the significance of the problem. That is, speak to
what errors of interpretation mean in real life for patients and their loved
ones.
I deleted the original content and added something new as suggested, the
part now reads: “If radiologists were unable to correctly interpret
mammograms, it means they would probably tell their patients that they are
healthy and need no treatments. So, patients have to suffer from more pain
and could lose the best timing of getting treatment and die eventually.
Families and friends would also suffer from the pain of losing loved ones.”
4. This outline of the new test's methodology required more detail in order
to be properly understood.
More detail has been added as suggested. It now reads: “The new test will
use screening measure to compare the ability of mammogram interpretation
between novice and expert and figure out which group has better
performance.”
6. You needed to actually stipulate these two groups properly. I.e., who
did you sample who have a known good and poor ability at mammogram
interpretation?
I made it more clear. It now reads: “Novice group (have poor ability at
interpreting mammograms) and expert group (have good ability at interpreting
mammograms).”
8. Great work!
I thank the reviewer for their positive feedback on reliability.
9. While good, please give the associated time frame for this.
10. The Aims and Significance section needed to start by framing the
problem issue about there being an alarming rate of breast cancer in
Australian women (drawing on relevant statistics for this) before
highlighting that mammograms are the only reliable tool for detecting
breast cancer early.
11. Apostrophe missing.
Fixed (See page 11).
44. No proper design was specified for the study. You needed to indicate
the general form that the study would take e.g., longitudinal design with
X testing time points involving two independent groups.
45. How many mammograms exactly will be used?
46. Key details were missing here. These mammogram screens should
be deliberately chosen based on representativeness of Australian
women’s breasts (in appropriate proportions of the various ethnicities
and ages that go for screens), difficulty (i.e., have a range of easy-to-
hard screens to better discern ability levels) and content (i.e., ratio of
cancer: no cancer). This all needed to be specified.
47. No description was offered as to what was considered to be a ‘correct’
answer for each of these mammograms, as often the true outcome or
real-world state for these screens is unknown. This was important as it
underpins the entire scoring system and currently there is no way to
determine if a participant’s response is correct or incorrect. Consider
using a separate panel of three radiologist experts to determine these
‘correct’ answers beforehand using the BI-RADS rating system, where
each mammogram screen is allocated a gold standard ‘correct’ answer
on the BI-RADS by consensus among these subject matter
experts. Then participant’s score correlation this ‘correct’ response on
each mammogram screen can be used to gauge their level/degree of
competence. This would allow for a more refined view into their
competence level. This was a critical issue that needed to be
addressed.
48. This is confusing, as it is a "divided by" sign. I think you mean "or" here
instead.
49. This proposed study procedure was highly problematic. Using the exact
same test at both testing time points (for test-retest reliability) was not
appropriate, as practice effects would likely occur. As such,
participants’ scores would be expected to improve from time 1 to time 2
testing purely due to familiarity with the test materials. Therefore, the
resulting reliability assessment would be undermined. This was a major
methodological issue that needed to be addressed. Think about
whether parallel or alternate forms reliability may be a better approach
to this i.e., use of different versions of the test.
50. While this was a solid suggested assessment of internal reliability of the
items within the test, given that the difficulty level of items would
differ within the test, it would have been more prudent to instead
evaluate the Cronbach's alpha levels for each level of item difficulty (like
subscales, rather than just the overall test version which contains items
of heterogeneous difficulty).
51. There was no second test version created in the study design provided
above. Therefore, there is no way to assess this form of reliability. This
was a major methodological error.
52. The timing of this completely undermines assessment of any form of
reliability. Given that you have deliberately placed training between the
time 1 and time 2 testing, you would expect participants' scores to
change due to this intervention. As such, a reliability evaluation (which
measures consistency of scores) would be completely useless. This
does not demonstrate your understanding and application of this core
concept.
53. The actual groups used, as stated earlier in your study design, needed
to be stipulated for this prediction to make proper sense.
54. The actual measure on which these two groups would be compared was
also not stated i.e., the new test.
55. This is not predictive validity i.e., the time 1 score would not predict the
time 2 score. Further, no training interventions were to be employed for
this study (see Assignment Briefing).
56. Predictive validity would be show if participants' scores on your new
test predicted their true job sensitivity and specificity ratings at a 2-year
follow-up.
57. Lovely recognition of this key fact - well done!
I thank the reviewer for leaving their positive comment.
Executive Summary
Comparing with other cancers, breast cancer is more likely to be detected in early stage
through mammograms and patients can get treatment early to prevent it from getting worse
or even death. However, the prevention largely depends on whether your radiologist
correctly interpret your screening result. Radiologists differ greatly in their ability to
Australia to assess for this ability so far.Most existing studies collected and analysed the
data of radiologists and their diagnosis without controlling variables. Thus, it is essential
to assess which factor influence the competence of radiologists, especially to find ways
to improve their ability of detecting tumours and to save lives. If radiologists were
unable to correctly interpret mammograms, it means they would probably tell their patients
that they are healthy and need no treatments. So, patients have to suffer from more pain
and could lose the best timing of getting treatment and die eventually. Families and friends
would also suffer from the pain of losing loved ones. I propose that we should develop an
online simulation test, in which radiologists could see lots of mammograms and they have
to decide which ones have tumour. The new test will use screening measure to compare the
ability of mammogram interpretation between novice and expert and examine whether
practice influence their ability of successfully detecting tumour.The new test will be able
to examine whether practice influence radiologists ability of detecting tumours and the test
result could be used to develop new training programs/methods for other novice
radiologists and improve their skills. Criterion validity will be assessed by comparing
performance of Novice group (have poor ability at interpreting mammograms) and expert
group (have good ability at interpreting mammograms). novice (bad at predicting tumours)
and expert (good at predicting tumours) groups. Predictive validity will be assessed by
alpha and parallel-forms reliability will be assessed through different version of the same
simulation test.
Aims and Significance
test that could reduce breast cancer mortality (Elmore et al., 2009). Statistics show that
it can reduce mortality from breast cancer by approximately 15% in females from 40 to 74
years old (Barlow et al., 2004) as early stage breast cancer is highly possible to be cured
through surgery and treatment (Hassett, O’Malley, Pakes, Newhouse & Earle, 2006). But it
all depends on whether or not patient’s tumour can actually be detected by radiologists.
According to Beam, Layde and Sullivan (1996), radiologists’ ability of detecting cancer in
mammography varies by 11%. Thus, it is necessary to develop a new test that assess factor
that contribute to the competence of radiologists and improve their ability of detecting
tumours based on the test result. Currently, most studies simply analysed the information
of radiologists and made conclusion without controlling variables that possibly influence
ability; some studies did but made conclusions based on small sample testing. The
proposed test will try to avoid or minimize these disadvantages and provide a valid and
reliable assessment tool for radiologists that let them as well as people in related area know
Background
Most existing literature simply analysed the information collected from different
institutes, hospitals and cancer centres, including gender, work type, work hours,
experience, etc. (Barlow et al., 2004; Beam et al., 1996; Elmore, Wells, Lee, Howard, &
Feinstein, 1994). This is likely problematic, as even though we might see a general trend,
there is no way to tell which specific factor(s) lead to better performance. For instance,
This is also consistent with some studies mentioned above that male radiologists are times
more than female radiologists (Elmore et al., 1994). However, according to Carney et al.
(2004), comparing to female radiologists, male radiologists report more intense reaction.
Thus, gender could be a factor of influencing performance, but the studies did not control it
properly.
Therefore, I would argue that it is not appropriate to simply analyse the data but also
need to control variables. The study of Nodine, Kundel, Lauver and Toto (1996) was
professionals and laypeople as participants and let them identify lesions in mammograms.
However, the conclusion is limited as they only had a sample of 15 participants. The study
conducted by Elmore, Wells, and Howard (1998) had a small sample in test settings as
well, which affected the reliability of their experiments (Van Der Boon, de Jaegere & van
Domburg, 2012).
Previous researches have suggested that comparing to other training methods, use of
simulation could effectively enhance ability (Bowman, Russell, Boekeloo, Rafi, & Rabin,
1992; Stone, 1975) as it could simulate real world condition and provide an intuitive view
to actual mammograms (Vallejo, Jordán, Díaz, Comeche & Ortega, 2007) and convenient
to conduct as it is not limited by location and easy to access. Online version also offers
some potential advantages like automatic analyse and statistical feedback (Frein, 2011).
The new measure only tests whether or not tumour is present on the image, which
means the format is basically a true or false question as participant should click either yes
or no button. True or false question is even easier to guess than multiple choice question
and random guessing in either of them reduce test reliability (Burton, 2001; Cronbach,
1941). However, the setting of pointing out the exact tumour area minimize the guessing
rate (Budescu & Bar‐Hillel, 1993) and make sure the participants actual ability is
Test formats as were mentioned above have limitations, they might be valid, but their
reliability is doubted. Therefore, I propose that we should develop a simulation test for
radiologists. In the test, participants need to answer yes or no questions about whether the
current image is presented with tumour. Participants should additionally circle the exact
area of tumour if their answer is yes. This technique will effectively decrease accuracy rate
a test that will: 1) have larger sample size, 2) be easier to assess, 3) control other variables
but only focus on single factor: practice. To achieve these, I propose that the simulation
test should be conducted using computer software online. The place will not be limited as
one can finish the test anywhere with access to Internet and computer software will
This new test will provide inspiration for 1) universities and organizations to develop
becoming a radiologist. This test can also assess other factors after modifying test content.
Study Design
300 new graduate students doing radiology at The University of Kingsland will be
novice group. Background check will be performed to make sure they have no experience.
There will be a training program lasting six months and the students will be tested again
with the same simulation test but different version. Another group will be expert group.
Expert in this experiment refers to people who have been in this field for more than 20
years and have reputation in this area. 100 experts will be recruited through invitation
emails sent to their personal email addresses or official email addresses of institutes,
hospitals and breast cancer centres etc. This experiment will be a non-factorial design.
All participants will complete the online simulation test. The online simulation test
includes hundreds of mammograms collected from real patients in institutes, hospitals and
breast cancer centres. Participants will need to click yes or no button to decide whether or
not a tumour is present in current mammogram. If it is a “YES”, the participant will also
need to circle the exact location of tumour and if it is a “NO”, participant will be shown the
next image. The simulation software will record and calculate the accuracy in identifying
detections (tumour exist and click yes, circling right area/tumour not exist and click no),
misses (tumour exist but click no) and false alarms (tumour not exist but click yes).
Procedure
Student participants will complete the online simulation test in computer labs under
Considering they might be in other cities or overseas as well as our budget, it is not
realistic to ask them to come to computer labs and finish. Students will be tested twice,
before and after the training program but experts will only do once.
Test Evaluation: Assessment of Reliability and Validity
Reliability Evaluation
Parallel-retest reliability is also predicted for the new test. It is expected via strong and
positive correlation between test performance in different version of simulation tests before
and after the training program, which could confirm the parallel-retest reliability of this
new measure.
Validity Evaluation
The test’s criterion validity will be evaluated through the performance outcomes of
contrasted groups. So, it is expected that expert group (good at predicting tumours) will
perform significantly better than student group (bad at predicting tumours; Elmore et al.
will perform significantly better after the training program than their first time, consistent
with the result of study conducted by Nodine et al. (1996) that training could produce
It should be noted that the proposed test could be ethical constraints. Although all the
mammograms are collected from real patients, the participants do not have to be
responsible for the choices they made. This is definitely different from real world situation,
where radiologists have to face anxious or desperate patients and take full responsibility
for their diagnosis. This might cause participants response inconsistent with their
to evaluate the new test and find evidence or add more elements to the test to support and
Breast cancer is more likely to be cured in early stage. Thus, if we could detect it as
early as possible, we can prevent people from suffering or even dying. Radiologists’ skills
vary a lot by 11% (Beam et al., 1996), which means some radiologists could possibly
ignore the tumours and patients loss the best time of treatment. The proposed test could
help change the situation by assessing and improving radiologists’ ability of detecting
tumour in mammograms and saving thousands of lives. Training programs will increase
the likelihood that radiologist correctly interpret mammograms. This will benefit not only
the radiologists, for their own reputation and industry environment, but also benefit
Barlow, W. E., Chi, C., Carney, P. A., Taplin, S. H., D'Orsi, C., Cutter, G., et al. (2004).
Beam, C. A., Layde, P. M., & Sullivan, D. C. (1996). Variability in the interpretation of
Bluth, E. I., Bansal, S., Macura, K. J., Fielding, J., & Truong, H. (2015). Gender and the
radiology workforce: Results of the 2014 ACR workforce survey. Journal of the
Bowman, M., Russell, N., Boekeloo, B., Rafi, I., & Rabin, D. (1992). The effects of
doi:10.1001/archinte.1992.00400210053009
doi:10.1111/j.1745-3984.1993.tb00427.x
Burton, R. F. (2001). Quantifying the effects of chance in multiple choice and true/false
Carney, P. A., Elmore, J. G., Abraham, L. A., Gerrity, M. S., Hendrick, R. E., Taplin, S.
doi:10.1177/0272989X04265480
Elmore, J. G., Jackson, S. L., Abraham, L., Miglioretti, D. L., Carney, P. A., Geller, B. M.,
doi:10.1148/radiol.2533082308
Elmore, J. G., Wells, C. K., & Howard, D. H. (1998). Does diagnostic accuracy
Elmore, J. G., Wells, C. K., Lee, C. H., Howard, D. H., & Feinstein, A. R. (1994).
Hassett, M. J., OMalley, A. J., Pakes, J. R., Newhouse, J. P., & Earle, C. C. (2006).
Legault, C., & Vora, N. (2018). Stroke code simulation has sustained benefit on
neurology resident education and preparedness for stroke call. Neurology, 90(s15).
Nodine, C. F., Kundel, H. L., Lauver, S. C., & Toto, L. C. (1996). Nature of expertise in
doi:10.1016/S1076-6332(96)80032-8
Vallejo, M. A., Jordán, C. M., Díaz, M. I., Comeche, M. I., & Ortega, J. (2007).
Psychological assessment via the internet: A reliability and validity study of online
(vs paper-and-pencil) versions of the General Health Questionnaire-28 (GHQ-28) and
Van Der Boon, R. M., de Jaegere, P. P., & van Domburg, R. T. (2012). Multivariate