You are on page 1of 20

PSYC3020 Assignment – Title Page

Student’s name (First name, Last name): Xinmiao Jin

Student number: 44200194

Date: 06/09/2019

Title of research proposal: How to Save More Lives: A Competence Assessment for
Radiologists

Tutor’s name (First name, Last name): Jo Brown

Tutorial (e.g., P01): T20

Total word count (excluding in-text references, the reference list, and this title page): 1808
words
Xinmiao Jin

44200194

PSYC3020 Semester 2 2019

Cover Letter for Second Submission for Written Assignment

Dear Jo,

Thank you very much for giving me the opportunity to revise my article. I have
addressed all the points made by you, as documented below (reviewer’s original
comments are in bold).

Yours sincerely,

Xinmiao Jin

Reviewer Jo

1. Cool title given to secure your reader's attention - well done!


I thank the reviewer for leaving their positive comment on the title.

2. More care was required with your written expression throughout this
document.  Please provide a final proofread before submitting your work
or look to the PSST tutors for help in future.  Alternatively, look to
journal articles and try to emulate how they write and structure their
sentences to get a sense of how to improve your writing style.  This
would boost your work, as it helps to communicate more clearly what
you want to your reader and hence conveys your intended message with
more impact.

To be continued

3. The core problem issue here was not clear enough i.e., radiologists
differ greatly in their ability to interpret mammogram screens and we
have no national standardised test in place in Australia to assess for
this ability.  You then needed to highlight more clearly the negative
ramifications of this problem given the high rates of breast cancer in
Australia to clarify the significance of the problem.  That is, speak to
what errors of interpretation mean in real life for patients and their loved
ones. 

I deleted the original content and added something new as suggested, the
part now reads: “If radiologists were unable to correctly interpret
mammograms, it means they would probably tell their patients that they are
healthy and need no treatments. So, patients have to suffer from more pain
and could lose the best timing of getting treatment and die eventually.
Families and friends would also suffer from the pain of losing loved ones.”

4. This outline of the new test's methodology required more detail in order
to be properly understood.
More detail has been added as suggested. It now reads: “The new test will
use screening measure to compare the ability of mammogram interpretation
between novice and expert and figure out which group has better
performance.”

5. Be very careful!  This sounds as though you plan to implement an


intervention here.  Recall that the core purpose of this assignment was
to develop and evaluate a novel test/test battery of ability only.  It was
NOT to put in place any interventions such as training programs or
therapy aimed at boosting people’s ability levels.  The latter is highly
inappropriate, as you have not yet demonstrated that your new test/test
battery is psychometrically sound (and therefore, it has no empirical
basis for appearing in any intervention studies yet).  This study should
seek solely to establish the psychometric soundness of this measure,
so that FUTURE research may use it in studies to evaluate interventions
etc.
All the inappropriate content has been deleted.

6. You needed to actually stipulate these two groups properly.  I.e., who
did you sample who have a known good and poor ability at mammogram
interpretation?
I made it more clear. It now reads: “Novice group (have poor ability at
interpreting mammograms) and expert group (have good ability at interpreting
mammograms).”

7. Again, this was highly inappropriate, as no training/interventions were


to be performed as part of this research proposal (see Assignment
Briefing).  This was a fatal flaw that needed to be amended.
I have replaced predictive validity with ? validity. This part now reads: “”.

8. Great work!
I thank the reviewer for their positive feedback on reliability.
9. While good, please give the associated time frame for this.
10. The Aims and Significance section needed to start by framing the
problem issue about there being an alarming rate of breast cancer in
Australian women (drawing on relevant statistics for this) before
highlighting that mammograms are the only reliable tool for detecting
breast cancer early.
11. Apostrophe missing.
Fixed (See page 11).

12. Beautiful identification of this part of the core problem issue!


I thank the reviewer for their positive feedback on reliability of this test.

13. This is wayward.  We need a test that assesses their actual competence,


as we currently have no national standardised test in Australia that does
this.
14. The negative implications of failing to interpret mammograms
correctly were not identified for all the relevant key stakeholders
involved (i.e., for patients, their loved ones, and the medical industry).
15. Please make use of Australian English for an Australian audience.
All fixed.
16. How the new test aimed to address the problem issue also had an
incorrect approach.  You are not to put in place any form of
intervention/training as part of this assignment (see Assignment
Briefing).  This was a major flaw.
17. You needed to speak to the current measures in place i.e., recognise
explicitly that Australia currently has no standardised national test for
radiologist competence in interpreting mammograms.  This helps
establish why your new test is needed and what you are seeking to
replace (in this case, to fill a deficit).
18. A brief review of the literature dealing with the variability of radiologists
in reading and interpreting mammograms was missing here.  This was a
key omission.
19. The Background section really needed to speak to the idea that a great
degree of variability exists in how radiologists interpret mammograms,
and the factors known to affect how accurately they do this (e.g.,
amount of specialised training/experience, breast density, eye
gaze/search patterns) - gender is not one of them and there would be
huge ethical and legal ramifications if you tried to argue that only
radiologists of a given sex be hired.  Further, you needed to speak to the
fact that no standardised measure of radiologist competence in
interpreting mammograms currently exists in Australia, and review the
systems/tests in place in other countries that may be useful to employ
(e.g., the BI-RADS rating system to accompany a series of mammogram
screens taken from a real mammogram screen bank).  As such, there
was a large amount of crucial material that was not covered and
critically analysed here.
20. The actual methodology employed for this study needed to be clearer
and there was no discussion of the results.  As such, how does your
reader know if this is a suitable (i.e., valid) measure of radiologist
ability?
21. Again, most of the key details were missing from the review of this
study.  Therefore, it's not useful to your reader.
22. Much of the key literature was overlooked.  That is, there was no
address and review of factors that affect radiologist competence, such
as experience, specialised training, breast density and eye gaze
patterns.  These were key omissions.  Noting these factors lead nicely
into why simulations would be the best method for assessing
competence, as they take all of these factors into account.
23. Spelling: "research".
Fixed.
24. As before, you should not be looking to put training into effect here.
25. While this is good, take it further and discuss the ecological validity they
provide, which helps with the overall validity of scores produced from
these types of tests.
26. The actual psychometrics for this needed to be reviewed.  Otherwise,
how can your reader trust that this is a reliable and valid measure to
employ for the new test?
27. Lovely work!
28. As before, this is not appropriate as you should not be implementing
and training/interventions.
29. A review of the research in other countries employing the BI-RADS
rating system should have been covered, as it circumvents many
problems raised in this research area e.g., 50% chance guess rates and
lack of realistic response scales for competence tests that employ
correct/incorrect scoring systems.  Likewise, a review of tests that
employ real mammograms but make use of high cancer: no cancer ratio
rates should also have been covered here.  These were key omissions.
30. This would have been a great time to discuss search patterns/strategies,
as this is an outcome of training.  I.e., trained radiologists know exactly
what to look for in mammograms, which is why they display better
search strategies than less trained radiologists, as shown by their
shorter time to locate masses, less time fixating on other structural
features and less time fixating on the mass before arriving at a decision.
31. Given that a reasonable degree of relevant literature was overlooked
here, the evidence base for the new test was not as strong as it should
be.  Particularly, the use of a simulation test needed to be bolstered by
reviewing the psychometric properties of other tests similar to this.
32. This does not address the limitation identified with the previous
measures.  Please consider use of the BI-RADS rating system to combat
this 50% guess rate problem and make the response more realistic to
what radiologists do on-the-job.
33. Solid outline provided as to the intended new test and content :)
34. If you wanted to do this, why not just use an eye-tracker in conjunction
with the task?  This could track search strategies and fixation points to
determine of they are looking at the right location.  It would also gauge
time to find the problem and arrive at a decision.
35. This was a problem with the test's assessment, not the test itself.
36. This is hard to determine, as the test descriptions provided earlier were
not adequate.
37. As before, this is highly inappropriate.  The proposed study actually
represents a training intervention.  Recall that the fundamental purpose
of this assignment was to design and evaluate a novel test to measure a
psychological ability, to establish it as being a psychometrically sound
real world measure.  You were not to assume the measure
was already psychometrically sound and put it in place in an
intervention (e.g., in a training or therapy program).  This was highly
inappropriate as the basic assumption that it is psychometrically sound
has not yet been demonstrated.  This was a fatal study flaw.
38. Future uses for the test/test battery should have covered all the relevant
key tangible and intangible outcomes that could arise from its use i.e.,
Universities can employ it as a competency test for graduating radiology
students, it can provide hospitals with a tool to select the best
radiologist applicants when hiring, identify current
radiologists operating at sub-par levels who are in need of training, and
be used for employee training purposes.  These applications are likely
to result in better patient care as radiologists will likely make more true
positive and fewer false positive or false negative determinations when
interpreting mammograms.  This will ultimately save more lives and
reduce unnecessary patient procedures/surgeries.
39. Capital letters are required at the start of all major words for Level 2
headings (APA issue).  This was a repeated problem in this section.
Fixed. (See Question 41)
40. ? Does this university exist?
No. I thought I should use an assumed name.
41. This does not make sense.  How are new graduates still at uni? Use of
third-year radiology students would have been a better choice.  Further,
these should be sampled from universities across Australia.
Fixed. It now reads: “ Three hundred third-year radiology students from
universities across Australia will be recruited by university research
participation department.”
42. This was NOT an appropriate study design.  The Assignment Briefing
and tutorial slides made clear that you were NOT to design an
intervention or experiment of any kind.  Therefore, this was a major
methodological issue.
I deleted the inappropriate part.
43. Solid choice of this expert group :)
I thank the reviewer for leaving positive comment.

44. No proper design was specified for the study.  You needed to indicate
the general form that the study would take e.g., longitudinal design with
X testing time points involving two independent groups.
45. How many mammograms exactly will be used?
46. Key details were missing here.  These mammogram screens should
be deliberately chosen based on representativeness of Australian
women’s breasts (in appropriate proportions of the various ethnicities
and ages that go for screens), difficulty (i.e., have a range of easy-to-
hard screens to better discern ability levels) and content (i.e., ratio of
cancer: no cancer).  This all needed to be specified.
47. No description was offered as to what was considered to be a ‘correct’
answer for each of these mammograms, as often the true outcome or
real-world state for these screens is unknown.  This was important as it
underpins the entire scoring system and currently there is no way to
determine if a participant’s response is correct or incorrect.  Consider
using a separate panel of three radiologist experts to determine these
‘correct’ answers beforehand using the BI-RADS rating system, where
each mammogram screen is allocated a gold standard ‘correct’ answer
on the BI-RADS by consensus among these subject matter
experts.  Then participant’s score correlation this ‘correct’ response on
each mammogram screen can be used to gauge their level/degree of
competence.  This would allow for a more refined view into their
competence level.  This was a critical issue that needed to be
addressed.
48. This is confusing, as it is a "divided by" sign.  I think you mean "or" here
instead.
49. This proposed study procedure was highly problematic.  Using the exact
same test at both testing time points (for test-retest reliability) was not
appropriate, as practice effects would likely occur.  As such,
participants’ scores would be expected to improve from time 1 to time 2
testing purely due to familiarity with the test materials.  Therefore, the
resulting reliability assessment would be undermined.  This was a major
methodological issue that needed to be addressed.  Think about
whether parallel or alternate forms reliability may be a better approach
to this i.e., use of different versions of the test.
50. While this was a solid suggested assessment of internal reliability of the
items within the test, given that the difficulty level of items would
differ within the test, it would have been more prudent to instead
evaluate the Cronbach's alpha levels for each level of item difficulty (like
subscales, rather than just the overall test version which contains items
of heterogeneous difficulty).
51. There was no second test version created in the study design provided
above.  Therefore, there is no way to assess this form of reliability.  This
was a major methodological error.
52. The timing of this completely undermines assessment of any form of
reliability.  Given that you have deliberately placed training between the
time 1 and time 2 testing, you would expect participants' scores to
change due to this intervention.  As such, a reliability evaluation (which
measures consistency of scores) would be completely useless.  This
does not demonstrate your understanding and application of this core
concept.
53. The actual groups used, as stated earlier in your study design, needed
to be stipulated for this prediction to make proper sense.
54. The actual measure on which these two groups would be compared was
also not stated i.e., the new test.
55. This is not predictive validity i.e., the time 1 score would not predict the
time 2 score.  Further, no training interventions were to be employed for
this study (see Assignment Briefing).
56. Predictive validity would be show if participants' scores on your new
test predicted their true job sensitivity and specificity ratings at a 2-year
follow-up.
57. Lovely recognition of this key fact - well done!
I thank the reviewer for leaving their positive comment.

58. Solid recap of the core problem issue here!


59. It was not stated briefly what would constitute your new test.  As such,
your reader has not been recapped on what components make up this
measure, nor how it is to be presented to prospective participants (i.e.,
the general test presentation mode and response format).
60. No reference was made to the need for this new test to be
psychometrically sound before proceeding to implement it in the
future.  You needed to make it clear that this tool first had to
demonstrate the expected high reliability and validity levels before it
was suitable to measure your chosen ability in this context and thus
help combat the key problem issue at hand.
61. While you touched on the novel contribution(s) of this new test, they
were not entirely clear.  Please provide more clarity/specification as to
exactly what gap and/or limitation the new tool addresses in its
measurement of your chosen ability and how it does so.
62. Absolutely brilliant job on this “big sell” here to secure the dire need for
this new measurement tool through highlight of the key outcomes it
would bring about for all the key stakeholders involved – awesome
work!
63. Evidence was presented here that you consulted an acceptable range of
appropriate literature whilst writing this assessment piece – lovely
job! :)
64. Beautiful work on the APA formatting of this section - well done!

Executive Summary

Comparing with other cancers, breast cancer is more likely to be detected in early stage

through mammograms and patients can get treatment early to prevent it from getting worse

or even death. However, the prevention largely depends on whether your radiologist

correctly interpret your screening result. Radiologists differ greatly in their ability to

interpret mammogram screens and there is no national standardised test in place in

Australia to assess for this ability so far.Most existing studies collected and analysed the

data of radiologists and their diagnosis without controlling variables. Thus, it is essential

to assess which factor influence the competence of radiologists, especially to find ways

to improve their ability of detecting tumours and to save lives. If radiologists were

unable to correctly interpret mammograms, it means they would probably tell their patients

that they are healthy and need no treatments. So, patients have to suffer from more pain

and could lose the best timing of getting treatment and die eventually. Families and friends

would also suffer from the pain of losing loved ones. I propose that we should develop an

online simulation test, in which radiologists could see lots of mammograms and they have

to decide which ones have tumour. The new test will use screening measure to compare the
ability of mammogram interpretation between novice and expert and examine whether

practice influence their ability of successfully detecting tumour.The new test will be able

to examine whether practice influence radiologists ability of detecting tumours and the test

result could be used to develop new training programs/methods for other novice

radiologists and improve their skills. Criterion validity will be assessed by comparing

performance of Novice group (have poor ability at interpreting mammograms) and expert

group (have good ability at interpreting mammograms). novice (bad at predicting tumours)

and expert (good at predicting tumours) groups. Predictive validity will be assessed by

correlation between participants’ performance before and after training at a 6-month

follow-up. For reliability, internal consistency will be assessed by calculating Cronbach’s

alpha and parallel-forms reliability will be assessed through different version of the same

simulation test.
Aims and Significance

In clinical trials, mammography is considered to be the only breast cancer screening

test that could reduce breast cancer mortality (Elmore et al., 2009). Statistics show that

it can reduce mortality from breast cancer by approximately 15% in females from 40 to 74

years old (Barlow et al., 2004) as early stage breast cancer is highly possible to be cured

through surgery and treatment (Hassett, O’Malley, Pakes, Newhouse & Earle, 2006). But it

all depends on whether or not patient’s tumour can actually be detected by radiologists.

According to Beam, Layde and Sullivan (1996), radiologists’ ability of detecting cancer in

mammography varies by 11%. Thus, it is necessary to develop a new test that assess factor

that contribute to the competence of radiologists and improve their ability of detecting

tumours based on the test result. Currently, most studies simply analysed the information

of radiologists and made conclusion without controlling variables that possibly influence

ability; some studies did but made conclusions based on small sample testing. The

proposed test will try to avoid or minimize these disadvantages and provide a valid and

reliable assessment tool for radiologists that let them as well as people in related area know

what can do to improve tumour-detecting ability.

Background

Current Relative Measures

Most existing literature simply analysed the information collected from different

institutes, hospitals and cancer centres, including gender, work type, work hours,

experience, etc. (Barlow et al., 2004; Beam et al., 1996; Elmore, Wells, Lee, Howard, &

Feinstein, 1994). This is likely problematic, as even though we might see a general trend,

there is no way to tell which specific factor(s) lead to better performance. For instance,

gender could be a factor contributing to the performance. As is observed by Bluth, Bansal,


Macura, Fielding and Truong (2015), female radiologists are more likely to work in

academic/university environment while male radiologists tend to be in private practice.

This is also consistent with some studies mentioned above that male radiologists are times

more than female radiologists (Elmore et al., 1994). However, according to Carney et al.

(2004), comparing to female radiologists, male radiologists report more intense reaction.

Thus, gender could be a factor of influencing performance, but the studies did not control it

properly.

Therefore, I would argue that it is not appropriate to simply analyse the data but also

need to control variables. The study of Nodine, Kundel, Lauver and Toto (1996) was

conducted under experiment condition with variables controlled. They recruited

professionals and laypeople as participants and let them identify lesions in mammograms.

However, the conclusion is limited as they only had a sample of 15 participants. The study

conducted by Elmore, Wells, and Howard (1998) had a small sample in test settings as

well, which affected the reliability of their experiments (Van Der Boon, de Jaegere & van

Domburg, 2012).

The Need for Simulations

Previous researches have suggested that comparing to other training methods, use of

simulation could effectively enhance ability (Bowman, Russell, Boekeloo, Rafi, & Rabin,

1992; Stone, 1975) as it could simulate real world condition and provide an intuitive view

for participants (Legault & Vora, 2018).

We use online version of simulation test as it provides similar psychometric properties

to actual mammograms (Vallejo, Jordán, Díaz, Comeche & Ortega, 2007) and convenient

to conduct as it is not limited by location and easy to access. Online version also offers

some potential advantages like automatic analyse and statistical feedback (Frein, 2011).

The new measure only tests whether or not tumour is present on the image, which
means the format is basically a true or false question as participant should click either yes

or no button. True or false question is even easier to guess than multiple choice question

and random guessing in either of them reduce test reliability (Burton, 2001; Cronbach,

1941). However, the setting of pointing out the exact tumour area minimize the guessing

rate (Budescu & Bar‐Hillel, 1993) and make sure the participants actual ability is

measured. This is a key advantage of this new test.


Proposed Test/Test Battery and Rationale

Test formats as were mentioned above have limitations, they might be valid, but their

reliability is doubted. Therefore, I propose that we should develop a simulation test for

radiologists. In the test, participants need to answer yes or no questions about whether the

current image is presented with tumour. Participants should additionally circle the exact

area of tumour if their answer is yes. This technique will effectively decrease accuracy rate

caused by guessing. To minimize the limitations of existing assessments, I aim to propose

a test that will: 1) have larger sample size, 2) be easier to assess, 3) control other variables

but only focus on single factor: practice. To achieve these, I propose that the simulation

test should be conducted using computer software online. The place will not be limited as

one can finish the test anywhere with access to Internet and computer software will

automatically record and analyse results for experimenters.

This new test will provide inspiration for 1) universities and organizations to develop

new training programs, 2) government department to update criterion/standard of

becoming a radiologist. This test can also assess other factors after modifying test content.

Study Design

Participants and design

300 new graduate students doing radiology at The University of Kingsland will be

recruited by university research participation department to do the online simulation test as

novice group. Background check will be performed to make sure they have no experience.

There will be a training program lasting six months and the students will be tested again

with the same simulation test but different version. Another group will be expert group.

Expert in this experiment refers to people who have been in this field for more than 20

years and have reputation in this area. 100 experts will be recruited through invitation
emails sent to their personal email addresses or official email addresses of institutes,

hospitals and breast cancer centres etc. This experiment will be a non-factorial design.

Materials and measures

All participants will complete the online simulation test. The online simulation test

includes hundreds of mammograms collected from real patients in institutes, hospitals and

breast cancer centres. Participants will need to click yes or no button to decide whether or

not a tumour is present in current mammogram. If it is a “YES”, the participant will also

need to circle the exact location of tumour and if it is a “NO”, participant will be shown the

next image. The simulation software will record and calculate the accuracy in identifying

real-existing tumours as the number of correct detection relative to number of correct

detections (tumour exist and click yes, circling right area/tumour not exist and click no),

misses (tumour exist but click no) and false alarms (tumour not exist but click yes).

Procedure

Student participants will complete the online simulation test in computer labs under

supervision. The experts will be required to complete it in a quiet place individually.

Considering they might be in other cities or overseas as well as our budget, it is not

realistic to ask them to come to computer labs and finish. Students will be tested twice,

before and after the training program but experts will only do once.
Test Evaluation: Assessment of Reliability and Validity

Reliability Evaluation

Internal consistency will be evaluated by Cronbach’s alpha. Generally, it is

recognized as a sign of acceptable reliability if Cronbach's Alpha gives a score of 0.7.

Parallel-retest reliability is also predicted for the new test. It is expected via strong and

positive correlation between test performance in different version of simulation tests before

and after the training program, which could confirm the parallel-retest reliability of this

new measure.

Validity Evaluation

The test’s criterion validity will be evaluated through the performance outcomes of

contrasted groups. So, it is expected that expert group (good at predicting tumours) will

perform significantly better than student group (bad at predicting tumours; Elmore et al.

1998). Another validity to be evaluated is predictive validity. It is predicted that students

will perform significantly better after the training program than their first time, consistent

with the result of study conducted by Nodine et al. (1996) that training could produce

efficient search patterns measured by the fastest search times.

It should be noted that the proposed test could be ethical constraints. Although all the

mammograms are collected from real patients, the participants do not have to be

responsible for the choices they made. This is definitely different from real world situation,

where radiologists have to face anxious or desperate patients and take full responsibility

for their diagnosis. This might cause participants response inconsistent with their

behaviour in workplace, which influence validity to some extent. Therefore, it is necessary

to evaluate the new test and find evidence or add more elements to the test to support and

increase its validity.


Conclusions

Breast cancer is more likely to be cured in early stage. Thus, if we could detect it as

early as possible, we can prevent people from suffering or even dying. Radiologists’ skills

vary a lot by 11% (Beam et al., 1996), which means some radiologists could possibly

ignore the tumours and patients loss the best time of treatment. The proposed test could

help change the situation by assessing and improving radiologists’ ability of detecting

tumour in mammograms and saving thousands of lives. Training programs will increase

the likelihood that radiologist correctly interpret mammograms. This will benefit not only

the radiologists, for their own reputation and industry environment, but also benefit

patients, for their own health.


References

Barlow, W. E., Chi, C., Carney, P. A., Taplin, S. H., D'Orsi, C., Cutter, G., et al. (2004).

Accuracy of screening mammography interpretation by characteristics of radiologists.

Journal of the National Cancer Institute, 96(24), 1840-1850. doi:10.1093/jnci/djh333

Beam, C. A., Layde, P. M., & Sullivan, D. C. (1996). Variability in the interpretation of

screening mammograms by US radiologists: Findings from a national sample.

Archives of Internal Medicine, 156(2), 209-213. doi:10.1001/archinte.156.2.209

Bluth, E. I., Bansal, S., Macura, K. J., Fielding, J., & Truong, H. (2015). Gender and the

radiology workforce: Results of the 2014 ACR workforce survey. Journal of the

American College of Radiology, 12(2), 155-157. doi:10.1016/j.jacr.2014.07.040

Bowman, M., Russell, N., Boekeloo, B., Rafi, I., & Rabin, D. (1992). The effects of

educational preparation on physician performance with a sexually transmitted disease

simulated patient. JAMA Archives of Internal Medicine, 152(9), 1823-1828.

doi:10.1001/archinte.1992.00400210053009

Budescu, D., & Bar‐Hillel, M. (1993). To guess or not to guess: A decision‐theoretic

view of formula scoring. Journal of Educational Measurement, 30(4), 277-291.

doi:10.1111/j.1745-3984.1993.tb00427.x

Burton, R. F. (2001). Quantifying the effects of chance in multiple choice and true/false

tests: Question selection and guessing of answers. Assessment & Evaluation in

Higher Education, 26(1), 41-50. doi:10.1080/02602930020022273

Carney, P. A., Elmore, J. G., Abraham, L. A., Gerrity, M. S., Hendrick, R. E., Taplin, S.

H., … D’orsi, C. J. (2004). Radiologist uncertainty and the interpretation of

screening. Medical Decision Making, 24(3), 255-264.

doi:10.1177/0272989X04265480

Cronbach, L. J. (1941). An experimental comparison of the multiple true-false and multiple


multiple-choice tests. Journal of Educational Psychology, 32(7), 533-543.

Elmore, J. G., Jackson, S. L., Abraham, L., Miglioretti, D. L., Carney, P. A., Geller, B. M.,

et al. (2009). Variability in interpretive performance at screening mammography and

radiologists' characteristics associated with accuracy. Radiology, 253(3), 641-651.

doi:10.1148/radiol.2533082308

Elmore, J. G., Wells, C. K., & Howard, D. H. (1998). Does diagnostic accuracy

in mammography depend on radiologists' experience? Journal of Women's

Health, 7(4), 443-449. doi:10.1089/jwh.1998.7.443

Elmore, J. G., Wells, C. K., Lee, C. H., Howard, D. H., & Feinstein, A. R. (1994).

Variability in radiologists interpretations of mammograms. New England Journal of

Medicine, 331(22), 1493-1499. doi:10.1056/NEJM199412013312206

Frein, S. T. (2011). Comparing in-class and out-of-class computer-based tests to

traditional paper-and-pencil tests in introductory psychology courses. Teaching of

Psychology, 38(4), 282–287. doi:10.1177/0098628311421331

Hassett, M. J., OMalley, A. J., Pakes, J. R., Newhouse, J. P., & Earle, C. C. (2006).

Frequency and cost of chemotherapy-related serious adverse effects in a

population sample of women with breast cancer. Journal of the National Cancer

Institute, 98(16), 1108-1117. doi:10.1093/jnci/djj305

Legault, C., & Vora, N. (2018). Stroke code simulation has sustained benefit on

neurology resident education and preparedness for stroke call. Neurology, 90(s15).

Nodine, C. F., Kundel, H. L., Lauver, S. C., & Toto, L. C. (1996). Nature of expertise in

searching mammograms for breast masses. Academic Radiology, 3(12), 1000-1006.

doi:10.1016/S1076-6332(96)80032-8

Vallejo, M. A., Jordán, C. M., Díaz, M. I., Comeche, M. I., & Ortega, J. (2007).

Psychological assessment via the internet: A reliability and validity study of online
(vs paper-and-pencil) versions of the General Health Questionnaire-28 (GHQ-28) and

the Symptoms Check-List-90-Revised (SCL-90-R). Journal of Medical Internet

Research, 9(1), e2–e2. doi:10.2196/jmir.9.1.e2

Van Der Boon, R. M., de Jaegere, P. P., & van Domburg, R. T. (2012). Multivariate

analysis in a small sample size, a matter of concern. The American Journal of

Cardiology, 109(3), 450. doi:10.1016/j.amjcard.2011.10.012

You might also like