You are on page 1of 5

Infectious diseases now 54 (2024) 104884

Contents lists available at ScienceDirect

Infectious Diseases Now


journal homepage: www.sciencedirect.com/journal/infectious-diseases-now

Original article

Evaluating ChatGPT ability to answer urinary tract


Infection-Related questions
Hakan Cakir a, Ufuk Caglar b, Sami Sekkeli b, *, Esra Zerdali c, Omer Sarilar b, Oguzhan Yildiz b,
Faruk Ozgor b
a
Department of Urology, Fulya Acibadem Hospital, Istanbul, Turkey
b
Department of Urology, Haseki Training and Research Hospital, Istanbul, Turkey
c
Department of Infectious Diseases and Clinical Microbiology, Haseki Training and Research Hospital, Istanbul, Turkey

A R T I C L E I N F O A B S T R A C T

Keywords: Introduction: For the first time, the accuracy and proficiency of ChatGPT answers on urogenital tract infection
Artificial intelligence (UTIs) were evaluated.
ChatGPT Methods: The study aimed to create two lists of questions: frequently asked questions (FAQs, public-based in­
Guideline
quiries) on relevant topics, and questions based on guideline information (guideline-based inquiries). ChatGPT
Infection
Urinary tract infection
responses to FAQs and scientific questions were scored by two urologists and an infectious disease specialist.
Quality and reliability of all ChatGPT answers were checked using the Global Quality Score (GQS). The repro­
ducibility of ChatGPT answers was analyzed by asking each question twice.
Results: All in all, 96.2 % of FAQs (75/78 inquiries) related to UTIs were correctly and adequately answered by
ChatGPT, and scored GQS 5. None of the ChatGPT answers were classified as GQS 2 and GQS 1. Moreover, FAQs
about cystitis, urethritis, and epididymo-orchitis were answered by ChatGPT with 100 % accuracy (GQS 5).
ChatGPT answers for EAU urological infections guidelines showed that 61 (89.7 %), 5 (7.4 %), and 2 (2.9 %)
ChatGPT responses were scored GQS 5, GQS 4, and GQS 3, respectively. None of the ChatGPT responses for EAU
urological infections guidelines were categorized as GQS 2 and GQS 1. Comparison of mean GQS values of
ChatGPT answers for FAQs and EAU urological guideline questions showed that ChatGPT was similarly able to
respond to both question groups (p = 0.168). The ChatGPT response reproducibility rate was highest for the FAQ
subgroups of cystitis, urethritis, and epididymo-orchitis (100 % for each subgroup).
Conclusion: The present study showed that ChatGPT gave accurate and satisfactory answers for both public-based
inquiries, and EAU urological infection guideline-based questions. Reproducibility of ChatGPT answers exceeded
90% for both FAQs and scientific questions.

1. Introduction healthcare institutions for UTIs such as urethritis [3]. As a result, they
and/or their relatives use internet sources to obtain knowledge about
Urogenital tract infection (UTI) is a collective phrase for infections UTIs.
involving any location in the urinary and genital systems. Different as­ ChatGPT is an artificial intelligence (AI) application acting as a
sociations and guidelines use different criteria to define UTI, which can natural multiple language chatbot [4]. After its introduction to daily
vary from non-life-threatening cystitis to potentially fatal septic shock practice, its popularity increased rapidly and it began to be used in
[1]. Previous reports have demonstrated that UTIs are the most common multiple areas of life including economy, education, medicine, etc. At
diseases in urology practice, and one evaluation showed that almost one- present, the reliability and proficiency of ChatGPT in medicine are
third of adults over the age of 20 have experienced UTIs [2]. Specific controversial issues. Maillard et al. demonstrated that only in 59 % of 44
populations (pregnant women, elderly patients, immobile patients, and cases with positive blood cultures, ChatGPT had a diagnosis and treat­
patients with immune deficiency, etc.) are particularly vulnerable to ment plan similar to the clinician’s [5]. In another study, Caglar et al.
UTIs. For various reasons, some patients may not wish to address analyzed the accuracy and sufficiency of ChatGPT responses about

* Corresponding author.
E-mail address: samisekkeli@yandex.com (S. Sekkeli).

https://doi.org/10.1016/j.idnow.2024.104884
Received 10 January 2024; Received in revised form 20 February 2024; Accepted 5 March 2024
Available online 8 March 2024
2666-9919/© 2024 Elsevier Masson SAS. All rights reserved.
H. Cakir et al. Infectious Diseases Now 54 (2024) 104884

pediatric urological diseases, and it achieved a 92 % success rate [6]. 2.1. Statistical analysis
Although previous studies have analyzed ChatGPT knowledge
regarding different urological diseases, none appear to have focused on Statistical examinations were conducted using the IBM’s Statistical
knowledge of ChatGPT regarding UTIs. In this study, the accuracy and Package for the Social Sciences, version 27 (SPSS, Armonk, NY, USA).
proficiency of ChatGPT answers about UTIs were evaluated for the first Normality assessment was verified with the Kolmogorov-Smirnov test.
time. GQS scores associated with the various disease subcategories are pre­
sented as percentages. The Independent Student’s t-test was used to
2. Materials and Methods compare the mean GQS between FAQ and guideline inquiries.

The study was conducted between 1st November 2023, and 15th 3. Results
November 2023. The study created two lists of questions: questions
about topics frequently asked by the public and questions based on All in all, 98 FAQs about UTIs were examined according to the study
guideline information. The terms ’urogenital tract infection’, ’cystitis’, scheme, while five repetitive inquiries, eight questions with significant
’prostatitis’, ’urethritis’, ’human papilloma virus (HPV)’, ’epididymo- grammatical errors, five questions requiring personal responses, and
orchitis’, and ’pyelonephritis’ were searched in English, and the results two questions related to personal health were excluded from the study.
were sorted by popularity. By examining the websites appearing on the Finally, 78 FAQs were included in the study, for which the flowchart is
first five pages, the sources most frequently consulted by the public were documented in Fig. 1.
determined. Frequently asked questions (FAQs) were identified by All in all, 96.2 % (75/78 inquiries) of FAQs on UTIs were correctly
analyzing patients’ inquiries and comments on popular social media and adequately answered by ChatGPT, and scored GQS 5, while two
applications including YouTube, Facebook, Instagram, and Twitter. (2.6 %) ChatGPT answers for FAQs about UTIs, and one (1.2 %) response
Complementarily, the websites of health institutions, healthcare pro­ to FAQs about UTIs were scored GQS 4 and GQS 3, respectively. None of
viders, and healthcare associations were used to create FAQs about UTIs. the ChatGPT answers were classified as GQS 2 and GQS 1. More spe­
All questions categorized as FAQs are listed in Supplementary File 1. As cifically, FAQs about cystitis, urethritis, and epididymo-orchitis were
for scientific inquiries, they were defined according to European Urol­ answered by ChatGPT with 100 % accuracy and sufficiency (GQS 5),
ogy Association’s (EAU) Urological Infections guidelines, and are whereas among ChatGPT answers regarding EAU urological infections
documented in Supplementary File 2. ChatGPT knowledge on scientific guidelines, 61 (89.7 %), 5 (7.4 %), and 2 (2.9 %) were scored GQS 5,
questions was analyzed in accordance with EAU Urological Infections GQS 4, and GQS 3, respectively. None of the ChatGPT responses for EAU
guidelines. When preparing lists of questions, those requiring personal Urological infection guidelines were categorized as GQS 2 and GQS 1.
answers, questions for advertising purposes, repetitive inquiries, and The scores for ChatGPT answers to FAQs and guideline-based questions
questions characterized by major grammatical inaccuracy were are summarized in Table 1. Comparison of mean GQS values of ChatGPT
excluded from the present research. All in all, 78 FAQs were created, answers for FAQs and EAU Urological guideline questions showed that
including 13 questions about cystitis, prostatitis, urethritis, HPV, ChatGPT was similarly able to respond to the two question groups (p =
epididymo-orchitis, and pyelonephritis. In addition, 68 inquiries were 0.168) (Fig. 2).
identified in accordance with the relevant EAU Urological Infection Similarity rates for ChatGPT responses to FAQs, FAQ subgroups, and
guidelines. guideline inquiries are listed in Fig. 3. Reproducibility rate was highest
In the present study, the free version of ChatGPT was used to answer for the ChatGPT responses for the FAQ subgroups of cystitis, urethritis,
questions, and all answers were submitted by 15 November 2023. and epididymo-orchitis (100 % for each subgroup), and the lowest for
ChatGPT responses to FAQs and scientific questions were analyzed and FAQ subgroups of prostatitis, HPV, and pyelonephritis (92.3 % for each
scored by two urologists with 15 years of experience and an infectious subgroup).
disease and clinical microbiology specialist with 10 years of experience
in urological infections. If the scores for a given inquiry were not iden­ 4. Discussion
tical according to three raters, the ChatGPT answers were re-checked by
the three raters, and finally determined by their joint decision. The potential benefits and drawbacks of AI in medicine are a subject
Quality and reliability of all ChatGPT answers were checked using of present-day debate. While some authors have claimed that AI may
the Global Quality Score (GQS), which was created to determine the increase public awareness about diseases, improve the success of
accuracy and proficiency of medicine-related visual sources [7]. GQS screening tests, reduce the workload on the healthcare system, and
scores analyze quality and reliability of written sources in the field of provide higher patient compliance with treatment processes, others
medicine. Quality and usefulness of content is scored from 1 to 5; a score have expressed their concerns about the information capacity and reli­
of 1 denotes poor quality and proficiency, and a score of 5 denotes ability of AI [8,9]. Given these opposed approaches, we conducted a
maximum quality and reliability of content. GQS scoring is as follows: study aimed at evaluating ChatGPT knowledge about urological in­
GQS 1: Poor quality, poor flow of the site, most information missing, fections, which are the most common urological diseases worldwide.
not at all useful for patients. The results of the present study showed that ChatGPT had excellent
GQS 2: Generally poor quality and poor flow, some information lis­ accuracy and proficiency rates of 96.2 % and 89.7 % when answering
ted but many important topics missing, of very limited use to patients. FAQs related to urological infections and questions based on EAU uro­
GQS 3: Moderate quality, suboptimal flow, some important infor­ logical infection guidelines, respectively. In addition, ChatGPT provided
mation is adequately discussed but other information poorly discussed, 96.1 % and 92.6 % reproducibility rates for FAQs and inquiries based on
somewhat useful for patients. the EAU urological infection guidelines.
GQS 4: Good quality and generally good flow, most of the relevant While internet resources are readily accessible and often free, the
information is listed, but some topics not covered, useful for patients. accuracy and adequacy of their contents are a matter of debate. Yuksel
GQS 5: Excellent quality and excellent flow, very useful for patients. and Cakmak analyzed videos in YouTube about COVID-19 and preg­
The reproducibility of ChatGPT answers was analyzed by asking each nancy, and stated that despite high view rankings, their quality was low,
inquiry twice, and different machines were used when procuring the two and that many contained misleading information [10]. In another study,
ChatGPT responses. If the ChatGPT answer for the same question had a Alsyouf and colleagues focused on contents about urological cancers in
similar score on different computers, it was considered as positive for social media applications, and found that misleading information was
ChatGPT repeatability. Since no patient records were used, ethics significantly more common than accurate information about urological
committee approval was not required. cancers [11]. In contrast, Cakir et al. were the first to use ChatGPT to

2
H. Cakir et al. Infectious Diseases Now 54 (2024) 104884

Fig. 1. Flowchart of frequently asked questions included in the study Attention: Questions related to personal health.

ChatGPT knowledge about scoliosis and found that while ChatGPT gave
Table 1
accurate and satisfactory responses to 80.6 % of questions asked by the
GQS scores of ChatGPT answers to questions about urological infections.
public, it did so for only 61.3 % of questions based on National Osteo­
Urological infections 5 4 3 2 1 porosis Guideline Group guidelines [14]. In contrast, Caglar et al.
Frequently asked questions (n = 78) 75 (96.2 2 (2.6 1 (1.2 - - evaluated ChatGPT performance in answering inquiries about pediatric
Cystitis (n = 13) %) %) %) - - urological diseases, and foind The study that accuracy of ChatGPT an­
Prostatitis (n = 13) 13 (100 - - - -
swers to questions based on strong recommendations in pediatric uro­
Urethritis (n = 13) %) 1 (7.7 - - -
Human papilloma virus (n = 13) 12 (92.3 %) - - - logical disease guidelines exceeded 90 % [6]. In the present study,
Epididymo-orchitis (n = 13) %) - - - - almost nine out of ten ChatGPT answers related to urological infection
Pyelonephritis (n = 13) 13 (100 1 (7.7 - - - guidelines had GQS 5, which indicates maximally accurate and reliable
%) %) 1 (7.7 answers.
12 (92.3 - %)
%) -
Although the 90 % success rate is impressive, the remaining 10 % of
13 (100 incorrect answers can have serious public health implications. Inaccu­
%) rate information may cause patients to follow erroneous treatment
12 (92.3 modalities, experience unnecessary anxiety or panic, or delay seeking
%)
timely and appropriate medical attention. Especially regarding condi­
EAU guideline recommendations (n 61 (89.7 5 (7.4 2 (2.9 – –
= 68) %) %) %) tions that are serious or require urgent intervention, misinformation can
lead to delays in treatment and exacerbate the disease. As regards AI-
EAU: European Association of Urology.
based health counselling applications, it is therefore of utmost impor­
GQS: Global Quality Score.
tance that users always confirm the information they obtain by consul­
Questions related to personal health.
ting a healthcare professional. Aside from these considerations,
ChatGPT has some inadequacies. The free version of the application
answer questions related to urinary stone diseases, and found that close cannot access data after 2021. Considering that the literature on infec­
to 19 out of 20 inquiries were totally correct and adequate [12]. Simi­ tious diseases is constantly renewed, this can be viewed as a serious
larly, Bulck and Moons found that 85 % of ChatGPT answers to questions limitation. Since ChatGPT does not involve personal medical experi­
about cardiac diseases were satisfactory and reliable [13]. In the present ences, it may ignore issues related to patients’ emotional states. More
study, the ChatGPT answers were investigated for the first time ac­ generally, it matters to have clear information on medical issues. In our
cording to GQS, and 96.2 % of ChatGPT responses to FAQs about UTIs study, it has been shown that ChatGPT gives different answers when
were found to provide accurate and sufficient information. While many some questions are asked repeatedly. In addition, some medical infor­
applications such as YouTube, Twitter and Instagram generally provide mation is personal and doubts persist about the ethical use of artificial
information without any evaluation or restriction criteria before intelligence applications such as ChatGPT.
uploading, ChatGPT can access highly numerous sources of information, Although the present study is the first to analyze ChatGPT knowledge
and we believe that capacity to reach them ensures higher accuracy and about UTIs, it nevertheless has some limitations. Firstly, it was con­
reliability of ChatGPT responses. ducted only in English, which is the most commonly used language in
Prepared by reviewing numerous sources including meta-analysis, the scientific areas and the internet. By contrast, the UTI-related ca­
reviews, original research, and case reports, scientific guidelines contain pacities of ChatGPT in rarer languages was not considered. Secondly, the
important information that will impact clinical practice. Accurate and study spanned a restricted time period, after which sources of infor­
adequate answering of questions may be particularly difficult when mation UTIs have incessantly continued to be uploaded.. Lastly, while
based on sources with extensive data. For example, Cinar analyzed two experienced urologists used ChatGPT and evaluated ChatCPT

3
H. Cakir et al. Infectious Diseases Now 54 (2024) 104884

Fig. 2. Comparison of mean GQS values for frequently asked questions and guideline questions EAU: European Association of Urology GQS: Global Quality Score.

Fig. 3. Similarity rates of answers to questions EAU: European Association of Urology.

answers, we believe that (a) ChatGPT use by people with different socio- Notwithstanding the generally successful results, it should not be
cultural levels and (b) the understandability of ChatGPT answers may be forgotten that in the field of health, present-day ChatGPT still has lim­
the subjects of a different study. itations in its practical use.
In conclusion, the present study showed for the first time that
ChatGPT can provide accurate and satisfactory answers for both public-
based inquiries and EAU urological infection guideline-based questions.

4
H. Cakir et al. Infectious Diseases Now 54 (2024) 104884

CRediT authorship contribution statement [3] Nitzan O, Elias M, Chazan B, Saliba W. Urinary tract infections in patients with
type 2 diabetes mellitus: review of prevalence, diagnosis, and management.
Diabetes, metabolic syndrome and obesity: targets and therapy 2015;26:129–36.
Hakan Cakir, and Ufuk Caglar designed the study. Sami Sekkeli, and [4] Zhou Z. Evaluation of ChatGPT’s Capabilities in Medical Report Generation. Cureus
Oguzhan Yildiz collected the data. Esra Zerdali, Faruk Ozgor, and Omer 2023;15:e37589.
Sarilar performed the management and the analysis of the data. All [5] Maillard A, Micheli G, Lefevre L, Guyonnet C, Poyart C, Canouï E et al. Can Chatbot
artificial intelligence replace infectious disease physicians in the management of
authors interpreted the results of the analysis. The first draft of the bloodstream infections? A prospective cohort study. Clin Infect Dis. Published
manuscript was written by Hakan Cakir, and Ufuk Caglar. All authors online October 12, 2023. doi:10.1093/cid/ciad632.
have read the manuscript and approved the final version to be [6] Caglar U, Yildiz O, Meric A, Ayranci A, Gelmis M, Sarilar O, et al. Evaluating the
performance of ChatGPT in answering questions related to pediatric urology.
submitted. J Pediatr Urol 2023;S1477–5131:00318–20.
[7] Mangan MS, Cakir A, Yurttaser Ocak S, Tekcan H, Balci S, Ozcelik KA. Analysis of
the quality, reliability, and popularity of information on strabismus on YouTube.
Declaration of Competing Interest Strabismus 2020;28:175–80.
[8] Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Andrew Taylor R, et al. How
Does ChatGPT Perform on the United States Medical Licensing Examination? The
The authors declare that they have no known competing financial Implications of Large Language Models for Medical Education and Knowledge
interests or personal relationships that could have appeared to influence Assessment. JMIR Med Educ. 2023;9:e45312. Published 2023 Feb 8. doi:10.2196/
the work reported in this paper. 45312.
[9] Zhou Z, Wang X, Li X, Liao L. Is ChatGPT an Evidence-based Doctor? Eur Urol
2023;84:355–6.
Appendix A. Supplementary data [10] Yuksel B, Cakmak K. Healthcare information on YouTube: Pregnancy and COVID-
19. Int J Gynaecol Obstet 2020;150:189–93.
[11] Alsyouf M, Stokes P, Hur D, Amasyali A, Ruckle H, Hu B. ’Fake News’ in urology:
Supplementary data to this article can be found online at https://doi. evaluating the accuracy of articles shared on social media in genitourinary
org/10.1016/j.idnow.2024.104884. malignancies. BJU Int 2019;124:701–6.
[12] Cakir H, Caglar U, Yildiz O, Meric A, Ayranci A, Ozgor F. Evaluating the
performance of ChatGPT in answering questions related to urolithiasis. Int Urol
References Nephrol 2023 Sep 2. https://doi.org/10.1007/s11255-023-03773-0.
[13] Van Bulck L, Moons P. Response to the Letter to the Editor on: Dr. ChatGPT in
[1] Foxman B. The epidemiology of urinary tract infection. Nat Rev Urol 2010;7: Cardiovascular Nursing: A Deeper Dive into Trustworthiness, Value, and Potential
653–60. Risks . Eur J Cardiovasc Nurs. 2023;zvad049.
[2] Laupland KB, Ross T, Pitout JD, Church DL, Gregson DB. Community-onset urinary [14] Cinar C. Analyzing the Performance of ChatGPT About Osteoporosis. Cureus 2023;
tract infections: a population-based assessment. Infection 2007;35:150–3. 15:e45890.

You might also like