Professional Documents
Culture Documents
Data in Brief
Data Article
a r t i c l e i n f o a b s t r a c t
∗
Corresponding author: Xiao Shi
E-mail address: shixiao@shyueyanghospital.com (X. Shi).
1
These authors contributed equally.
https://doi.org/10.1016/j.dib.2020.106153
2352-3409/© 2020 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license.
(http://creativecommons.org/licenses/by/4.0/)
2 D. Shi, C. Tang and S.V. Blackley et al. / Data in Brief 32 (2020) 106153
Specifications Table
Subject Aging
Specific subject area Diagnosis – Image and text data analysis
Hospitalized geriatric patients are a highly heterogeneous group often
with variable diseases and conditions. Physicians, and geriatricians
especially, are devoted to seeking non-invasive testing tools to support
a timely, accurate diagnosis. The dataset can provide an objective test
for Chinese tongue diagnosis, which is mainly based on the color and
texture of the tongue.
Type of data Free-text document
Table
Image
Each patient has a folder with 1 face image, 1 tongue image, and 2
narrative documents. An additional summary formed by table is
provided.
How data were acquired We used a patented light-field camera (CN201520303463.5) called the
intelligent mirror using CIELAB color space. Our data acquisition was
handled in a standardized way (i.e., ensuring consistent sitting height
and placement of the intelligent mirror) as much as possible.
Data format The face and tongue images belong to raw data and were taken at 600
pixels per inch (about 42.3 µm per pixel) and saved as a ∗ .jpg with
minimum compression (10% compression max). One narrative
document is annotated and contains the parameters generated by the
intelligent mirror when creating the face and tongue images, and the
other contains the annotation results from the expert panel (e.g., vital
signs, clinical imaging examination, and laboratory indicators).
Parameters for data collection The study was conducted at a Chinese tertiary, comprehensive hospital.
We recruited hospitalized subjects (excluding minority groups or other
sensitive or disempowered populations) in the Geriatrics Department
beginning in January 1, 2019. Images were captured via a light-field
camera using CIELAB color space (to simulate the human visual
perception) and then were manually labeled by a panel of subject
matter experts after chart reviewing patients’ clinical information
documented in the hospital’s information system.
Description of data collection Data acquisition and image annotation was conducted by subject
matter experts including four fully credentialed senior-level physicians
(i.e., associate chief physician and above), one resident, and two
medical students. One medical student was in charge of data
acquisition. The resident consolidated patients’ previous chronic
medical history, clinical imaging examination, and laboratory
indicators. One physician diagnosed patients’ constitutional types.
Another physician gave a final admission diagnosis by considering the
patient’s constitution based on both traditional Chinese medicine and
Western medicine. Constitutional types are based on TCM analysis and
differentiation of pathological conditions in accordance with the eight
principal syndromes, namely , including yin and yang (),
exterior and interior (), cold and heat (), and hypofunction and
hyperfunction (). All the information from the free-text data
labeling was documented digitally by one medical student in Chinese
(continued on next page)
D. Shi, C. Tang and S.V. Blackley et al. / Data in Brief 32 (2020) 106153 3
• The data is extensible, comparable, and compatible. Data collection processes are standard-
ized to acquire data by considering the requirements and expectations of not only patients
but also various researchers. Specifically, patients desire non-invasive, simple, and effective
diagnostic tools. Clinicians are curious and sometimes want to collect data that doesn’t exist
in any pre-existing table of the database. Data analysts are interested in grouping data into
categories that might not exactly fit the data. The dataset pursues at least three purposes.
First, it covers almost all possible indicators of tongue diagnosis in Chinese and Western
medicine and adds the content of face consultation additionally. Second, it aims to adopt
the epidemiological method of investigation by (1) limiting the target population to Asia’s
elderly population aged 65 and over, and (2) scheduling the collection time as the first day
of hospitalization. Thirdly, the data can be easily linked to data from different systems, such
as CT (computerized tomography) scans or MRIs (magnetic resonance imaging) and clinical
laboratory indicators, relying on more than 20 years of previous HIS (hospital information
system) experience.
• The data is labeled by clinicians with rich clinical experience. A total of 16 physicians in
the department of geriatrics participated in manually labeling the data with the admission
diagnosis. Each patient’s diagnosis is determined through a panel of subject matter experts.
The data will be updated if the patient is readmitted to the hospital. The dataset meets the
requirements for use as a training set and is suitable for artificial intelligence and machine
learning. Some preliminary results are able to correct false medical information or misleading
claims concerning tongue and face consultation on the Internet and social media.
4 D. Shi, C. Tang and S.V. Blackley et al. / Data in Brief 32 (2020) 106153
• This continuously growing dataset (up to 688 patients) is new and original, and the data has
not been published elsewhere.
1. Data Description
Geriatric syndromes may be complicated and heterogeneous [1], and geriatric patients with
multiple diagnoses are prone to treatment complications.
A unique, non-invasive approach to monitoring health [2] for millennia, tongue diagnosis pur-
ports that the tongue’s color and texture are outer manifestations of the status of the internal
organs [3] and provide insights into patient status in conditions like inflammation, infection, and
endocrine disorders. Recently, tongue diagnosis has seen gradual acceptance in modern Western
medicine, with the term “geographic tongue [4]” used to describe tongue discolorations or cracks
accompanying illness. One case study of multiple systemic disorders published in the New Eng-
land Journal of Medicine describes a patient’s “smooth, shiny tongue [5].”
Applying machine learning to tongue images might provide useful diagnostic tools. Existing
tongue appearance data is inadequate in both quality and quantity; therefore, we manually cre-
ated an annotated tongue diagnosis dataset to support future work.
The study was conducted at the Yueyang Integrated Traditional Chinese Medicine and West-
ern Medicine Hospital, a tertiary, comprehensive hospital affiliated with the Shanghai University
of Traditional Chinese Medicine. We recruited hospitalized subjects (excluding minority groups
or other sensitive or disempowered populations) in the Geriatrics Department beginning in Jan-
uary 1, 2019. This study was approved by Yueyang’s Institutional Review Board (IRB).
Since January 1, 2019, 668 adults were recruited to the study, of which 149 (22.3%) were male
and 519 (77.7%) female. For each patient, two images of face and tongue were captured, and
the associated free-text notes were collected and stored in a directory. We pulled unidentified
patient ID (identification), age range, gender, weight and height, initial diagnosis, and admission
and discharge dates, as well as previous chronic medical history from electronic medical records
(EMRs) stored in Yueyang’s hospital information system. We also collected vital signs, clinical
imaging examination, and laboratory indicators during hospitalization.
Each patient has a folder with 1 face image, 1 tongue image, and 2 narrative documents. The
face and tongue images were taken at 600 pixels per inch (about 42.3 µm per pixel) and saved
as a ∗ .jpg with minimum compression (10% compression max). One narrative document contains
the parameters generated by the intelligent mirror when creating the face and tongue images,
and the other contains the annotation results from the expert panel (e.g., vital signs, clinical
imaging examination, and laboratory indicators).
We ensured that the data covers as many indicators as possible, including the content of
face consultation, in both Chinese and Western medicine. We used a patented light-field camera
(CN201520303463.5) called the intelligent mirror using Commission on Illumination (CIE) L∗ a∗ b∗
color space (CIELAB) [6]. Patients’ health conditions may pose challenges during data collection.
For example, some patients with cerebral infarction may have difficulty sticking out their tongue.
Our data acquisition was handled in a standardized way (i.e., ensuring consistent sitting height
and placement of the intelligent mirror) as much as possible.
D. Shi, C. Tang and S.V. Blackley et al. / Data in Brief 32 (2020) 106153 5
Table 1
Overview of data files/data sets.
Table 2
Parameters produced by the intelligent mirror.
Data acquisition and image annotation was conducted by subject matter experts including
four fully credentialed senior-level physicians (i.e., associate chief physician and above), one res-
ident, and two medical students. One medical student was in charge of data acquisition. The
resident consolidated patients’ previous chronic medical history, clinical imaging examination,
and laboratory indicators. One physician diagnosed patients’ constitutional types. Another physi-
cian gave a final admission diagnosis by considering the patient’s constitution based on both
traditional Chinese medicine (TCM) and Western medicine. Constitutional types are based on
TCM analysis and differentiation of pathological conditions in accordance with the eight prin-
cipal syndromes, namely , including yin and yang (), exterior and interior (),
cold and heat (), and hypofunction and hyperfunction (). All the information from the
free-text data labeling was documented digitally by one medical student in Chinese and trans-
lated into English. The treatment plan corresponding to the admission diagnosis was reviewed
and annotated by the remaining two physicians.
In the dataset, each patient had an individual folder consisted of 1 face image, 1 tongue
image, and 2 narrative documents. The face and tongue images were taken at 600 pixels per inch
(about 42.3 µm per pixel) and saved as a ∗ .jpg with minimum compression (10% compression
max). Among two documents, one contains the parameters generated by the intelligent mirror
when creating the face and tongue images, and the other document is the annotation results
given by the expert panel related to the patient (e.g., vital signs, clinical imaging examination,
and laboratory indicators). The two free-text documents were initially written in Chinese and
then were translated into English by a medical student and approved by at least one of the
experts.
Table 2 shows the main parameters with their data type and references produced by the
intelligent mirror, including the color of the face, lip, tongue coating, and overall tongue. For ex-
ample, the color combination of cyan, red, yellow, white, and black were used to describe the
color of the face according to the TCM literature.
Table 3 indicates other items that were manually evaluated via expert judgment, including
tongue shape (i.e., thick index), size (i.e., macroglossia index), the judgment of tooth marks or
6 D. Shi, C. Tang and S.V. Blackley et al. / Data in Brief 32 (2020) 106153
Fig. 1. The principle of tongue data acquisition. (a) An ancient instruction for tongue diagnosis recorded in Ao-shi-
shang-han-jin-jing-lu (i.e., ), a traditional Chinese medicine (TCM) book of 36 tongue illustrations
compiled during the Yuan Dynasty of Ancient China; (b) Tongue appearance as an outer manifestation of the status of
the human organ systems used as a guideline in the TCM; (c) Data acquisition process: (1) take images of 2 shots (i.e.,
face and tongue) of each patient, (2) capture colors via a light-field camera using CIELAB, a color space defined by the
CIE in 1976.
Table 3
Manually annotated items.
Thick index free text normal or not (thick or fat) expert judgment
Macroglossia index free text normal or not expert judgment
Judgment of fissured tongue free text with or without cracks expert judgment
Judgment of tooth marks free text with or without marks expert judgment
Smooth or shiny index free text normal or not (smooth or shiny) expert judgment
Constitutional types free text 9 indicators (see Table 4) physical factors
Scale for osteoporosis questionnaire primary and specific form physical factors
Frailty assessment scale questionnaire SARC-F and FRAIL form physical factors
Scale for geriatric depression questionnaire CGS short form mental factors
Opinion for clinical imaging free text 10 indicators (see Table 4) clinical observations
examination
Opinion for laboratory free text 20 indicators (see Table 4) clinical observations
indicators
Additional information free text the patient’s sitting height and the other factors
distance between the patient and
intelligent mirror
D. Shi, C. Tang and S.V. Blackley et al. / Data in Brief 32 (2020) 106153 7
Table 4
Indicators used in the manually annotated dataset.
fissured tongue, and the degree of smoothness or shininess. We additionally documented the
condition of each patient when taking photos in a text file located in the same directory as
the face and tongue images, including the patient’s sitting height and the distance between the
patient and intelligent mirror.
Table 4 lists all the indicators used in the manually annotated documents.
ing tongue appearance, and even developing an mHealth application to provide individualized
health suggestions based on tongue causes.
3. Limitations
Our work was based on a single organization, Yueyang Integrated Traditional Chinese
Medicine and Western Medicine Hospital, collecting from a single race in Asia, Chinese, and
therefore might not be generalizable to other races and ethnicity categories.
The two free-text annotated documents may be misleading due to having been translated
from their original language as well as algorithmic bias. These documents were initially written
in Chinese, then translated into English by a medical student and approved by at least one of
the experts.
Ethics Statement
We obtained trial approval at ClinicalTrials.gov. This project was approved by Yueyang Hos-
pital’s IRB without any “minority groups or other sensitive or disempowered populations.” All
participants signed the consent form and agreed to share their data with face and tongue im-
ages.
The authors received full approval from all participants (without children or other “minority
groups or other sensitive or disempowered populations.”). They understood that, (1) the infor-
mation will be published without their names attached (but that full anonymity cannot be guar-
anteed), (2) the text and pictures or videos published in the article will be freely available on
the internet and may be seen by the general public, and (3) the pictures, videos, and text may
also appear on other websites or in print, and may be translated into other languages or used
for commercial purposes.
The authors declare that they have no known competing financial interests or personal rela-
tionships which have, or could be perceived to have, influenced the work reported in this article.
Acknowledgements
This work was partially funded by the Shanghai Municipal Health and Family Planning Com-
mission Fund No. ZY(2018-2020)-FWTX-4022, and the National Key R&D Program of China No.
2018YFC1704703.
The authors would like to thank Fufeng Li for providing the patented light-field camera, and
to thank all clinicians for their help with the annotation in this work.
Supplementary materials
Supplementary material associated with this article can be found, in the online version, at
doi:10.1016/j.dib.2020.106153.
D. Shi, C. Tang and S.V. Blackley et al. / Data in Brief 32 (2020) 106153 9
References
[1] S.K. Inouye, S. Studenski, M.E. Tinetti, G.A. Kuchel, Geriatric syndromes: clinical, research, and policy implications of
a core geriatric concept, J. Am. Geriatr. Soc. 55 (5) (2007) 780–791.
[2] B. Jiang, X. Liang, Y. Chen, T. Ma, L. Liu, J. Li, R. Jiang, T. Chen, X. Zhang, S Li, Integrating next-generation sequencing
and traditional tongue diagnosis to determine tongue coating microbiome, Sci. Rep. 6 (2) (2012) 936.
[3] C.C. Chiu, A novel approach based on computerized image analysis for traditional Chinese medical diagnosis of the
tongue, Comput. Methods Programs Biomed. 61 (2) (20 0 0) 77–89.
[4] Y. Zadik, S. Drucker, S. Pallmon, Migratory stomatitis (ectopic geographic tongue) on the floor of the mouth, J. Am.
Acad. Dermatol. 65 (2) (2011 Aug) 459–460, doi:10.1016/j.jaad.2010.04.016.
[5] H.J. Lee, D.Y. Jo, A smooth, shiny tongue, N. Engl. J. Med. 360 (6) (2009) e8.
[6] Wikipedia. CIELAB color space. https://en.wikipedia.org/wiki/CIELAB_color_space. Accessed 7 April 2020.
[7] C. Tang, J.M. Plasek, Y. Xiong, Z. Zhang, D.W. Bates, L. Zhou, A clustering algorithm based on document embedding
to identify clinical note templates, Ann. Data Sci. (2020), doi:10.1007/s40745- 020- 00296- 8.