You are on page 1of 25

1

1 Review

2 Person-Centered Care Assessment Tool, a Focus on Patient-


3 Reported Outcomes (PRO): A Systematic Review of
4 Psychometric Properties
5 Lluna María Bru-Luna 1, Manuel Martí-Vilar 2, César Merino-Soto 3, José Livia-Segovia 4, Juan Garduño-Espinosa 5 and
6 Filiberto Toledano-Toledano 5,6,7,*

7 1
Departamento de Educación, Facultad de Ciencias Sociales, Universidad Europea de Valencia, 46010, Valencia, Spain
8 2
Departamento de Psicología Básica, Universitat de València, Blasco Ibáñez Avenue, 21, 46010, Valencia, Spain
9 3
Escuela de Psicología, Instituto de Investigación de Psicología, Universidad de San Martín de Porres; Tomás
10 Marsano Avenue 242, Lima 34, Perú
11 4
Instituto Cenral de Gestión de la Investigación, Universidad Nacional Federico Villarreal; Carlos Gonzalez
12 Avenue 285, San Miguel 15088, Perú
13 5
Unidad de Investigación en Medicina Basada en Evidencias, Hospital Infantil de México Federico Gómez
14 National Institute of Health, Dr. Márquez 162, Doctores, Cuauhtémoc, Mexico City 06720, Mexico
15 6
Unidad de Investigación Multidisciplinaria en Salud, Instituto Nacional de Rehabilitación Luis Guillermo
16 Ibarra Ibarra, México-Xochimilco roadway 289, Arenal de Guadalupe, Tlalpan, Mexico City 14389, Mexico
17 7
Dirección de Investigación y Diseminación del Conocimiento, Instituto Nacional de Ciencias e Innovación
18 para la Formación de Comunidad Científica, INDEHUS, Periférico Sur 4860, Arenal de Guadalupe, Tlalpan,
19 Mexico City 14389, Mexico
20 * Correspondence: filiberto.toledano.phd@gmail.com; Tel.: +52-5580094677.

21 Abstract: Background The Person-centered Care Assessment Tool (P-CAT) is one of the shortest
22 and simplest tools available to measure the person-centered care approach; however, the study of
23 its validity has mostly focused on content, criterion and construct validity, leaving aside the
24 interpretation and use of scores. The aim of this research was to conduct a systematic review of the
25 evidence on P-CAT validation studies, taking the "Standards" as a frame of reference. Methods
26 First, a systematic review of the literature was performed following the PRISMA method. As a
27 search strategy, the original P-CAT article was located, and all articles citing it in the Web of
28 Science, Scopus and PubMed databases were reviewed. One protocol was registered in
29 PRSOPERO, and the search was performed based on the criteria established therein. Second, a
30 systematic descriptive literature review of validity evidence was conducted following the
31 "Standards" framework. Results Empirical evidence indicated that these validations offer many
32 sources related to test content, internal structure for dimensionality and internal consistency and a
Citation: To be added by editorial
33 moderate number of sources pertaining to internal structure in terms of test-retest reliability and
staff during production.
34 for the relationship with other variables. There is very little evidence for response processes,
35 Academic Editor: Firstname internal structure in measurement invariance terms, and test consequences. Conclusions
36 Lastname Validation studies continue to focus on the types of validity traditionally studied, leaving aside
37 the interpretation of scores in terms of their intended use, which may affect clinical practice and,
Received: date
38 Revised: date
as a consequence, equity in the provision of care to individuals.
Accepted: date
39 Published: date Keywords: person-centered care; PCC; person-centered care assessment tool; P-CAT; validity;
40 standards
41
Copyright: © 2023 by the authors.
Submitted for possible open access
42 publication under the terms and 1. Introduction
43 conditions of the Creative Commons 1.1. Person-Centered Care assessment Tool (P-CAT)
Attribution (CC BY) license
44 Quality care for people with chronic diseases and/or functional limitations has
(https://creativecommons.org/license
45 become one of the main objectives of medical and care services. In this sense, the person-
s/by/4.0/).

3 Healthcare 2023, 11, x. https://doi.org/10.3390/xxxxx www.mdpi.com/journal/healthcare


4 Healthcare 2023, 11, x FOR PEER REVIEW 2 of 25
5

46 centered care (PCC) approach is not only an essential element in achieving this goal but
47 also a key element in providing high-quality care in the areas of health maintenance and
48 medical care [1–3]. PCC, in addition to guaranteeing human rights, provides numerous
49 benefits to both the recipient and the provider [4,5]. Additionally, the PCC includes a set
50 of competencies recognized as necessary for health care professionals to meet the
51 changing challenges that prevail in this area [6]. The PCC includes elements such as [7]
52 an individualized care plan based on the person's preferences, support by an
53 interprofessional team, active coordination among all medical and care providers and
54 support services, education and training for providers, and quality improvement using
55 feedback from the person and his or her caregivers.
56 However, even with the existence of a considerable body of literature, rapid growth
57 in PCC and the frequent inclusion of this term in health policy and research [8], PCC has
58 numerous complications: there is no consensus on the definition of this concept [9],
59 which includes problematic areas such as efficacy assessment [10,11]. Furthermore, the
60 study by Ahmed et al. [12] identifies the difficulty in measuring the subjectivity involved
61 in the identification of PCC dimensions and the infrequent use of standardized measures
62 as acute issues to be resolved. These limitations and purposes were motivations for the
63 creation of the Person-centered Care Assessment Tool (P-CAT) [13].
64 There are several instruments that can measure PCC from different perspectives
65 (i.e., caregiver or person receiving care) and in different contexts (e.g., hospitals or
66 nursing homes). However, from a practical point of view, P-CAT is one of the shortest
67 and simplest tools and contains all the essential elements of PCC described in the
68 literature. It was developed in Australia to measure the treatment approach for older
69 people with dementia in long-term residential settings, although it is increasingly used
70 in other healthcare settings, such as oncology units [14] and psychiatric hospitals [15].
71 Moreover, the P-CAT has gained wide acceptance in recent years [16]—so much so that
72 it has been adapted in several countries separated by wide cultural and linguistic
73 differences, such as Norway [17], Sweden [18,19], China [20,21], South Korea [22], Spain
74 [16] and Italy [23].
75 The P-CAT is composed of 13 items based on a 5-point ordinal scale (from "strongly
76 disagree" to "strongly agree"), with high scores interpreted as a high degree of person-
77 centeredness. The scale consists of three dimensions: person-centered care (7 items),
78 organizational support (4 items) and environmental accessibility (2 items). In the
79 original study (n = 220; [13]), the internal consistency of the instrument yielded
80 satisfactory values for the total scale (α =.84) and good test-retest reliability (r =.66) at a
81 one-week interval. A reliability generalization study conducted in 2021 [24], which
82 estimated the internal consistency of the P-CAT and analyzed possible factors that could
83 affect it, recorded a mean α value for the 25 meta-analysis samples (some of which are
84 part of the validations included in this study) of .81; further, the only variable that had a
85 statistically significant relationship with the reliability coefficient was the mean age of
86 the sample. Regarding internal structure validity, three factors (56% of the total
87 variance) were obtained, as well as content validity estimated by experts, literature
88 reviews and stakeholders [23].
89 Although not explicitly stated, the apparent communality between validation
90 studies of different versions of the P-CAT was possibly influenced by an influential
91 decades-old validity framework, which differentiates three categories: content validity,
92 construct validity, and criterion validity [25,26]. However, a reformulation of this P-CAT
93 validity evidence within a modern framework, which would provide a different
94 definition of validity, has not been elaborated.
95 1.2. Scale Validity
96 Traditionally, validation is a process focused on the psychometric properties of a
97 measurement instrument [27]. In the early 20th century, with the frequent appearance of
98 standardized tests in education and psychology, two definitions emerged: the first test

6
7 Healthcare 2023, 11, x FOR PEER REVIEW 3 of 25
8

99 defined validity as the degree to which a test measures what it intends to measure, while
100 the other described the validity of an instrument in terms of the correlation it presents
101 with some variable [26].
102 However, in the past century, there has been an evolution in validity theory that
103 has led to the conception that validity should be based on specific interpretations for an
104 intended purpose and that it should not be limited to empirically obtained psychometric
105 properties but should also be supported by the theory underlying the construct
106 measured. This conceptual advance in validity led to the creation of a framework to
107 guide and serve as a basis for obtaining evidence to support the use and interpretation
108 of the scores obtained by a measure [28].
109 This purpose is addressed by the Standards for Educational and Psychological
110 Testing ("Standards"), a guide created by the American Educational Research
111 Association (AERA), the American Psychological Association (APA) and the National
112 Council on Measurement in Education (NCME) in 2014 with the aim of providing
113 guidelines for assessing the validity of the interpretations of scores from an instrument
114 based on their intended use. There are two conceptual aspects that stand out in this
115 modern view of validity. First, validity is a unitary concept centered on the construct;
116 second, validity is defined as "the degree to which evidence and theory support the
117 interpretations of test scores for proposed uses of tests" [29]. Thus, the Standards
118 propose several sources that serve as a reference for assessing different aspects of
119 validity. These five sources of validity evidence are as follows [29]: test content, response
120 processes, internal structure, relations to other variables and consequences of testing.
121 According to AERA et al. [29], test content validity refers to the relationships among
122 administrative process, subject matter, wording and format of test items to the construct
123 they are intended to measure and is measured predominantly with qualitative methods
124 but without excluding quantitative approaches. The validity of responses is based on the
125 analysis of the cognitive processes and the interpretation of the items by the respondents
126 and is measured with qualitative methods. Internal structure validity is based on the
127 interrelationship between the items and the construct and is measured with quantitative
128 methods. Validity based on the relationship with other variables is based on the
129 comparison between the variable that the instrument intends to measure and other
130 theoretically relevant external variables and is measured by quantitative methods.
131 Finally, validity based on the consequences of the test analyzes the consequences, both
132 intended and unintended, that may be due to a source of invalidity. It is measured
133 mainly by qualitative methods.
134 Thus, even though validity plays a fundamental role in providing a strong scientific
135 basis for interpretations of test scores, in the health field, validation studies have
136 traditionally focused on content validity, criterion validity and construct validity,
137 leaving aside the interpretation and use of scores [25].
138 Standards is considered a suitable theoretical framework for conducting reviews on
139 the validity of questionnaires due to its ability to analyze sources of validity from both a
140 qualitative and quantitative approach and its evidence-based method [26]. However,
141 either for reasons of lack of knowledge or due to lack of a systematic description
142 protocol, very few instruments have been reviewed within the framework of the
143 Standards [28].
144 1.3. Justification
145 Due to the brevity and simplicity of application, the versatility of its use in different
146 medical and care contexts, and its potential emic characteristics (i.e., constructs that can
147 be cross culturally applicable with reasonable and similar structure and interpretation)
148 [30], the P-CAT is one of the most widely used tests to measure PCC [16,31] and has
149 expanded to several countries with large cultural and linguistic differences. Thus, since
150 its creation, it has had seven validations [16–23]. However, there has not yet been any
151 analysis of its validity within the framework of the Standards, that is, the empirical

9
10 Healthcare 2023, 11, x FOR PEER REVIEW 4 of 25
11

152 evidence of validity obtained from the P-CAT in a way that helps build a judgment
153 based on a synthesis of the available information.
154 A review of this type is critical given some methodological issues that have not
155 been resolved by the P-CAT. For example, although the multidimensionality of the P-
156 CAT was identified in the study that constructed it, Bru-Luna et al. [24] recently stated
157 that in adaptations of the P-CAT [16–22], the total score was used for interpretation, and
158 multidimensionality was disregarded. Thus, the multidimensionality of the original
159 study was apparently not replicated. Bru-Luna et al. [24] also indicated that the internal
160 structure validity of the P-CAT is usually underreported, lacking sufficient rigorous
161 approaches to establish with certainty how its scores are calculated.
162 The validity of the P-CAT, specifically of its internal structure, appears to be
163 unresolved, but it is still a measure that substantive research and professional practice
164 point to as relevant to assessing PCC. This perception is contestable and judgment-based
165 and may not be sufficient to assess the validity of the P-CAT from a cumulative and
166 synthetic angle based on antecedent validation studies. To this end, an adequate
167 assessment of validity requires a model of how to conceptualize validity, followed by a
168 review of antecedent validity studies of the P-CAT using this model.
169 Therefore, the main purpose of this study was to conduct a systematic review of the
170 validity evidence conducted in P-CAT validation studies, taking the Standards as a
171 framework.

172 2. Materials and Methods


173 The present study comprises two distinct but interconnected procedures. First, a
174 systematic literature review was conducted following the PRISMA method [32] (see
175 Appendix A) with the aim of collecting all validations of the P-CAT that have been
176 developed. Second, a systematic descriptive literature review of validity tests was
177 conducted following the Standards framework [29]. The work of Hawkins et al. [28], the
178 first study to conduct a review of validity sources according to the guidelines proposed
179 by the Standards, was also used as a reference. Both provided conceptual and pragmatic
180 guidance for organizing and classifying P-CAT validity evidence.
181 2.1. Systematic Review
182 2.1.1. Search Strategy and Information Sources
183 Initially, the Cochrane database was searched with the aim of finding systematic
184 reviews of P-CAT. When none were found, subsequent searches were made in the Web
185 of Science (WoS), Scopus and PubMed databases. These databases play a fundamental
186 role in scientific literature since they are the main sources of published articles that
187 undergo high-quality content and editorial review processes [33]. The search formula
188 used was as follows. The original P-CAT article [13] was located, and then all articles
189 citing it through 2021 were identified and analyzed. This ensured the inclusion of all
190 validations. No articles were excluded on the basis of language to avoid language bias
191 [34]. Moreover, to reduce the effects of publication bias, a complementary search in
192 Google Scholar was also performed to allow the inclusion of gray literature [35]. Finally,
193 a manual search of the references of the included articles was conducted with the
194 purpose of collecting other articles that met the search criteria but were not present in
195 any of the aforementioned databases.
196 This process was conducted by one of the authors and corroborated by another
197 through the Covidence tool [36]. Although it was not necessary, a third author was
198 consulted in case of doubt.
199 2.1.2. Eligibility Criteria and Selection Process
200 A protocol was registered in PROSPERO, and the search was conducted according
201 to these criteria. The identification code is CRD42022335866.

12
13 Healthcare 2023, 11, x FOR PEER REVIEW 5 of 25
14

202 The articles had to meet the following inclusion criteria to be included in the
203 systematic review: (a) studies with a methodological approach to P-CAT validations, (b)
204 experimental or quasiexperimental studies, (c) studies with any type of sample, and (d)
205 studies in any language. We discarded studies that presented at least one of the
206 following exclusion criteria: (a) systematic reviews, bibliometric reviews of the
207 instrument or meta-analyses, and (b) studies published after 2021.
208 2.1.3. Data Collection Process
209 After selecting the articles, the most relevant information was extracted from each
210 article. In an Excel spreadsheet, fundamental data were recorded for each of the sections:
211 introduction, methodology, results and discussion. Information was also recorded about
212 the limitations mentioned in each article, as well as the practical implications and future
213 research.
214 In addition, accounting for the aim of the study, information was collected about
215 the sources of validity for each study: test content (judges' evaluation, literature review
216 and translation); response processes; internal structure (factor analysis, design,
217 estimator, factor extraction method, factors and items, interfactor R, internal replication,
218 effect of the method, and factor loadings); relationship with other variables (convergent,
219 divergent, concurrent and predictive validity) and consequences of measurement.
220 This process was conducted by one of the authors and corroborated by another
221 through the Covidence tool [36]. Although it was not necessary, in the event of a
222 disagreement, a third author was available for consultation.
223 2.2. Description of the Validity Study
224 To describe the validity of the studies,. information was recorded for the seven
225 articles included in the systematic review on an Excel spreadsheet. The data were
226 extracted directly from the text of the articles and comprised information about the
227 authors, the year of publication, the country where each P-CAT validation was
228 produced and each of the five standards proposed in the Standards [29].
229 The validity source related to internal structure was divided into three sections
230 recording information about dimensionality (including aspects such as factor analysis,
231 design, estimator, factor extraction method, factors and items, interfactor R, internal
232 replication, effect of the method, and factor loadings), reliability expression (i.e., internal
233 consistency and test-retest) and the study of factorial invariance according to the groups
234 into which it was divided (e.g., sex, age, profession) and the level of study (i.e., metric,
235 intercepts). This allowed much more information to be obtained than relying solely on
236 source validity based on internal structure. This division was done by the same
237 researcher who performed the previous processes.

238 3. Results
239 3.1. Systematic Review
240 3.1.1. Study Selection and Study Characteristics
241 The systematic review process was developed according to the PRISMA
242 methodology [32].
243 The databases WoS, Scopus, PubMed and Google Scholar were searched on
244 February 12, 2022, yielding a total of 485 articles. Of these, 111 were found in WoS, 114
245 in Scopus, 43 in PubMed and 217 in Google Scholar. In the first phase, the title and
246 abstract of all the articles were read. A total of 457 articles were eliminated because they
247 did not include studies with a methodological approach to P-CAT validation and 1
248 because it was the original P-CAT article. This left 27 articles, 19 of which were
249 duplicates. This process yielded a total of 8 articles that were evaluated for eligibility by
250 a complete reading of the text. In this step, 1 of the articles was excluded due to the
251 impossibility of accessing the full text [23]. Finally, a manual search was performed by

15
16 Healthcare 2023, 11, x FOR PEER REVIEW 6 of 25
17

252 reviewing the references of the seven studies, but none was considered suitable for
253 inclusion. Thus, the review was conducted with a total of seven articles.
254 Of the seven studies, six were studies of original validations in other languages,
255 including Norwegian [17], Swedish [18], Chinese (two studies [20,21]), Spanish [16], and
256 Korean [22]. The study by Selan et al. [19] includes a modification of the Swedish version
257 of the P-CAT and explores the psychometric properties of both versions (i.e., the original
258 Swedish version and the modified version).
259 The item selection and screening process is illustrated in detail in Figure 1.

260

261 Figure 1. Flowchart according to PRISMA.

262 3.2. Validity Analysis


263 To provide a clear view of the validity analyses, Table 1 descriptively shows the
264 percentages of items providing information about the five standards proposed by the
265 Standards guide [29]. The table shows a high number of validity sources related to test
266 content and internal structure in relation to dimensionality and internal consistency,
267 followed by a moderate number of sources for test-retest and relationship with other
268 variables. A rate of 0% is observed for validity sources related to response processes,
269 invariance and test consequences. Below, different sections related to each of the
270 standards are shown, and the information is presented in more detail.

271 Table 1. Number of studies and percentages for each validity test.

Validity evidence N % Validity model


Content 7 100 Classical
Response process 0 0 Classical
Internal structure - - Classical

18
19 Healthcare 2023, 11, x FOR PEER REVIEW 7 of 25
20

Dimensionality 7 100 -
Reliability - - -
Internal consistency 7 100 -
Test-retest 4 57 -
Invariance 0 0 -
Relation to other variables 4 57 Classical
Consequences of testing 0 0 Classical
272 Note. Validity model: validity theory used (classical, modern)
273 3.2.1. Evidence Based on Test Content
274 The first standard, test content, was met in all items (100%). Translation, which
275 refers to the equivalence of content between the original language and the target
276 language, was met in the six articles that carried out validation in another language
277 and/or culture. These studies reported that the validations had been translated by
278 bilingual experts and/or experts in the area of care. In addition, three of them [16,20,21]
279 reported that the translation process followed International Test Commission guidelines
280 such as those of Beaton et al. [37], Guillemin [38], Hambleton et al. [39] and Muñiz et al.
281 [40]. The evaluation by judges, who referred to the relevance, clarity and importance of
282 the content, was divided into two categories: expert evaluation and experiential
283 evaluation. The first one was fulfilled in three of the articles [18–20], while the other one
284 was fulfilled in two [16,21]. Only one of the items [20] reported that the scale contained
285 items reflecting the dimension described in the literature. The validity evidence related
286 to the test content present in each article can be seen in Table 2.

287 Table 2. Validity tests based on test content

Judges’ assessment
Study Literature review Translation
Experts Experiential
Rokstad et al. [17] X
Sjögren et al. [18] X X
Zhong and Lou [20] X X X
Martínez et al. [16] X X
Tak et al. [22] X
Selan et al. [19] X
Le et al. [21] X X
288 3.2.2. Evidence Based on Response Processes
289 The second standard, the validity of the response process, can be obtained
290 according to the Standards from the analysis of individual responses: "questioning test
291 takers about their performance strategies or response to particular items (...),
292 maintaining records that monitor the development of a response to a writing task (...),
293 documentation of other aspects of performance, like eye movement or response times..."
294 [29] (p. 15). Thus, according to the analysis of the validity related to the response
295 processes, none of the articles complies with this evidence.
296 3.2.3. Evidence Based on Internal Structure
297 The third standard, validity related to internal structure, was divided into three
298 sections. First, the dimensionality of each study was examined. This was divided into
299 factor analysis, design, estimator, factor extraction method, factors and items, interfactor
300 R, internal replication, effect of the method, and factor loadings. Le et al. [21] conducted
301 an exploratory-confirmatory design, while Sjögren et al. [18] conducted a confirmatory-
302 exploratory design to assess construct validity using confirmatory factor analysis (CFA)
303 and exploratory factor analysis (EFA). The remaining articles employed only a single
304 form of factor analysis: three of them used EFA and two used CFA. Regarding the next

21
22 Healthcare 2023, 11, x FOR PEER REVIEW 8 of 25
23

305 point, only three of the articles reported the factor extraction method used, including
306 Kaiser’s eigenvalue, criterion, scree plot test, parallel analysis and Velicer's MAP test.
307 Instrument validations yielded a total of two factors in five of the seven articles, while
308 one yielded a single dimension [16], and another yielded three dimensions [20], as in the
309 original instrument. Interfactor R is reported only in the study by Zhong and Lou [20],
310 whereas in the study by Martinez et al. [16], it can be easily obtained since it consists of
311 only one dimension. Internal replication was also calculated in the Spanish validation by
312 randomly splitting the sample into two to test the correlations between factors. The
313 effect of the method was not reported in any of the articles. All this information is
314 presented in Table 3, as well as a summary of the factor loadings.

24
25 Healthcare 2023, 11, x FOR PEER REVIEW 9 of 25
26

315 Table 3. Validity tests based on internal structure: dimensionality.

Factor Factors Effect of


Desig Factors Interfact Factor loadings
Study analys Estimator extraction Internal replication the
n and items or R (summary)
is method method

F1 F2
Kaiser’s F1 = 1, 2, 3,
Expl.: PCA, Max.: .738 Max.: .714
Rokstad eigenvalue, 4, 5, 6, 7, 13
EFA Expl. varimax, direct N.R. N.R. N.R. Min.: .515 Min.: .543
et al. [17] criterion and F2 = 8, 9,
oblimin rotation Average: .6 Average: .6
scree plot test 10, 11, 12
08 64

Conf.: ML
F1 = 1, 2, 3, F1 F2
Expl.: PCA,
CFA Conf. Parallel analysis 4, 5, 6, 11, Max.: .7 Max.: .75
Sjögren direct oblimin
and  and Velicer's 13 N.R. N.R. N.R. Min.: .4 Min.: .46
et al. [18] rotation,
EFA Expl. MAP test F2 = 7, 8, 9, Average: .5 Average: .6
orthogonal
10 12 93 72
rotation
P-CAT-
C1 – P-
CAT-
C2: .043 F2 F3
F1
F1 = 1, 2, 3, Max.: . Max.: .
Max.: .
4, 5, 6 P-CAT- 729 51
Zhong 666
F2 = 7, 8, 9, C1 – P- Min.: . Min.: .
and Lou CFA Conf. Conf: N.R. N.R. N.R. N.R. Min.: .
10, 11, 12 CAT- 454 399
[20] 443
F3 = 13, 14, C3: .23 Avera Avera
Avera
15 ge: .58 ge: .45
ge: .51
P-CAT- 4 5
C2 – P-
CAT-
C3: .065

27
28 Healthcare 2023, 11, x FOR PEER REVIEW 10 of 25
29

The total sample was divided into


F1 = 1, 2, 3, F1
two random subsamples. In the
Martínez 4, 5, 6, 7, 8, Max.: .73
CFA Conf. Conf.: WLSMV N.R. 1 two-dimensional model, the N.R.
et al. [16] 9, 10, 11, Min.: .33
correlation between the two factors
12, 13 Average: .56
was analyzed in both subsamples.

F1 = 1, 2, 3, F1 F2
Expl.: PCA,
4, 5, 6, 7 Max.: .78 Max.: .8
Tak et al. varimax
EFA Expl. N.R. F2 = 8, 9, N.R. N.R. N.R. Min.: .5 Min.: .34
[22] orthogonal
10, 11, 12, Average: .6 Average: .6
rotation
13 92 53

Kaiser’s
Expl.: PCA, F1 = 1, 2, 3, F1 F2
eigenvalue,
varimax 4, 5, 6, 11, Max.: .776 Max.: .771
Selan et criterion, scree
EFA Expl. orthogonal 13 N.R. N.R. N.R. Min.: .465 Min.: .546
al. [19] plot test and
rotation F2 = 7, 8, 9, Average: .5 Average: .7
parallel
10, 12 99 11
analysis

Expl.: PCA, F1 = 1, 2, 3, F1 F2
EFA Expl. varimax 4, 5, 6, 7, Max.: .79 Max.: .89
Le et al.
and  rotation, oblique N.R. 10, 13 N.R. N.R. N.R. Min.: .17 Min.: .71
[21]
CFA Conf. rotation F2 = 8, 9, Average: .5 Average: .8
Conf.: SEM 11, 12 96 17
316 Note. N.R.: not reported. CFA: confirmatory factor analysis. EFA: exploratory factor analysis. Expl.: exploratory. Conf.: confirmatory. PCA: principal component
317 analysis. MAP test: minimum average partial test. WLSMV: weighted least squares means and variance adjusted estimator. SEM: structural equation modeling.
318 Max.: maximum factor loading. Min.: minimum factor loading. Average: average factor loading.

30
31 Healthcare 2023, 11, x FOR PEER REVIEW 11 of 25
32

319 The second section examines reliability. All the studies presented measures of
320 internal consistency, conducted in their entirety with Cronbach's α coefficient, both for
321 the total scale and the subscales. In no case was the  coefficient of McDonald used. In
322 reference to the test-retest, four of the seven articles performed it. Martínez et al. [16]
323 performed it after a period of 7 days, while Le et al. [21] and Rokstad et al. [17]
324 performed it between 1 and 2 weeks later, and Sjögren et al. [18] allowed approximately
325 2 weeks to pass after the initial test.
326 The third section analyzes the calculation of invariance, which was not reported in
327 any of the studies.
328 3.2.4. Evidence Based on Relationships with other Variables
329 In the fourth of the standards, based on validity according to the relationship with
330 other variables, it is observed that the articles that report it use only convergent validity
331 (i.e., it is hypothesized that the variables related to the construct measured by the test, in
332 this case, person-centeredness, are related, positively or negatively, with another
333 construct). Discriminant validity hypothesizes that the variables related to the PCC
334 construct are not correlated in any way with any other variable studied. No article (0%)
335 measured discriminant evidence, while four (57%) measured convergent evidence
336 [16,19,20,22]. This convergent validity has been obtained through comparison with
337 instruments such as the Person-centered Climate Questionnaire–Staff Version (PCQ-S),
338 the Staff-based Measures of Individualized Care for Institutionalized Persons with
339 Dementia (IC), the Caregiver Psychological Elder Abuse Behavior Scale (CPEAB), the
340 Organizational Climate (CLIOR) and the Maslach Burnout Inventory (MBI). In the case
341 of Selan et al. [19], convergent validity was performed on two items considered by the
342 authors as "crude measures of person-centered care (i.e., external constructs) giving an
343 indication of the instruments’ ability to measure PCC" (p. 4). Concurrent validity, which
344 measures the degree to which the results of one test are or are not similar to those of
345 another conducted at more or less the same time with the same participants, and
346 predictive validity, which allows establishing predictions regarding behavior based on
347 the comparison between the values of the instrument and the criterion, have not been
348 reported in any of the studies.
349 3.2.5. Evidence based on the Consequences of Testing
350 The fifth and final standard is related to the consequences of the test; that is, it
351 analyzes the consequences, both intended and unintended, of having applied the test to
352 a given sample. None of the articles present explicit or implicit evidence of this.
353 The last two sources of validity can be seen in Table 4.

354 Table 4. Evidence of validity based on association with other variables and on consequences of testing.

Consequences of
Study Relation to other variables
testing
Convergent: N.R.
Divergent: N.R.
Rokstad et al. [17] N.R.
Concurrent: N.R.
Predictive: N.R.
Convergent: N.R.
Divergent: N.R.
Sjögren et al. [18] N.R.
Concurrent: N.R.
Predictive: N.R.
Convergent: IC and CPEAB.
Zhong and Lou Divergent: N.R.
N.R.
[20] Concurrent: N.R.
Predictive: N.R.

33
34 Healthcare 2023, 11, x FOR PEER REVIEW 12 of 25
35

Convergent: emotional exhaustion, depersonalization, personal


achievement, and organizational climate.
Martínez et al. [16] Divergent: N.R. N.R.
Concurrent: N.R.
Predictive: N.R.
Convergent: PCQ-S.
Divergent: N.R.
Tak et al. [22] N.R.
Concurrent: N.R.
Predictive: N.R.
Convergent: control questions “The care here is person-centered” and “We
work from the individual's self-perceived needs”.
Selan et al. [19] Divergent: N.R. N.R.
Concurrent: N.R.
Predictive: N.R.
Convergent: N.R.
Divergent: N.R.
Le et al. [21] N.R.
Concurrent: N.R.
Predictive: N.R.
355 Note. N.R.: not reported. IC: Staff-Based Measures of Individualized Care for Institutionalized
356 Persons with Dementia. CPEAB: Caregiver Psychological Elder Abuse Behavior Scale. PCQ-S:
357 Person-centered Climate Questionnaire-Staff Version.
358 Table 5 shows the results of the set of validity tests for each study according to the
359 described standards

360 Table 5. Results of validity tests.

Internal structure
Response Reliability Relation to other Consequences
Study Test content Factor
processes Internal Test- Invariance variables of testing
analysis
consistency retest
Processes of
Rokstad et translation Cronbach's 1-2
N.R. EFA N.R. N.R. N.R.
al. [17] and cultural alpha weeks.
adaptation
Evaluation
by expert
judges and
Sjögren et CFA and Cronbach's
processes of N.R. 2 weeks. N.R. N.R. N.R.
al. [18] EFA alpha
translation
and cultural
adaptation
Evaluation
by expert
judges,
Zhong processes of Convergent
Cronbach's
and Lou translation N.R. CFA. N.R. N.R. validity: IC and N.R.
alpha
[20] and cultural CPEAB
adaptation,
and literature
review

36
37 Healthcare 2023, 11, x FOR PEER REVIEW 13 of 25
38

Convergent
Experiential
validity: emotional
evaluation by
exhaustion,
judges and
Martínez Cronbach's depersonalization,
processes of N.R. CFA 1 week. N.R. N.R.
et al. [16] alpha personal
translation
achievement and
and cultural
organizational
adaptation
climate.
Processes of
Tak et al. translation Cronbach's Convergent
N.R. EFA N.R. N.R. N.R.
[22] and cultural alpha validity: PCQ-S
adaptation
Convergent
validity: examined
by calculating
Evaluation
Selan et al. Cronbach's N.R. Spearman's rho
by expert N.R. EFA N.R. N.R.
[19] alpha . correlation
judges
between the factors
and the control
questions
Experiential
evaluation by
judges and
Le et al. EFA and Cronbach's 1-2
processes of N.R. N.R. N.R. N.R.
[21] CFA alpha weeks.
translation
and cultural
adaptation
361 Note. N.R.: not reported. CFA: confirmatory factor analysis. EFA: exploratory factor analysis. IC:
362 Staff-Based Measures of Individualized Care for Institutionalized Persons with Dementia. CPEAB:
363 Caregiver Psychological Elder Abuse Behavior Scale. PCQ-S: Person-centered Climate
364 Questionnaire-Staff Version.

39
40 Healthcare 2023, 11, x FOR PEER REVIEW 14 of 25
41

365 4. Discussion
366 The main purpose of this article is to analyze the evidence of validity shown in the
367 different validation studies of the P-CAT, and with the aim of gathering all the existing
368 validations, a systematic review of all the literature citing this instrument was
369 conducted.
370 As seen, the publication of validation studies of the P-CAT has been constant over
371 the years. Since the publication of the original instrument in 2010, a total of seven
372 validation studies have been published in other languages (including the Italian version
373 by Brugnolli et al. [23], which could not be included in this study), plus a modification of
374 one of these versions. The very unequal distribution of validations between languages
375 and countries is striking. A recent systematic review [41] revealed that in Europe, the
376 countries where the PCC approach is most widely used are the United Kingdom,
377 Sweden, the Netherlands, Northern Ireland, and Norway. It has also been seen that the
378 proximity between neighboring countries seems to exert an influence on each other [42],
379 so that they tend to organize health care in a similar way, as is the case with
380 Scandinavian countries. This fact favors the expansion of PCC, which explains the
381 numerous validations we found in this geographical area.
382 However, although this approach is conceived as an essential element in health care
383 for most governments [43], PCC varies according to the different definitions and
384 interpretations attributed to it, which can cause confusion in its application (e.g.,
385 between Norway and the United Kingdom) [44]. Moreover, it has been seen that the
386 facilitators or barriers to implementation depend on the context and level of
387 development of each country and that financial support remains one of the determinants
388 in this regard [43]. This fact offers an explanation as to why PCC is not globally
389 widespread among all territories. In countries where access to health care for all remains
390 out of reach for economic reasons, the application of this approach takes a back seat, as
391 does the validation of its assessment tools. In contrast, in a large part of Europe or in
392 countries such as China or South Korea, which have experienced decades of rapid
393 economic development, patients feel the willingness to be involved in their medical
394 treatment and to enjoy more satisfying and efficient medical experiences and
395 environments [45], which has facilitated the expansion of validations of instruments
396 such as the P-CAT.
397 Regarding validity testing, the guidelines proposed by the Standards [29] were
398 followed. None of the analyses conducted on the different validations of the P-CAT
399 instrument makes use of a structured theoretical framework for conducting the
400 validation tests. The most frequently reported validity tests have been on the content of
401 the test and two of the sections into which the internal structure was divided (i.e.,
402 dimensionality and internal consistency). Hawkins et al. [28] are the authors of the first
403 study that conducted a review of the sources of validity according to the guidelines
404 proposed by the Standards. They found that the items included in their work reported
405 mostly sources of validity related to other variables, followed by those related to test
406 content and internal structure. Sources of validity related to response processes and test
407 outcomes have rarely been studied.
408 The results of Hawkins et al. [28] partially coincide with the findings of this study.
409 In the present article, the most cited source of validity in the studies included was the
410 content of the test. This is because most articles were validations of the P-CAT in other
411 languages, and the authors reported in all cases that the translation procedure was
412 conducted by experts. In addition, several of the studies employed International Test
413 Commission guidelines, such as those by Beaton et al. [37], Guillemin [38], Hambleton et
414 al. [39] and Muñiz et al. [40]. Some studies also assessed the relevance, clarity and
415 importance of the content.

42
43 Healthcare 2023, 11, x FOR PEER REVIEW 15 of 25
44

416 The third source of validity, internal structure, was the next most often reported,
417 although it appeared unevenly among the three sections into which this evidence was
418 divided. Dimensionality and internal consistency were sections reported in all studies,
419 followed by test-retest. In relation to the first section, factor analysis, a total of five EFAs
420 and four CFAs were presented in the validations. Traditionally, EFAs have been used in
421 research to assess dimensionality and identify key psychological constructs, although
422 this has numerous inconveniences, such as difficulty in testing measurement invariance,
423 in incorporating latent factors into subsequent analyses [46] or a major problem of factor
424 loading matrix rotation [47]. Eventually, researchers began using CFA, a technique that
425 overcame some of these obstacles [46] but had other drawbacks, such as that the strict
426 requirement of zero cross-loadings often does not fit the data well and that
427 misspecification of zero loadings tends to give distorted factors [47]. Recently, another
428 technique has been proposed, exploratory structural equation modeling (ESEM), a
429 technique widely recommended both conceptually and empirically to assess the internal
430 structure of psychological tools [48] since it overcomes the limitations that EFA and CFA
431 have in estimating their parameters [46,47].
432 The next section, reliability, is reported for the total number of items by Cronbach's
433 α reliability coefficient. Reliability is defined as "a population-based quantification of
434 measurement accuracy as a function of signal-to-noise ratio" [49], a combination of
435 systematic and random influences that determine the observed scores on a psychological
436 test. Reporting the reliability measure serves to ensure that item-based scores are
437 consistent, that the tool's responses are replicable and that they are not modified solely
438 by random noise [50]. The most commonly employed reliability coefficient in most
439 studies containing a multi-item measurement scale (MIMS) is Cronbach's α [51]. For
440 example, McNeish [50] conducted an analysis of 118 articles from different emblematic
441 journals in educational, social, and clinical psychology and found that 92% used only
442 this coefficient as a measure of scale reliability. In Hayes and Coutts [51], who examined
443 a total of 187 articles, Cronbach's α was reported in 93% of the cases, while the rest did
444 not report any source of reliability.
445 Cronbach's α [52] is based on numerous strict assumptions (e.g., the test must be
446 unidimensional, factor loadings must be equal for all items or item errors do not covary)
447 to estimate internal consistency, the importance and rigidity of which may not be
448 accounted for in empirical work [50]. Since it is difficult for all these conditions to be met
449 in real-world situations and violation of these assumptions may produce reliability
450 estimates that are too small, numerous researchers and methodologists have spoken out
451 to demonstrate that Cronbach's α is an obsolete coefficient that should be replaced by
452 other methods. One of the alternative measures that stands up to α and is increasingly
453 recommended by the scientific literature is  McDonald's [53], a composite reliability
454 measure. This coefficient is recommended for congeneric scales in which tau equivalence
455 is not assumed. In addition, it has numerous advantages, such as that the formula can be
456 extended to account for design-based error covariances [50], that estimates of  are
457 usually robust when the estimated model contains more factors than the true model,
458 even with small samples, or when skewness in univariate item distributions produces
459 lower biases than those found when using α [49].
460 The test-retest was the next most reported internal structure section in the studies.
461 This is a type of reliability that alludes to the consistency of the scores of a test between
462 two measurements separated by a period of time [54]. It is striking that test-retest
463 consistency does not have a prevalence similar to that of internal consistency since,
464 unlike internal consistency, it can be assessed for practically all types of patient-reported
465 outcomes and is even considered by some measurement experts as a report of reliability
466 with greater relevance than internal consistency since it plays a fundamental role in the
467 calculation of parameters for health measures [54]. However, little guidance is provided
468 in the literature regarding the assessment of this type of reliability.

45
46 Healthcare 2023, 11, x FOR PEER REVIEW 16 of 25
47

469 The last of the internal structure sections least reported in the studies in this review
470 has been invariance. Noninvariance refers to the difference between scores on a test that
471 is not explained by group differences in the structure it is intended to measure [55]. The
472 invariance of the measure should be emphasized as a prerequisite in the comparison
473 between groups since “if scale invariance is not examined, item bias may not be fully
474 recognized and this may lead to a distorted interpretation of the bias in a particular
475 psychological measure” [55].
476 Evidence related to other variables (most widely reported in the case of Hawkins et
477 al. [28] was the next most reported source of validity in the studies included in this
478 review. Specifically, the four studies reporting this evidence do so according to
479 convergent validity, citing several instruments: the PCQ-S, a self-report designed to
480 assess the extent to which staff perceive the climates of healthcare settings to be person-
481 centered; the CLIOR, which measures organizational climate; the MBI, which measures
482 the degree of employee emotional burden; the IC, which measures three domains of
483 individualized care (knowing the person/resident, resident autonomy and choice, and
484 communication – staff to staff and staff to resident); and the CPEAB, which determines
485 the degree of psychologically abusive behavior of caregivers toward elderly individuals.
486 It is striking that none of the studies include evidence of discriminant validity, although
487 this may be because there are several obstacles related to the measurement of this type
488 of validity [56]. Different definitions are used in the applied literature, which makes its
489 evaluation difficult, and the literature on discriminant validation is focused on
490 techniques that require the use of multiple measurement methods, which often seem to
491 have been introduced without sufficient evidence or are applied randomly.
492 Validity related to response processes was not reported by any of the studies. There
493 are several methods to analyze this validity that have been divided into two groups:
494 "those that directly access the psychological processes or cognitive operations (think
495 aloud, focus group, and interviews), compared to those that provide indirect indicators
496 which in turn require additional inference (eye tracking and response times)" [57].
497 However, this validity evidence has traditionally been less reported than others in most
498 studies, perhaps because there are fewer clear and accepted practices on how to design
499 such studies or report on them [58].
500 Finally, the consequences of testing have not been reported in any of the studies.
501 There is debate surrounding this source of validity, with two main opposing streams of
502 thought. On the one hand, Messick [59], cited by Moss [60], suggests that consequences
503 that may appear after the application of a test do not derive from any source of test
504 invalidity and that "adverse consequences only undermine the validity of an assessment
505 if they can be attributed to a problem of fit between the test and the construct" (p. 6). In
506 contrast, Cronbach [61], cited by Moss [60], notes that adverse social consequences that
507 may result from the application of a test may call into question the validity of the test.
508 However, the potential risks that may arise from the application of a test should be
509 minimized in any case, especially regarding health assessments. To this end, it is
510 essential that this aspect be assessed by instrument developers and that the experiences
511 of respondents be protected through the development of comprehensive and informed
512 practices [28].
513 This work is not without limitations, the first being that not all published validation
514 studies of the P-CAT, such as the Italian version by Brugnolli et al. [23], were available,
515 and these studies could have provided relevant information. Second, many of the
516 sources of validity could not be analyzed because the studies provided scant or even no
517 data, such as response processes, relationships with other variables, consequences of
518 testing, or invariance in the case of internal structure, and interfactor correlation, internal
519 replication or effect of the method in the case of dimensionality. In the future, it is hoped
520 that authors will become aware of the importance of validity, as shown in this article
521 and many others, and provide data on unreported sources so that comprehensive
522 validity studies can be performed. On the other hand, our study did not have an

48
49 Healthcare 2023, 11, x FOR PEER REVIEW 17 of 25
50

523 evaluative focus but a descriptive focus (as we state several times in the manuscript).
524 Therefore, the COSMIN method is not an appropriate framework because it is an
525 evaluative framework. Our framework, comes from the Standards for Educational and
526 Psychological Testing [29]. It is based on contemporary validity theory. This choice
527 should not detract from the value of the study because the Standards framework still
528 seems innovative in health sciences and gives a complementary and more complete look
529 at COSMIN. The use of the validity theory framework to describe instruments has
530 appeared in recent studies [25,27,28].
531 The present work also has several strengths. The search was extensive, as many
532 studies were obtained by using three different databases, including WoS, one of the
533 most widely used and authoritative databases in the world, which offers a large number
534 and variety of articles and is not fully automated due to its human team [62–64]. In
535 addition, to prevent publication bias, gray literature search engines such as Google
536 Scholar were also used to avoid the exclusion of unpublished research [35]. Finally,
537 linguistic bias was avoided by not limiting the search to articles published in only one or
538 two languages, thus avoiding overrepresentation of studies in one language and
539 underrepresentation in others [35].

540 5. Conclusions
541 Validity is understood as the degree to which tests and theory support the
542 interpretations of instrument scores for their intended use [29]. From this point of view,
543 the various validations of the P-CAT are not framed in a structured theoretical
544 framework as are the Standards. After integration and analysis of the results, it has been
545 observed that these validation reports offer a high number of sources of validity related
546 to test content, internal structure in dimensionality and internal consistency, a moderate
547 number of sources for internal structure in terms of test-retest reliability and for the
548 relationship with other variables, and a very low number of sources for response
549 processes, internal structure in terms of invariance, and test consequences.
550 Validity plays a fundamental role in ensuring a sound scientific basis for test
551 interpretations, as it provides evidence of the extent to which the data provided by the
552 test are valid for the intended purpose. This can affect clinical practice, as people's health
553 may depend on it. In this sense, the Standards are considered a suitable theoretical
554 framework for studying this modern conception of questionnaire validity, which should
555 be considered in future research.
556 Although the P-CAT is one of the most widely used instruments in the assessment
557 of PCC, as seen in this study, PCC has hardly been studied. The developers of
558 measurement tests applied to the health care setting, on which the health and quality of
559 life of many people may depend, should use this validity framework to reflect the clear
560 purpose of the measurement. This is important because the equity of decision making by
561 health care professionals in daily clinical practice may depend on the sources of validity.
562 Additionally, through a more extensive study of validity, including interpretation of
563 scores in terms of intended use, the P-CAT, an instrument that was initially developed
564 for long-term care elderly care homes, could be expanded to other care settings.
565 However, the findings of this study show that validation studies continue to focus on
566 the types of validity traditionally studied, leaving aside the interpretation of the scores.
567
568 Author Contributions: Conceptualization, L.M.B.L., M.M.V. and C.M.S.; methodology, L.M.B.L.,
569 M.M.V., and F.T.T.; software, M.M.V., J.G.E., and C.M.S.; validation, J.L.S., C.M.S., and F.T.T.;
570 formal analysis, L.M.B.L., M.M.V. and C.M.S.; investigation, L.M.B.L., M.M.V. and F.T.T.;
571 resources, F.T.-T.; data curation, L.M.B.L., M.M.V. and J.G.E.; writing—original draft preparation,
572 L.M.B.L., M.M.V. and C.M.S.; writing—review and editing, L.M.B.L., M.M.V. and F.T.T.;
573 visualization, J.LS.; supervision, M.M.V.; project administration, C.M.S., and F.T.T; funding
574 acquisition, F.T.T. All authors have read and agreed to the published version of the manuscript.

51
52 Healthcare 2023, 11, x FOR PEER REVIEW 18 of 25
53

575 Funding: This work is one of the results of the research project HIM/2015/017/SSA.1207, “Effects of
576 mindfulness training on psychological distress and quality of life of the family caregiver”. Main
577 researcher: Filiberto Toledano-Toledano Ph.D. The present research was conducted with federal
578 funds for health research and was approved by the Commissions of Research, Ethics and Biosafety
579 (Comisiones de Investigación, Ética y Bioseguridad), Hospital Infantil de México Federico Gómez,
580 National Institute of Health. The source of federal funds did not affect the study design, data
581 collection, analysis, interpretation or decisions regarding publication.
582 Institutional Review Board Statement: The study was conducted according to the guidelines of
583 the Declaration of Helsinki and approved by the Commissions of Research, Ethics and Biosafety
584 (Comisiones de Investigación, Ética y Bioseguridad), Hospital Infantil de México Federico Gómez,
585 National Institute of Health. HIM/2015/017/SSA.1207, “Effects of mindfulness training on
586 psychological distress and quality of life of the family caregiver”.
587 Informed Consent Statement: Not applicable.
588 Data Availability Statement: All data relevant to the study are included in the article or uploaded
589 as additional files. Additional template data extraction forms are available from the corresponding
590 author upon reasonable request.
591 Acknowledgments: The authors thank the casual helpers for their aid with information
592 processing and searching.
593 Conflicts of Interest: The authors declare no conflict of interest.

54
55 Healthcare 2023, 11, x FOR PEER REVIEW 19 of 25
56

594 Appendix
Section and topic Item Checklist item Page where item is reported
TITLE
Title 1 Identify the report as a systematic review. 1
ABSTRACT
Abstract 2 See the PRISMA 2020 for Abstracts checklist. 2
INTRODUCTION
Rationale 3 Describe the rationale for the review in the context of existing knowledge. 3-9
Provide an explicit statement of the objective(s) or question(s) the review
Objectives 4 9
addresses.
METHOD
Specify the inclusion and exclusion criteria for the review and how studies were
Eligibility criteria 5 11
grouped for the syntheses.
Specify all databases, registers, websites, organizations, reference lists and other
Information sources 6 sources searched or consulted to identify studies. Specify the date when each 10
source was last searched or consulted.
Present the full search strategies for all databases, registers and websites,
Search strategy 7 10
including any filters and limits used.
Specify the methods used to decide whether a study met the inclusion criteria of
the review, including how many reviewers screened each record and each report
Selection process 8 11
retrieved, whether they worked independently, and if applicable, details of
automation tools used in the process.
Specify the methods used to collect data from reports, including how many
Data collection reviewers collected data from each report, whether they worked independently,
9 11
process any processes for obtaining or confirming data from study investigators, and if
applicable, details of automation tools used in the process.
Data items 10a List and define all outcomes for which data were sought. Specify whether all 12
results that were compatible with each outcome domain in each study were
sought (e.g., for all measures, time points, analyses), and if not, the methods used
to decide which results to collect.

57
58 Healthcare 2023, 11, x FOR PEER REVIEW 20 of 25
59

Section and topic Item Checklist item Page where item is reported
10b List and define all other variables for which data were sought (e.g., participant NA
and intervention characteristics, funding sources). Describe any assumptions
made about any missing or unclear information.
Specify the methods used to assess risk of bias in the included studies, including
details of the tool(s) used, how many reviewers assessed each study and whether
Study risk of bias assessment 11 NA
they worked independently, and if applicable, details of automation tools used in
the process.
Specify for each outcome the effect measure(s) (e.g., risk ratio, mean difference)
Effect measures 12 NA
used in the synthesis or presentation of results.
Describe the processes used to decide which studies were eligible for each
13a synthesis (e.g., tabulating the study intervention characteristics and comparing 11
against the planned groups for each synthesis (item #5)).
Describe any methods required to prepare the data for presentation or synthesis,
13b NA
such as handling of missing summary statistics, or data conversions.
Describe any methods used to tabulate or visually display results of individual
13c NA
studies and syntheses.
Synthesis methods Describe any methods used to synthesize results and provide a rationale for the
choice(s). If meta-analysis was performed, describe the model(s), method(s) to
13d NA
identify the presence and extent of statistical heterogeneity, and software
package(s) used.
Describe any methods used to explore possible causes of heterogeneity among
13e NA
study results (e.g., subgroup analysis, meta-regression).
Describe any sensitivity analyses conducted to assess robustness of the
13f NA
synthesized results.
Reporting bias Describe any methods used to assess risk of bias due to missing results in a
14 NA
assessment synthesis (arising from reporting biases).
Certainty Describe any methods used to assess certainty (or confidence) in the body of
15 NA
assessment evidence for an outcome.
RESULTS

60
61 Healthcare 2023, 11, x FOR PEER REVIEW 21 of 25
62

Section and topic Item Checklist item Page where item is reported
Describe the results of the search and selection process, from the number of
16a records identified in the search to the number of studies included in the review, 12-14
ideally using a flow diagram
Study selection
Cite studies that might appear to meet the inclusion criteria, but which were
16b 13
excluded, and explain why they were excluded.
Study characteristics 17 Cite each included study and present its characteristics. 13-21
Risk of bias in
18 Present assessments of risk of bias for each included study. NA
studies
For all outcomes in each study, present: (a) summary statistics for each group
Results of individual studies 19 (where appropriate) and (b) an effect estimate and its precision (e.g., NA
confidence/credible interval), ideally using structured tables or plots.
For each synthesis, briefly summarize the characteristics and risk of bias among
20a NA
contributing studies.
Present results of all statistical syntheses conducted. If meta-analysis was done,
present for each the summary estimate and its precision (e.g., confidence/credible
20b NA
interval) and measures of statistical heterogeneity. If comparing groups, describe
Results of
the direction of the effect.
syntheses
Present results of all investigations of possible causes of heterogeneity among
20c NA
study results.
Present results of all sensitivity analyses conducted to assess the robustness of the
20d NA
synthesized results.
Present assessments of risk of bias due to missing results (arising from reporting
Reporting biases 21 NA
biases) for each synthesis assessed.
Certainty of Present assessments of certainty (or confidence) in the body of evidence for each
22 NA
evidence outcome assessed.
DISCUSSION
23a Provide a general interpretation of the results in the context of other evidence. 21-27
23b Discuss any limitations of the evidence included in the review. NA
Discussion
23c Discuss any limitations of the review processes used. 27
23d Discuss implications of the results for practice, policy, and future research. 29
OTHER INFORMATION

63
64 Healthcare 2023, 11, x FOR PEER REVIEW 22 of 25
65

Section and topic Item Checklist item Page where item is reported
Provide registration information for the review, including register name and
24a 11
registration number, or state that the review was not registered.
Indicate where the review protocol can be accessed, or state that a protocol was
24b 11
Competing interests not prepared.
Describe and explain any amendments to information provided at registration or
24c NA
in the protocol.
Describe sources of financial or nonfinancial support for the review, and the role
Support 25 31
of the funders or sponsors in the review.
Competing interests 26 Declare any competing interests of review authors. 30
Report which of the following are publicly available and where they can be found:
Availability of data, code and other materials 27 template data collection forms; data extracted from included studies; data used for NA
all analyses; analytic code; any other materials used in the review.
595 Note. NA: not applicable.

66
67 Healthcare 2023, 11, x FOR PEER REVIEW 23 of 25
68

596 References
597 1. Institute of Medicine. Crossing the Quality Chasm: A New Health System for the 21st Century; National Academy Press:
598 Washington, DC, 2001.
599 2. International Alliance of Patients’ Organizations. What is Patient-Centred Healthcare? A Review of Definitions and Principles;
600 International Alliance of Patients’ Organizations: London, UK, 2007.
601 3. World Health Organization. WHO Global Strategy on People-Centred and Integrated Health Services: Interim Report; World Health
602 Organization: Geneva, Switzerland, 2015.
603 4. Britten, N.; Ekman, I.; Naldemirci, Ö.; Javinger, M.; Hedman, H.; Wolf, A. Learning from Gothenburg model of person centred
604 healthcare. BMJ 2020, 370, m2738; DOI:10.1136/bmj.m2738.
605 5. van Diepen, C.; Fors, A.; Ekman, I.; Hensing, G. Association between person-centred care and healthcare providers’ job
606 satisfaction and work-related health: A scoping review. BMJ Open 2020, 10, e042658; DOI:10.1136/bmjopen-2020-042658.
607 6. Ekman, N.; Taft, C.; Moons, P.; Mäkitalo, Å.; Boström, E.; Fors, A. A state-of-the-art review of direct observation tools for
608 assessing competency in person-centred care. Int. J. Nurs. Pract. 2020, 109, 103634; DOI:10.1016/j.ijnurstu.2020.103634.
609 7. American Geriatrics Society Expert Panel on Person‐Centered Care. Person-centered care: A definition and essential elements.
610 J. Am. Geriatr. Soc. 2016, 64, 15–18; DOI:10.1111/jgs.13866.
611 8. McCormack, B.; Borg, M.; Cardiff, S.; Dewing, J.; Jacobs, G.; Janes, N.; Karlsson, B.; McCance, T.; Mekki, T.E.; Porock, D.; et al.
612 Person-centredness – the ‘state’ of the art. Int. Pract. Dev. J. 2015, 5, 1–15; DOI:10.19043/ipdj.5sp.003.
613 9. Wilberforce, M.; Challis, D.; Davies, L.; Kelly, M.P.; Roberts, C.; Loynes, N. Person-centredness in the care of older adults: A
614 systematic review of questionnaire-based scales and their measurement properties. BMC Geriatr. 2016, 16, 63;
615 DOI:10.1186/s12877-016-0229-y.
616 10. Rathert, C.; Wyrwich, M.D.; Boren, S.A. Patient-centered care and outcomes: A systematic review of the literature. Med. Care
617 Res. Rev. 2013, 70, 351–379; DOI:10.1177/1077558712465774.
618 11. Sharma, T.; Bamford, M.; Dodman, D. Person-centred care: An overview of reviews. Contemp. Nurse 2016, 51, 107–120;
619 DOI:10.1080/10376178.2016.1150192.
620 12. Ahmed, S.; Djurkovic, A.; Manalili, K.; Sahota, B.; Santana, M.J. A qualitative study on measuring patient-centered care:
621 Perspectives from clinician-scientists and quality improvement experts. Health Sci. Rep. 2019, 2, e140; DOI:10.1002/hsr2.140.
622 13. Edvardsson, D.; Fetherstonhaugh, D.; Nay, R.; Gibson, S. Development and initial testing of the Person-centered Care
623 Assessment Tool (P-CAT). Int. Psychogeriatr. 2010, 22, 101–108; DOI:10.1017/s1041610209990688.
624 14. Tamagawa, R.; Groff, S.; Anderson, J.; Champ, S.; Deiure, A.; Looyis, J.; Faris, P.; Watson, L. Effects of a provincial-wide
625 implementation of screening for distress on healthcare professionals' confidence and understanding of person-centered care
626 in oncology. J. Natl. Compr. Canc. Netw. 2016, 14, 1259–1266; DOI:10.6004/jnccn.2016.0135.
627 15. Innocenti, A.D.; Wijk, H.; Kullgren, A.; Alexiou, E. The influence of evidence-based design on staff perceptions of a
628 supportive environment for person-centered care in forensic psychiatry. J. Forensic Nurs. 2020, 16, E23–E30;
629 DOI:10.1097/JFN.0000000000000261.
630 16. Martínez, T.; Suárez-Álvarez, J.; Yanguas, J.; Muñiz, J. Spanish validation of the Person-centered Care Assessment Tool (P-
631 CAT). Aging Ment. Health 2016, 20, 550–558; DOI:10.1080/13607863.2015.1023768.
632 17. Rokstad, A.M.M.; Engedal, K.; Edvardsson, D.; Selbaek, G. Psychometric evaluation of the Norwegian version of the person-
633 centred care assessment tool. Int. J. Nurs. Pract. 2012, 18, 99–105; DOI:10.1111/j.1440-172x.2011.01998.x.
634 18. Sjögren, K.; Lindkvist, M.; Sandman, P.O.; Zingmark, K.; Edvardsson, D. Psychometric evaluation of the Swedish version of
635 the person-centered care assessment tool (P-CAT). Int. Psychogeriatr. 2012, 24, 406–415; DOI:10.1017/s104161021100202x.
636 19. Selan, D.; Jakobsson, U.; Condelius, A. The Swedish P-CAT: Modification and exploration of psychometric properties of two
637 different versions. Scand. J. Caring Sci. 2017, 31, 527–535; DOI:10.1111/scs.12366.
638 20. Zhong, X.B.; Lou, V.W.Q. Person-centered care in Chinese residential care facilities: A preliminary measure. Aging Ment.
639 Health 2013, 17, 952–958; DOI:10.1080/13607863.2013.790925.
640 21. Le, C.; Ma, K.; Tang, P.; Edvardsson, D.; Behm, L.; Zhang, J.; Yang, J.; Fu, H.; Ahlström, G. Psychometric evaluation of the
641 Chinese version of the person-centred care assessment tool. BMJ Open 2020, 10, e031580; DOI:10.1136/bmjopen-2019-031580.
642 22. Tak, Y.R.; Woo, H.Y.; You, S.Y.; Kim, J.H. Validity and reliability of the person-centered care assessment tool in long-term care
643 facilities in Korea. J. Korean Acad. Nurs. 2015, 45, 412–419; DOI:10.4040/jkan.2015.45.3.412.
644 23. Brugnolli, A.; Debiasi, M.; Zenere, A.; Baggia, M. The person-centered care assessment tool in nursing homes: Psychometric
645 evaluation of the Italian version. J. Nurs. Meas. 2020, 28, 555–563; DOI:10.1891/jnm-d-18-00090.
646 24. Bru-Luna, L.M.; Martí-Vilar, M.; Merino-Soto, C.; Livia, J. Reliability generalization study of the person-centered care
647 assessment tool. Front. Psychol. 2021, 12, 712582; DOI:10.3389/fpsyg.2021.712582.
648 25. Hawkins, M.; Elsworth, G.R.; Nolte, S.; Osborne, R.H. Validity arguments for patient-reported outcomes: Justifying the
649 intended interpretation and use of data. J. Patient Rep. Outcomes 2021, 5, 64; DOI:10.1186/s41687-021-00332-y.
650 26. Sireci, S.G. On the validity of useless tests. Assess. Educ. Princ. Policy Pract. 2016, 23, 226–235;
651 DOI:10.1080/0969594x.2015.1072084.
652 27. Hawkins, M.; Elsworth, G.R.; Osborne, R.H. Questionnaire validation practice: A protocol for a systematic descriptive
653 literature review of health literacy assessments. BMJ Open 2019, 9, e030753; DOI:10.1136/bmjopen-2019-030753.

69
70 Healthcare 2023, 11, x FOR PEER REVIEW 24 of 25
71

654 28. Hawkins, M.; Elsworth, G.R.; Hoban, E.; Osborne, R.H. Questionnaire validation practice within a theoretical framework: A
655 systematic descriptive literature review of health literacy assessments. BMJ Open 2020, 10, e035974; DOI:10.1136/bmjopen-
656 2019-035974.
657 29. American Educational Research Association; American Psychological Association; National Council on Measurement in
658 Education. Standards for Educational and Psychological Testing; American Educational Research Association: Washington, DC,
659 2014.
660 30. Hulin, C.L. A psychometric theory of evaluations of item and scale translations. J. Cross Cult. Psychol. 1987, 18, 115–142;
661 DOI:10.1177/0022002187018002001.
662 31. Martínez, T.; Martínez-Loredo, V.; Cuesta, M.; Muñiz, J. Assessment of person-centered care in gerontology services: A new
663 tool for healthcare professionals. Int. J. Clin. Health Psychol. 2020, 20, 62–70; DOI:10.1016/j.ijchp.2019.07.003.
664 32. Page, M.J.; McKenzie, J.; Bossuyt, P.; Boutron, I.; Hoffmann, T.; Mulrow, C.; Shamseer, L.; Tetzlaff, J.; Akl, E.; Brennan, S.E.; et
665 al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. Int. J. Surg. 2021, 88, 105906;
666 DOI:10.31222/osf.io/v7gm2.
667 33. Falagas, M.E.; Pitsouni, E.I.; Malietzis, G.A.; Pappas, G. Comparison of PubMed, scopus, web of science, and Google scholar:
668 Strengths and weaknesses. FASEB J. 2008, 22, 338–342; DOI:10.1096/fj.07-9492lsf.
669 34. Grégoire, G.; Derderian, F.; Le Lorier, J. Selecting the language of the publications included in a meta-analysis: Is there a tower
670 of babel bias? J. Clin. Epidemiol. 1995, 48, 159–163; DOI:10.1016/0895-4356(94)00098-b.
671 35. Molina, M. Aspectos metodológicos del metaanálisis (1). Pediatr. Aten. Primaria 2018, 20, 297–302.
672 36. Veritas Health Innovation. Covidence systematic review software. Available online: https://www.covidence.org/ (accessed on
673 28 February 2022).
674 37. Beaton, D.E.; Bombardier, C.; Guillemin, F.; Ferraz, M.B. Guidelines for the process of cross-cultural adaptation of self-report
675 measures. Spine 2000, 25, 3186–3191; DOI:10.1097/00007632-200012150-00014.
676 38. Guillemin, F. Cross-cultural adaptation and validation of heatth status measures. Scand. J. Rheumatol. 1995, 24, 61–63;
677 DOI:10.3109/03009749509099285.
678 39. Hambleton, R.K.; Merenda, P.F.; Spielberger, C.D. Adapting Educational and Psychological Tests for Cross-Cultural Assessment;
679 Lawrence Erlbaum Associates: Mahwah, NJ, 2005.
680 40. Muñiz, J.; Elosua, P.; Hambleton, R.K. International test commission guidelines for test translation and adaptation. Psicothema
681 2013, 25, 151–157; DOI:10.7334/psicothema2013.24.
682 41. Rosengren, K.; Brannefors, P.; Carlstrom, E. Adoption of the concept of person-centred care into discourse in Europe: A
683 systematic literature review. J. Health Organ. Manag. 2021, 35, 265–280; DOI:10.1108/jhom-01-2021-0008.
684 42. Alharbi, T.S.J.; Olsson, L.E.; Ekman, I.; Carlström, E. The impact of organizational culture on the outcome of hospital care:
685 After the implementation of person-centred care. Scand. J. Public Health 2014, 42, 104–110; DOI:10.1177/1403494813500593.
686 43. Bensbih, S.; Souadka, A.; Diez, A.G.; Bouksour, O. Patient centered care: Focus on low and middle income countries and
687 proposition of new conceptual model. J. Med. Surg. Res. 2020, 7, 755–763; DOI:10.46327/msrjg.1.000000000000168.
688 44. Stranz, A.; Sörensdotter, R. Interpretations of person-centered dementia care: Same rhetoric, different practices? A
689 comparative study of nursing homes in England and Sweden. J. Aging Stud. 2016, 38, 70–80; DOI:10.1016/j.jaging.2016.05.001.
690 45. Zhou, L.M.; Xu, R.H.; Xu, Y.H.; Chang, J.H.; Wang, D. Inpatients’ perception of patient-centered care in guangdong province,
691 China: A cross-sectional study. Inq. J. Health Care Organ. Provis. Financ. 2021, 58, 469580211059482;
692 DOI:10.1177/00469580211059482.
693 46. Marsh, H.W.; Morin, A.J.S.; Parker, P.D.; Kaur, G. Exploratory structural equation modeling: An integration of the best
694 features of exploratory and confirmatory factor analysis. Annu. Rev. Clin. Psychol. 2014, 10, 85–110; DOI:10.1146/annurev-
695 clinpsy-032813-153700.
696 47. Asparouhov, T.; Muthén, B. Exploratory structural equation modeling. Struct. Equ. Model. Multidiscip. J. 2009, 16, 397–438;
697 DOI:10.1080/10705510903008204.
698 48. Cabedo-Peris, J.; Martí-Vilar, M.; Merino-Soto, C.; Ortiz-Morán, M. Basic empathy scale: A systematic review and reliability
699 generalization meta-analysis. Healthcare 2022, 10, 29–62; DOI:10.3390/healthcare10010029.
700 49. Flora, D.B. Your coefficient alpha is probably wrong, but which coefficient omega is right? A tutorial on using r to obtain
701 better reliability estimates. Adv. Methods Pract. Psychol. Sci. 2020, 3, 484–501; DOI:10.1177/2515245920951747.
702 50. McNeish, D. Thanks coefficient alpha, we’ll take it from here. Psychol. Methods 2018, 23, 412–433; DOI:10.1037/met0000144.
703 51. Hayes, A.F.; Coutts, J.J. Use omega rather than cronbach’s alpha for estimating reliability. But…. Commun. Methods Meas. 2020,
704 14, 1–24; DOI:10.1080/19312458.2020.1718629.
705 52. Cronbach, L.J. Coefficient alpha and the internal structure of tests. Psychometrika 1951, 16, 297–334; DOI:10.1007/bf02310555.
706 53. McDonald, R.P. Test Theory: A Unified Approach; Erlbaum: Mahwah, NJ, 1999.
707 54. Polit, D.F. Getting serious about test–retest reliability: A critique of retest research and some recommendations. Qual. Life Res.
708 2014, 23, 1713–1720; DOI:10.1007/s11136-014-0632-9.
709 55. Ceylan, D.; Çizel, B.; KarakaŞ, H. Testing destination image scale invariance for intergroup comparison. Tour. Anal. 2020, 25,
710 239–251; DOI:10.3727/108354220x15758301241756.
711 56. Rönkkö, M.; Cho, E. An updated guideline for assessing discriminant validity. Organ. Res. Methods 2022, 25, 6–14;
712 DOI:10.1177/1094428120968614.

72
73 Healthcare 2023, 11, x FOR PEER REVIEW 25 of 25
74

713 57. Padilla, J.L.; Benitez, I. Validity evidence based on response processes. Psicothema 2014, 26, 136–144;
714 DOI:10.7334/psicothema2013.259.
715 58. Hubley, A.M.; Zumbo, B.D. Response processes in the context of validity: Setting the stage. In Understanding and Investigating
716 Response Processes in Validation Research; Zumbo, B., Hubley, A., Eds.; Springer: Cham, 2017; pp. 1–12.
717 59. Messick, S. Validity of performance assessments. In Technical Issues in Large-Scale Performance Assessment; Philips, G.,
718 Goldstein, A., Eds.; Department of Education, National Center for Education Statistics: Washington, DC, 1996; pp. 1–18.
719 60. Moss, P.A. The role of consequences in validity theory. Educ. Meas. Issues Pract. 1998, 17, 6–12; DOI:10.1111/j.1745-
720 3992.1998.tb00826.x.
721 61. Cronbach, L. Five perspectives on validity argument. In Test Validity; Wainer, H., Ed. Erlbaum: Hillsdale, 1988; pp. 3–17.
722 62. Birkle, C.; Pendlebury, D.A.; Schnell, J.; Adams, J. Web of Science as a data source for research on scientific and scholarly
723 activity. Quant. Sci. Stud. 2020, 1, 363–376; DOI:10.1162/qss_a_00018.
724 63. Bramer, W.M.; Rethlefsen, M.L.; Kleijnen, J.; Franco, O.H. Optimal database combinations for literature searches in systematic
725 reviews: A prospective exploratory study. Syst. Rev. 2017, 6, 245; DOI:10.1186/s13643-017-0644-y.
726 64. Web of Science Group. Editorial selection process. Available online:
727 https://clarivate.com/webofsciencegroup/solutions/%20editorial-selection-process/ (accessed on 12 September 2022).

728 Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
729 author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury
730 to people or property resulting from any ideas, methods, instructions or products referred to in the content.

75

You might also like