PROMIS Physical Function measures

Measurement properties of the PROMIS Physical Function measures in patients with spinal disorders: A Systematic Review

1. Introduction

Musculoskeletal disorders have a high prevalence and cause a high burden of disability worldwide, accounting for 21,3% of the total Years Lived with Disability(YLD)1. The two conditions with the highest proportion of YLD in 2010, according to the Global Burden of Disease Study 2010, were spinal disorders (SD); low back pain (49,6 %) and neck pain (20,1 %)2.

Physical therapist use several interventions in patients with back pain, like exercise therapy, spinal manipulation or massage, aiming to reduce pain and improve physical function (PF)3. To improve communication and patient management in patients with SD, Patient reported outcomes (PROs) are used by physical therapists and other care givers in clinical practice4. For measuring PF, a wide variety of PROs have been developed for different measurement purposes, e.g. diagnostic, prognostic or evaluative reasons. However, clinicians are challenged to select the right PROs because of the limited evidence for the measurement properties like reliability, validity or responsiveness and because of feasibility aspects5,6.

In 2004, the Patient-Reported Outcomes Measurement Information System (PROMIS) was initiated by the National Institutes of Health (NIH). The goal of the PROMIS initiative is to improve measurement quality and comparability of health outcomes measures and reduce the burden for respondents. This is realized by building and validating item banks for measuring specified symptoms and health status domains7,8. The PROMIS system consists of a collection of item banks. An item bank is a series of questions or items, all measuring the same domain, e.g. pain interference or physical function, independent from disease9. The items in an item bank are all calibrated om the same scale, using Item Response Theory (IRT) modelling. In this way, more precise measuring results can be obtained. Also, IRT based item banks enable the use of Computerized Adaptive Testing (CAT)10. A CAT uses an algorithm that selects items from the item bank, based on the person’s response to the previous question, and the estimated latent trait or domain level for that person. The algorithm stops asking questions when a pre-defined precision is reached, in this way the number of questions can be reduced to 4 to 7 items10.

A preliminary PROMIS item bank for PF was developed using IRT and evaluated through CAT simulation, and showed improved measures for PF and increased efficiency using CAT11. The PROMIS PF item bank was further developed, which resulted in a final bank consisting of 124 items covering central (i.e. spinal) and upper and lower extremity functions and activities of daily living12. Validity of the PF item bank was examined in diverse clinical samples 13. Fixed length short forms and CAT’s for the PROMIS PF item bank have been developed and the measurement properties were examined in several clinical application studies.

However, the evidence on the measurement properties of the PROMIS PF item bank, short forms and CAT applications, used in clinical samples with patients with SD, have not been reviewed in a systematic way.

2. Aim

The aim of this study is to critically appraise the evidence on the measurement properties of the PROMIS physical function item bank, short forms and CAT applications in patients with spinal disorders.

3. Methods (700 woorden, huidig ….)

3.1 Design

The study design is a systematic review of measurement properties. We performed the review according to the protocol for systematic reviews of measurement properties by C.B. Terwee (November 2011), downloaded from http://www.cosmin.nl/downloads.html . In acoordance with this protocol, reporting of the review was conducted following the PRISMA statement14.

3.2 Information sources

We searched the following electronic databases on February 24th, 2017: PubMed, EMBASE, CINAHL and Pedro. Additionally, a hand search was performed on the website www.nihpromis.com/science/PubsDomain/Physical_function.aspx where the scientific publications on PROMIS measurements in the domain of PF are presented.

3.3 Search strategy

The search contains blocks of search terms related to the following aspects: (1) construct of interest (physical functioning): no search terms were included. Instead, studies on the construct physical functioning were selected by hand form the search. (2) target population (spinal disorders): no search terms were included, studies on the target population spinal disorders were also selected by hand form the search. (3) type of instrument: (full item bank, short form or Computerized Adaptive Test using the PROMIS Physical Functioning Item bank): a combination of search terms “promis” and “patient reported outcomes measurement information system” were used. Studies on the PROMIS Physical Function item bank, short forms or CAT were selected by hand from the search (4) measurement properties: In PubMed, a validated sensitive search filter for studies on measurement properties of measurement instruments was used15. A translated version of this filter was used in EMBASE. Another translated version of this filter was used in CINAHL. The full search strategies can be found in Appendix 1.

3.4 Selection criteria

We used the following inclusion criteria: (1) the study population consisted of adult (18 years and older) patients with spinal disorders (back and neck pain), including radicular pain/hernia; (2) the main purpose of the study was the evaluation of measurement properties of the PROMIS PF item bank, PF short form or PF CAT in a clinical sample; (3) the article was an original research report; (4) the article was published in English, German or Dutch; (5) the article was a full-text article published in a Peer reviewed journal. No restrictions concerning the year of publication were used.

Exclusion criteria were: (1) spinal cord injury, spine trauma/fractures, cancer, neurological disorders (e.g. Multiple Sclerosis), Rheumatologic disorders, infection and pelvic floor pain; (2) studies on the initial development of the PF item bank (establishing face and content validity in a general population), other constructs (e.g. pain or pain interference), composite measurement instruments measuring several constructs like the PROMIS-29 and NIH minimal dataset, studies on the Upper extremity and Lower extremity CAT; (3) reviews, intervention studies, case reports, abstracts, editorials and dissertations.

3.5 Study selection

One of the reviewers (EJH) screened the search results on title and abstract to identify potentially relevant articles. For abstracts that fulfilled the inclusion criteria, full-text articles were retrieved. Reference lists of retrieved articles were manually screened to identify additional relevant articles. After reading of the full-text articles, a final decision on the inclusion of articles was made.

3.6 Data extraction

Data extraction was performed by two independent reviewers (EJH, CK), using a standardized extraction form, based on the COSMIN checklist 4-point scale16. Data extracted from the included articles included general characteristics of the instruments, characteristics of the study populations, results of the measurement properties and evidence on the interpretability of the measures.

3.7 Measurement properties

The measurement properties that were assessed were internal consistency, structural validity, hypothesis testing for construct validity and responsiveness. We use
d the definitions of these measurement properties according to the COSMI
N taxonomy17, see table 1.

3.8 Quality assessment of studies

Two reviewers (EJH, CK) independently assessed the methodological quality of the included studies, using the COSMIN checklist 2.0 for PROMS (Mokkink, L. B., de Vet, H. C. W., Prinsen, C. A. C., Patrick, D. L., Alonso, J., Bouter, L. M., & Terwee, C. B. (2017). COSMIN risk of bias checklist for assessing the methodological quality of studies on the measurement properties of Patient-Reported Outcome Measures. Submitted.) This is an updated version of the COSMIN checklist on a 4-point scale16,18. The COSMIN checklist consists of boxes with quality criteria for each measurement property. Each item was rated “excellent”, “good”, “fair” or “poor” and the overall rating per measurement property was determined using the “worse score counts” algorithm. Disagreements between the two reviewers were resolved by consensus.

3.9 Quality assessment of measurement properties

The quality of the measurement properties of the instruments were determined by using the criteria for measurement properties as described by Prinsen et al.19 (modified from Terwee et al.20). Only the measurement properties that were assessed in this review are presented; internal consistency, construct validity and responsiveness. The possible rating for a measurement property is “positive”, “indeterminate” or “negative” (see Table 2).

3.10 Best evidence synthesis

A best evidence synthesis was performed by combining the results of the quality rating of the measurement properties with the methodological quality of the studies. The quality of evidence was determined as described in the consensus based guideline by Prinsen et al.19, see table 3.

4. Results

4.1 Study selection/number of studies screened assessed/included

The search strategy resulted in 664 unique records. After screening on title and abstract, 20 records remained for full-text assessment. Reference checking did not reveal any additional articles. After full-text assessment, 15 articles were excluded, so five articles remained for inclusion in the review (see Figure 1).

4.2 Study characteristics

In the five included studies, three ways of assessing the PROMIS PF domain were evaluated for their measurement properties in patients with spinal disorders: the PROMIS PF item bank21, the PROMIS PF item bank assessed by means of CAT13,22,23 and the PROMIS PF 10-item short form24. The mean age of the study populations varied from 53 to 63 years, and overall slightly more females were included. The disease characteristics of the population were Back and Neck pain21, Back or leg pain13,22, Spine pain or disability23 and Lumbar radiculair pain (where patients were referred for lumbar transforaminal epidural steroid injection)24. All studies were conducted in academic clinical settings in the USA and all measures were performed in English language. The characteristics of the study populations are summarized in table 4.

4.3 Methodological quality

Ratings for methodological quality of the studies are presented in table 5. The PROMIS PF item bank was assessed in one study where a Rasch model (IRT method) was applied21. In this study, internal consistency and structural validity were both rated “excellent”.

The PROMIS PF CAT was assessed in three studies13,22,23. In one study a Rasch model was applied22. In this study, internal consistency and structural validity were both rated “excellent” and hypotheses testing for construct validity was rated “excellent”. In two other studies on the PF CAT, no IRT methods were applied 13,23. In one study only hypothesis testing for construct validity was examined, and was rated “excellent” 23. In one study only responsiveness was examined13 and was rated “fair”, due to limited information on the comparator instruments and due to a lack of clearly stated hypotheses a priori.

The PROMIS PF 10-item short form was assessed in one study, where no IRT methods were applied24. Hypotheses testing for construct validity was rated “fair”, because the PF short form was compared to the (23-point) Roland Morris Disability Index (RMDI), the European Quality of Life scale 5D questionnaire (EQ-5D) and the Numeric Rating Scale for Pain (NRS pain), which we considered not all measuring (solely) the construct of Physical Function. In this study responsiveness was rated “fair”, because the magnitude of expected change (before and after intervention) was not formulated in the hypothesis.

4.4 Results of individual studies (measurement properties)

The measurement properties that were assessed in the included studies are presented in table 6. None of the included studies assessed reliability in terms of test-retest reliability, inter-rater reliability or intra-rater reliability. Also, measurement error in terms of absolute measures were not assessed in the included studies. No studies that assessed content validity were included in our search. Furthermore, in none of the included studies cross-cultural validity or criterion validity were assessed.

Internal consistency. For the PROMIS PF item bank, internal consistency was rated positive in one study21, based on adequate evidence for unidimensionality and a person reliability of 0.99 and item reliability of 1.00. For the PROMIS PF CAT application, internal consistency was rated positive in one study22, based on evidence for unidimensionality and a demonstrated person reliability of 0.95 and item reliability of 0.95.

Structural validity was rated positive in one study on the PROMIS PF item bank, were the unexplained variance was 2.9% indicating unidimensionality in measuring PF21. In one study on the PROMIS CAT, structural validity was also rated positive with unexplained variance in the residuals of the first dimension of 2.6%22.

Hypotheses testing (for construct validity) was rated positive for the PROMIS PF CAT because a correlation was found with the Short Form-36 Physical Function Domain (SF-36PFD) (r = 0.81) and with the Oswestry Disability Ondex (ODI) (r = – 0.81)in one study22 which was in accordance with the hypothesis a priori. Also a correlation was found with the ODI (r = 0.76 – 0.85) for back pain patients and a strong correlation was found for the Neck Disability Index (NDI) ( r = 0.83 – 0.87) for neck pain patients in one other study23 which was rated positive. For the PROMIS PF 10-item short form an indeterminate rating was given for hypothesis testing for construct validity, because a correlation of 0.7 was found with the Roland Morris Disability Index (RMDI) and 0.5-0.6 with the European Quality of Life Scale 5D Questionnaire (EQ-5D) and a lower correlation (0.35 – 0.50) was found with the Numerical Rating Scale for pain (NRS) pain which measures an unrelated construct24.

Responsiveness for the PROMIS PF CAT was rated indeterminate, because solely a comparison was made with a general health anchor, but no correlation was calculated13. Responsiveness for the PROMIS PF 10-item short form was rated indeterminate, because the changes with RMDI and the EQ-5D at 3 month were correlated, but the changes at 6 months were not correlated and this was not in accordance with the hypotheses a priori24.

4.5 Best Evidence Synthesis

A summary of ratings for methodological quality and measurement properties is presented in table 7. In the best evidence synthesis, we combined the results from the studies on the PROMIS PF item bank, the PROMIS CAT and the PROMIS 10-item short form. The results form the best evidence synthesis are described per measurement property.

< p> Internal consistency

Structural validity

Moderate evidence was found for a positive rating of structural validity, based on consistent findings in one study of good quality21 and one study of fair quality22.

Hypothesis testing

Moderate evidence was found for a positive rating of hypothesis testing, based on consistent findings in two studies of fair quality22,23 and one study of poor quality24.

Responsiveness

Low evidence was found for indeterminate rating of responsiveness based on one study of fair quality13. Very low evidence was found for a positive rating of responsiveness based on one study of poor quality24.

5. Discussion (1200 woorden, geen subheadings, wel verschillende paragrafen)

5.1 Summary of evidence/statement of principal findings

– Geen studies gevonden over (test-hertest) reliability en measurement errors en niet over cross-cultural validation

– Studies alleen nog in USA, in Engels en in Academic settings (geen 1e lijn, geen FT)

5.2 Strengths & limitations

Strengths:

– Protocol Cosmin

– Search 4 databases , geen tijdslimiet

– 2 independent reviewers (search, selectie tiab, selectie full text, quality appraisal)

–

Limitations

– Mogelijk missen van studies door..?

– Beperkt aantal gevonden studies -> conclusies beperkt

– In deze review geen content validity onderzocht

– Mogelijk nadeel van uitsluiten samengestelde meetinstrumenten met PF construct (PROMIS 29 en NIH minimal dataset)

–

In the best evidence synthesis, we combined the results from the studies on the PROMIS PF item bank, the PROMIS CAT and the PROMIS 10-item short form. All studies were conducted measuring the same construct in comparable populations and comparable settings, however, the form of administration of the item bank as a whole, a CAT and a short form are not the same, thus conducting a best evidence synthesis could be argued.

5.3 Link with the literature on the subject, findings in relation to the findings of other studies/reviews

– Nog geen andere reviews over PROMIS

– review Oude Voshaar over PF bij RA

5.4 Relevance for clinical practice

PROMIS is een nieuwe vorm van PROMS met veelbelovende verbeterde meeteigenschappen (althans in ontwikkelings studies). Hierdoor betere metingen in zowel onderzoek als in clinical settings mogelijk. Toepassing van met name CAT apllicaties zorgt voor betere precisie, geen missing items, lage floor/ceiling effects en een lage “burden” door de snelle afname tijd.

Belang van onderzoek in clinical populations onderstrepen.

5.5 Recommendation for future research

PF item bank is in ontwikkeling (nu versie 2.0). Vertalingen en validatiestudies naar ander landen zijn onderweg. Hiervan moeten dan ook de meeteigenschappen nog onderzoecht worden in clinical samples. Voor PF in diverse populaties. Met name CAT toepssingen zijn interessant. Behalve in Academsche settings ook voor de eerstelijns populatie (aangezien Fysiotherapeuten veel gebruik maken van PROMS en hier de patient burden ook erg hoog is).

Belang om aandacht te besteden aan geode beschrijvingen van gebruikte IRT methods, missing items, de juiste comparison measurements (bij construct en responsiveness). Aanbeveling om bij de opzet van studies gebruik te maken van de COSMIN V2.0 checklist en de guideline van Prinsen et al.

6. Conclusion (300 woorden)

Table 1 Overview of measurement properties and definitions

Measurement property

Definition according to the COSMIN taxonomy17

Internal consistency

The degree of interrelatedness among the items

Structural validity

The degree to which the scores of a measurement instrument are an adequate reflection of the dimensionality of the construct to be measured

Hypotheses testing

The degree to which the scores of a measurement instrument are consistent with hypotheses based on the assumption that the measurement instrument validly measures the construct to be measured

Responsiveness

The ability of a measurement instrument to detect change over time in the construct to be measured

Table 2 Criteria for measurement properties (based on Prinsen et al.19, modified fromTerwee et al.20)

Measurement property

Rating*

Criteria

Internal consistency

At least limited evidence for unidimensionality or positive structural validity AND Cronbach’s alpha(s) ≥ 0.70 and ≤ 0.95

Not all information for ‘+’ reported OR conflicting evidence for unidimensionality or structural validity OR evidence for lack of unidimensionality or negative structural validity

–

Criteria for ‘+’ not met

Structural validity

Rasch/IRT:

At least limited evidence for unidimensionality or positive structural validity AND no evidence for violation of local independence: Rasch: standardized item-person fit residuals between -2.5 and 2.5; OR IRT: residual correlations among the items after controlling for the dominant factor < 0.20 OR Q3’s < 0.37 AND no evidence for violation of monotonicity: adequate looking graphs OR item scalability > 0.30 AND adequate model fit: Rasch: infit and outfit mean squares ≥ 0.5 and ≤ 1.5 OR Z-standardized values > -2 and < 2; OR IRT: G2 > 0.01;

Optional additional evidence:

Adequate targeting; Rasch: adequate person-item threshold distribution; IRT: adequate threshold range

No important DIF for relevant subject characteristics (such as age, gender, education), Mc Faddens R2 < 0.02

IRT: model fit not reported

–

Criteria for (+) not met

Hypothesis testing

At least 75% of the results are in accordance with the hypotheses

No correlations with instrument(s) measuring related construct(s) AND no differences between relevant groups reported

–

Criteria for (+) not met

Responsiveness

At least 75% of the results are in accordance with the hypotheses

No correlations with changes in instrument(s) measuring related construct(s) AND no differences between changes in relevant groups reported

–

Criteria for (+) not met

DIF = differential item functioning, IRT = item response theory

*rating: + = positive rating, ? = indeterminate rating, – = negative rating

Table 3 Quality of evidence, based on Prinsen et al.19

Quality rating

Criteria

High

Consistent findings in multiple studies of at least good quality OR in one study of excellent quality AND a total sample size of ≥ 100 patients

Moderate

Conflicting findings in multiple studies of at least good quality OR consistent findings in multiple studies of at lea
st fair quality OR one study of good quality AND a sample size of ≥ 50 patients

Low

Conflicting findings in multiple studies of at least fair quality OR one study of fair quality AND a total sample size of ≥ 30 pa
tients

Very low

Only studies of poor quality OR a total sample size of < 30 patients

Unknown

No studies

Figure 1 Flowchart search and selection

Table 4 Characteristics of included study populations

Instrument

Study

Mean age

(year)

Gender

(% female)

Disease characteristics

Setting

Country

Language

Sampling

Response rate

(% missing)

PROMIS PF item bank

Hung et al.21

438

Back and Neck pain

University clinic

USA

English

Consecutive

0.4

PROMIS PF CAT

Brodke et al.22

1607

Back or leg pain

University clinic

USA

English

Consecutive

Papuga et al.23

283

Spine pain or disability

Academic Hospital

USA

English

Consecutive

0.4

Schalet et al.13

218

Median

Age group

55-59

Back or leg pain

University spine center

USA

English

Convenience?

PROMIS PF 10-item short form

Shahgholi et al.24

199

Lumbar radiculair pain

University spine center

USA

English

Consecutive

CAT = Computerized Adaptive Test, PF = Physical Function, PROMIS = Patient Reported Outcomes Measurement Information System

Table 5 Methodological quality of studies

Instrument

Study

IRT used

Reliability

Validity

Responsiveness

Construct validity

Internal consistency

Structural validity

Hypothesis testing

Responsiveness

PROMIS PF item bank

Hung et al.21

Yes

Excellent

PROMIS PF CAT

Brodke et al.22

Yes

Excellent

Papuga et al.23

Excellent

Schalet et al.13

Fair

PROMIS PF 10-item short form

Shahgholi et al.24

Fair

CAT = Computerized Adaptive Test, PF = Physical Function, PROMIS = Patient Reported Outcomes Measurement Information System

Table 6 Measurement properties of individual studies

Instrument

Study

Reliability

Construct validity

Responsiveness

Floor and ceiling effects

Construct validity

Internal consistency

Structural validity

Hypothesis testing

PROMIS PF item bank

Hung et al.21

Item reliability 1.00

Person reliability 0.99

Unidimensionality: 2.9% unexplained variance

Floor effect 0.2%

Ceiling effect 1.7%

PROMIS PF CAT

Brodke et al.22

Item reliability 0.99

Person reliability 0.95

Unidimensionality: 2.6% unexplained variance

Correlation SF-36 PFD: 0.81

Correlation ODI: – 0.81

Floor effect 3.86%

Ceiling effect 0.81%

Papuga et al.23

Correlation ODI: 0.76 – 0.85 (Back pain)

Correlation NDI: 0.83 – 0.87 (Neck pain)

Schalet et al.13

Compared to “general health anchor” but no correlation calculated

PROMIS PF 10-item short form

Shahgholi et al.24

Correlation RMDI: 0.7

Correlation EQ-5D: 0.5-0.6

Correlation NRS pain: 0.35 – 0.5

Correlation with change in RMDI

Baseline: 0.5

3 months: 0.55

6 months: < 0.1

Correlation with change in EQ-5D

Baseline: 0.45

3 month: 0.5

6 month: – 0.1

Correlation with change in NRS pain

Baseline: 0.51

3 months: 0.52

6 months: 0.35

CAT: Computerized Adaptive Test, EQ-5D: European Quality of Life scale 5D questionnaire, NDI = Neck Disability Index, NRS pain = Numerical Rating Scale for pain, ODI = Oswestry Disability Index, PF: Physical Function, PROMIS: Patient Reported Outcomes Measurement Information System, RMDI: Roland Morris Disability Index, SF-36 PFD = Short Form-36 Physical Function Domain,

Table 7 Summary of ratings for methodological quality and measurement properties

Instrument

Study

Reliability

Validity

Responsiveness

Construct validity

Internal consistency

Structural validity

Hypothesis testing

PROMIS PF item bank

Hung et al.21

excellent

PROMIS PF CAT

Brodke et al.22

excellent

Papuga et al.23

excellent

Schalet et al.13

fair

PROMIS PF 10-item short form

Shahgholi et al.24

fair

M = methodological quality of the study: “excellent”, “good”, “fair” and “poor”. Q = Quality criteria for measurement property; + = positive rating, ? = indeterminate rating, – = negative rating

Essay: PROMIS Physical Function measures

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: