Reliability of Standard Health Assessment Instruments in a Large, Population-Based Cohort Study

https://doi.org/10.1016/j.annepidem.2006.12.002Get rights and content

Purpose

The Millennium Cohort Study began in 2001 using mail and Internet questionnaires to gather occupational and environmental exposure, behavioral risk factor, and health outcome data from a large, population-based US military cohort. Standardized instruments, including the Patient Health Questionnaire, the Medical Outcomes Study Short Form-36 for Veterans, and the Posttraumatic Stress Disorder (PTSD) Checklist–Civilian Version, have been validated in various populations. The purpose of this study was to investigate internal consistency of standardized instruments and concordance of responses in a test-retest setting.

Methods

Cronbach alpha coefficients were used to investigate the internal consistency of standardized instruments among 76,742 participants. Kappa statistics were calculated to measure stability of aggregated responses in a subgroup of 470 participants who voluntarily submitted an additional survey within 6 months of their original submission.

Results

High internal consistency was found for 14 of 16 health components, with lower internal consistency found among two alcohol components. Substantial test-retest stability was observed for stationary variables, while moderate stability was found for more dynamic variables that measured conditions with low prevalence.

Conclusions

These results substantiate internal consistency and stability of several standard health instruments applied to this large cohort. Such reliability analyses are vital to the integrity of long-term outcome studies.

Introduction

Standardized instruments are often used in survey research. Many of these instruments are devised in clinic settings where health assessment is completed by trained health care professionals. However, prohibitive cost and relative ease make participant-assessed outcome measures a more feasible approach to obtain constructs describing functional and mental health outcomes. With these more convenient measures of health increasingly used as primary outcomes in epidemiologic studies, selecting an appropriate assessment tool involves careful review of the many standard survey instruments available. Special consideration of whether the instruments meet the requirements of the proposed application is critical to interpretation of collected data (1). Reliability and validity of these instruments are often tested thoroughly in populations or settings in which the instrument was originally created 2, 3. However, many questionnaires incorporate standardized survey instruments in populations that may be different from those for which the instrument was intended. In these studies, it is important to establish a level of confidence in the information being ascertained prior to declaring the instrument appropriate for the targeted population.

The Millennium Cohort, the largest cohort study ever undertaken by the US Department of Defense, was launched in 2001 to gather health outcome information along with occupational and environmental exposures employing a longitudinal approach 4, 5. In the first panel of enrollment, more than 77,000 participants joined the 22-year-long study, filling out either a mailed survey or an identical Web-based survey. The Millennium Cohort Study questionnaire is composed of more than 60 multipart questions comprising more than 400 individual data points, including questions from standardized instruments such as the Medical Outcomes Study Short Form 36-item for Veterans (SF-36V) 6, 7, the Primary Care Evaluation of Mental Disorders (PRIME-MD) Patient Health Questionnaire (PHQ) 2, 8, 9, the Posttraumatic Stress Disorder (PTSD) Checklist–Civilian Version (PCL-C) 3, 10, and the CAGE questionnaire to assess problematic drinking behavior (11), as well as questions that target areas such as medical history, vaccinations, environmental exposures, and occupation. Although the concordance of test-retest responses and internal consistency of the standard instruments have been established 6, 7, 8, 9, 10, tests of reliability of these constructs have not been performed in a large, population-based cohort where multiple independent instruments are presented simultaneously. The purpose of this study, therefore, was to establish the reliability as measured by concordance in a test-retest setting and internal consistency of several standardized instruments in a large, population-based military cohort.

Section snippets

Study Population

The invited Millennium Cohort Study participants were randomly selected from all US military personnel serving in the Army, Navy, Coast Guard, Air Force, and Marine Corps as of October 1, 2000. The population-based sample represented approximately 11% of the 2.3 million men and women in service and, oversampled for those who had been previously deployed, were US Reserve and National Guard personnel, and female service members, to ensure sufficient power to detect differences in smaller

Results

Of the 77,047 Millennium Cohort Panel 1 participants, 76,742 (99.6%) had complete demographic and military characteristic data. This population included 73% men, 73% born between 1960 and 1979, 49% without any college experience, 63% married, 70% white non-Hispanic, 77% enlisted personnel, 57% active duty personnel, 48% Army, 20% working as functional support specialists, and 20% combat specialists (Table 2).

Levels of internal consistency among standardized survey scales, as measured by

Discussion

Standardized instruments are often employed to enhance the value of epidemiologic survey research. Diligence in establishing consistency and comparability to promote confidence in results will become increasingly more important. While the use of established survey instruments may be an enticing addition in pursuit of quality health metrics, suboptimal performance in varying populations may be found instead. In this study, the internal consistency of well-known instruments (PHQ, SF-36V, CAGE,

References (32)

  • Weathers FW, Litz BT, Herman DS, Huska JA, Keane TM. The PTSD Checklist (PCL): reliability, validity, and diagnostic...
  • J.A. Ewing

    Detecting alcoholism. The CAGE questionnaire

    JAMA

    (1984)
  • J.R. Fann et al.

    Validity of the Patient Health Questionnaire-9 in assessing depression following traumatic brain injury

    J Head Trauma Rehabil

    (2005)
  • A.J. Means-Christensen et al.

    An efficient method of identifying major depression and panic disorder in primary care

    J Behav Med

    (2005)
  • D. Jones et al.

    Health status assessments using the Veterans SF-12 and SF-36: methods for evaluating outcomes in the Veterans Health Administration

    J Ambul Care Manage

    (2001)
  • J.E. Ware et al.

    SF-36 Physical and Mental Health Summary Scales: A user's manual

    (1994)
  • Cited by (103)

    • Sexual health difficulties among service women: the influence of posttraumatic stress disorder

      2021, Journal of Affective Disorders
      Citation Excerpt :

      Mental disorders were assessed at Time 1. Probable PTSD was measured using the PTSD Checklist−Civilian Version (PCL-C), a validated instrument used to rate the severity of symptoms (Blanchard et al., 1996) that has demonstrated good internal consistency (Cronbach's =0.94) in this cohort (Smith et al., 2007). Based on criteria from the Diagnostic and Statistical Manual of Mental Disorders 4th edition (DSM-IV), probable PTSD was defined as reporting a moderate or higher level of at least one intrusion symptom, three avoidance symptoms, and two hyperarousal symptoms (Diagnostic and statistical manual of mental disorders 4th ed.

    • A community pharmacy-led intervention for opioid medication misuse: A small-scale randomized clinical trial

      2019, Drug and Alcohol Dependence
      Citation Excerpt :

      The two-item pain subscale asked about level of bodily pain and pain-related physical functioning and is scored on a 0–200 scale. We assessed depression using the Patient Health Questionnaire (PHQ) depression subscale, a valid mental health assessment with demonstrated reliability (Hides et al., 2007; Smith et al., 2007; Spitzer et al., 1999, 2000). This subscale is scored on a 5-point scale (0=none-minimal; 1=mild; 2=moderate, 3=moderately severe; 4=severe).

    View all citing articles on Scopus

    Disclosure: This work represents Report 06-24, supported by the Department of Defense, under work unit No. 60002. The views expressed in this article are those of the authors and do not reflect the official policy or position of the Department of the Navy, Department of the Army, Department of the Air Force, Department of Defense, Department of Veterans Affairs, or the US Government. This research has been conducted in compliance with all applicable federal regulations governing the protection of human subjects in research (Protocol NHRC.2000.007).

    In addition to the authors, the Millennium Cohort Study Team includes Paul J. Amoroso, MD, MPH (Madigan Army Medical Center, Tacoma, WA); Edward J. Boyko, MD, MPH (Seattle Epidemiologic Research and Information Center, Department of Veterans Affairs Puget Sound Health Care System, Seattle, WA; Gary D. Gackstetter, PhD, DVM, MPH (Department of Preventive Medicine and Biometrics, Uniformed Services University of the Health Sciences, Bethesda, MD and Analytic Services, Inc. [ANSER], Arlington, VA; Gregory C. Gray, MD, MPH (College of Public Health, University of Iowa, Iowa City, IA; Tomoko I. Hooper, MD, MPH, Department of Preventive Medicine and Biometrics, Uniformed Services University of the Health Sciences, Bethesda, MD); James R. Riddle, DVM, MPH, and Timothy S. Wells, PhD, DVM, MPH. (both from Air Force Research Laboratory, Wright Patterson AFB, OH.).

    View full text