Abstract
Recognizing that the efficiency in relative risk estimation for the Cox proportional hazards model is largely constrained by the total number of cases, Prentice (1986) proposed the case-cohort design in which covariates are measured on all cases and on a random sample of the cohort. Subsequent to Prentice, other methods of estimation and sampling have been proposed for these designs. We formalize an approach to variance estimation suggested by Barlow (1994), and derive a robust variance estimator based on the influence function. We consider the applicability of the variance estimator to all the proposed case-cohort estimators, and derive the influence function when known sampling probabilities in the estimators are replaced by observed sampling fractions. We discuss the modifications required when cases are missing covariate information. The missingness may occur by chance, and be completely at random; or may occur as part of the sampling design, and depend upon other observed covariates. We provide an adaptation of S-plus code that allows estimating influence function variances in the presence of such missing covariates. Using examples from our current case-cohort studies on esophageal and gastric cancer, we illustrate how our results our useful in solving design and analytic issues that arise in practice.
Similar content being viewed by others
References
C. C. Abnet, C. B. Borkowf, Y. L Qiao, P. S. Albert, E. Wang, A. H. Merrill, S. D. Mark, Z. W. Dong, P. R. Taylor S. M. Dawsey, “Sphingolipids as biomarkers of fumonisin exposure and risk of esophageal squamous cell carcinoma,” To appear Cancer Epidemiology Biomarkers and Prevention, 2001.
P. K. Andersen, Ø. Borgan, R. D. Gill, and N. Keiding, Statistical Models Based on Counting Processes, Springer-Verlag: New York, NY, 1991.
W. E. Barlow, “Robust variance estimation for the case-cohort design,” Biometrics, vol 50 pp. 1064–1072, 1994.
O. Borgan, B. Langholz, S. O. Samuelsen, L. Goldstein and J. Pogoda, “Exposure stratified case-cohort designs,” Lifetime Data Analysis, vol 6 pp. 39–58, 2000.
O. Borgan, L. Goldstein and B. Langholz, “Methods for the analysis of sampled cohort data in the cox proportional hazards model,” The Annals of Statistics, vol 23 pp. 1749–1778, 1995.
K. C. Cain and N. T. Lange, “Approximate case influence for the proportional hazards regression model in censored data,” Biometrics, vol 40 pp. 493–499, 1984.
Helicobacter and Cancer Collaborative Group, “Gastric cancer and Helicobacter Pylori: a combined analysis of eleven case-control studies nested within prospective cohorts,” Gut, vol 3 pp. 347–353, 2001.
P. J. Huber, Robust Statistical Procedures, Society for Industrial and Applied Mathematics: Philadelphia, PA, 1977.
J. D. Kalbfleisch and J. F. Lawless, “Likelihood analysis of multi-state models for disease incidence and mortality,” Statistics in Medicine, vol 7 pp. 149–160, 1988.
S. Kim and V. De Gruttola, “Strategies for cohort sampling under the Cox proportional hazards model, application to an AIDS clinical trial,” Lifetime Data Analysis, vol. 5 pp. 149–172, 1999.
P. J. Limburg, C. Q. Wang, S. D. Mark, Y. L. Qiao, G. I. Perez-Perez, M. J. Blaser, P. R. Taylor, Z. W. Dong, S. M. Dawsey, “Helicobacter pylori seropositivity: Association with increased gastric cardia and noncardia cancer risks in Linxian, China.” Journal of the National Cancer Institute, 93, pp. 226–233, 2001.
D. Y. Lin and L. J. Wei, “The robust inference for the Cox proportional hazards model,” Journal of the American Statistical Association, vol 84 pp. 1074–1078, 1989.
Y. Lin and Z. Ying, “Cox regression with incomplete covariate measurements,” Journal of the American Statistical Association, vol 88 pp. 1341–1349, 1993.
S. D. Mark, Y. L., S. M. Dawsey, H. Katki, E. W. Gunter, W. Yan-Ping, J. F. Fraumeni, W. J. Blot, Z. W. Dong, P. R. Taylor, “Higher serum selenium is associated with lower esophageal and gastric cardia cancer rates.” Journal of the National Cancer Institute, vol 92 pp. 1753–1763, 2000.
H. Moller, E. Heseltine, H. Vainio. “Working group report on schistosomes, liver flukes and Helicobacter pylori.” International Journal of Cancer, vol 60 pp. 587–589, 1994.
R. L. Prentice, “A case-cohort design for epidemiologic cohort studies and disease prevention trials,” Biometrika, vol. 73 pp. 1–11, 1986.
M. G. Pugh, Inference in the Cox Proportional Hazards Model with Missing Covariate Data, thesis, Harvard School of Public Health: Boston, MA, 1993.
N. Reid and H. Crepeau, “Influence functions for proportional hazards regression,” Biometrika, vol 72 pp. 1–9, 1985.
J. M. Robins, A. Rotnitsky, and L. P. Zhao, “Estimation of regression coefficients when some regressors are not always observed,” Journal of the American Statistical Association, vol 89 pp. 846–866, 1994.
S. G. Self and R. L. Prentice, “Asymptotic distribution theory and efficiency results for case-cohort studies,” The Annals of Statistics, vol. 16 pp. 64–81, 1988.
T. M. Therneau and H. Li, “Computing the Cox Model for Case Cohort Designs,” Lifetime Data Analysis, vol 5 pp. 99–112, 1999.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Mark, S.D., Katki, H. Influence Function Based Variance Estimation and Missing Data Issues in Case-Cohort Studies. Lifetime Data Anal 7, 331–344 (2001). https://doi.org/10.1023/A:1012533130596
Issue Date:
DOI: https://doi.org/10.1023/A:1012533130596