Article Text

Exposure assessment
Reconstructing past occupational exposures: how reliable are women's reports of their partner's occupation?
  1. Nara Tagiyeva1,
  2. Sean Semple2,
  3. Graham Devereux1,
  4. Andrea Sherriff3,
  5. John Henderson4,
  6. Peter Elias5,
  7. Jon G Ayres6
  1. 1Child Health, University of Aberdeen, Aberdeen, UK
  2. 2Environmental and Occupational Medicine, University of Aberdeen, Aberdeen, UK
  3. 3Dental Public Health, University of Glasgow, Dental School, Glasgow, UK
  4. 4Department of Community-Based Medicine, University of Bristol, Bristol, UK
  5. 5Institute for Employment Research, University of Warwick, Coventry, UK
  6. 6Institute of Occupational and Environmental Medicine, University of Birmingham, Birmingham, UK
  1. Correspondence to Dr Nara Tagiyeva, Child Health, Royal Aberdeen Children's Hospital, University of Aberdeen, Westburn Road, Aberdeen AB25 2ZP, UK; n.tagiyeva-milne{at}


Objectives Most of the evidence on agreement between self- and proxy-reported occupational data comes from interview-based studies. The authors aimed to examine agreement between women's reports of their partner's occupation and their partner's own description using questionnaire-based data collected as a part of the prospective, population-based Avon Longitudinal Study of Parents and Children.

Methods Information on present occupation was self-reported by women's partners and proxy-reported by women through questionnaires administered at 8 and 21 months after the birth of a child. Job titles were coded to the Standard Occupational Classification (SOC2000) using software developed by the University of Warwick (Computer-Assisted Structured Coding Tool). The accuracy of proxy-report was expressed as percentage agreement and kappa coefficients for four-, three- and two-digit SOC2000 codes obtained in automatic and semiautomatic (manually improved) coding modes. Data from 6016 couples at 8 months and 5232 couples at 21 months postnatally were included in the analyses.

Results The agreement between men's self-reported occupation and women's report of their partner's occupation in fully automatic coding mode at four-, three- and two-digit code level was 65%, 71% and 77% at 8 months and 68%, 73% and 76% at 21 months. The accuracy of agreement was slightly improved by semiautomatic coding of occupations: 73%/73%, 78%/77% and 83%/80% at 8/21 months respectively. While this suggests that women's description of their partners' occupation can be used as a valuable tool in epidemiological research where data from partners are not available, this study revealed no agreement between these young women and their partners at the two-digit level of SOC2000 coding in approximately one in five cases.

Conclusion Proxy reporting of occupation introduces a statistically significant degree of error in classification. The effects of occupational misclassification by proxy reporting in retrospective occupational epidemiological studies based on questionnaire data should be considered.

  • Proxy-respondents
  • occupation
  • questionnaires
  • retrospective exposure assessment

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


What this paper adds

  • Most published studies that have examined the validity of proxy-responses of occupational exposure are based on interviews.

  • Little is known about the quality of occupational data based on proxy-responses from subject-completed questionnaires.

  • The present study demonstrates that the reliability of women's reports of their partners' occupation is limited, even at a broad standard occupational classification level.

  • This raises a question of potential misclassification of exposure and appropriateness of using proxy-derived data for estimating risks in occupational epidemiological research.

Parents, family members, friends and colleagues are often used as proxy-respondents in studies, where index subjects are incapable, unwilling or unavailable to respond. One-third of all responses in the UK Labour Force Survey1 and almost half of responses in the Italian Labour Force Survey were provided by proxy-respondents.2 Accurate proxy-reporting of a patient's occupation is of crucial importance in medicolegal cases where a patient has died of an occupationally related condition, and the proxy-reporter is the only source of an occupational history for the patient.3

Research into health effects of exposure to hazards in the workplace is particularly reliant on proxy-responses if the occupationally related condition is malignant and/or rapidly fatal. A number of studies have examined the validity of proxy-responses of occupational exposure. Comparison of asbestos exposure using reports by the next of kin and an assessment by an occupational hygienist demonstrated that proxy-assessment by a relative, regardless of whether it was a spouse or other relative, had a higher agreement with the expert's assessment (κ 0.47, prevalence index (PI) 0.36, bias index (BI) 0.24) in cases than in controls (κ 0.19, PI 0.43, BI 0.35).4 Studies which compared proxy-reports of occupational exposure with self-reports found that the reliability of proxy-derived data declined with increased requirement for detail or increased recall time.5 6 The reliability of proxy-derived data also depends on the type of the data, the relationship of the proxy to the index subject, sex, age, ethnicity of proxies and other factors.5–8

To our knowledge, little is known about the quality of occupational data based on proxy-responses from questionnaires. The aim of this study was to evaluate the accuracy of women's descriptions of their partners' jobs by comparison with contemporaneously collected partners' self-reported data. The demonstration of how reliable women may be in reporting the partner's occupation is important for occupational medicolegal cases and occupational epidemiological research.


The Avon Longitudinal Study of Parents and Children (ALSPAC) is a population-based birth cohort of children born to 14 541 (about 85% of the eligible population) women recruited during pregnancy with estimated dates of delivery April 1991–December 1992 in a defined geographical region of England. Families of the 13 971 children surviving to 1 year were followed up by postal questionnaires. Full details on the study methodology have been published elsewhere.9

The questionnaires included a section about ‘Your partner’s present job or last main job: actual job, occupation, trade or profession' and were sent to women at 8 and 21 months postnatally. At the same time, the women received questionnaires which were addressed to their partners, and these contained a section that sought information about ‘Your present job or last main job: actual job, occupation, trade or profession.’ The responses to both questions were recorded as free text.

The resulting job descriptions, principally the job titles obtained in response to these questions about occupations, were coded into four-digit Standard Occupational Classification (SOC2000) codes10 using the Computer-Assisted Structured Coding Tool (CASCOT) developed by the University of Warwick.11 The SOC2000 classification system can provide coding at two-digit (eg, code 21: Science and technology professionals), three-digit (eg, code 212: Engineering professionals) and four-digit (eg, code 2121: Civil engineers) levels. The software generates a ‘certainty score’ for each job title it codes, where the certainty score represents the Bayesian probability that the computer-assigned four-digit code (automatic code) is that which would be assigned by expert human coders. Job titles, which were assigned automatic codes with a certainty score ≤50%, were coded manually to derive semiautomatic codes. The coder was blind to the automatic CASCOT codes. It has been shown that this coding strategy results in 91% agreement between semiautomatic codes and expert-assigned codes.12

The validity of women's reports of their partner's occupation was addressed by examining the degree of agreement between partner self-report (further referred to as ‘self-report’) and women proxy-reported information (further referred to as ‘proxy-report’) derived from questionnaires, which was analysed at four-, three- and two-digit SOC2000 levels. Agreement was expressed by the percentage of agreement and κ statistics that control for the proportion of agreement expected by chance alone. The guidelines for strength of agreement indicated with κ values are adapted from Landis and Koch.13

As in the ALSPAC questionnaires, the term ‘partner’ in this paper refers to a male partner of women enrolled in the study, regardless of the couple's marital status and the male's relationship to the child.

With the exception of self-reported age, demographic data related to women, and their partners were provided by women. The partners' socio-economic class, based on women's report of their partner's occupation at 32 weeks of pregnancy, was used to group families into higher socio-economic status (HSES; social class I, II and III non-manual) and lower socio-economic status (LSES; social class III manual, V and VI) groups. χ2 was applied to test the difference between HSES and LSES.

All analyses used SPSS v17.0.


There was general attrition in the ALSPAC sample (ALSPAC Study Team, accessed 15 Feb 2010). In addition, there was a loss of occupational data owing to incomplete items in returned questionnaires (table 1). Participation by partners was less than by women at all stages of the study; however, for those participating, the partner's occupational data were more complete than the proxy-data.

Table 1

Attrition and sample selection in the Avon Longitudinal Study of Parents and Children database

There was no difference in general attrition rates by socio-economic status (SES), that is proportions of women and their partners in a given SES group remained the same over gestational and postnatal periods (for women in the HSES group: 80.1%, 80.0% and 80.0% at gestation, 8 and 21 months postnatal respectively; for partners in the HSES group: 56.1%, 55.9% and 56.0% at gestation, 8 and 21 months postnatal respectively). Availability of self- and proxy-reported occupational data from returned questionnaires reduced between 8 and 21 months in both HSES and LSES groups (p<0.001). Among partners who remained in the study at 8 or 21 months, those from HSES backgrounds were more likely to respond to the question about their occupation than those from LSES backgrounds (at 8 months: 59.3% vs 41.1%, p<0.001). Similarly, the proxy-response was higher in the HSES group compared with the LSES group (at 8 months: 82.7% vs 66.5%, p<0.001).

At 8 and 21 months, paired occupational descriptions from both the woman and her partner were available from 5996 and 5232 couples respectively. This represented 41.2% and 36.0% of the initial cohort of women and 53.5% and 51.0% of those who remained in the study at 8 and 21 months respectively. Only paired data were included in the analysis. The characteristics of proxy-respondents and self-respondents included in the analysis are presented in table 2.

Table 2

Demographic characteristics of woman–partner pairs

A total of 22 456 individual job descriptions were coded using CASCOT in this study. Manual coding was carried out for the total 3484 job descriptions that achieved a certainty score of ≤50%.

In total, 353 individual four-digit SOC2000 job codes were generated. Using the SOC2000 codes generated by CASCOT in automatic mode demonstrated that at the four-digit level, the agreement between proxy- and self-reports was 65.5% at 8 months and 68.4% at 21 months (table 3). Consequently, agreement at this level was not reached by 2067 (34.5%) and 1652 (31.6%) pairs at 8 and 21 months respectively. Of these, 1162 (56.2%) proxy-reports and 900 (43.5%) self-reports on occupation at 8 months and 647 (39.2%) proxy-reports and 775 (46.9%) self-reports on occupation at 21 months had an associated CASCOT certainty score ≤50%. These cases were then recoded manually, which provided a higher rate of agreement between proxy- and self-reports (table 3). Comparison between proxy- and self-reported occupation for both automatically and semiautomatically obtained codes carried out at a three-digit level revealed an increase in agreement in these broader classification groups. The agreement improved further at a two-digit or major occupational group level. This trend was similar for the reports at 8 and 21 months (table 3).

Table 3

Agreement between proxy- and self-reports on partners' occupation

To investigate socio-economic influences on reporting bias, the analysis by socio-economic status was carried out with the 8-month data (table 4). A higher agreement rate between women and their partners was found among those with lower SES, and the difference between lower and higher SES was statistically significant across all levels of both automatic and semiautomatic modes.

Table 4

Agreement between proxy- and self-reports on partners' occupation by socio-economic status

Similar results were found when agreement was examined by home ownership status, another socio-economic domain (table S1 in online data supplement). Spouses who lived in council and housing association rented accommodation were significantly more likely to agree about occupation than those who lived in owned, mortgaged or privately rented homes.

Agreement at all levels and in either coding mode did not differ by the woman's educational level (table S2 in online data supplement), or by the woman's or partner's age (p>0.1) (data not shown).


The frequent use of proxy-responses in epidemiological research raises the issue of their accuracy and validity. While much has been published on the validity of proxy-reported occupational data obtained during interviews, little information is available concerning quality of questionnaire-based proxy-reported data and what is available has come mainly from exposure-assessment studies.5 14–19 The present population study investigated the agreement between proxy- and self-reports of occupation based on job titles from self-administered questionnaires.

A number of studies have demonstrated that inaccuracy in proxy-responses increases as more detailed job description and occupational history are required, and more time has elapsed since the job.6 Face-to-face interviews of wives have been shown to produce reliable information for general but not detailed questions about their husband's occupation.5 Other studies have shown that wives were more aware of their husbands' occupations (agreement 81%, κ 0.71, PI 0.038, BI 0.115) than about workplace exposures to particular hazards (agreement 58%, κ 0.18, PI 0.115, BI 0.192).20 In a multicentre case–control cancer study, regardless of the number of years in the job, proxy-respondents including wives were less accurate (although the sensitivity improved with ≥5 years in the job) in reporting exposure to occupational hazards than in reporting a job history, which was particularly accurate by wives, and did not vary by socio-economic factors.3 The overall agreement between proxy- and self-reports of working in specific industries found by others (κ=0.67) was similar to our four-digit automatic CASCOT codes, and markedly higher than agreement for reports on working with a specific material (κ=0.36).17

The present study examined agreement in the reports of occupations which were based on the question about a job title, which should ensure better concordance rates than reports of specific self-reported occupational exposures. Also, in the present study, recall bias from proxy-respondents is unlikely due to a low level of required detail and zero recall time (current or most recent job title). The fact that women's reports about their partner's job appear to be more accurate for more recent jobs has also been demonstrated in reproductive research and a study of migrant farm workers.18 21

Similar to other studies,5 18 our study included a relatively young population of women, who were unlikely to ‘forget’ their partners' occupation. On the other hand, the duration of the woman–partner relationship and/or the partner's length of employment in their current job may be short, and this may influence the ability of women to know exactly what their partner's job entails. However, our study found no effect of the proxy-respondent's age on the accuracy of proxy-reports.

As in the study by Coggon and colleagues,19 we found that differences in wording often accounted for a proportion of the disagreement between spouses in the description of occupations, and consequently for a higher percentage of agreement in broader classification groups. An additional manual check of women's and men's responses, which did not fall into the ‘agreed’ category, revealed that in some cases, respondents used different words to refer to the same occupation. Differences in the expertise of coders can also give rise to variations in coding agreement rates, especially where the job titles are ambiguous, usually due to failure to adhere to complex coding rules for the treatment of such information.22 These differences, however, were minimised in the present study, as coding of the data from both proxy and self-respondents was carried out by one person using a coding tool which rigorously applies complex coding rules where job titles are ambiguous. It was more surprising therefore to find this level of disagreement in the present study.

The nature of the relationship between the index and proxy-respondent has been found to be influential on the degree of agreement between their responses in some but not all studies.23 24 Given the present study design, where the mothers were recipients of all questionnaires, including those addressed to their partners, and also bearing in mind that questionnaires were delivered to the self- and proxy-respondents simultaneously, there is a high possibility of interaction between spouses and cross-checking of responses provided in each questionnaire. No measures were taken to avoid the information exchange and to ensure that the questionnaires were completed independently. This is a limitation of the study design which is not however unique to the present study: the ‘potential for correlated errors’ among couples has been discussed elsewhere.18 While discordant self- and proxy-reports, which were assigned automatic codes with a certainty score of ≤50%, were recoded manually, no attempt was made to recode concordant self- and proxy-reports with a certainty score ≤50% (<20% of reports). As a result, the concordance rates for semimanual codes presented in this study could be overestimated, which strengthens further our main message. Moreover, recent unpublished work and our experience in this study and a previous study12 demonstrate that agreement rates are generally raised by about five percentage points in semiautomatic coding mode compared with fully automatic mode, and that the disagreement rates we report here are well outside the potential ‘error band’ that could arise from errors inherent in the coding process.

Beyond the specifics of this study design, there is a general concern about the validity and reliability of data from self-administered questionnaires, which are more likely to be affected by ambiguity of questions and misinterpreted, than data obtained during an interview.25 In addition, more specific concerns related to occupational questionnaires, including misrecognition of exposure, overlapping categories, variability in interpretation of questions depending on occupation and difficulties in estimating time in work or on a particular task, have been recognised.26 The lower accuracy of proxy-derived occupational data from questionnaires versus interviews has been demonstrated in exposure agreement studies.27 28

In the present study, we found a significant difference in the accuracy of proxy-responses between higher and lower socio-economic classes with more accurate responses from spouses from lower socio-economic background. This is at variance with a study in reproductive research, which found that proxy-reported data were more accurate if obtained from spouses of patients in private hospitals (who had a higher level of education and were more likely to be White and working) compared with those in public healthcare (lower level of education and more likely to be Black or Hispanic and without a job) (κ coefficient 1.00 vs 0.20 for women's reporting their partner's work status), suggesting an influence of social class on knowledge about the partner's job.18 A study of paternal occupation and birth defects in approximately 6000 children also found an improvement in agreement between fathers' and mothers' responses about the fathers' jobs within 2 years before the child was born with increasing family income and fathers' and mothers' educational level.16 We can only speculate as to whether in our study couples from higher SES were more likely to complete questionnaires independently, whereas couples from a lower SES, being less confident about their partners' jobs or sharing more time, were more likely to complete questionnaires jointly. A more likely explanation for a higher concordance rate between spouses in lower SES groups is that job titles for such occupations are generally well established and unambiguous, hence a clearer description which produces more accurate and matching coding results, while higher SES occupations, being constantly evolving and newly emerging, prove to be more complicated and difficult to describe and code. Over the past 15 years, employment in social classes I, II and III has grown by 4 million in the UK, with no growth in aggregate in the remaining lower social groups.29

Typical for a longitudinal study there was loss to follow-up in the ALSPAC, which led to a decrease in both general responses and response to occupational questions. With regard to occupational data, those from higher socio-economic classes were more likely to be over-represented in the study population. This fact could contribute further to reduce the overall agreement between spouses.

As the present study examined the agreement between spouses in reporting occupation based on a recent or current job title, the proportion of couples who provided discordant occupational information in this population of young adults is surprisingly high. Even when this information was coded at the two-digit level, which is too broad to be useful for studying the effect of occupation, and in a semiautomatic mode, which takes into account differences in wording, still one in every five couples did not provide agreement in terms of the partner's occupation. On the other hand, a complex type of occupational question, which included actual job, occupation, trade and profession, could lead to a wide interpretation of the question and be another source of discrepancies in responses between a woman and her partner. Others found that while using SOC codes can be an inexpensive and convenient way of identifying work-related exposure in a large number of people, unless augmented with a task-specific checklist it can lead to a high degree of misclassification.30

Direct comparison of the results in the present study with published literature is constrained by the fact that most of the published evidence was obtained from interviews rather than self-completed questionnaires.21 31 32 In addition, only a few studies assessed agreement on jobs, while most of them compared occupational exposures.3 4 20 Also, most of the published evidence is derived from case–control studies, where the chance of misclassification varies between cases and controls, and this difference may lead to both non-differential and differential misclassification.24 33–35 In studies in which, similar to our study, information is sought about occupation rather than exposures, bias in the occupational data collected from proxy-respondents is likely to be random with regard to health outcomes and therefore non-differential.

The percentage agreement between proxy- and self-reports is the most commonly used and easiest way to measure their concordance, but it does not account for chance, which is accounted for by the κ statistics.6 Kappa values are very sensitive to the extreme levels of the prevalence of the observed condition or characteristic (such as exposure), and therefore a comparison of κ values between studies with different prevalence rates is difficult.23 This, however, does not apply to our study and other studies which investigate agreement between job titles or occupational codes in the general population.

In summary, in agreement with other studies, we found a statistically significant probability of misreporting partners' occupation by women, even when a broader standard occupational classification level was applied. This raises questions on the reliability of proxy-derived data and its appropriateness for estimating risks in occupational epidemiological studies.


Supplementary materials

  • Web Only Data oem.2009.052506

    Files in this Data Supplement:


  • Funding Asthma UK, Summit House, 70 Wilson Street, London EC2A 2DB.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.