Aim: To validate short term recall of mobile phone use within Interphone, an international collaborative case control study of tumours of the brain, acoustic nerve, and salivary glands related to mobile telephone use.
Methods: Mobile phone use of 672 volunteers in 11 countries was recorded by operators or through the use of software modified phones, and compared to use recalled six months later using the Interphone study questionnaire. Agreement between recalled and actual phone use was analysed using both categorical and continuous measures of number and duration of phone calls.
Results: Correlations between recalled and actual phone use were moderate to high (ranging from 0.5 to 0.8 across countries) and of the same order for number and duration of calls. The kappa statistic demonstrated fair to moderate agreement for both number and duration of calls (weighted kappa ranging from 0.20 to 0.60 across countries). On average, subjects underestimated the number of calls per month (geometric mean ratio of recalled to actual = 0.92, 95% CI 0.85 to 0.99), whereas duration of calls was overestimated (geometric mean ratio = 1.42, 95% CI 1.29 to 1.56). The ratio of recalled to actual use increased with level of use, showing underestimation in light users and overestimation in heavy users. There was substantial heterogeneity in this ratio between countries. Inter-individual variation was also large, and increased with level of use.
Conclusions: Volunteer subjects recalled their recent phone use with moderate systematic error and substantial random error. This large random error can be expected to reduce the power of the Interphone study to detect an increase in risk of brain, acoustic nerve, and parotid gland tumours with increasing mobile phone use, if one exists.
- CAPI, computer assisted personal interview
- SMP, software modified phone
- mobile phones
- validation studies
Statistics from Altmetric.com
Widespread and increasing use of mobile phones over the past decade has raised concerns about their possible health effects. This has prompted a series of epidemiological studies, particularly focusing on the risk of brain tumours related to mobile phone use. Studies of brain cancer risk related to mobile phone use have used mainly the case control approach, as summarised in several recent reviews.1,2 Exposure assessments in most case control studies have relied on participants’ self reports of phone use, as it is usually impossible to obtain long term independent records of phone use. Very few studies have, however, attempted to validate recalled phone use.3
Interphone is an international collaborative case control study investigating whether mobile telephone use (and particularly radio frequency exposure from this use) is related to risk of tumours of the brain, acoustic nerve, and parotid gland.4,5 Thirteen countries worldwide are participating following a common protocol. Exposure assessment in Interphone is primarily based on a standardised personal interview and includes a full history of mobile phone use. The validity of this method of assessing exposure is important to the interpretation of Interphone results, which are now starting to be published.6,7,8,9,10 Validation studies, comparing recalled phone use with independently recorded data, have been carried out in most countries participating in Interphone.
Reports of two national components of the Interphone validation studies in the Northern UK and Germany, have recently been published and show only moderate correlation between recorded and self reported phone use (correlation coefficients between 0.5 and 0.6).11–13 The UK-North study, moreover, reported a high level of overreporting of both number and duration of calls (by a factor of 1.7 and 2.8 on average respectively) and a high level of variability in recall.11
Here, we report the combined findings of the Interphone validation studies in the 11 countries that have conducted the studies. The studies compare recalled phone use against phone use recorded by operators and/or software modified mobile phones.
Data were collected in 11 countries (table 1) following a common core protocol. In the UK-North, two separate studies were carried out, one using only software modified phones and one using only mobile phone operators’ records to collect data on individual phone use. The latter followed a somewhat different protocol.11
Selection of volunteers
In each country, between 40 and 100 volunteers were recruited to the study. In most countries (all but Australia and the UK-North operator study), volunteers were selected from colleagues and acquaintances of the investigators. It was not feasible to recruit random samples of mobile phone users in most countries because of the valuable equipment (that is, the software modified mobile phones) used in these studies. The UK-North operator study recruited volunteers through advertising in local newspapers, advertising to local council and university staff, and distribution of letters to areas profiled for their socioeconomic status.11 In Australia, subjects were chosen from controls who had already taken part in the main Interphone case control study and who had expressed enthusiasm to participate in further research. In each of the countries an effort was made to select volunteers to correspond broadly to the Interphone study population with respect to age and sex.
The studies were approved by the IARC Ethical Review Committee and by the relevant ethical committees of the participating countries.
Actual phone use
Data on actual phone use were collected from network operators in countries where this was possible (all except Sweden, Denmark, and New Zealand) and through the use of software modified phones (SMPs). Operators recorded date, time, and duration of each call made or received for periods of 3–8 months. The UK-North operator study included outgoing calls only.
SMPs are normal mobile phones whose software has been modified to record the date, starting time, and duration of each call, as well as information on power levels used for radio frequency exposure assessment purposes in Interphone. Volunteers used the SMPs with their own phone SIM card, thus maintaining their own phone numbers and subscriptions. The SMPs were used for a period of about one month by about 40 volunteers in each country. In New Zealand, 15 out of 35 volunteers who originally used the phones had to be excluded because of errors that prevented the matching of subjects to their SMP data (n = 7) and because a number of subjects had their phones stolen (n = 2) or moved oversees and could not be interviewed (n = 6).
When both operator and SMP data were available for a subject, network operator data were used as the preferred measure of actual phone use as they were collected for longer periods of time; SMP data were used otherwise. Detailed comparisons of the two data sources in countries and time periods where both were available have shown near perfect agreement.
Actual phone use was calculated as the average number of calls per month (made and received), and the average total (per person) duration of calls per month (in minutes), from operator or SMP data.
Recalled phone use
Data on recalled phone use was collected by use of the Interphone computer assisted personal interview (CAPI),5 which was administered at least six months and up to 12 months after the end of the monitoring period. The mobile phone section of the CAPI includes an “event history calendar”, structured according to factors that may change patterns of phone use. For each period of mobile phone use, more specific questions are asked concerning average duration and number of calls made or received. A shorter postal validation questionnaire was developed for use in countries that did not have the resources to interview participants in person (UK-North and Sweden). In the UK-North operator study, questionnaires were sent out directly after the end of the monitoring period.
Recalled phone use was calculated from questionnaire responses for the period overlapping the operators’ recordings or SMP use.
Analyses compared phone use reported by the volunteers in the questionnaires (recalled use) and phone use recorded by the operators or SMPs (actual use) for two measures of phone use: the number of calls made and received per month, and the total duration of call time per month (in minutes). Agreement between recalled and actual phone use according to these two measures was analysed on both a categorical and continuous scale.
Agreement between categories of number and duration of phone use was tested using the kappa statistic, which measures the amount of agreement between two measures beyond that expected by chance.14 Kappa statistics range from –1 (complete disagreement) to 1 (perfect agreement). A value of zero indicates the level of agreement expected by chance. Quintiles of the actual phone use distribution were used to categorise the data. Each country had data in at least four of the five quintiles, most in all five. Weighted kappa values and their 95% confidence intervals were calculated.
On the continuous scale, all analyses used log-transformed data as both number of calls and duration of calls have very skewed distributions. Pearson correlation coefficients were calculated between recalled and actual phone use. The level of agreement between the continuous measures of phone use was measured by calculating the ratio of recalled phone use to actual phone use. The mean of this ratio represents the average level of over- or underestimation, and its spread provides a measure of the variation between individuals and therefore of the random error in recall. A ratio was of one indicates absence of bias in recall, or complete agreement. For presentation of results, the mean and confidence limits of the log-transformed data were exponentiated to the arithmetic scale and the geometric mean of the ratio was presented. Differences in the ratio of recalled to actual phone use between sexes, age groups, and other variables of interest were tested using analysis of variance and always adjusted for country. The graphical method of Bland and Altman was used to illustrate the mean ratio and its limits of agreement, and to assess the relation between the extent of over or underestimation and the level of use.15,16 Following the Bland-Altman method, the ratio of the two measures (recalled to actual phone use) was plotted against their average on a log scale and the 95% limit of agreement calculated as the mean ratio plus or minus two standard deviations. Linear regression was used to examine the relation between the extent of over- or underestimation (ratio of the two measures) and level of use (average of the two measures), adjusting for country. The residuals of this regression were regressed against the level of use to examine whether variance in the error increased with increasing use.
Analyses for all countries combined were carried out both with and without the UK-North operator study as this study followed a somewhat different protocol. Additionally, analyses were carried out excluding the countries that did not use the CAPI questionnaire (UK and Sweden).
A total of 672 subjects with available operator or SMP data completed questionnaires (table 1). Of these, 663 subjects completed information on call duration (table 2). Figures 1A and B show, respectively for the number and the duration of calls, phone use reported in the questionnaire against actual phone use recorded by operator or SMP on the log scale. Correlation coefficients for number of calls (table 1) ranged from 0.44 (in UK-North operator study) to 0.79 in New Zealand, with a correlation of 0.69 for all countries combined. For duration of calls (table 2) correlations were similar and ranged from 0.50 (in France) to 0.81 (in Australia) with an overall correlation of 0.69.
There was agreement between quintiles of recalled and actual phone use in 43% of subjects for the number of calls and 41% for duration (tables 3 and 4). Disagreement of two quintiles or more was found in 20% of subjects for number of calls and 21% for duration of calls. The weighted kappa statistics for quintiles of number of calls (table 1) ranged from 0.21 in New Zealand to 0.59 in Australia; for all countries combined the kappa was 0.50. For duration of calls (table 2) kappa values ranged from 0.27 in Germany to 0.63 in the UK operator study, with a value of 0.49 for all countries combined.
For number of calls, the average level of over- or underestimation, as measured by the ratio of recalled to actual number of calls, ranged across countries from underestimation by a factor of 2.4 (geometric mean ratio = 0.42) in New Zealand, to overestimation by a factor of 1.6 (ratio = 1.61) in the UK-North operator study (table 1). Box plots (fig 2A) show the distribution of ratios in each country and demonstrate very wide variation between individuals. Differences between countries in the mean ratio were statistically significant (p<0.0001). For all countries combined, the ratio for number of calls was 0.92 (95% CI 0.85 to 0.99), indicating 9% underestimation on average. The mean ratio decreased slightly when restricting to countries that used CAPI (ratio = 0.86), or when excluding the UK-North operator study (ratio = 0.84). The 95% limits of agreement for the mean ratio of 0.92 ranged from 0.12 to 7.85 (table 1, fig 3A), indicating very wide variation between volunteers from around eight times under to around eight times overestimation. Table 1 includes a column with the 95% limits of agreement in each country; the widest interval is seen in the UK-North operator study (0.12 to 21.82). Also shown in Table 1 is the percentage of subjects who had a ratio of recalled to actual phone use between 0.5 and 2—that is, subjects who under- or overestimated their phone use by less than a factor of 2. This percentage ranges from 42% in New Zealand to 73% in Denmark and is 57% overall.
Linear regression showed an increase in the ratio of recalled to actual numbers of calls as the average of the two increased (fig 3A, regression coefficient 0.23, p<0.0001). This regression equation predicts a ratio of 0.59 at the 10th percentile of use (15 calls/month) and a ratio of 1.13 at the 90th percentile (257 calls/month). Variance did not increase with level of use on the log scale (p = 0.999), implying that variance increased on the arithmetic scale. Regression analyses in each country separately demonstrated positive regression slopes, indicating increasing ratios with increasing numbers of calls, in all but two countries (UK-North operator study and Australia).
The ratio of recalled to actual duration of calls was larger on average than that for number of calls (table 2). It ranged from 0.56 in Norway (that is, underestimation by a factor 1.8) to 2.47 in the UK-North operator study (that is, overestimation by a factor 2.5). In most countries (all but Norway and New Zealand), subjects tended to overestimate their call duration. Box plots (fig 2B) demonstrate the distribution of ratios in each country. Differences between countries were highly significant (p<0.0001). For all countries combined, duration of calls was overestimated by 42% (ratio = 1.42, 95% CI 1.29 to 1.56). A ratio of 1.31 was found when restricting to CAPI using countries, or when excluding the UK-North operator study. Individual variation was larger for duration of calls than for number of calls; the limits of agreement around the ratio ranged from 0.12 to 17.37 for all countries combined (table 2, fig 3B) and this interval narrowed only slightly when the UK-North operator operators’ study was excluded. Only 42% of subjects under or overestimated their total duration of calls by less than a factor 2 (table 2).
The ratio of recalled to actual duration of use increased with increasing duration of phone use (fig 3B, regression coefficient 0.26, p<0.0001). This regression predicts a ratio of 0.78 at the 10th percentile of duration (21 min/month) and a ratio of 2.00 at the 90th percentile (788 min/month). Again, variance in the (log) ratio did not increase with level of use (p = 0.999), meaning that variance on the arithmetic scale did increase. Similar to the analyses of number of calls, analyses in each country separately indicated increasing ratios with increasing duration of calls in all countries except the UK-North operator study and Australia.
In CAPI, duration of calls is reported either by giving the average duration of a call, or by estimating the total duration of calls per day, week, or month. Subjects who reported their phone use per call showed greater levels of overestimation of call duration (ratio = 1.47; 95% CI 1.31 to 1.64) than those who reported per day/week/month (ratio = 0.88; 95% CI 0.73 to 1.07). No important differences in under- or overestimation of number or duration of calls were found between different sex and age groups, or between short term (⩽1 year) and long term (>5 years) mobile phone users.
This study compares recalled mobile telephone use with assumed accurate data recorded by network operators or SMPs, thereby providing information essential to the interpretation of studies of mobile phone use in which exposure assessments are based on recall. The findings give important indications of the level of systematic and random error associated with recalling of phone use.
Correlations between recalled and actual phone use were moderate to high, ranging from 0.5 to 0.8 in the different countries. They are generally somewhat lower than those reported in a US study of recall compared to billing records (0.74).3 These correlations do not give information on the level of numerical agreement between the two measures.
Kappa statistics give a measure of agreement between categories of a variable. There are no absolute standards for interpreting kappa values.17 Values below 0.2 have been suggested to represent slight agreement, values of 0.2–0.4 fair agreement, values of 0.4–0.6 moderate agreement, and values of over 0.6 good agreement.18 The kappa values reported in this study (between 0.2 and 0.6 across countries) can thus be described as fair to moderate. It is common practice in epidemiological studies of mobile phone use to categorise continuous measures of phone use, such as cumulative number of calls and cumulative duration of calls, using quintiles (or other percentiles) of their distribution.6,7,8,9,10 This study shows that, using quintile categories, the exposure of up to 60% of subjects may be misclassified, albeit that most (40%) are misclassified to the adjacent quintile.
On a continuous scale, levels of under- and overreporting in recalled phone use found in this study show that, even though the systematic error in recalled phone use was relatively small on average, variation between subjects was very large and a substantial proportion of subjects considerably over- or underestimated their phone use. This indicates a large random error in recalled phone use. Random errors in recall of phone use can have an important impact on risk estimates based on them. Non-differential random errors usually bias risk estimates for dichotomous and continuous exposures and trend effects for ordered polytomous exposures towards the null (no effect), and increase their uncertainty, making it more likely that real associations are not detected.19 Such errors do not normally induce spurious associations if in fact no association exists between exposure and outcome. However, for comparisons of exposure categories within polytomous exposures, bias can be in either direction.19–22 Systematic under- or overestimation of phone use leads to a bias of risk estimates upwards or downwards: if all subjects overestimate, measured relative risk estimates will be lower than the true relative risk, and vice versa. Differential recall errors in cases and controls may also lead to a bias, the direction of which depends on the direction of the differences between cases and controls.
The errors estimated in the current validation studies have been used in simulations to assess their impact on tumour risk estimates. These simulations show that when random errors are large (of the level found in these validation studies) they have a large impact, biasing risk estimates for continuous exposure towards a null effect (Vrijheid et al, unpublished data). The simulations also show that random errors have a larger impact on the risk estimates than do systematic errors, even when relatively extreme systematic errors are modelled and when the systematic errors simulated differ between cases and controls.
That error in recall of phone use varied between countries requires some consideration. We find a wide range of effects: both for number of calls and duration of calls the average error ranges from underestimation by a factor of about two in some countries to overestimation by a similar factor in others. The high levels of underestimation found in New Zealand and Norway are notable; there are no clear differences in the protocol that may explain them, although differences between convenience samples may affect comparability between countries. New Zealand and Norway have some of the lowest average levels of use, possibly explaining the higher levels of underestimation. The UK-North operator study shows a higher level of overestimation than the other studies. It used a somewhat different protocol: in particular, the study recruited subjects through advertisement, was limited to outgoing phone calls, and used a short postal questionnaire administered directly after the monitoring period.11 Other countries that used a postal questionnaire (Sweden and the UK-North SMP study) did not show similar levels of overestimation however. It is particularly notable that the two validation studies carried out in the UK at different time periods and with different protocols, but in the same region, showed different levels of error. The main difference between the two studies is the method used to record actual phone use (operators v SMPs). Although these methods record the same data (date, time, and duration of each call), the use of the SMP may have influenced a subject’s use during the monitoring period. However, when subjects who reported a change of use in the monitoring period were excluded, the results did not change. The UK-North operator study was also based on relatively small numbers of calls as the study was carried out in 2000, when phone use was lower than in more recent years.11 In all other countries, tendencies for a small underestimation of number of calls and for a larger overestimation of duration of calls are found nevertheless.
We find that the level of systematic error is related to the level of phone use. For both number and duration of calls, it appears that light users underestimated and heavy users overestimated their mobile phone use. This tendency was found in almost all countries. The random error in recall also increased with increasing use. These effects may have important implications for the exposure-response analysis in Interphone. In particular, it is expected that larger overestimation and random error in heavy users will lead to an underestimation of tumour risk in this group of users if these errors are non-differential (that is, not related to case or control status).
The results also suggest that the level of overestimation is related to the option chosen to report call duration (per call or per time period). This is not surprising for two reasons. Firstly, for those who reported duration per call, the total duration of calls was calculated from two values (number of calls and duration per call), both of which are subject to error. For those who reported duration per day, week, or month, only one value was estimated. Secondly, the questionnaire gave only two options for duration of calls: minutes or hours. Subjects who made many short calls of less than a minute might have overestimated their duration by having to answer in minutes per call.
These validation studies were generally carried out with healthy volunteer subjects who may not have the same age, sex, and socioeconomic status as cases and controls in Interphone, and who may have been more motivated to give accurate accounts of their phone use. We also only tested recall over a six month period, whereas historical phone use extending years or decades in the past is an important component of exposure assessment in the Interphone study. These differences might suggest there will be less accurate recall in Interphone subjects per se, although longer term recall was found to be more accurate than short term recall in one study in France (Hours, personal communication). Volunteers in the validation studies were generally younger than Interphone cases and controls. Analyses by age, however, do not indicate appreciable differences in the agreement by levels of this variable. The validation studies in Australia used volunteers from the whole Interphone control group. They had already completed CAPI once, but before the period of SMP use or operator recording. Their recall error was similar to that of all countries combined, and their correlation coefficient and kappa measure of agreement were among the highest in the different countries. In order to address questions regarding the long term recall of Interphone cases and controls, billing records of Interphone subjects are also being collected from phone service providers. This is only possible in a few countries, but will provide important insights into the possibility of differential recall between brain tumour cases and healthy controls.
In summary, validation study subjects recalled relatively recent phone use with moderate systematic error and substantial random error. These large random errors are likely to lead to important reductions in the power of the Interphone study to detect an increase in risk if one exists. Results of this study, together with those of the simulation studies and of validation studies using billing records of Interphone subjects, will play an important role in interpretation of the results of the Interphone study and of similar studies in which exposure assessments are based on recall of mobile phone use.
On average, volunteers tend to slightly underestimate the number of mobile phone calls they make, but overestimate total duration of calls by about 40%.
There is large variation between individuals in the level of under- or overestimation of mobile phone use and variation increases with level of use.
The random errors quantified in this study are likely to reduce the power of the Interphone study to detect an increase in risk if one exists.
These findings will play an important role in the interpretation of results of studies of mobile phone use in which exposure assessments are based on recall.
The authors thank the volunteers and the network operators in all countries, as well as the SMP manufacturers, for their participation. Emilie Combalot and Monika Moisonnier assisted in the preparation of the data at IARC. Angus Cook (University of Western Australia) contributed to the design of the validation study protocol and questionnaire, and Cara Marshall coordinated the distribution of the SMPs in New Zealand.
The other members of the Interphone Study Group include: Canada: Professor Daniel Krewski, Dr Mary McBride, Dr Marie-Elise Parent, and Professor Jack Siemiatycki; Japan: Dr Naohito Yamaguchi; UK: Professor Anthony Swerdlow.
Funding sources: we acknowledge funding from the European Union Fifth Framework Program, “Quality of Life and Management of living Resources” (contract QLK4-CT-1999-01563), the International Union against Cancer (UICC), and national funding sources. The UICC received funds for this purpose from the Mobile Manufacturers’ Forum and GSM Association. Provision of funds to the Interphone study investigators via the UICC was governed by agreements that guaranteed Interphone’s complete scientific independence. These agreements are publicly available at http://www.iarc.fr/pageroot/UNITS/RCA4.html. Funding sources for the national validation studies included: Australia: National Health and Medical Research Council (EME Grant 219129), Bruce Armstrong is supported by a programme grant from the University of Sydney Medical Foundation; Finland: Emil Aaltonen Foundation, TEKES National Technology Agency, and Academy of Finland; France: French Association for Research on Cancer (ARC); Germany: Ministry for Environment, Agriculture and Consumer protection of North Rhine Westphalia; New Zealand: Health Research Council of New Zealand, New Zealand Cancer Society, Waikato Medical Research Foundation. UK: Department of Health, contract ref RRX51.
Competing interests: as stated above, the UICC received funding for Interphone from the Mobile Manufacturers’ Forum and the GSM Association, and Interphone’s complete scientific independence from these funders was guaranteed. The University of Leeds (authors: SJH, RCP, PAMcK) and the University of Lyon (author MH) received some financial support for the main Interphone study (not the validation study), from, respectively, the UK Network Operators (O2, Orange, T-Mobile, Vodafone, ‘3’) and French Network Operators (Orange, SFR, Bouygues) under legal signed contractual agreements which ensure complete independence for the scientific investigators. These funders had no involvement in the study design, data collection, statistical analysis or interpretation of the data, report writing, or decision to submit this paper. Individual authors have no competing interests to declare.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.