Article Text

Download PDFPDF

Original article
Stability of measured and modelled spatial contrasts in NO2 over time
  1. Marloes Eeftens1,
  2. Rob Beelen1,
  3. Paul Fischer2,
  4. Bert Brunekreef1,3,
  5. Kees Meliefste1,
  6. Gerard Hoek1
  1. 1Institute for Risk Assessment Sciences, Utrecht University, Utrecht, The Netherlands
  2. 2Center for Environmental Health Research, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
  3. 3Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
  1. Correspondence to Marloes Eeftens, Institute for Risk Assessment Sciences (IRAS), Utrecht University, PO Box 80178, 3508 TD Utrecht, The Netherlands; m.r.eeftens{at}


Objectives Land use regression (LUR) modelling is a popular method to estimate outdoor air pollution concentrations at the home and/or work addresses of individual subjects in epidemiological studies. Typically, such models are constructed using measurements from dedicated monitoring campaigns lasting up to 1 year. It is unknown to what extent such models can adequately predict concentrations in earlier or later time periods. We tested the stability of measured and modelled spatial contrasts in outdoor nitrogen dioxide (NO2) pollution across the Netherlands over 8 years.

Methods NO2 measurements were conducted at 40 locations in the Netherlands in 1999–2000. In 2007, NO2 was again measured at 144 locations, of which 35 were the same as in 1999–2000. This enabled us to compare measurements as well as model predictions between the two time periods.

Results NO2 measurements conducted in 2007 agreed well with NO2 measurements taken in 1999–2000 at the same locations (R2=0.86). LUR models from 1999–2000 and 2007 explained 85% and 86% of observed spatial variance, respectively. The 2007 LUR model explained 77% of spatial variability in the 1999–2000 measurements and the 1999–2000 model explained 81% of variability in the 2007 measurements.

Conclusion We found good agreement between measured spatial contrasts in outdoor NO2 in 1999–2000 and 2007. LUR models predicted spatial contrast 8 years in the past (2007 model) and 8 years in the future (1999–2000 model) well. This supports the use of LUR models in epidemiological studies with health data available for a later or earlier timepoint.

  • Land use regression
  • air pollution
  • long term
  • geographic information systems
  • traffic
  • nitrogen oxides
  • exposure assessment
  • exposure monitoring
  • retrospective exposure assessment
  • pollution

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

What this paper adds

  • Although many studies currently assume that concentration contrasts estimated by land use regression (LUR) models are valid over long periods, this assumption had not been verified.

  • The results of this study suggest that LUR models can be applied to estimate concentrations several years forwards or backwards in time, if land use change and trends in emissions are minimal, as is the case in many developed countries.

  • The discussion provides considerations and suggestions for authors who deal with the application of LUR models backwards or forwards in time in other study areas.


Ambient levels of outdoor air pollution still pose a risk to public health in the developed world.1 2 Many studies have evaluated exposure contrasts between and within communities, demonstrating that local sources such as busy roads are important contributors to air pollution. Increasingly, studies have demonstrated that spatial contrasts in exposure related to local sources such as busy roads are associated with adverse health outcomes.3–7 Within community contrasts have been related to adverse health effects including premature mortality.8–10 This is the reason why research has recently focused on the development of methods to estimate small-scale spatial contrasts.11 These methods include interpolation of monitoring data, dispersion modelling and, increasingly, land use regression (LUR) models.11 12

Especially in prospective cohort studies, there is interest in estimating long-term exposures. LUR models require monitoring data from a fairly large number of monitoring sites, typically 40–80.12 Because of the typically low spatial density of routine monitoring networks, LUR models are often based on dedicated monitoring campaigns which have assessed concentrations measured over a single year or less. Typical purpose-designed monitoring campaigns are conducted for 1–4 week-long sampling periods spread over 1 year.12 Often, the health effects of interest have occurred in the past, or have developed over a long period of time, and hence there is a growing interest in estimating historical exposures using LUR models developed in a later time period. Similarly, there is a need for extrapolation forwards in time, for instance in cohort studies planning multiple follow-up measurements of health status.

Currently, individual exposure estimates often assume that the concentrations estimated from the LUR model are valid for long periods of time, but so far there has been little verification of this. Multiple studies showed that pollution contrasts remained constant over a short time span between different monitoring rounds which were part of the same campaign.13 14 Over a longer period of time, Beelen et al3 demonstrated that traffic intensities measured on municipal roads, provincial roads and national roads in the Netherlands were highly correlated between 1986 and 1996. However, traffic intensities are just one element in LUR models. There is, therefore, a need for empirical studies on the stability of spatial pollution contrasts over a longer period of time (eg, 10 years).

In connection with the TRAPCA study (Traffic Related Air Pollution and Children's Asthma), we earlier reported development of a LUR model in three areas of the Netherlands for particulate matter air pollution,4 which was subsequently applied in a large birth cohort study.15 16 A LUR model for nitrogen dioxide (NO2) in TRAPCA was reported as well.17 To evaluate the stability of the TRAPCA NO2 model, we revisited 35 of the original monitoring sites in 2007. The aim of this paper is to compare two LUR models developed for different years and based on different measuring campaigns of outdoor NO2 concentrations. We have looked at the ability of the 1999–2000 model to estimate 2007 concentrations, and at the ability of the 2007 model to estimate the outdoor NO2 pollution levels which were measured 8 years earlier.


For this study we used concentration data from two LUR studies which were carried out 8 years apart. For the TRAPCA study, PM10, PM2.5, PM2.5 reflectance and NO2 were measured between 1 March 1999 and 20 April 2000 at 40 sites in the Netherlands. The TRACHEA study (Traffic Related Air pollution and Children's respiratory HEalth and Allergies) measured NO2 and NOX at 144 sites in 2007, including 35 of the original TRAPCA sites. In this paper, we will focus only on the common component NO2, and the main analyses are limited to the 35 sites where measurements took place in both years. We used a common method for development of the LUR models. Online supplement 1 describes the development of the TRACHEA model using all 144 sites and contains an evaluation of the differences in models depending on number of sites, modelling strategy and GIS predictor data.

The TRAPCA 1999–2000 model

The study design and LUR model development for the TRAPCA study have been described previously.12 15 17 Briefly, monitoring sites were selected to cover the main regions of the PIAMA (Prevention and Incidence of Asthma and Mite Allergy) cohort study. NO2 was measured for four 2-week periods spread over a year with Palmes tubes. The final TRAPCA model for NO2 included five predictor variables: a class variable for region, high traffic roads within a 250 m buffer, medium traffic roads within a 1000 m buffer, number of households within a 300 m circular buffer and a 1000–5000 m donut buffer of the monitoring sites. This model explained 85% of total concentration variability in NO2 concentrations.17 We developed a new TRAPCA 1999–2000 model based on the 35 sites which were also sampled in the TRACHEA study. The same predictor variables were evaluated as in the original model: distance to nearest BASNET 1–7 type road, number of households in 300, 1000 and 5000 m buffers, population in 300, 1000 and 5000 m buffers, number of roads of BASNET categories 1–3 in 250 and 1000 m buffers, number of roads of BASNET categories 5–7 in 250 and 1000 m buffers, number of roads of BASNET categories 9–11 in 250 and 1000 m buffers and an indicator variable for region.4 17 BASNET is a Dutch road network in which six different road classes are distinguished, ranging from highways to small peripheral roads.

All 14 potential predictor variables were first evaluated individually. The one which gave the highest value for adjusted explained variance (R2) was included. A predictor was only included if it contributed positively to the concentration. Additional predictors were included if the adjusted R2 increased by at least 1% compared to the previous model. New variables were not included if their addition changed the direction of effect of previously included variables. We validated the model using leave-one-out cross validation.

The TRACHEA 2007 study

Study design

The TRACHEA study was conducted in 2007. We chose to measure only NO2 and NOX for this study as these components are relatively easy to monitor with cheap, passive samplers. Prior to TRACHEA measurements, all TRAPCA locations were re-contacted and 35 out of 40 original sites were eventually re-sampled for TRACHEA. In addition, the sampling network was extended to a total of 144 sites, to improve the coverage of the areas of residence of a number of other Dutch cohort studies, and of the 8th year addresses of the PIAMA cohort, which had spread out since recruitment. In total, 26 sites were regional background locations, 78 were urban background sites and the remaining 40 were selected close to major roads (figure 1 in online supplement 2).

Nitrogen oxides were measured simultaneously at all sites for four week-long periods in the four different seasons of 2007. The exact sampling dates were 17 January 2007–24 January 2007, 18 April 2007–25 April 2007, 13 June 2007–20 June 2007 and 26 September 2007–3 October 2007. NO2 and NOX concentrations were measured with Ogawa badges using procedures described previously.18 Briefly, the Ogawa badge is a passive sampler which collects nitrogen oxides through diffusion on a precoated filter. The filter is extracted in the laboratory and analysed spectrophotometrically for nitrite using a Saltzman reaction. Four lab blanks from each batch of 40 filters were kept in the laboratory. The batch-specific average lab blank result was subtracted from all measurements. In addition, field blanks and field duplicates were taken to document detection limit and precision. Furthermore, Ogawa sampler measurements were compared with measurements from the National Air Quality Monitoring Network at two regional, two urban and two street continuous monitoring sites.

For the development of the exposure model, each sampling location was characterised by a single pollution level, the four-period mean. A small number of 17 samplers (3% of total) were lost, and so four-period means were calculated per site after using multiple imputation using the proc MI procedure in SAS 9.1 (SAS Institute) and imputing 10 times, stratifying for site type. The MI procedure takes account of both temporal variability between rounds and variability between sites. The Pearson correlation between crude four-period means and means after multiple imputation was above 0.99.

Predictor variables

All measurement locations were successfully geocoded using the Address Coordinates Netherlands database from the year 2000.19 In total, 33 predictor variables were evaluated to explain the variability in concentrations. We calculated the number of inhabitants and home addresses within 300 m, 1000 m and 5000 m buffers of the site. Six different land use variables were calculated in the same buffers: land use for low-density residential purposes, industry, ports, urban green, forests and agriculture (table 1 in online supplement 3). As local sources, we evaluated total average traffic intensity on the nearest road, nearest major road (≥10 000 motor vehicles per hour (mvh)/24 h), nearest motorway, distance to these three types of roads, as well as total average traffic density of all roads within a 100 m or 250 m circular buffer from that site (table 1 in online supplement 3). As was previously done in TRAPCA, an indicator variable for region (North, Middle, West) was included to account for regional variation.

We selected the buffer sizes for land use, population and households to reflect known distances of pollutant dispersion,12 taking into account the resolution of available maps. The same buffer sizes were also used in previous studies.3 20 Both home address density and population density maps were available for 1999. Land use maps on a 100 m×100 m raster were available from Corine (COordination and INformation on the Environment programme, initiated by the European Commission) for the year 2000.21 All land use variables are given in fractions of the total buffer, so if a site is 100% surrounded by residential land within a 300 m buffer, the fraction of residential land in this buffer is 1. Since the original dataset recognises 44 different land use categories, we aggregated them into 10 categories considered relevant for air pollution, as was done in the EU APMoSPHERE project (Air Pollution Modelling for Support to Policy on Health and Environmental Risk in Europe).20 Ultimately, we evaluated six aggregated categories to build the model. We did not use the high-density residential category, as this did not occur in the Netherlands. We left out the transport category because we had more accurate road network data, with linked traffic intensities. We did not evaluate the airport variable as there was only one site within 5000 m of an airport. The ‘total built-up’ category was a combination of the categories low-density residential, industry, port and airport, so we evaluated these separately to avoid overlap.

A digital road network with linked estimated traffic intensity data (mvh/24 h) was available with national coverage. This allowed us to calculate the distance and traffic intensity on the nearest road, nearest national road and nearest major road (≥10 000 mvh/24 h) for each site. The 100 m and 250 m buffer variables were derived by multiplying the length of each road segment (in metres) within the buffer by the traffic intensity of that segment, hence the unit mvh/24 h m. The same buffer sizes of 100 and 250 m were used by Beelen et al in 2007.3

At 26 of the 144 original TRACHEA locations and at four of the 35 sites measured in both studies, the ‘nearest road’ to the measuring site was between 200 and 3220 m from the site with an average distance of 706 m Although some sites were purposely selected at background locations away from roadsides, none of the sites were actually further than 200 m from the nearest road. The unrealistic distances to the nearest road found in GIS are a result of the incomplete representation of minor roads in the network. The relatively minor roads which were actually closest to the site were not included in the road network. We assumed that at distances of 200 m and more, the influence of roads on the NO2 concentration is likely to be small. To properly evaluate the predictive power of the variable ‘traffic intensity on the nearest road’ in LUR models, we set the traffic intensity of the nearest road to 1000 mvh/24 h (which was the lowest number otherwise reported) for those sites which were further than 200 m from the nearest road. For the buffer calculations, we used the original data.

Land-use regression modelling

All 33 potential TRACHEA predictor variables were first evaluated individually, using the same stepwise inclusion criteria as used in the TRAPCA model. New variables were only included if the sign of the parameter estimate had the a priori specified direction, for example, positive for industry and ports, negative for urban green.

Comparing two studies and two models

As mentioned previously, 35 of 40 TRACHEA measurements were made in the exact same locations as in TRAPCA, enabling us to directly compare measurements taken in 1999–2000 for TRAPCA to those taken in 2007 for TRACHEA. The predictions of the TRAPCA model could also be compared to the concentrations measured in 2007. Similarly, we were able to assess the ability of the TRACHEA model to predict the concentrations measured in TRAPCA in 1999–2000. We also compared the predictions of the two models.

We further compared measured and modelled data using the original TRAPCA model using 40 sites. We also performed a similar comparison using TRACHEA modelled data based on all 144 sites and using a different modelling strategy (online supplement 1).


TRAPCA 1999–2000

The new TRAPCA model based on 35 sites included four of the same five variables included in the original model: address density in 0–300 m and 300–5000 m buffers, roads of BASNET categories 1–3 in a 250 m buffer, and an indicator variable for region. The model explained 84.6% of the variance with an RMSE of 4.21, while in leave-one-out cross validation, 67.5% of the variance was explained (table 1). NO2 model estimates ranged between 14.21 and 50.59 μg/m3.

Table 1

Land use regression model for NO2 in TRAPCA 1999–2000, based on 35 sites, used in both TRAPCA 1999–2000 and TRACHEA 2007


Substantial variability in NO2 concentration was found between the monitoring sites (figure 1 in online supplement 4). The within-site Pearson correlation (r) between the concentrations measured in the four monitoring rounds was above 0.88, documenting the stability of spatial contrasts over a year (table 1 in online supplement 4). Ogawa NO2 measurements agreed well with the chemiluminescence measurements from the national monitoring network (figure 1 in online supplement 5).

The TRACHEA model included three variables: population in a 5000 m buffer, traffic intensity on the nearest road and an indicator variable for region. Predictor variables included in the TRACHEA 2007 regression model are listed in table 2. NO2 estimates ranged from 13.78 μg/m3 to 51.18 μg/m3 and the model explained 85.7% of the variance in NO2 concentrations with an RMSE of 4.21 μg/m3. Leave-one-out cross validation of the model using 34 locations to predict the concentration at the one omitted, resulted in an R2 of 80.0%.

Table 2

Land use regression model for NO2 in TRACHEA 1999–2000, based on 35 sites, used in both TRAPCA 1999–2000 and TRACHEA 2007

Comparison between the TRAPCA and TRACHEA studies

The agreement between the measurements taken in 1999–2000 and those taken in 2007 was high (R2=0.86) (figure 1A). The figure shows that despite the differences in measurement method and averaging times, concentrations not only showed a high R2 but were also very similar in both measurement periods. When revisiting the sites, we found that traffic reduction measures had been put in place at two street locations used in TRAPCA. Both streets had become one-way streets with lower traffic intensities. Both locations indeed showed a reduction in pollution level since 1999–2000. Excluding those two locations slightly increased the R2 value to 0.89.

Figure 1

(A) Comparison between measured concentrations of NO2 in TRAPCA 1999–2000 and TRACHEA 2007 (n=35, R2=0.86). (B) TRAPCA 1999–2000 model predictions of NO2 compared to TRACHEA 2007 measurements (n=35, R2=0.81). (C) TRACHEA 2007 model predictions of NO2 compared to TRAPCA 1999–2000 measurements (n=35, R2=0.77). (D) Comparison between predicted concentrations of NO2 in TRAPCA 1999–2000 and TRACHEA 2007 (n=35, R2=0.89).

To assess the validity of extrapolating LUR models backwards or forwards in time, we compared measured and predicted values for TRAPCA 1999–2000 and TRACHEA 2007. The TRAPCA model predicted the measurements of TRACHEA 2007 very well, forwards in time (R2=0.81) (figure 1B). Model predictions from the TRACHEA 2007 study also compared well to the TRAPCA 1999–2000 measurements, backwards in time, with an explained variance of 77% (figure 1C). Similarly, model predictions from TRAPCA 1999–2000 compared very well to model predictions from TRACHEA 2007 (figure 1D) with an R2 of 0.89.

Comparison of the modelled concentrations from the original TRAPCA model (based on 40 sites) and the measured concentrations of the TRACHEA model (based on 144 sites) showed they were similar (online supplement 6), supporting the robustness of the modelling. Briefly, predictions from the original TRAPCA model explained the 2007 measurements very well (R2=0.82), and predictions from the full 144-site TRACHEA model explained the 1999–2000 measurements well (R2=0.72). A more detailed analysis of the different models is presented in online supplement 6.


We compared two sets of NO2 measurements and two LUR models for different years. NO2 concentrations measured in TRAPCA 1999–2000 agreed well with those measured in TRACHEA 2007 at the same locations, documenting the stability of the spatial contrast. Both models explained a large part of the spatial contrasts in measured NO2. The TRAPCA 1999–2000 model predicted the concentrations in 2007 very well and the TRACHEA 2007 model predicted the 1999–2000 concentrations similarly well.

TRACHEA 2007 model

Compared to other studies which reported LUR models for NO2, the TRACHEA model explained a similar percentage of the spatial variance.12 Traffic indicator variables, number of inhabitants and home addresses within different buffers are common variables in many models, although the exact definitions differ.

Comparison of the 1999 and 2007 models and measurements

Good agreement between the 1999–2000 and 2007 modelled and measured NO2 concentrations was found despite a number of differences between the two monitoring campaigns and the GIS data available. Most of the differences discussed below affect the absolute concentrations but not the spatial variability within each campaign. Assessment of the stability of the spatial contrast was the main goal of our study. The TRAPCA study used Palmes tubes to measure NO2, whereas TRACHEA made use of Ogawa samplers. Both methods are based on passive diffusion and use similar techniques to derive concentrations, and both compared well to chemiluminescence monitoring22 (online supplement 5). Both techniques also compared well in the AIRALLERG study (R2=0.95).23 Therefore, it is unlikely that this influenced the spatial contrast between sites. The different study set-ups of sampling periods of 2 weeks and 1 week in TRAPCA and TRACHEA, respectively, could have resulted in better precision of the annual average for TRAPCA, but it probably had a minor effect on the concentration contrasts between sites. In TRACHEA all measurements were performed simultaneously at all sites, while in TRAPCA this was not feasible and adjustment with a continuous reference site was performed to adjust for temporal variation, possibly leading to less precision. The differences in sampling methodology and short campaigns hamper the assessment of absolute trends in concentration, but this was not a major complication for the comparison of contrasts over time. Evaluation of NO2 concentrations at urban backgrounds from the National Air Quality Monitoring Network showed a small downward trend between 1999 and 2007.24

Previous studies

There is only limited information on the stability of spatial contrasts of measured and especially modelled concentrations. While Pope et al25 showed that good correlations exist between PM2.5 concentrations for 51 background monitoring stations throughout the USA between 1979–1983 and 1999–2000, the present study showed that stability of contrasts also holds true for small-area spatial variations in pollution, as occur within cities. Apart from that, we found convincing evidence that not only measured concentrations but also LUR models based on measurements from 1999–2000 and 2007 predict largely the same contrasts. We previously found that in the Netherlands, traffic intensities between 1986 and 1996 were highly correlated.3 This study shows that the spatial contrasts in NO2 as observed in a later time period were also very stable.


The measurements used to construct both models were conducted in Dutch cities, which have only marginally expanded over the course of this 8-year period. On 1 January 1999, the total population of the Netherlands was 15 760 225, while on 1 January 2007 it was 16 357 992, an increase of 3.8%. Population density increased from 465 to 484 inhabitants per km2, an increase of 4.1%. Total vehicle kilometres increased by 11.7% from 111 383 million kilometres to 124 377 million kilometres. Because engines have become cleaner, the NOX emissions from motorised vehicles in the Netherlands have been reduced by 24% (table 1 in online supplement 7). Similar conditions will apply for many other study areas, especially in Western Europe. However, caution is needed in assuming stable spatial contrasts over longer periods of time, for more dynamically changing study areas, or for sites where major air pollution interventions took place.

Despite the good comparisons found in this study between two sets of measurements, and two models developed 8 years apart, it remains difficult to judge exactly over which time span we can assume spatial contrasts to be stable. More methodological work is necessary in this area. Spatial contrast can be decomposed into regional, urban and local scale components. Changes in the spatial pattern of the regional and urban background could be related to large scale changes in important predictors such as population, traffic (volume and emission factors) and industry or other land use. These could be derived from population and traffic statistics and from changes in land use. It may, however, be difficult to find historic databases of land use and traffic.26 Alternatively, one could look at sector-specific emissions for different parts of the study area. In the Netherlands, emission data are available at a resolution of 1×1 km grids. Emission data are usually available for multiple years, but not at a scale finer than 1×1 km and are therefore typically not appropriate for the local scale. For the local scale, data particularly on traffic intensity and emission factors (related to the composition of the car fleet) are needed to assess whether, for example, contrast between traffic and background locations has changed.

In conclusion, LUR models developed independently using data obtained 8 years apart showed excellent agreement with each other and with NO2 measurements. This supports their use in epidemiological studies which often assume stability of spatial concentration contrasts over long periods of time.



  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.