Introduction

The Generation R Study is a population-based prospective cohort study from fetal life until young adulthood. The study is designed to identify early environmental and genetic causes of normal and abnormal growth, development and health during fetal life, childhood and adulthood. The study focuses on four primary areas of research: (1) growth and physical development; (2) behavioural and cognitive development; (3) diseases in childhood; and (4) health and healthcare for pregnant women and children. The background and specific lines of investigation have been described in detail previously [13]. The main outcomes and determinants are presented in Tables 1 and 2. The general aims of the study are: (1) To describe normal and abnormal growth, development and health from fetal life until young adulthood; (2) To identify biological, environmental and social determinants of normal and abnormal growth, development and health from fetal life until young adulthood; (3) To develop and evaluate strategies for prevention and early identification of groups at risk.

Table 1 Main outcomes per research area
Table 2 Main determinants

Special interest is in exploring the role of early environmental exposures and genetic variants in pathways leading to common diseases during fetal life, childhood and adulthood, of which various examples have been published in this journal [415].

Main outcomes include risk factors for cardiovascular disease, type 2 diabetes, obesity, asthma, psychopathology and riks factors such as impaired physical activity [1670]. Results forthcoming from the Generation R Study have to contribute to the development of strategies for optimizing health and healthcare for pregnant women and children.

Study area

The Generation R Study is conducted in Rotterdam, the second largest city in the Netherlands. Rotterdam is situated in the Western part of the Netherlands on almost 80 km south from Amsterdam, the capital of the Netherlands. The total population consists of about 600,000 inhabitants of almost 150 different ethnicities. The study area is well defined by postal codes and covers more than half of the cities inhabitants (almost 350,000 inhabitants) [16]. The largest ethnic groups in this population are the Dutch (56%), Surinamese (9%), Turkish (7%), Moroccan (6%), Dutch Antillean (3%) and Cape Verdian (3%) groups [2]. The percentages of the non-Dutch groups are higher in younger age groups [2]. The number of children born in this study area is about 4,300 per year. Measurements in the prenatal phase of the study were conducted in two well-equipped research centers in the study area, with a close collaboration with midwives and hospitals. In the preschool period, detailed measurements were conducted in a dedicated research center in the Erasmus Medical Center—Sophia Children’s Hospital and routine care data were collected in five hospitals and sixteen child health centers located in this area.

Study design

Overview

The Generation R Study is a population-based prospective cohort study from fetal life until young adulthood. Mothers with a delivery date between April 2002 and January 2006 were eligible. Extensive assessments have been carried out in mothers and fathers and are currently performed in their children (Tables 3, 4). Assessments in pregnancy were planned in early pregnancy (gestational age < 18 weeks), mid-pregnancy (gestational age 18–25 weeks) and late pregnancy (gestational age > 25 weeks). These measurements are considered as first, second and third trimester measurements. The fathers were assessed once in pregnancy. The children form a prenatally recruited birth-cohort that will be followed until young adulthood. In the preschool period, which in the Netherlands refers to the period from birth to the age of 4 years, data collection was performed by a home-visit at the age of 3 months, questionnaires at the ages of 2, 6, 12, 18, 24, 30, 36 and 48 months and visits to the routine child health centres at the ages 2, 3, 4, 6, 11, 14, 18, 24, 30, 36 and 45 months (Table 4). Additional, more detailed assessments of fetal and postnatal growth and development have been conducted in a randomly selected subgroup of Dutch children and their parents at a gestational age of 32 weeks and postnatally at the ages of 1.5, 6, 14, 24, 36 and 48 months (Tables 3, 4).

Table 3 Assessments in mothers, fathers and their children in the prenatal phase
Table 4 Assessments in mothers, fathers and children during the preschool period

Dutch is defined as two parents and four grandparents born in the Netherlands. From the age of 5 years onwards, regular detailed hands on assessment are performed in all children and their parents in a well-equipped, dedicated research center.

Eligibility and enrolment

Eligible mothers were those who were resident in the study area at their delivery date and had a delivery date from April 2002 until January 2006. We aimed to enroll mothers in early pregnancy (gestational age < 18 weeks) but enrolment was allowed until birth of their child. Midwives and obstetricians informed eligible mothers about the study at their first prenatal visit in routine care, handed out the information package and asked these mothers to make an appointment for the first ultrasound examination. The study staff contacted these mothers by phone for additional information about the study and in person at the ultrasound examination to obtain informed consent. Mothers who were not approached in pregnancy, were approached and enrolled in the first months after birth of their child when newborns visit the routine child health centers [2]. The fathers were not approached directly by the study staff but the mothers were informed about the importance of involvement of the fathers in the study.

Study cohort

Parents

In total, 9,778 mothers were enrolled in the study (Fig. 1). Of these mothers, 91% (n = 8,880) was enrolled in pregnancy. Only partners from mothers enrolled in pregnancy were invited to participate. In total, 71% (n = 6,347) of all fathers was enrolled. The general characteristics of the mothers and fathers are presented in Table 5. Of all participating mothers, enrolment was in early pregnancy in 69% (n = 6,691), in mid-pregnancy in 19% (n = 1,918), in late pregnancy in 3% (n = 271) and at birth of their child in 9% (n = 898). Of all pregnant women enrolled, 94% (n = 8,356), 6% (n = 516) and 0.1% (n = 8) were first, second and third pregnancies in the study, respectively. A total of 1,232 pregnant women and their children were enrolled in the subgroup for additional detailed studies. Ethnicity of participating mothers and partners was defined according the classification of Statistics Netherlands [71, 72]. The largest ethnic groups were the Dutch, Surinamese, Turkish and Moroccan groups. The ethnic distribution differed only moderately from that of the population in the study area [71]. Mean household income in Rotterdam is about €1600 per month and the percentage subjects with a secondary or higher education level in Rotterdam is 56% [71]. The educational level of participating mothers and their partners was classified in groups according to the classification of Statistics Netherlands [73]. Ethnic background, educational level and occupational status are of major interest and are studied as determinants of health and behavioural outcomes [7491]. Both household income and highest followed educational level in mothers and fathers in the study cohort suggest a selection towards a higher socioeconomic status than in the whole study. This pattern is similar as in other large scale cohort studies [92]. However, differences between the population and cohort characteristics may also be due to selective missing values of ethnicity and socio-economic status in the questionnaires. Socio-economic status is related to various perinatal and postnatal health outcomes and of major interest in the study [93109].

Fig. 1
figure 1

Enrolment and measurements until the age of 4 years

Table 5 Characteristics of mothers and their partners

Children at birth and overall response

Among the live births, 51% were male and 49% female. These percentages are similar to the population figures in the Netherlands and in Rotterdam [71]. The percentages of children born preterm or with low birth weight are smaller than expected on the basis of the population figures. This seems to reflect a selection toward a relative more healthy study population. Estimation of the precise number of eligible pregnant women in the study area is difficult since there is no satisfactory registry of pregnancies. Therefore, it was not attempted to identify overall response rates of pregnant women. Since the children form a prenatally recruited birth-cohort, the overall response of the study has been established at birth and is 61%.

Participation in postnatal follow up studies

As described above, 9,778 mothers were enrolled in the study and gave birth to 9,745 known live born children. The logistics of the postnatal follow up studies were embedded in the municipal routine child care system and restricted to the study area. In total 1,163 children lived outside this definite study area at birth and were therefore not approached for the postnatal follow up studies during the preschool period. Of the remaining 8,582 children, 689 (8%) mothers did not give consent for the postnatal phase of the study leaving 7,893 children for the postnatal follow-up studies. Reasons for non-participation in the postnatal follow up studies were primarily time restrictions or plans for moving from the study area. Of the total of 7,893 mothers, 598 (7.5%) gave a restricted consent and did not want to participate in the questionnaire studies. In total 7588 (78%) children or their parents participated in at least one measurement during the preschool period, of whom 6845 (90%) completed at least one questionnaire.

Data collection in the prenatal phase

Physical examinations

Physical examinations were planned at each visit in early pregnancy, mid-pregnancy and late pregnancy and included height, weight and blood pressure measurements of both parents. Overall response rates for these specific measurements in mothers and fathers are similar as the visit percentages presented in Fig. 1. Since there was a wide range of gestational age at each visit, these measurements are used in the analyses as gestational age adjusted measurements [110].

Questionnaires

Mothers received four postal questionnaires and father received one postal questionnaire in the prenatal phase (Table 3). All questionnaires are available in three languages (Dutch, English and Turkish). If needed, further support for verbal translation of questionnaires is available in Arabic, Portuguese and French. Each questionnaire comprises about 25 pages and takes about 30–45 min to be completed [2]. Topics in these questionnaires were:

  • Mother 1: medical and family history, previous pregnancies, quality of life, life style habits, housing conditions, ethnicity, educational level;

  • Mother 2: diet, including macronutrients and micronutrients;

  • Mother 3: current pregnancy, quality of life, life style habits, psychopathology;

  • Mother 4: current pregnancy, quality of life, life style habits, working conditions, household income, self-esteem;

  • Father: medical history, family history, life style habits, educational level, psychopathology.

Overall response rates for these questionnaires varied from 77 to 91% (Fig. 1). However, the response rates of specific questions may be lower due to missing values within questionnaires.

Fetal ultrasound examinations

Fetal ultrasound examinations were performed at each prenatal visit. Overall response rates for these ultrasound examinations were in general similar to the visit percentages given in Fig. 1. These ultrasound examinations were used for both establishing gestational age and assessing fetal growth patterns. These methods have previously been described in detail [111, 112]. Establishing gestational age by using the first day of the last menstrual period is not reliable for a variety of reasons including the large number of women who do not know their exact date, have irregular menstrual cycles or amenorrhea, use oral contraceptive pills or bleed in early pregnancy [113]. Using fetal ultrasound data such as crown-rump length or biparietal diameter for pregnancy dating seems to overcome these problems but does not allow growth studies of these measurements since no growth variability between subjects is assumed. Pregnancy dating-curves have been derived in a subsample of the cohort including subjects with complete data on both the first day of the last menstrual period and crown-rump length or biparietal diameter and used to date the gestational age [111]. Subsequently, longitudinal curves of all fetal growth measurements (head circumference, biparietal diameter, abdominal circumference and femur length) were created resulting in standard deviation scores for all of these specific growth measurements. Various sociodemographic and lifestyle related determinant seems to affect these fetal growth and birth outcomes [114119]. Also, specific fetal growth patterns seem to be associated with outcomes in childhood [120123]. We have demonstrated in a subgroup study among mothers with a known and reliable last menstrual period, that various life style related factors affect first trimester growth [124]. Placental haemodynamics including resistance indices of the uterine and umbilical arteries have been assessed in second and third trimester [125]. Detailed measurements of fetal brain, cardiac and kidney development have been performed in the subcohort [126129].

Pregnancy complications and outcomes

The obstetric records of mothers have been looked up in the hospitals and mid-wife practices. Specialists in the relevant field code items in these records, and used for validation studies for maternal reported data [130, 131]. The major pregnancy outcomes, including live births, induced abortion fetal or perinatal loss, pregnancy induced hypertension, preeclampsia, and gestational diabetes are known in 99% of all enrolled mothers. These outcomes are related to various exposures of interest [132148]. In all children known to be born alive, information about sex, birth weight and gestational age is available. Currently, efforts are made to link the Generation R dataset to existing national databases for pregnancy and the neonatal period [149, 150].

Biological samples

Blood samples were collected in early and mid-pregnancy and at birth. Procedures for collection, processing and storage of biological samples have been described previously in detail [151]. The planned amounts of venous blood taken were 35 and 20 ml in early pregnancy and mid-pregnancy, respectively, from the mother and 10 ml from father. At delivery, 30 ml cord blood was collected. Blood samples are currently available for 97, 83 and 67% of the participating mothers, partners and children, respectively. Due to several clinical complications in mother and child, it was not always possible to collect cord blood. Subsequently, availability rates of cord blood samples in children born preterm or with low birth weight are lower than expected. Plasma and serum samples has been distributed in small aliquots (each 250 μl) and stored at −80°C. Maternal and cord blood samples have been used for measuring dietary biomarkers (folate, homocystein, total vitamin B12, free vitamin B12) levels; angiogenesis biomarkers (soluble fms-like tyrosine kinase-1 (sFlft-1), Placental growth factor (PlGF)); thyroid hormone levels (thyroid-stimulating hormone (TSH), free thyroxine (FT4)) and thyroid antibody levels, inflammation markers (high-sensitivity C-reactive protein (hs CRP); and celiac disease and Helicobacter Pylori antibodies [152154]. Urine samples of mothers have been collected from February 2004 until November 2005 and are stored for future measurements. Response rates for these urine samples are 85, 97 and 96% in early, mid- and late pregnancy respectively. These urine samples are used for measurement of several environmental exposures, and metabolites. Urine samples have been used for measurement of Chlamydia, and pesticides, bisphenol A, and phthalates levels [155, 156].

DNA and genome wide association studies

DNA from both parents (whole blood) and children (cord blood) has been extracted, normalised and plated and is currently used for several genotype studies. A genome wide association scan (GWAS) using the Illumina 610 Quad platform is available for the participating children. For genotyping, we used the infrastructure of the Genetic Laboratory of the Department of Internal Medicine (www.glimdna.org) that was also used for creation of the GWAS datasets of the Rotterdam Study, a prospective cohort study among more than 10,000 adults [157, 158]. The GWAS dataset underwent a stringent QC process whereby Individuals with low call rates or sex mismatches were excluded, and SNPs with low call rates were excluded. In addition, ethnic composition of the sample was estimated by Identity-By-State Analysis, using Principal Component Analysis, and cryptic familial relationships were identified through Identity-By-Descent analysis]. To maximize genome coverage and allow inter-study comparisons, we used MACH (version 1.0.15) software to impute genotypes at each of the 2.5 million autosomal CEPH HapMap II (release 22) SNPs for Europeans and European-Americans [159]. Criteria such as high levels of missing data (SNP call rate < 98%), highly significant departures from Hardy-Weinberg equilibrium (P < 1.10–6), or low Minor Allele Frequencies (MAF < 1%) were used to determine which SNPs to include in the imputation step. Most GWAS analyses are strongly embedded in the recently started Early Growth Genetics (EGG) Consortium and Early Genetics and Longitudinal Epidemiology (EAGLE) Consortium, in which several birth cohort studies combine their GWAS efforts focused on multiple outcomes in fetal life, childhood and adolescence [160, 161]. The power of large scale consortia has been shown previously [162170]. Also, strategies for genetic scoring analyses are being developed to estimate predictive values of identified genetic variants [171, 172]. Although a GWAS is available in children, we use DNA from both children and parents for genotyping for candidate gene or replication studies [173182]. Missing data in DNA samples in children will be collected by blood sampling or other methods at follow-up measurements.

Data collection in the first 4 postnatal years

Physical examinations

At the age of 3 months, home visits were performed to assess neuromotor development using an adapted version of Touwen’s Neurodevelopmental examination [183185]. Information about growth (length (height), weight, head circumference) is collected at each visit to the routine child health centres in the study area using standardized procedures. These visits are planned at the ages 2, 3, 4, 6, 11, 14, 18, 24, 30, 36 and 48 months. Response rates for these visits vary between 70 and 97% [186].

Questionnaires

During the preschool period, mothers receive 8 questionnaires and fathers receive one questionnaire. Each questionnaire comprises about 25 pages. Items included in these questionnaires and their references are demonstrated in Table 6 [187223]. Response rates based on the number of send questionnaires are showed in Fig. 1. Due to logistical constraints and implementation of questionnaires after the first group of children reached a certain age, not all children received each questionnaire. Thus, although response rates may be similar, the absolute number of completed questionnaires could differ between different ages. Response rates presented in Fig. 1 are based on the number of send questionnaires.

Table 6 Themes in postnatal questionnaires until the age of 4 years

Hands on assessments and observations in subgroup

During the preschool period, children participating in the subgroup studies have been invited six times to a dedicated research center. Measurements at these visits included physical examinations (height, weight, head circumference, skinfold thickness and waist- hip ratio, Touwen’s Neurodevelopmental Examination at the ages of 1.5, 6, 14, and 24 months) and ultrasound examinations (brain structures at 1.5 months and cardiac and kidney structures at the ages of 1.5, 6 and 24 months, abdominal fat at the age of 24 months) [224237]. Dual × Energy Absorptiometry (DXA) scanning was performed in a subgroup of children at the age of 6 months [238]. Similarly Fractional Exhaled Nitric Oxide (FeNO) has been measured in only 50% of all children at the ages of 6 and 24 months. Blood pressure was measured at the age of 24 months [239]. Observations of parent-child interaction and behaviour, such as executive function, heart rate variability, infant-parent attachment, moral development, and compliance with mother and child have been performed at the ages of 14, 36 and 48 months and with father and child at the age of 48 months [240, 241]. Biological materials have been collected, including bacterial colonization measured by nasal- and nasopharyngeal swabs at the ages of 1.5, 6, 14 and 24 months, cortisol day rhythm measured by repeated salivary samples at the age of 14 months and, if parents give consent, blood samples at the ages of 6, 14 and 24 months [242251]. Response rates for these blood samples are about 30%, mainly because of lack of parental consent.

Measurements from the age of 5 years

From the age of 5 years onwards, we plan to invite all participating children to a well-equipped and dedicated research center in the Erasmus Medical Center—Sophia Children’s Hospital every 3 years. Currently, all children aged 5–6 years are invited to participate in hands on measurements, behavioural observations and biological sample collection. The total visit takes about 2.5 h and all measurements are grouped in thematic 20 min blocks. These measurements are focused on several health outcomes including asthma, bacterial carriage, infectious diseases, behaviour and cognition, body composition, eye and tooth development, heart and vascular development, kidney growth and function, and obesity. These measurements will be used as both outcomes and determinants for health outcomes in later life. Cardiovascular, metabolic and bone measurements are also conducted in mothers. Clinically relevant results are discussed with the parents and, if needed, children or mothers are referred to their general practitioner, paediatrician or other relevant health care provider. All children are expected to have visited our research center in summer 2011. Currently ideas for follow up studies including detailed imaging studies using Magnetic Resonance Imaging (MRI), are discussed and tested in subgroups.

Ethical issues

The general design, all research aims and the specific measurements in the Generation R Study have been approved by the Medical Ethical Committee of the Erasmus Medical Center, Rotterdam. New measurements will only be embedded in the study after approval of the Medical Ethical Committee. Participants are asked for their written informed consent for the four consecutive phases of the study (prenatally, birth to 4 years, 4–16 years, and from 16 years onwards). At the start of each phase, mothers and their partners received written and oral information about the study. Even with consent of the parents, when the child is not willing to participate actively, no measurements are performed.

Follow up and retention strategies

Thus far, loss to follow up seems to be limited. Major efforts are made to keep the children and parents involved in the study and to minimize loss to follow up to prevent bias in future studies. Several strategies have been implemented and are currently part of the study design:

  • Addresses: new addresses of participants, which are known by the municipal health service are forwarded to the study staff;

  • Newsletters: participants receive four newsletters per year, in which several results of the study are presented and explained, questions of participants are answered and new research initiatives are presented;

  • Presents and discounts: all children who visit our research center receive small presents. Also, discount offers are regularly part of the newsletter;

  • Transport costs: all costs for transport and parking related to visits to the research center are paid by Generation R;

  • Reminders for questionnaires: when the questionnaire has not been returned within 3 weeks, a kind reminder letter is send to the parents. After 6 weeks, when the questionnaire has still not been returned, the parents receive a phone call. If necessary, help for completing the questionnaire is offered and the importance of filling in the questionnaire is explained once more during this phone call;

  • Individual feedback: if clinically relevant, all results of hands-on measurements are discussed with the parents at the visit. If necessary, follow up appointments with the general practitioner or pediatrician are planned;

  • Support for ethnic minorities: all study materials such as questionnaires, newsletters, website, and information folders are available in three languages (Dutch, English, and Turkish). Furthermore, staff from different ethnic minorities is available and able to verbally translate these materials into Arabic, French and Portuguese. With this, the study staff is able to communicate to all participants.

  • Care-cases: children and parents who showed low response rates for different measurements, showed difficulties in completing questionnaires or require additional explanation or support are considered as care-cases. Care—cases have a more individual based approach and are pro-actively contacted by one dedicated member of the study staff.

New methods for contacting participants, which have been used in other more recently started studies, including use of internet, sms and e-mail, are currently being explored [92, 252262].

Data management and privacy protection

Data collected by measurements in the research centers are directly entered onto written forms and into the electronic database. Data collected by questionnaires are scanned and manually entered into an electronic database by a commercial bureau. Random samples of all questionnaires are double checked by study staff members to monitor the quality of this manual data entry process. The percentage of mistakes is kept as low as possible and does not exceed 3% per questionnaire. Open text fields are entered into the electronic database exactly as they are filled in on the questionnaires. In a secondary stage, these open text fields are cleaned and coded by a specialist in the relevant field. All measurements are centrally checked by examination of the data including their ranges, distributions, means, standard deviations, outliers and logical errors. Data outliers and missing values are checked on the original forms and might be imputed and subsequent appropriate statisticial techniques are used [263274]. The data of one specific measurement are only distributed for analyses after data collection and preparation is completed for that measurement for the whole cohort. Datasets needed for answering specific research questions are centrally built from different databases. All information in these datasets that enables identification of a particular participant (including identification number used for the logistics of the study, names and dates) is excluded before distribution to the researchers. The datasets for researchers include subject unique identification numbers that enable feedback about one subject to the data manager but do not enable identification of that particular subject.

Statistical power

Due to expected missing values and loss to follow-up, most analyses in the study are not based on data in all subjects. Therefore, power calculations demonstrated in Tables 7 and 8 are based on 7,000 subjects in the whole cohort and 700 subjects in the focus cohort. Table 8 demonstrates that for a normally distributed continuous outcome it is possible to detect with a type I error of 5% and a type II error of 20% (power 80%) a difference of 0.11 SD in the whole cohort and of 0.35 SD in the focus cohort if 10% of all subjects has the relevant exposure. Table 8 presents for dichotomous outcomes that with the same type I and II errors, it is possible to detect a relative risk of 1.39 in the whole cohort and of 2.48 in the focus cohort if 10% of all subjects has the relevant exposure and the 1 year incidence of the outcome of interest is 10%. Rates of most dichotomous environmental and genetic exposures in the Generation R Study are expected to vary generally between 10 and 20%. The presented power calculations are rather conservative since most studies will assess the effects of continuously instead of dichotomous measured exposures and studies may be focused on outcomes collected in more than only 1 year. Furthermore, the Generation R Study has a large number of measurements repeated over time, which may increase the accuracy of measuring the true underlying value and may thereby increase the statistical power for these measurements.

Table 7 Effects sizes in standard deviation that can minimally be detected according to the prevalence of the exposure
Table 8 Relative risks that can minimally be detected according to the prevalence of the exposure

Collaboration

The Generation R Study is conducted by several research groups from the Erasmus Medical Center in close collaboration with the Erasmus University Rotterdam and the Municipal Health Service Rotterdam area. Since the data collection is still ongoing and growing, the number of collaborating research groups in and outside the Netherlands is expected to increase in the coming years. The study has an open policy in regard to collaboration with other research groups. Request for collaboration should primarily be pointed to Vincent Jaddoe (v.jaddoe@erasmusmc.nl). These requests are discussed in the Generation R Study Management Team regarding their study aims, overlap with ongoing studies, logistic consequences and financial contributions. After approval of the project by the Generation R Study Management Team and the Medical Ethical Committee of the Erasmus Medical Center, the collaborative research project is embedded in one of the four research areas supervised by the specific principal investigator.