Article Text
Abstract
Within the EXPOSOME PROJECT FOR HEALTH AND OCCUPATIONAL RESEARCH (EPHOR) project we aim to develop a protocol to enable efficient update of job exposure matrices so that they can include the latest available information of highest quality possible. The protocol will include methods for searching and collecting new data from literature (assisted by text mining WP4), exposure databases (e.g. ECHA REACH database, reports) and (Bayesian) decision criteria to determine if and how to revise exposure estimates in the JEM. As part of this work we have started to develop a framework of semi- and fully-automated approaches for identification of relevant literature and extraction of occupational exposure measurements, which in turn may be used in creating and updating JEMs. Currently both content-level and document-level approaches are being explored. The content-level approach utilizes text-mining and machine learning to interpret, analyse, and return relevant information from a text corpus (e.g. manuscripts in the PubMed Central (PMC) archive). In addition to retrieval of user-specified information (e.g. ‘literature with occupational benzene measurements in PMC form 2018–2020’), the software will also have the potential to identify new patterns and relationships within the corpus (e.g. ‘the most sampled industry/occupation in literature with benzene measurements in PMC from 2018–2020’). The document-level method uses automated keyword searches with optional filters to highlight documents with potential relevant information. Once highlighted by the search algorithm, the documents may be screen manually or with other automatic software to extract relevant information and data. We are currently focusing on exposure to diesel engine exhaust but plan to expand to other substances. The protocol will form part of the EPHOR toolbox being developed as part of the project.