Data mining and dendograms: Some possibilities for using 21st computational linguistics to reinvent oral health research
Bell, E and Crocombe, L, Data mining and dendograms: Some possibilities for using 21st computational linguistics to reinvent oral health research, 6th International Meeting: The Dental Biostats Conference, 1-3 April, Adelaide, South Australia (2014) [Conference Extract]
Qualitative research−the study of language texts using non numerical methods−comprises 1% of the entire corpus of oral health research. Yet language is the medium in which patients, dentists, professional organisations and policy-makers understand oral health. Language shapes oral health behaviours and oral health outcomes. Too often, understandings of qualitative research held by health scientists are referenced to narrative or ‘story-telling’ approaches from the 1970s−before the information age. Today, advances in algorithm, machine-based approaches to ‘natural language processing’ present oral health with novel opportunities for scientific understandings of language data through quantification of language. In this paper, we discuss two recent studies completed by the CRE in Primary Oral Health Care. The first study analyses the language of oral health policy documents from 8 OECD countries. The second study compares these policy documents with 127,927 oral health abstracts published 2000-2012, indicative of all oral health research. Both ‘data mining’ studies have implications for understanding the content of policy and research, as well as where policy is evidence-based and where research is policy-relevant. The presentation includes discussion and modelling of other applications of computational linguistics to complex areas: data mining of healthcare notes; analysing the subtleties of practitioner’s career motivations or the behaviour of groups with unequal health outcomes; as well as tracing changes in wider community oral health perceptions and behaviours over time. We conclude with an assessment of the limitations, strengths and weaknesses of the current field of computational linguistics for meeting oral health priorities.