Electronic Patient Records for Better Health
How can we use clinical corpora to assist the clinician, her managers and clinical research? & Modeling factuality levels of diagnoses in Swedish clinical records for information access
Professor Hercules Dalianis and Licentiate Sumithra Velupillai (Department of Computer and Systems Sciences, (DSV), Stockholm University, Sweden)NICTA SML SEMINAR
TIME: 11:00:00 - 12:30:00
LOCATION: NICTA - 7 London Circuit
Speaker: Hercules Dalianis, Department of Computer and Systems Sciences, (DSV), Stockholm University, Sweden
Title: How can we use clinical corpora to assist the clinician, her managers and clinical research?
Abstract Today a large number of Electronic Patient Records (EPRs) are produced for legal reasons but they are never reused neither for clinical research nor for business (hospital) intelligence reasons. Moreover, it is also alarming that the clinicianas daily work in documenting the patient status is rarely supported in a proper way. We are aiming to change these facts. Clinical corpora form an abundant source to extract valuable information that can be used for this purpose.
The Stockholm EPR Corpus is a huge clinical corpus written in Swedish, containing over one million patient records distributed over 800 clinics encompassing three years from the Stockholm area. We have explored subsets of this corpus with the aim of understanding the whole corpus and its domain(s). In one experiment we annotated a subset of the corpus for de-identification, and we created a gold standard for training and evaluation of automatic de-identification tools. In another experiment we investigated the relations of diagnosis codes (ICD-10) for co-morbidity analyses and found interesting results. We have also developed a method for automatic support in assigning new ICD-10 codes on newly entered clinical text, but also for evaluating already assigned ICD-10 codes. Finally we have tried to understand what exactly is written in the corpora, with the aim to construct information extraction tools that can distinguish between the factuality of diagnoses. Is the diagnosis certain, negated, or uncertain to some extent? Two annotators with clinical background have annotated a subset of the corpus for factuality levels.
Speaker: Sumithra Velupillai, Department of Computer and Systems Sciences, (DSV), Stockholm University, Sweden
Title: Modeling factuality levels of diagnoses in Swedish clinical records for information access
Retrieving relevant information from Electronic Patient Records (EPRs) is a challenging task, since there are different information needs in different situations. Moreover, this document type is a good example of where traditional information retrieval methods such as those applied in search engines are not sufficient; simply searching for keywords and retrieving ranked lists of documents is an insufficient way of exploiting the knowledge, experience and information contained in EPRs.
We have initiated the creation of Swedish clinical corpora manually annotated for factuality levels in clinical records. The first experiment was carried out on a sentence and token level, manually annotated by three laymen. Following this work, a second experiment has been carried out, focusing on assessment descriptions from clinical records from an emergency department. In this task, the annotators (clinicians) are given a diagnosis to be judged for factuality levels (certainly, probably and possibly positive or negative).
The created corpora will be used for automatic classification experiments. We want to be able to answer the following questions: is it feasible to automatically classify factuality levels of diagnosis descriptions in Swedish EPRs? Which diagnosis types are harder to judge when it comes to factuality levels? Which features are indicative? Are these different depending on diagnosis types? Are some diagnoses inherently uncertain/speculative? What does this imply?
In the future, we envisage information access systems that are able to distinguish these types of factuality levels automatically, information that could be utilized in information access systems, (semi-)automatic summarization applications, hypothesis generation and clinical research, etc.
Dalianis is an associate professor (docent) and tenured lecturer (universitetslektor) at the Department of Computer and Systems Sciences (DSV) at Stockholm University, Sweden where he heads the research area IT for Health. Dalianis received his Ph.D in 1996. Dalianis was a post doc researcher at University of Southern California/ISI in Los Angeles 1997-98. Dalianis held a three-year guest professorship at CST, University of Copenhagen during 2002-2005, founded by Norfa, the Nordic council.
Dalianis works in the interface between university and industry with the aim to make research results useful for society. Dalianis has specialized in the area of human language technology, to make computer to understand and process human language text, but also to make a computer to produce text automatically. Examples on applications are automatic text summarization and search engines with built in human language technology support as for example stemming, spell checking, compound splitting to improve the information extraction. Currently Dalianis works in the area of text mining and medical informatics focused on electronic health records. Dalianis has more than 20 years of experience of his research area. Dalianis has been project leader and received funding for over 15 national, Nordic and European research projects.Velupillai is a PhD student at the Department of Computer and Systems Sciences at Stockholm University since April 2007. She successfully defended her Licentiate Thesis Swedish Health Data a" Information Access and Representation on the 6th of October, 2009. Velupillai is also affiliated with the Swedish National Graduate School of Language Technology (GSLT), has participated in several research projects, and is currently part of the Nordic research network HEXAnord. Velupillai has a background in Computational Linguistics and specializes on research covering both Language Technology, Information Access and Health Informatics. Velupillai has published and presented eighteen articles in renowned international conferences and journals.