Extração de Informação e Processamento de Linguagem Natural ...

Extração de Informação e Processamento de Linguagem Natural ...

Keynote address: Stefan Schulz Medical University of Graz (Austria) purl.org/steschu Annotating clinical narratives with SNOMED CT: The thorny way towards interoperability of clinical routine data "Classical" AI workflow Data Acquisition D Representation

Reasoning Output "Classical" AI workflow Data Acquisition D Reasoning A Output A Reasoning B Output B Representation "Classical" AI workflow

Data Acquisition Representation A Reasoning Output A Representation B Reasoning Output B D "Classical" AI workflow Data Acquisition A

DA Representation Reasoning Output A Data Acquisition B DB Representation Reasoning Output B Data reliability Data interoperability high Data

Acquisition A DA DA=DB DA Data Acquisition B DB DA DB DB low Data reliability Data interoperability unstructured

representation structured representation high Interpretation A DA Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more DA=DB DA Interpretation B DB

DA DB DB low Focus of the talk Structured extracts from unstructured clinical data: reliability and interoperability Empirical study on inter-annotator agreement Analysis of examples for inter-annotator disagreement Mechanisms to improve agreement better data reliability better interoperability better training data better gold standards

Annotating clinical narratives with SNOMED CT Annotating clinical narratives with SNOMED CT Coding observation map metadata phenomena configurations observed Vocabulary Annotation Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more symbolic representation

map symbols metadata (configurations) configurations Annotating clinical narratives with SNOMED CT Huge clinical reference terminology representable as OWL EL (quasi-) ontological definitional and qualifying axioms eHealth standard, maintained by transnational SDO SNOMED CT

multiple hierarchies ~300,000 "concepts" preferred terms and synonyms in several languages covers disorders, procedures, body parts, substances, devices, organisms, qualities Annotation: Sources of complexity Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more

Clinical narrative - sequence of Tokens - syntactic structures - relations at various levels Compactness Agrammaticality Short forms Implicit contexts best text span to annotate? Nave or analytic annotation? Map SNOMED CT Ontology - entities, codes

- relations - logical constructors - axioms Terminology - preferred terms - synonyms - definitions Ill-defined concepts Similar concepts Pre-coordination vs. postcoordination Complex annotations (> 1 concept) Degree of formality? Examples Clinical text SNOMED CT concepts (FSNs) 'Duodenal structure (body structure)' " the duodenum . The mucosa is"

"Hemorrhagic shock ? ? ? suspected dengue" 'Duodenal mucous membrane structure (body structure)' 'Traffic accident on public road (event)' after RTA " "travel history of 'Mucous membrane structure (body structure)' 'Traffic accident on public road (event)', 'Renal tubular acidosis (disorder)' 'Traffic accident on public road (event)' or 'Renal tubular acidosis (disorder)' 'Suspected dengue (situation)'

'Suspected (qualifier value)' 'Dengue (disorder)' Coding / Annotation guidelines Examples: 1. German coding guidelines for ICD and OPS, 171 pages 2. Using SNOMED CT in CDA models: 147 pages 3. CHEMDNER-patents: annotation of chemical entities in patent corpus: annotation manual 30 pages 4. CRAFT Concept Annotation guidelines: 47 pages 5. Gene Ontology Annotation conventions: 7 pages Complex rule sets, requiring intensive training 1. 2. 3. 4. 5. http://www.dkgev.de/media/file/21502.Deutsche_Kodierrichtlinien_Version_2016.pdf http://www.snomed.org/resource/resource/249 http://www.biocreative.org/media/store/files/2015/cemp_patent_guidelines_v1.pdf http://bionlp-corpora.sourceforge.net/CRAFT/guidelines/CRAFT_concept_annotation_guidelines.pdf http://geneontology.org/page/go-annotation-conventions

Annotation experiments in ASSESS-CT Annotation experiments in ASSESS-CT EU project on the fitness of purpose of SNOMED CT as a core reference terminology for the EU: www.assess-ct.eu Feb 2015 Jul 2016 Scrutinising clinical, technical, financial, and organisational aspects of reference terminology introduction Summary of results: brochure published, scientific papers to appear http://assess-ct.eu/fileadmin/assess_ct/final_brochure/assessct_final_brochure.pdf Annotation of clinical narratives Comparing

SNOMED CT vs. UMLS derived terminology Resources Parallel corpus: 60 clinical text snippets from 6 languages, high diversity For each language: 2 annotators * 40 samples 20 snippets annotated twice Annotators trained by webinars

follow annotation guideline (10 pages) Nitroglycerin pump spray as required Amantadine bds Allopurinol 300 tablet every other day (last dose on 20091130) Mefenamic acid 500 mg up to 3x daily for pain in conjunction with simultaneous administration of a drug to protect the stomach e. g. Pantoprazole 40mg. Torasemide bds Melperone 50 mg p. m. 387404004;385074009;225 761000 372763006;229799001 387135004;385055001;225 760004

medians of Mandible and Maxilla the fragments are dislocated. Normal mucous membranes in mouth pharynx and on the larynx. Hyoid and thyroid cartilage are intact. Fragmental fractures of the two upper vertebrae of the cervical spine. Otherwise the cervical spine is intact. Oesophagus as well as trachea are torn at the lower end of the neck. 260528009 387185008;258684004; e.g. 229798009;22253000 chunking into noun phrases

79970003;416118004; annotation of chunks by sets of 373517009;69695003 codes 395821003;258684004 give preference 318034005;229799001 to maximally 442519006;258684004; pre-coordinated422133006 codes Intact teeth are in the 11163003;245543004; 7understanding text and assign mouth. 123851003 maximally codes Fractures are visible onspecific the 263172003;263156006;

123735002 17621005;33044003; 71248005 21387005;52940008; 11163003 13321001;207984009; 207983003 122494005;11163003 262793000;282459005; 261122009;123958008 Principal quantitative results (English) Concept coverage [95% CI] SNOMED CT Alternative Text annotations English .86 [.82-.88] .88 [.86-.91]

Term coverage [95% CI] SNOMED CT .68 [.64; .70] Alternative .73 [.69; .76] Text annotations English Inter annotator agreement Krippendorff's Alpha [95% CI] SNOMED CT Alternative Text annotations .37 [.33-.41] .36 [.32-.40]

Krippendorff, Klaus (2013). Content analysis: An introduction to its methodology, 3rd edition. Thousand Oaks, CA: Sage. Agreement map: text annotations (English) SNOMED CT UMLS SUBSET green: agreement yellow: only annotated by one coder red: disagreement Systematic error analysis Creation of gold standard for SNOMED CT 20 English text samples annotated twice 208 NPs Analysis of English SNOMED CT annotations by two additional terminology experts Consensus finding, according to pre-established annotation guidelines Inspection, analysis and classification of text annotation disagreements Presentation of some disagreement cases for SNOMED CT Reasons for disagreement

Human issues Lack of domain knowledge / carelessness Tokens Annotator #1 Annotator #2 "IV" 'Structure of abductor 'Abducens hallucis muscle (body nerve structure structure)' (body structure) ' Gold standard 'Abducens nerve structure (body structure)' Retrieval error (synonym not recognised) Tokens

Annotator #1 "Glibenclamide" 'Glyburide (substance)' Annotator #2 Gold standard 'Glyburide (substance)' Non-compliance with annotation rules Ontology issues (I) Polysemy ("dot categories")* Tokens Annotator #1 Annotator #2 Gold standard 'Lymphoma"

'Malignant lymphoma (disorder)' 'Malignant lymphoma category (morphologic abnormality)' 'Malignant lymphoma (disorder)' *Alexandra Arapinis, Laure Vieu: A plea for complex categories in ontologies. Applied Ontology 10(3-4): 285-296 (2015) Ontology issues (I) Polysemy ("dot categories")* Tokens Annotator #1 Annotator #2 Gold standard 'Lymphoma"

'Malignant lymphoma (disorder)' 'Malignant lymphoma category (morphologic abnormality)' 'Malignant lymphoma (disorder)' "Pseudo-polysemy" Incomplete definitions Tokens "Former Smoker" Annotator #1 Annotator #2 Gold standard 'In the past

(qualifier value)' 'Smoker (finding)' 'History of (contextual qualifier) (qualifier value)' 'Ex-smoker (finding)' 'Smoker (finding)' *Alexandra Arapinis, Laure Vieu: A plea for complex categories in ontologies. Applied Ontology 10(3-4): 285-296 (2015) Ontological issues (II) Incomplete definitions Tokens Annotator #1 Annotator #2 "Motor: 'Skeletal muscle structure (body structure)' 'Muscle finding

(finding)' 'Normal (qualifier value)' 'Normal (qualifier value)' normal bulk and tone" Gold standard 'Skeletal muscle normal (finding)' Ontological issues (II) Normal findings, incomplete definitions Tokens Annotator #1 Annotator #2

"Motor: 'Skeletal muscle structure (body structure)' 'Muscle finding (finding)' 'Normal (qualifier value)' 'Normal (qualifier value)' normal bulk and tone" Gold standard 'Skeletal muscle normal (finding)' Fuzziness of qualifiers

Tokens Annotator #1 "Significant 'Significant (qualifier value)' bleeding" 'Bleeding (finding)' Annotator #2 'Severe (severity modifier) (qualifier value)' 'Bleeding (finding)' Gold standard 'Moderate (severity modifier) (qualifier value)' 'Bleeding (finding)' Interface term (synonym) issues Tokens "Blood

Annotator #1 'Blood (substance)' extravasati 'Extravasation (morphologic on" abnormality)' Annotator #2 Gold standard 'Hemorrhage (morphologic abnormality)' 'Hemorrhage (morphologic abnormality)' "extravasation of blood" Interface term (synonym) issues

Tokens Annotator #1 "Blood 'Blood (substance)' extravasati 'Extravasation (morphologic on" abnormality)' Annotator #2 Gold standard 'Hemorrhage (morphologic abnormality)' 'Hemorrhage (morphologic abnormality)'

"extravasation of blood" Tokens Annotator #1 "anxious" 'Anxiety (finding)' Annotator #2 Gold standard 'Worried (finding)' 'Anxiety (finding)' "anxious cognitions" Language issues Ellipsis / anaphora "Cold and wind are provoking factors." (provoking factors for angina) "These ailments have substantially increased since October 2013" (weakness)

"No surface irregularities" (breast) "Significant bleeding" (intestinal bleeding) Ambiguity of short forms "IV" (intravenous? Fourth intracranial nerve?) Co-ordination: "normal factors 5, 9, 10, and 11" Scope of negation "no tremor, rigidity or bradykinesia" Addressed by annotation guideline Manageable by human annotators Known challenges for NLP systems Prevention and remediation of annotation disagreements Prevention: annotation processes Training with continuous feedback

Early detection of inter annotator disagreement triggers guideline enforcement / guideline revision Tooling Optimised concept retrieval (fuzzy, substring, synonyms) Guideline enforcement by appropriate tools Postcoordination support (complex syntactic expessions instead of grouping of concepts Anti-patterns, e.g. avoid unrelated primitive concepts (?) Prevention: improve terminology structure Fill gaps equivalence axioms (reasoning) Self-explaining labels (FSNs), especially for qualifiers Scope notes / text definitions where necessary Manage polysemy Flag navigational and modifier concepts Strengthen ontological foundations Upper-level ontology alignment Clear division between domain entities and information entities

Overhaul problematic subhierarchies, especially qualifiers Prevention: improve content maintenance Analysis of real data to support terminology maintenance process Harvest notorious disagreements between text passages and annotations from clinical datasets Compare concept frequency and concept co-occurrence between comparable institutions and users to detect imbalances Stimulate community processes for ontology-guided content evolution: Crowdsourcing of interface terms by languages, dialects specialties, user groups (separation of interface terminologies from reference terminologies is one of the ASSESS-CT recommendations) Remediation of annotation disagreements Remediation of annotation disagreements Exploit ontological dependencies / implications Concept A

'Mast cell neoplasm (disorder)' Concept B 'Mast cell neoplasm (morphologic abnormality)' 'Isosorbide dinitrate' 'Isosorbide dinitrate (product)' (substance)' 'Palpation (procedure)' 'Palpation - action (qualifier value)' 'Blood pressure taking 'Blood pressure (procedure)' (observable entity)' 'Increased size 'Increased (qualifier (finding)' value)' 'Finding of heart rate 'Heart rate (finding)' (observable entity)'

Dependency A subclassOf AssociatedMorphology some B A subclassOf HasActiveIngredient some B A subclassOf Method some B A subclassOf hasOutcome some B A subclassOf isBearerOf some B A subclassOf Interprets some B Experiment Gold standard expansion: Step 1: include concepts linked by attributive relations: A subclassOf Rel some B Step 2: include additional first-level taxonomic relations: A subclassOf B Language of text sample Gold standard expansion no expansion

English expansion step 1 expansion step 2 F measure 0.28 0.28 0.29 only insignificant improvement possibly due to missing relations in SNOMED CT, e.g. haemorrhage - blood Conclusion (I) Low inter-annotator agreement limits successful use of clinical terminologies / ontologies for manual annotation scenarios for benchmarking of NLP-based annotations for optimised training data for ML Structured data essential for many intelligent systems, but unreliable information extracted

from clinical narratives raises patient safety issues when used for decision support Conclusion (II) Prevention of disagreements Education, tooling, guideline support Terminology content improvement: labelling, scope notes, ontological clarity, full definitions, community processes High coverage interface terminologies Remediation of disagreements So far no clear evidence of ontology-based resolution of agreement issues Big data approaches ? Conclusion (III) R & D required: "Learning systems" for improvement terminology content / structure / tooling. Clinical "big data" underused resource Harmonization of annotation guideline creation and validation efforts Formulate and enforce good quality criteria for clinical terminologies used as annotation vocabularies

Better ontological underpinning of clinical terminologies Ontologically founded patterns for recurring clinical documentation tasks: Information extraction rather than concept mapping* *Martnez-Costa C et al. Semantic enrichment of clinical models towards semantic interoperability. JAMIA 2015 May;22(3):565-76 Thanks for your attention Slides will be accessible via at purl.org/steschu Acknowledgements: ASSESS CT team: Jose Antonio Miarro-Gimnez, Catalina MartnezCosta, Daniel Karlsson, Kirstine Rosenbeck Geg, Kornl Mark, Benny Van Bruwaene, Ronald Cornet, Marie-Christine Jaulent, Pivi Hmlinen, Heike Dewenter, Reza Fathollah Nejad, Sylvia Thun, Veli Stroetmann, Dipak Kalra Contact: [email protected] Vibhu Agarwal, Tanya Podchiyska, Juan M. Banda, Veena Goel, Tiffany I. Leung, Evan P. Minty, Timothy E. Sweeney, Elsie Gyang, Nigam H. Shah: Learning statistical models of phenotypes using noisy labeled training data. JAMIA 23(6): 11661173 (2016)

Recently Viewed Presentations

  • Eng. Mgt. 385 Statistical Process Control Stephen A.

    Eng. Mgt. 385 Statistical Process Control Stephen A.

    Shewharts Bowl Experiment Read, reread, and read again the detail of Shewharts Bowl Experiment Demonstrates that the larger the sample taken from the universe, it is more likely that the average of the samples will be close to the process...
  • "By the User, For the User, With the Learning System ...

    "By the User, For the User, With the Learning System ...

    Interactive Learning With Users. Users and system jointly work on the task (same goal).. System is not a passive observer of user. Complement each other. Need to develop learning algorithms in conjunction with plausible models of user behavior.
  • What we know - WordPress.com

    What we know - WordPress.com

    Verity Campbell-Barr and Caroline Leeson. January 2016. £22.99 . ISBN: 9781473906488. Quality and Leadership in the Early Years 'This exciting book brings a new dimension to the study of leading quality services in the mixed economy of early years.
  • Corporate Presentation Tutorial

    Corporate Presentation Tutorial

    Services Ontology Development An Overview from HDTF December 2007 Ken Rubin EDS Co-Chair, OMG Healthcare Domain Task Force Co-Chair, HL7 Services-oriented Architecture SIG
  • chernev.com

    chernev.com

    Target market. Value proposition. Strategy. Tactics. Goal. Focus. Benchmarks. Implementation. Performance. Environment. Control. Communication. Distribution. Product ...
  • Texas A&M Research Park: Findings and Recommendations Theresa

    Texas A&M Research Park: Findings and Recommendations Theresa

    SWOT Analysis. Strengths. There is a significant plat of land available for development south of Raymond Stotzer Parkway. The Park has proximity to Texas A&M University, the Texas A&M Health Science Center, and Easterwood Airport
  • Higher Education Options for continuing education after leaving

    Higher Education Options for continuing education after leaving

    www.notgoingtouni.co.uk. Mrs C Wilkes. Support at Rickmansworth School. Head of Sixth Form. UCAS Convention. A series of Tutor time sessions on Higher Education. Should I go to university. Choosing a course. Filling in the UCAS form. Writing a Personal Statement.
  • Diapositiva 1 - diputados.gob.mx

    Diapositiva 1 - diputados.gob.mx

    Juana María de Coss León, Secretaria de Hacienda . Vocal. 61-87200 EXT. 65011. Lic. Juan Carlos Gómez Aranda, Secretario de Planeación, Gestión Pública y Programa de Gobierno. Vocal. 69-14020 EXT.66574. Profr. Miguel Ángel Córdova Ochoa,