ICON 2018ICON 2018 - Tutorial

Natural Language Processing and Biomedical Text


An enormous amount of biological data have been generated and collected at an unprecedented speed and scale. For example, the application of electronic health records (EHRs) is documenting large amounts of patient data. Automatically extracting different types of knowledge from authoritative texts, e.g., scientific medical literature, electronic health records etc., within the biomedical domain and representing it in a computer analyzable as well as human readable form is an important but challenging goal. Ability to query and use such extracted knowledge-bases can help scientists, doctors and other users in performing tasks such as question-answering, diagnosis, exploring and validating hypotheses, understanding the state-of-the-art, and identifying opportunities for new research. This tutorial covers a good range of research problems in the biomedical domain which can be addressed using natural language processing. We will also discuss the challenges and the possible solutions to the problems in the biomedical domain.


This tutorial will span over five parts outlined as follows:

Tutorial Outline

  1. Part-1: Types of Biomedical Text  and Their Source (20 Slides, 30 Minutes)
    1. Scientific Medical Literature
    2. Electronic Health Record
    3. Patient Related Textual Data
    4. Online Discussion Forums (Twitter, Blogs etc.)
  2. Part-2: Types of Knowledge in the Biomedical Text (30 Slides, 20 Minutes)
    1. Domain-specific entities, relations, events
    2. Cause-Effect relations
    3. Sentence classification into Introduction, Methods, Results and Discussion (IMRAD) types
    4. Numerical information
    5. High polarity sentences
    6. Comparisons with related work
  3. Part-3: Text mining Approaches w.r.t. Biomedical Text (30 Slides, 30 Minutes)
    1. Rule-based approaches (Sophisticated Regular Expressions)
    2. Machine Learning Approaches (Supervised, Semi-supervised, Unsupervised)
  4. Part-4: Knowledge Extraction Systems from Literature (50 Slides, 70 Minutes)
    1. Joint Extraction of Entities and Relations from Drug Labels using an Ensemble of Neural Networks
    2. An Unsupervised Approach for Cause-Effect Relation Extraction from Biomedical Text
  5. Part-5: Hands on Session (10 Slides, 20 Minutes)
    1. Using ready-to-use NLP tools and library over biomedical text.
      1. To Identify Biological Entities
      2. To Identify Semantic Relation among Entities
  6. Discussion (10 Minutes)

Technical Equipment Required:

  1. Laptop with Python having Stanford-Core-NLP Package
  2. Internet Connection