Overview of BioNLP’09 Shared Task on Event Extraction
Jin-Dong Kim Tomoko Ohta Sampo Pyysalo Yoshinobu Kano Jun’ichi Tsujii†‡Department of Computer Science, University of Tokyo, Tokyo, Japan
†School of Computer Science, University of Manchester, Manchester, UK
‡National Centre for Text Mining, University of Manchester, Manchester, UK
{jdkim,okap,smp,kano,tsujii}@is.s.u-tokyo.ac.jp
The below intro cite from the above paper, for further information, refers to the full text, or http://www.nactem.ac.uk/tsujii/GENIA/SharedTask/
The history of text mining (TM) shows that shared tasks based on carefully curated resources, such as those organized in the MUC (Chinchor, 1998), TREC (Voorhees, 2007) and ACE (Strassel et al., 2008) events, have significantly contributed to the progress of their respective fields. This has also been the case in bio-TM. Examples include the TREC Genomics track (Hersh et al., 2007), JNLPBA (Kim et al., 2004), LLL (N´edellec, 2005), and BioCreative (Hirschman et al., 2007). While the first two addressed bio-IR (information retrieval) and bio-NER (named entity recognition), respectively, the last two focused on bio-IE (information extraction), seeking relations between bio-molecules. With the emergence of NER systems with performance capable of supporting practical applications, the recent interest of the bio-TM community is shifting toward IE. Similarly to LLL and BioCreative, the BioNLP’09 Shared Task (the BioNLP task, hereafter) also addresses bio-IE, but takes a definitive step further toward finer-grained IE.While LLL and BioCreative focus on a rather simple representation of relations of bio-molecules, i.e. protein-protein interactions (PPI), the BioNLP task concerns the detailed behavior of bio-molecules, characterized as bio-molecular events (bio-events). The difference in focus is motivated in part by different applications envisioned as being supported by the IE methods. For example, BioCreative aims to support curation of PPI databases such as MINT (Chatr-aryamontri et al., 2007), for a long time one of the primary tasks of bioinformatics. The BioNLP task aims to support the development of more detailed and structured databases, e.g. pathway (Bader et al., 2006) or Gene Ontology Annotation (GOA) (Camon et al., 2004) databases, which are gaining increasing interest in bioinformatics research in response to recent advances in molecular biology. As the first shared task of its type, the BioNLP task aimed to define a bounded, well-defined bioevent extraction task, considering both the actual needs and the state of the art in bio-TM technology and to pursue it as a community-wide effort. The key challenge was in finding a good balance between the utility and the feasibility of the task, which was also limited by the resources available. Special consideration was given to providing evaluation at diverse levels and aspects, so that the results can drive continuous efforts in relevant directions. The paper discusses the design and implementation of the BioNLP task, and reports the results with analysis.
The BioNLP task targets semantically rich event extraction, involving the extraction of several different classes of information. To facilitate evaluation on different aspects of the overall task, the task is divided to three sub-tasks addressing event extraction at different levels of specificity.
Task 1. Core event detection detection of typed, text-bound events and assignment of given proteins as their primary arguments.
Task 2. Event enrichment recognition of secondary arguments that further specify the events extracted in Task 1.
Task 3. Negation/Speculation detection detection of negations and speculation statements concerning extracted events.
Jin-Dong Kim Tomoko Ohta Sampo Pyysalo Yoshinobu Kano Jun’ichi Tsujii†‡Department of Computer Science, University of Tokyo, Tokyo, Japan
†School of Computer Science, University of Manchester, Manchester, UK
‡National Centre for Text Mining, University of Manchester, Manchester, UK
{jdkim,okap,smp,kano,tsujii}@is.s.u-tokyo.ac.jp
The below intro cite from the above paper, for further information, refers to the full text, or http://www.nactem.ac.uk/tsujii/GENIA/SharedTask/
The history of text mining (TM) shows that shared tasks based on carefully curated resources, such as those organized in the MUC (Chinchor, 1998), TREC (Voorhees, 2007) and ACE (Strassel et al., 2008) events, have significantly contributed to the progress of their respective fields. This has also been the case in bio-TM. Examples include the TREC Genomics track (Hersh et al., 2007), JNLPBA (Kim et al., 2004), LLL (N´edellec, 2005), and BioCreative (Hirschman et al., 2007). While the first two addressed bio-IR (information retrieval) and bio-NER (named entity recognition), respectively, the last two focused on bio-IE (information extraction), seeking relations between bio-molecules. With the emergence of NER systems with performance capable of supporting practical applications, the recent interest of the bio-TM community is shifting toward IE. Similarly to LLL and BioCreative, the BioNLP’09 Shared Task (the BioNLP task, hereafter) also addresses bio-IE, but takes a definitive step further toward finer-grained IE.While LLL and BioCreative focus on a rather simple representation of relations of bio-molecules, i.e. protein-protein interactions (PPI), the BioNLP task concerns the detailed behavior of bio-molecules, characterized as bio-molecular events (bio-events). The difference in focus is motivated in part by different applications envisioned as being supported by the IE methods. For example, BioCreative aims to support curation of PPI databases such as MINT (Chatr-aryamontri et al., 2007), for a long time one of the primary tasks of bioinformatics. The BioNLP task aims to support the development of more detailed and structured databases, e.g. pathway (Bader et al., 2006) or Gene Ontology Annotation (GOA) (Camon et al., 2004) databases, which are gaining increasing interest in bioinformatics research in response to recent advances in molecular biology. As the first shared task of its type, the BioNLP task aimed to define a bounded, well-defined bioevent extraction task, considering both the actual needs and the state of the art in bio-TM technology and to pursue it as a community-wide effort. The key challenge was in finding a good balance between the utility and the feasibility of the task, which was also limited by the resources available. Special consideration was given to providing evaluation at diverse levels and aspects, so that the results can drive continuous efforts in relevant directions. The paper discusses the design and implementation of the BioNLP task, and reports the results with analysis.
The BioNLP task targets semantically rich event extraction, involving the extraction of several different classes of information. To facilitate evaluation on different aspects of the overall task, the task is divided to three sub-tasks addressing event extraction at different levels of specificity.
Task 1. Core event detection detection of typed, text-bound events and assignment of given proteins as their primary arguments.
Task 2. Event enrichment recognition of secondary arguments that further specify the events extracted in Task 1.
Task 3. Negation/Speculation detection detection of negations and speculation statements concerning extracted events.