7.8   Further Reading

Extra materials for this chapter are posted at http://www.nltk.org/, including links to freely available resources on the web. For more examples of chunking with NLTK, please see the Chunking HOWTO at http://www.nltk.org/howto.

The popularity of chunking is due in great part to pioneering work by Abney e.g., (Church, Young, & Bloothooft, 1996). Abney's Cass chunker is described in http://www.vinartus.net/spa/97a.pdf.

The word chink initially meant a sequence of stopwords, according to a 1975 paper by Ross and Tukey (Church, Young, & Bloothooft, 1996).

The IOB format (or sometimes BIO Format) was developed for NP chunking by (Ramshaw & Marcus, 1995), and was used for the shared NP bracketing task run by the Conference on Natural Language Learning (CoNLL) in 1999. The same format was adopted by CoNLL 2000 for annotating a section of Wall Street Journal text as part of a shared task on NP chunking.

Section 13.5 of (Jurafsky & Martin, 2008) contains a discussion of chunking. Chapter 22 covers information extraction, including named entity recognition. For information about text mining in biology and medicine, see (Ananiadou & McNaught, 2006).

相关文章:

  • 2021-10-17
  • 2021-06-02
  • 2021-10-11
  • 2021-04-08
  • 2021-06-16
  • 2021-09-05
  • 2021-08-01
  • 2021-08-11
猜你喜欢
  • 2021-07-13
  • 2022-12-23
  • 2021-07-23
  • 2022-03-03
  • 2021-04-14
  • 2022-12-23
  • 2022-02-10
相关资源
相似解决方案