【发布时间】:2017-10-18 02:08:05
【问题描述】:
根据我从 jcrfsuite 示例中给出的 POS 标记示例中的理解。训练文件是制表符分隔的,第一个标记是标签。但我没有得到 BigCluster|事物。有人可以帮助我如何在训练文件中指定令牌。
下面的例子:
o bigcluster | 00 bigcluster | 000000 bigcluster | 00000000 bigcluster | 0000000000 bigcluster | 000000000000 bigcluster | 00000000000000 bigcluster | 0000000000000000 nextbigcluster | 0100 nextbigcluster | 01000101 nextbigcluster | 01000101111 Postagdict | d postagdict | ^ postagdict | ^ postagdict | ^ postagdict | ^ postagdict | ^ postagdict | |G NextPOSTag|V 1gramSuff|i 1gramPref|i prevword| prevcurr||i nextword|predict nextword|predict currnext|i|predict Word|I Lower|i Xxdshape|X charclass|1, first-shortcap prevnext||predict t=0
测试文件格式:
! bigcluster | 01 bigcluster | 0110 bigcluster | 01101100 bigcluster | 0110110011大肠杆菌| 011011001100 bigcluster | 01101100110000 bigcluster | 01101100110000 bigcluster | 0110110011000000 nextbigcluster | 1000 nextbigcluster | 10001000 nextbigcluster | 10001000000 mnn 4gramSuff|mmnn 5gramSuff|mmmnn 6gramSuff|ammmnn 7gramSuff|aammmnn 8gramSuff|aaammmnn 9gramSuff|daaammmnn 1gramPref|d 2gramPref|da 3gramPref|daa 4gramPref|daaa 5gramPref|daaam 6gramPref|daaamm 7gramPref|daaammm|daaammm 8gramPref|daammn 预置词prevcurr||daaammmnn nextword|。下一个字|。当前下一个|daaammmnn|。 Word|Daaammmnn Lower|daaammmnn Xxdshape|Xxxxxxxxx charclass|1,2,2,2,2,2,2,2,2, first-initcap prevnext||. t=0
【问题讨论】:
标签: java machine-learning crfsuite