【问题标题】:POS tagger in python without NLTK没有NLTK的python中的POS标记器
【发布时间】:2018-12-11 22:48:57
【问题描述】:

我正在尝试为 Sorani Kurdish 的限定词和介词制作词性标注器。我正在使用以下代码将每个标签放在我的库尔德语文本中的每个命题或确定符之后。

import os
SOR = open("SOR-1.txt", "r+", encoding = 'utf-8')
old_text = SOR.read()
punkt = [".", "!", ",", ":", ";"]
text = ""
for i in old_text:
    if i in punkt:
        text+=" "+i
    else:
        text += i

d = {"DET":["ئێمە" , "ئێوە" , "ئەم" , "ئەو" , "ئەوان" , "ئەوەی", "چەند" ], "PREP":["بۆ","بێ","بێجگە","بە","بەبێ","بەدەم","بەردەم","بەرلە","بەرەوی","بەرەوە","بەلای","بەپێی","تۆ","تێ","جگە","دوای","دەگەڵ","سەر","لێ","لە","لەبابەت","لەباتی","لەبارەی","لەبرێتی","لەبن","لەبەینی","لەبەر","لەدەم","لەرێ","لەرێگا","لەرەوی","لەسەر","لەلایەن","لەناو","لەنێو","لەو","لەپێناوی","لەژێر","لەگەڵ","ناو","نێوان","وەک","وەک","پاش","پێش","" ], "punkt":[".", ",", "!"]}

text = text.split()
for w in text:
    for pos in d:
        if w in d[pos]:
            SOR.write(w+"/"+pos+" ")
SOR.close()

我想要做的是在定义的字典中每个单词之后的文本内添加 POS 标签,但结果是文件末尾的单词和 POS 标签的单独列表。

【问题讨论】:

    标签: python nlp pos-tagger


    【解决方案1】:

    请记住,old_text 是一个字符串。因此,当您像在

    中那样循环遍历它时
    for i in old_text:
        if i in punkt:
    

    你正在循环字符。我认为您打算改为循环遍历 old_text 行。如果是这种情况,您可以使用指定 readwrite 模式的 with 语句打开文件。比如:

    with open("SOR-1.txt", 'r+', encoding = 'utf-8') as f:
        old_text = f.readlines()
        for line in old_text:
            for punctuationMark in punct:
                if punctuationMark in line.strip('\n'):     #when you read the file, every line will be terminated with newline character `'\n'`
                    #give more instructions
    

    【讨论】:

    • 谢谢,但我的文本中的每个单词后仍然没有标记。我已将“DET”和“PREP”定义为我的标签,但它们仍然与我的文本末尾的字典中的单词一起出现,而不是在文本中。我只需要在每个定义的单词后面加上一个标签。
    猜你喜欢
    • 1970-01-01
    • 2016-07-02
    • 2013-01-08
    • 2014-03-14
    • 2015-03-25
    • 2014-05-20
    • 1970-01-01
    • 2015-03-23
    • 1970-01-01
    相关资源
    最近更新 更多