【问题标题】:Pattern table to Pandas DataFrame模式表到 Pandas DataFrame
【发布时间】:2017-10-20 11:40:54
【问题描述】:

我正在使用 Python“Pattern.en”包,它为我提供了有关特定句子的主语、宾语和其他详细信息。

但我想将此输出存储到另一个变量或数据帧中以进行进一步处理,但我无法这样做。

对此的任何意见都会有所帮助。

下面提到了示例代码以供参考。

from pattern.en import parse
from pattern.en import pprint
import pandas as pd

input = parse('I want to go to the Restaurant as I am hungry very much')
print(input)    
I/PRP/B-NP/O want/VBP/B-VP/O to/TO/I-VP/O go/VB/I-VP/O to/TO/O/O the/DT/B-NP/O Restaurant/NNP/I-NP/O as/IN/B-PP/B-PNP I/PRP/B-NP/I-PNP am/VBP/B-VP/O hungry/JJ/B-ADJP/O very/RB/I-ADJP/O much/JJ/I-ADJP/O

pprint(input)

      WORD   TAG    CHUNK    ROLE   ID     PNP    LEMMA                                                
         I   PRP    NP       -      -      -      -       
      want   VBP    VP       -      -      -      -       
        to   TO     VP ^     -      -      -      -       
        go   VB     VP ^     -      -      -      -       
        to   TO     -        -      -      -      -       
       the   DT     NP       -      -      -      -       
Restaurant   NNP    NP ^     -      -      -      -       
        as   IN     PP       -      -      PNP    -       
         I   PRP    NP       -      -      PNP    -       
        am   VBP    VP       -      -      -      -       
    hungry   JJ     ADJP     -      -      -      -       
      very   RB     ADJP ^   -      -      -      -       
      much   JJ     ADJP ^   -      -      -      -       

请注意 print 和 pprint 语句的输出。我正在尝试将其中任何一个存储到变量中。如果我可以将 pprint 语句的输出存储到 Dataframe 中,因为它以表格格式打印会更好。

但是当我尝试这样做时,我遇到了下面提到的错误

df = pd.DataFrame(input)

ValueError: DataFrame 构造函数未正确调用!

【问题讨论】:

  • 看起来很基础,你读过 Pandas 的文档吗? pandas.pydata.org/pandas-docs/stable/generated/… 您的错误表明您没有正确调用构造函数 - 似乎确实如此。
  • 谢谢@Jacob。但我的问题不是如何解决我得到的错误。它是如何将 pattern.en 包的输出存储到变量或 Dataframe 中。因此,如果您对此有任何想法,请告诉我。希望这不是一个基本的,如果你认为这不是基本的,你可以重新考虑删除反对票

标签: python python-3.x pandas linguistics


【解决方案1】:

table函数的源码,我出来了这个

from pattern.en import parse
from pattern.text.tree import WORD, POS, CHUNK, PNP, REL, ANCHOR, LEMMA, IOB, ROLE, MBSP, Text
import pandas as pd

def sentence2df(sentence, placeholder="-"):
    tags  = [WORD, POS, IOB, CHUNK, ROLE, REL, PNP, ANCHOR, LEMMA]
    tags += [tag for tag in sentence.token if tag not in tags]
    def format(token, tag):
        # Returns the token tag as a string.
        if   tag == WORD   : s = token.string
        elif tag == POS    : s = token.type
        elif tag == IOB    : s = token.chunk and (token.index == token.chunk.start and "B" or "I")
        elif tag == CHUNK  : s = token.chunk and token.chunk.type
        elif tag == ROLE   : s = token.chunk and token.chunk.role
        elif tag == REL    : s = token.chunk and token.chunk.relation and str(token.chunk.relation)
        elif tag == PNP    : s = token.chunk and token.chunk.pnp and token.chunk.pnp.type
        elif tag == ANCHOR : s = token.chunk and token.chunk.anchor_id
        elif tag == LEMMA  : s = token.lemma
        else               : s = token.custom_tags.get(tag)
        return s or placeholder

    columns = [[format(token, tag) for token in sentence] for tag in tags]
    columns[3] = [columns[3][i]+(iob == "I" and " ^" or "") for i, iob in enumerate(columns[2])]
    del columns[2]
    header = ['word', 'tag', 'chunk', 'role', 'id', 'pnp', 'anchor', 'lemma']+tags[9:]

    if not MBSP:
        del columns[6]
        del header[6]

    return pd.DataFrame(
        [[x[i] for x in columns] for i in range(len(columns[0]))],
        columns=header,
    )

用法

>>> string = parse('I want to go to the Restaurant as I am hungry very much')
>>> sentence = Text(string, token=[WORD, POS, CHUNK, PNP])[0]
>>> df = sentence2df(sentence)
>>> print(df)
          word  tag   chunk role id  pnp lemma
0            I  PRP      NP    -  -    -     -
1         want  VBP      VP    -  -    -     -
2           to   TO    VP ^    -  -    -     -
3           go   VB    VP ^    -  -    -     -
4           to   TO       -    -  -    -     -
5          the   DT      NP    -  -    -     -
6   Restaurant  NNP    NP ^    -  -    -     -
7           as   IN      PP    -  -  PNP     -
8            I  PRP      NP    -  -  PNP     -
9           am  VBP      VP    -  -    -     -
10      hungry   JJ    ADJP    -  -    -     -
11        very   RB  ADJP ^    -  -    -     -
12        much   JJ  ADJP ^    -  -    -     -

【讨论】:

  • 哇。棒极了。你很棒@pacholik
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2015-03-16
  • 2018-01-23
  • 2021-10-06
  • 2018-11-16
  • 2019-08-25
相关资源
最近更新 更多