【问题标题】:AttributeError: 'Example' object has no attribute 'text'AttributeError:“示例”对象没有属性“文本”
【发布时间】:2021-07-27 10:09:15
【问题描述】:

所以我一直在尝试在另一个数据集上运行 BERT 的示例代码。即这就是我用来尝试实现BERT模型的website,我设法按照说明成功运行了代码。 但是,当我尝试在自己的数据集上运行相同的代码时,出现以下错误:

```
model = BERT().to(device)
optimizer = optim.Adam(model.parameters(), lr=2e-5)

train(model=model, optimizer=optimizer)
``` 

```---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-57-e4474bff9c36> in <module>()
      2 optimizer = optim.Adam(model.parameters(), lr=2e-5)
      3 
----> 4 train(model=model, optimizer=optimizer)

5 frames
<ipython-input-45-4d0fa7f8acd5> in <lambda>(x)
     19 # Iterators
     20 
---> 21 train_iter = BucketIterator(train, batch_size=15, sort_key=lambda x: len(x.text),
     22                             device=device, train=True, sort=True, sort_within_batch=True)
     23 valid_iter = BucketIterator(valid, batch_size=32, sort_key=lambda x: len(x.text),

AttributeError: 'Example' object has no attribute 'text'```
         Label                                               Text
1178093      3                renal and urinary disorders common:
1170768      3                              orodispersible tablet
4339706      4  remotely manage the transmission bittorrent cl...
6513296      0                     what do you think of her, ay ?
7013664      0  how that could become a film is more than i ca... 

此外,我根据 StackOverflow 上的一些文章和问题的建议检查了空行,但我没有。这些是错误消息指向的相关代码行:

```
# Model parameter
MAX_SEQ_LEN = 128
PAD_INDEX = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)
UNK_INDEX = tokenizer.convert_tokens_to_ids(tokenizer.unk_token)

# Fields

label_field = Field(sequential=True, use_vocab=False, batch_first=True, dtype=torch.float)
text_field = Field(use_vocab=False, tokenize=tokenizer.encode, lower=False, include_lengths=False, batch_first=True,
                   fix_length=MAX_SEQ_LEN, pad_token=PAD_INDEX, unk_token=UNK_INDEX)
fields = [('Label', label_field), ('Text', text_field)]

# TabularDataset


train, valid, test = TabularDataset.splits(path=source_folder, train='train_bert.csv', validation='valid_bert.csv',
                                           test='test_bert.csv', format='CSV', fields=fields, skip_header=True)

# Iterators

train_iter = BucketIterator(train, batch_size=15, sort_key=lambda x: len(x.text),
                            device=device, train=True, sort=True, sort_within_batch=True)
valid_iter = BucketIterator(valid, batch_size=32, sort_key=lambda x: len(x.text),
                            device=device, train=True, sort=True, sort_within_batch=True)
test_iter = Iterator(test, batch_size=32, device=device, train=False, shuffle=False, sort=False)
```

我一直在使用谷歌 colab。 这可能是一个新手问题,但我已经坚持了好几天了,非常感谢您在这方面的帮助。

【问题讨论】:

  • 你能在所有迭代中尝试x['Text']而不是x.text吗?

标签: nlp bert-language-model torchtext


【解决方案1】:

问题是我试图在代码上使用多类数据集,这显然是用于二进制分类的。衷心感谢所有尝试提供帮助的人!

【讨论】:

    猜你喜欢
    • 2019-07-30
    • 2018-01-15
    • 1970-01-01
    • 1970-01-01
    • 2018-11-12
    • 2021-06-05
    • 2019-02-09
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多