【发布时间】:2021-07-27 10:09:15
【问题描述】:
所以我一直在尝试在另一个数据集上运行 BERT 的示例代码。即这就是我用来尝试实现BERT模型的website,我设法按照说明成功运行了代码。 但是,当我尝试在自己的数据集上运行相同的代码时,出现以下错误:
```
model = BERT().to(device)
optimizer = optim.Adam(model.parameters(), lr=2e-5)
train(model=model, optimizer=optimizer)
```
```---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-57-e4474bff9c36> in <module>()
2 optimizer = optim.Adam(model.parameters(), lr=2e-5)
3
----> 4 train(model=model, optimizer=optimizer)
5 frames
<ipython-input-45-4d0fa7f8acd5> in <lambda>(x)
19 # Iterators
20
---> 21 train_iter = BucketIterator(train, batch_size=15, sort_key=lambda x: len(x.text),
22 device=device, train=True, sort=True, sort_within_batch=True)
23 valid_iter = BucketIterator(valid, batch_size=32, sort_key=lambda x: len(x.text),
AttributeError: 'Example' object has no attribute 'text'```
Label Text
1178093 3 renal and urinary disorders common:
1170768 3 orodispersible tablet
4339706 4 remotely manage the transmission bittorrent cl...
6513296 0 what do you think of her, ay ?
7013664 0 how that could become a film is more than i ca...
此外,我根据 StackOverflow 上的一些文章和问题的建议检查了空行,但我没有。这些是错误消息指向的相关代码行:
```
# Model parameter
MAX_SEQ_LEN = 128
PAD_INDEX = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)
UNK_INDEX = tokenizer.convert_tokens_to_ids(tokenizer.unk_token)
# Fields
label_field = Field(sequential=True, use_vocab=False, batch_first=True, dtype=torch.float)
text_field = Field(use_vocab=False, tokenize=tokenizer.encode, lower=False, include_lengths=False, batch_first=True,
fix_length=MAX_SEQ_LEN, pad_token=PAD_INDEX, unk_token=UNK_INDEX)
fields = [('Label', label_field), ('Text', text_field)]
# TabularDataset
train, valid, test = TabularDataset.splits(path=source_folder, train='train_bert.csv', validation='valid_bert.csv',
test='test_bert.csv', format='CSV', fields=fields, skip_header=True)
# Iterators
train_iter = BucketIterator(train, batch_size=15, sort_key=lambda x: len(x.text),
device=device, train=True, sort=True, sort_within_batch=True)
valid_iter = BucketIterator(valid, batch_size=32, sort_key=lambda x: len(x.text),
device=device, train=True, sort=True, sort_within_batch=True)
test_iter = Iterator(test, batch_size=32, device=device, train=False, shuffle=False, sort=False)
```
我一直在使用谷歌 colab。 这可能是一个新手问题,但我已经坚持了好几天了,非常感谢您在这方面的帮助。
【问题讨论】:
-
你能在所有迭代中尝试
x['Text']而不是x.text吗?
标签: nlp bert-language-model torchtext