【问题标题】:glob filenames | AttributeError: 'str' object has no attribute 'content'glob 文件名 | AttributeError:“str”对象没有属性“content”
【发布时间】:2022-01-13 13:34:01
【问题描述】:

我正在运行我自己的 Notebook 版本,其中 Apply DocumentClassifier 部分更改如下。

documents 中的对象 docstr dtype。我认为它不应该是。 doc 应该是什么 dtype?

Jupyter Labs,内核:conda_mxnet_latest_p37

细胞:

import glob
docs_to_classify = glob.glob('full-set-of-gri-standards-2021-english/*.pdf')
with open('filt_gri.txt', 'r') as filehandle:
    tags = [current_place.rstrip() for current_place in filehandle.readlines()]

doc_classifier = TransformersDocumentClassifier(model_name_or_path="cross-encoder/nli-distilroberta-base",
                                                task="zero-shot-classification",
                                                labels=tags,
                                                batch_size=16)

classified_docs = doc_classifier.predict(docs_to_classify)

all_docs = convert_files_to_dicts(dir_path=doc_dir)

preprocessor_sliding_window = PreProcessor(split_overlap=3,
                                           split_length=10,
                                           split_respect_sentence_boundary=False,
                                           split_by='passage')

输出:

INFO - haystack.modeling.utils -  Using devices: CUDA
INFO - haystack.modeling.utils -  Number of GPUs: 1
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-75f29230cd0e> in <module>
      7                                                 batch_size=16)
      8 
----> 9 classified_docs = doc_classifier.predict(docs_to_classify)
     10 
     11 all_docs = convert_files_to_dicts(dir_path=doc_dir)

~/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages/haystack/nodes/document_classifier/transformers.py in predict(self, documents)
    134         :return: List of Document enriched with meta information
    135         """
--> 136         texts = [doc.content if self.classification_field is None else doc.meta[self.classification_field] for doc in documents]
    137         batches = self.get_batches(texts, batch_size=self.batch_size)
    138         if self.task == 'zero-shot-classification':

~/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages/haystack/nodes/document_classifier/transformers.py in <listcomp>(.0)
    134         :return: List of Document enriched with meta information
    135         """
--> 136         texts = [doc.content if self.classification_field is None else doc.meta[self.classification_field] for doc in documents]
    137         batches = self.get_batches(texts, batch_size=self.batch_size)
    138         if self.task == 'zero-shot-classification':

AttributeError: 'str' object has no attribute 'content'

请让我知道我是否应该添加任何其他内容来发布/澄清。

【问题讨论】:

    标签: python-3.x string glob attributeerror


    【解决方案1】:

    我忘了在classified_docs之前添加行:

    # convert to Document using a fieldmap for custom content fields the classification should run on
    docs_to_classify = [Document.from_dict(d) for d in docs_sliding_window]
    

    【讨论】:

    • 2天后接受。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2018-01-15
    • 1970-01-01
    • 2018-09-10
    • 2021-10-04
    • 2019-12-02
    • 2021-09-25
    • 2014-03-04
    相关资源
    最近更新 更多