【发布时间】:2020-06-25 17:43:25
【问题描述】:
我正在尝试与 lexnlp 一起阅读我拥有的法律案件的 csv,以便分离出文本中的不同信息,例如列出的所有行为、日期等。
我已经按照 lexnlp 网站的说明完全格式化了所有内容。但是,我的 csv 读取不正确。我的教授建议我编写一个循环来遍历 csv,以便读取每个句子。在搜索了有关编写迭代循环的不同信息之后,我仍然不太了解该怎么做。
我找到了这个输入
for row in text.iterrows(): 但我不知道我应该让它运行什么操作。我问过同学,他们似乎也迷路了。下面是我的代码。任何和所有的帮助都是有用的。
url = 'https://raw.githubusercontent.com/unt-iialab/INFO5731_Spring2020/master/In_class_exercise/01-05-1%20%20Adams%20v%20Tanner.txt'
text = pd.read_csv(url,error_bad_lines=False, names=['sentence'])
#Output appears & reads fine with this portion
#Indicates that CSV is getting read properly
print('Number of Sentences:' , len(text['sentence']))
!pip install lexnlp
#Cannot get nlp module to read csv
import lexnlp.extract.en.acts
#This Version gives back empty brackets. I believe because it is reading text as a string.
print(lexnlp.extract.en.acts.get_act_list('text'))
#This is the format used in the number of sentences. It creates an error message.
print(lexnlp.extract.en.acts.get_act_list(text['sentence']))
#This is the format that the lexnlp site reccommends. It also creates an error message.
print(lexnlp.extract.en.acts.get_act_list(text))
#The following are just different features of the lexnlp module that I am going to run.
import lexnlp.extract.en.amounts
print(list(lexnlp.extract.en.amounts.get_amounts(text)))
import lexnlp.extract.en.citations
print(list(lexnlp.extract.en.citations.get_citations(text)))
import lexnlp.extract.en.entities.nltk_re
print(list(lexnlp.extract.en.entities.nltk_re.get_entities.nltk_re.get_companies(text)))
import lexnlp.extract.en.conditions
print(list(lexnlp.extract.en.conditions.get_conditions(text)))
import lexnlp.extract.en.constraints
print(list(lexnlp.extract.en.constraints.get_constraints(text)))
import lexnlp.extract.en.copyright
print(list(lexnlp.extract.en.copyright.get_copyright(text)))
import lexnlp.extract.en.courts
import lexnlp.extract.en.cusip
print(lexnlp.extract.en.cusip.get_cusip(text))
import lexnlp.extract.en.dates
print(list(lexnlp.extract.en.dates.get_dates(text)))
import lexnlp.extract.en.definitions
print(list(lexnlp.extract.en.definitions.get_definitions(text)))
import lexnlp.extract.en.distances
print(list(lexnlp.extract.en.distances.get_distances(text)))
import lexnlp.extract.en.durations
print(list(lexnlp.extract.en.durations.get_durations(text)))
import lexnlp.extract.en.money
print(list(lexnlp.extract.en.money.get_money(text)))
import lexnlp.extract.en.percents
print(list(lexnlp.extract.en.percents.get_percents(text)))
import lexnlp.extract.en.pii
print(list(lexnlp.extract.en.pii.get_pii(text)))
import lexnlp.extract.en.ratios
print(list(lexnlp.extract.en.ratios.get_ratios(text)))
import lexnlp.extract.en.regulations
print(list(lexnlp.extract.en.regulations.get_regulations(text)))
import lexnlp.extract.en.trademarks
print(list(lexnlp.extract.en.trademarks.get_trademarks(text)))
import lexnlp.extract.en.urls
print(list(lexnlp.extract.en.urls.get_urls(text)))
以下是我收到的错误代码:
<ipython-input-2-301f76c3c169> in <module>()
19
20 #This is the format used in the number of sentences. It creates an error message.
---> 21 print(lexnlp.extract.en.acts.get_act_list(text['sentence']))
22
23 #This is the format that the lexnlp site reccommends. It also creates an error message.
2 frames
/usr/local/lib/python3.6/dist-packages/lexnlp/extract/en/acts.py in get_acts_annotations(text)
37
38 def get_acts_annotations(text: str) -> Generator[ActAnnotation, None, None]:
---> 39 for match in ACT_PARTS_RE.finditer(text):
40 captures = match.capturesdict()
41 act_name = ''.join(captures.get('act_name') or [])
TypeError: expected string or buffer```
【问题讨论】:
-
“以下是我收到的错误代码” - 所以...错误代码在哪里?请发布完整的回溯
-
我更新了上面的错误代码
-
您使用的 url 不会将您带到 .csv 文件,只是一个普通的文本文件。因此,为此使用处理表格数据的 pandas 毫无意义。请参阅this question(及其答案)了解如何从 url 获取 txt 文件并遍历(循环)每一行。
标签: python python-3.x pandas csv iteration