【发布时间】:2019-07-09 23:25:58
【问题描述】:
我想将一段文本拆分成句子,然后打印每个句子的字符数,但是程序并没有计算每个句子的字符数。
我尝试将用户输入的文件标记为句子并循环遍历句子计数并打印每个句子中的字符数。我试过的代码是:
from collections import defaultdict
import nltk
from nltk.tokenize import word_tokenize
from nltk.tokenize import sent_tokenize,wordpunct_tokenize
import re
import os
import sys
from pathlib import Path
while True:
try:
file_to_open =Path(input("\nYOU SELECTED OPTION 8:
CALCULATE SENTENCE LENGTH. Please, insert your file
path: "))
with open(file_to_open,'r', encoding="utf-8") as f:
words = sent_tokenize(f.read())
break
except FileNotFoundError:
print("\nFile not found. Better try again")
except IsADirectoryError:
print("\nIncorrect Directory path.Try again")
print('\n\n This file contains',len(words),'sentences in total')
wordcounts = []
caracter_count=0
sent_number=1
with open(file_to_open) as f:
text = f.read()
sentences = sent_tokenize(text)
for sentence in sentences:
if sentence.isspace() !=True:
caracter_count = caracter_count + 1
print("Sentence", sent_number,'contains',caracter_count,
'characters')
sent_number +=1
caracter_count = caracter_count + 1
我想打印类似的东西:
“句子 1 有 35 个字符” “第 2 句有 45 个字符”
等等……
我通过这个程序得到的输出是: 该文件共包含 4 个句子 “句子 1 包含 0 个字符” “句子 2 包含 1 个字符” “第 3 句包含 2 个字符” "第 4 句包含 3 个字符"
有人可以帮我做吗?
【问题讨论】: