【发布时间】:2020-10-14 13:12:46
【问题描述】:
我有一个小要求,我需要有关此代码的帮助:
def grepi(dico, fichier):
line_number = 0
nameFile = os.path.basename(fichier)
# Chargement dico
with open(dico, encoding="utf-8") as dic:
dicolist = dic.read().splitlines()
# Recherche dans fichier
with open(fichier, encoding="utf-8") as fic:
ficlist = fic.read().splitlines()
for line in ficlist:
line_number += 1
for patt in dicolist:
line = line.lower()
if re.search(r' + line + r'\b', patt):
print(line.rstrip() + ', ' + patt + ', ' + nameFile + ', '
+ str(line_number))
我在这里遇到了麻烦:if re.search(r' + line + r'\b', patt):
dico 是名字的字典,例如:
benoît
Nicolas
Stéphane
Sébastien
Alexandre
fichier 是一个包含大量信息的文件,例如:
Is the first name of Nicolas
Is Benoît is here
Hey 1234Alexandre1234
Stéphane found something
dfqklnflSébastiendsqjfldsjfldksj
等等。
在文件中,我想返回所有确切的字符串(即名字)。但是有些名字的格式是这样的:1234Alexandre5678,我找不到只返回Alexandre的方法,对于我想返回Sébastien的dfqklnflSébastiendsqjfldsjfldksj也是一样...
有人可以帮助我吗? 谢谢!
我如何用答案更正我的代码:
#!/usr/bin/env python3
import os
import re
def grepi(dico, fichier):
line_number = 0
nameFile = os.path.basename(fichier)
result_final = []
dicolist = open(dico, encoding="utf-8").read().splitlines()
print(dicolist)
with open(fichier, encoding="utf-8") as ficlist:
ficstring = ficlist.read().splitlines()
for line in ficstring:
ptrn = re.compile(r"\w*(" + "|".join(dicolist) + r")\w*",
flags=re.I)
ptrn_result = ptrn.findall(line)
if ptrn_result:
result_final = (nameFile, line_number, str(ptrn.findall(line)))
print(result_final)
line_number += 1
这里是输出:
('prénom.xml', 4, "['Benoit']")
('prénom.xml', 6, "['Stéphane']")
('prénom.xml', 9, "['Alexandre']")
('prénom.xml', 10, "['Nicolas']")
('prénom.xml', 14, "['Sébastien']")
【问题讨论】:
标签: python-3.x search find python-re