【发布时间】:2019-02-18 11:29:18
【问题描述】:
我有 2 个文件,一个是包含国家名称的 .txt 文件,另一个是包含详细信息(文本)的 csv 文件。我想从文本 csv 文件中逐行匹配国家名称并计算和打印匹配的单词
我试过这段代码:
#NEW!
import csv
import time
#OLD! Import the keywords
f = open('country names.txt', 'r')
allKeywords = f.read().lower().split("\n")
f.close()
#CHANGED! Import the 'Details' column from the CSV file
allTexts = []
fullRow = []
with open('Detail_file.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
#the full row for each entry, which will be used to recreate the improved CSV file in a moment
fullRow.append((row['sr. no.'], row['Details'], row['LOC']))
#the column we want to parse for our keywords
row = row['Details'].lower()
allTexts.append(row)
#NEW! a flag used to keep track of which row is being printed to the CSV file
counter = 0
#NEW! use the current date and time to create a unique output filename
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + str(timestr) + '.csv'
#NEW! Open the new output CSV file to append ('a') rows one at a time.
with open(filename, 'a') as csvfile:
#NEW! define the column headers and write them to the new file
fieldnames = ['sr. no.', 'Details', 'LOC', 'Placename']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
#NEW! define the output for each row and then print to the output csv file
writer = csv.writer(csvfile)
#OLD! this is the same as before, for currentRow in fullRow:
for entry in allTexts:
matches = 0
storedMatches = []
#for each entry:
allWords = entry.split(' ')
for words in allWords:
#if a keyword match is found, store the result.
if words in allKeywords:
if words in storedMatches:
continue
else:
storedMatches.append(words)
matches += 1
#CHANGED! send any matches to a new row of the csv file.
if matches == 0:
newRow = fullRow[counter]
else:
matchTuple = tuple(storedMatches)
newRow = fullRow[counter] + matchTuple
#NEW! write the result of each row to the csv file
writer.writerows([newRow])
counter += 1
它的工作很好,它的输出是 enter image description here
所以我有一个问题,如果我的字典关键字(国家名称)包含一个单词,例如澳大利亚,美国等它的工作很好,但是
如果我的字典中的任何关键字包含超过 1 个单词,例如新西兰,南非等它不匹配并且不计数所以我有这个问题,因为上面的代码正在逐字匹配,所以如果我的字典任何关键字包含超过 1 个单词,如 conatins 2、3,如何解决这个问题, 4, .... 话。 以及我们将在上面的代码中添加解决方案代码的地方。
一个逻辑在我脑海中 如果任何关键字包含多个单词,则在搜索过程中,如果该特定关键字的第一个单词匹配,则代码根据关键字单词从搜索文本中检查下一个单词,如果匹配则确定,否则继续下一个关键字。
【问题讨论】:
-
这是一个excel问题吗?
-
@SolarMike 好吧,这是我的错误。
-
只是在 excel 中我会想到将单词连接在一起 ("south"&" "&"africa") 然后进行搜索...
-
你能给出输入文件的样本和预期的输出吗?
-
@EvensF 是的,我应该将文件和预期输出发送到哪里?
标签: python python-3.x csv dictionary text