【问题标题】:Want to match and count whole text or words of dictionary with search text in python想要用python中的搜索文本匹配和计算整个文本或字典的单词
【发布时间】:2019-02-18 11:29:18
【问题描述】:

我有 2 个文件,一个是包含国家名称的 .txt 文件,另一个是包含详细信息(文本)的 csv 文件。我想从文本 csv 文件中逐行匹配国家名称并计算和打印匹配的单词

我试过这段代码:

#NEW!
import csv
import time

#OLD! Import the keywords
f = open('country names.txt', 'r')
allKeywords = f.read().lower().split("\n")
f.close()


#CHANGED! Import the 'Details' column from the CSV file
allTexts = []
fullRow = []
with open('Detail_file.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        #the full row for each entry, which will be used to recreate the improved CSV file in a moment
        fullRow.append((row['sr. no.'], row['Details'], row['LOC']))

        #the column we want to parse for our keywords
        row = row['Details'].lower()
        allTexts.append(row)

#NEW! a flag used to keep track of which row is being printed to the CSV file   
counter = 0

#NEW! use the current date and time to create a unique output filename
timestr = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + str(timestr) + '.csv'

#NEW! Open the new output CSV file to append ('a') rows one at a time.
with open(filename, 'a') as csvfile:

    #NEW! define the column headers and write them to the new file
    fieldnames = ['sr. no.', 'Details', 'LOC', 'Placename']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()

    #NEW! define the output for each row and then print to the output csv file
    writer = csv.writer(csvfile)

    #OLD! this is the same as before, for currentRow in fullRow:
    for entry in allTexts:

        matches = 0
        storedMatches = []

        #for each entry:
        allWords = entry.split(' ')
        for words in allWords:


            #if a keyword match is found, store the result.
            if words in allKeywords:
                if words in storedMatches:
                    continue
                else:
                    storedMatches.append(words)
                matches += 1

        #CHANGED! send any matches to a new row of the csv file.
        if matches == 0:
            newRow = fullRow[counter]
        else:
            matchTuple = tuple(storedMatches)
            newRow = fullRow[counter] + matchTuple

        #NEW! write the result of each row to the csv file
        writer.writerows([newRow])
        counter += 1

它的工作很好,它的输出是 enter image description here

所以我有一个问题,如果我的字典关键字(国家名称)包含一个单词,例如澳大利亚,美国等它的工作很好,但是

如果我的字典中的任何关键字包含超过 1 个单词,例如新西兰,南非等它不匹配并且不计数所以我有这个问题,因为上面的代码正在逐字匹配,所以如果我的字典任何关键字包含超过 1 个单词,如 conatins 2、3,如何解决这个问题, 4, .... 话。 以及我们将在上面的代码中添加解决方案代码的地方。

一个逻辑在我脑海中 如果任何关键字包含多个单词,则在搜索过程中,如果该特定关键字的第一个单词匹配,则代码根据关键字单词从搜索文本中检查下一个单词,如果匹配则确定,否则继续下一个关键字。

【问题讨论】:

  • 这是一个excel问题吗?
  • @SolarMike 好吧,这是我的错误。
  • 只是在 excel 中我会想到将单词连接在一起 ("south"&" "&"africa") 然后进行搜索...
  • 你能给出输入文件的样本和预期的输出吗?
  • @EvensF 是的,我应该将文件和预期输出发送到哪里?

标签: python python-3.x csv dictionary text


【解决方案1】:

嗯,要理解你想要做什么并不容易。而且我不确定您是否了解 CSV 文件是什么。尝试在您正在编辑 Python 脚本的同一编辑器(不是 Excel)中打开它。

无论如何,这是我的尝试:

import csv
import time

with open('country names.txt', 'r') as f:
    all_keywords = list(line.lower().rstrip("\n") for line in f)

with open('Detail_file.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    full_rows = [(row['sr. no.'], row['Details'], row['LOC']) for row in reader]

time_string = time.strftime("%Y-%m-%d-(%H-%M-%S)")
filename = 'output-' + time_string + '.csv'

with open(filename, 'w', newline='') as csvfile:

    writer = csv.writer(csvfile)
    writer.writerow(['sr. no.', 'Details', 'LOC', 'Placename'])

    for input_row in full_rows:
        stored_matches_unique = set(x for x in all_keywords if x in input_row[1].lower())
        stored_matches = list(stored_matches_unique)
        new_row = input_row + stored_matches
        writer.writerow(new_row)

【讨论】:

    猜你喜欢
    • 2010-11-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多