【发布时间】:2017-11-22 14:49:49
【问题描述】:
我已经构建了一个 python 字典,它将存储单词作为键和它们出现的文件列表。下面是代码片段。
if len(sys.argv) < 2:
search_query = input("Enter the search query")
else:
search_query = sys.argv[1]
#path to the directory where files are stored, store the file names in list named directory_name
directory_name = os.listdir("./test_input")
#create a list list_of_files to get the entore path of the files , so that they can be opend later
list_of_files = []
#appending the files to the list_files
for files in directory_name:
list_of_files.append("./test_input"+"/"+files)
#empty dictionary
search_dictionary = {}
#iterate over the files in the list_of files one by one
for files in list_of_files:
#open the file
open_file = open(files,"r")
#store the basename of the file in as file_name
file_name = os.path.basename(files)
for line in open_file:
for word in line.split():
#if word in the file is not in the dictionary, add the word and the file_name in the dictionary
if word not in search_dictionary:
search_dictionary[word] = [file_name]
else:
#if the filename of a particular word is the same then ignore that
if file_name in search_dictionary[word]:
continue
#if the same word is found in the different file then append that filename
search_dictionary[word].append(file_name)
def search(search_dictionary, search_query):
if search_query in search_dictionary:
print 'found '+ search_query
print search_dictionary[search_query]
else:
print 'not found '+ search_query
search(search_dictionary, search_query)
input_word = ""
while input_word != 'quit':
input_word = raw_input('enter a word to search ')
start1 = time.time()
search(search_dictionary,input_word)
end1 = time.time()
print(end1 - start1)
但如果没有。目录中的文件数为 500 MB,RAM 和 SWAP 空间被耗尽。如何管理内存使用情况。
【问题讨论】:
-
相信我:问题出在你的算法上……太多了,ifs 等等。您需要检查您的算法。
-
这太荒谬了; for 循环和 if/else 条件本身与内存使用没有任何关系
-
您能否修正一下缩进,以便我们可以准确地告诉您在做什么?
-
也许尝试为每个单词列出一个数字列表?然后有另一个包含文件名的列表。
-
@Shadow 编辑了代码
标签: python linux dictionary search-engine