【发布时间】:2015-12-20 05:53:03
【问题描述】:
我正在尝试获取一个输入文件,读取每一行,使用该行搜索 google 并打印来自查询的所有搜索结果,前提是结果来自特定网站。一个简单的例子来说明我的观点,如果我搜索狗,我只想从维基百科打印结果,无论是维基百科的一个结果还是十个结果。我的问题是我得到了非常奇怪的结果。下面是我的 Python 代码,其中包含我想要结果的特定 URL。
我的程序
inputFile = open("small.txt", 'r') # Makes File object
outputFile = open("results1.txt", "w")
dictionary = {} # Our "hash table"
compare = "www.someurl.com/" # urls will compare against this string
from googlesearch import GoogleSearch
for line in inputFile.read().splitlines():
lineToRead = line
dictionary[lineToRead] = [] #initialzed to empty list
gs = GoogleSearch(lineToRead)
for url in gs.top_urls():
print url # check to make sure this is printing URLs
compare2 = url
if compare in compare2: #compare the two URLs, if they match
dictionary[lineToRead].append(url) #write out query string to dictionary key & append EACH url that matches
inputFile.close()
for i in dictionary:
print i # this print is a test that shows what the query was in google (dictionary key)
outputFile.write(i+"\n")
for j in dictionary[i]:
print j # this print is a test that shows the results from the query which should look like correct URL: "www.medicaldepartmentstore.com/..."(dictionary value(s))
outputFile.write(j+"\n") #write results for the query string to the output file.
我的输出文件不正确,它应该被格式化的方式是
query string
http://www.
http://www.
http://www.
query string
http://www.
query string
http://www.medical...
http://www.medical...
【问题讨论】:
标签: python google-search