【问题标题】:I think I have a memory leak in my python script我认为我的 python 脚本中有内存泄漏
【发布时间】:2010-08-02 23:29:44
【问题描述】:

这是我的代码:

from xgoogle.search import GoogleSearch, SearchError
import urllib, urllib2, sys, argparse

global stringArr

stringArr = ["string 1",
             "string 2",
             "string 3",
             "string etc"]

def searchIt(url):
    try:
        if(args.verbose>='1'): print "[INFO] Opening URL: "+url
        response = urllib.urlopen(url)
    except urllib2.URLError, e:
        print "[ERROR] "+e.reason
        return False
    except KeyboardInterrupt:
        print "Suspended by user..."
        sys.exit()
    if(checkForStr(response.read())):
        if(args.verbose=='0'): print "[INFO] String found in URL: "+url
    else:
        if(args.verbose>='1'): print "[INFO] No string found in URL: "+url

def checkForStr(html):
    global stringArr
    try:
        if any(checkStr in html for checkStr in stringArr):
            return True
        else:
            return False
    except KeyboardInterrupt:
        print "Suspended by user..."
        sys.exit()

def main():
    try:
        i=0
        gs = GoogleSearch(args.keyword)
        gs.results_per_page = 100
        results = []
        while True:
            tmp = gs.get_results()
            i = i+1 # page number
            if not tmp: # no more results (pages) were found
                break
            results.extend(tmp)
            for r in results: # process results for page
                searchIt(r.url) # check for string
            del results[:] # clean results
        # finished
    except SearchError, e:
        print "[ERROR] Search failed: %s" % e
    except KeyboardInterrupt:
        print "Suspended by user..."
        sys.exit()

if __name__ == '__main__':
    try:
        parser = argparse.ArgumentParser()
        parser.add_argument('-v', dest='verbose', default='0', help='Verbosity level', choices='012')
        group = parser.add_argument_group('required arguments')
        group.add_argument('-k', dest='keyword', help='Keyword to use on google query', required=True)
        args = parser.parse_args()
        main()
    except KeyboardInterrupt:
        print "Suspended by user..."
        sys.exit()

为了便于阅读,我将它缩短了一点,但它应该仍然可以使用。此代码将成为更大脚本的一部分。

我正在使用这个库:XGOOGLE 从 google 抓取结果,然后我访问每个结果以搜索网站是否包含 stringArr 中的任何字符串。

我第一次测试没有任何问题(我在不到 10 个结果后按 ctrl+C),但是第一次让它运行时,在测试了大约 100 个 url 后我得到了这个错误:

  File "./StringScan.py", line 99, in <module>
    main()
  File "./StringScan.py", line 83, in main
    checkForStr(r.url)
  File "./StringScan.py", line 39, in checkForStr
    response = urllib.urlopen(url)
  File "/usr/lib/python2.6/urllib.py", line 86, in urlopen
    return opener.open(url)
  File "/usr/lib/python2.6/urllib.py", line 205, in open
    return getattr(self, name)(url)
  File "/usr/lib/python2.6/urllib.py", line 344, in open_http
    h.endheaders()
  File "/usr/lib/python2.6/httplib.py", line 904, in endheaders
    self._send_output()
  File "/usr/lib/python2.6/httplib.py", line 776, in _send_output
    self.send(msg)
  File "/usr/lib/python2.6/httplib.py", line 735, in send
    self.connect()
  File "/usr/lib/python2.6/httplib.py", line 716, in connect
    self.timeout)
  File "/usr/lib/python2.6/socket.py", line 500, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno -2] Name or service not known

(行号不一样,因为我修改了代码贴在这里)

之后,我拿回了我的 linux 终端,就像脚本已经完成一样。但我注意到我的电脑运行得不太好,我检查了系统监视器,发现进程 Python 使用了 1.3gb 内存,我不得不终止进程才能让我的电脑恢复正常。

是我的代码中的某些东西导致了这种情况还是为什么会发生?

我知道我的代码可能有一些错误,但现在我主要对可能导致内存问题的任何错误感兴趣。任何帮助将不胜感激。

【问题讨论】:

  • if x: return True \ else: return False - 很高兴我们得到了这些布尔值,嗯?
  • global stringArr 并没有按照你的想法去做,你根本不需要那些行
  • 你不需要到处处理KeyboardInterrupt,异常会渗透回顶层,所以就在那里处理吧
  • 感谢 gnibbler,我添加了这么多 KeyboardInterrupt,因为如果我只在 main() 中使用它并且脚本例如在 .urlopen 上,它并没有立即关闭,但是所有 KeyboardInterrupt 它确实关闭了. Relet,我不明白你的评论。
  • 我相信 relet 是指您在checkForStr 中使用any。在我发布的答案中查看如何简化

标签: python memory memory-leaks


【解决方案1】:

我对您的代码进行了一些重构,以使其更易于阅读。我在这里看不到任何会泄漏内存的东西

from itertools import count
import urllib, urllib2, sys, argparse
from xgoogle.search import GoogleSearch, SearchError

stringArr = ["string 1",
             "string 2",
             "string 3",
             "string etc"]

def searchIt(url):
    try:
        if(args.verbose>='1'):
            print "[INFO] Opening URL: "+url
        response = urllib.urlopen(url)
    except urllib2.URLError, e:
        print "[ERROR] "+e.reason
        return False
    if checkForStr(response.read()):
        if(args.verbose=='0'):
            print "[INFO] String found in URL: "+url
    else:
        if(args.verbose>='1'):
            print "[INFO] No string found in URL: "+url

def checkForStr(html):
    return any(checkStr in html for checkStr in stringArr)

def main():
    try:
        gs = GoogleSearch(args.keyword)
        gs.results_per_page = 100
        for i in count():
            results = gs.get_results()
            if not results: # no more results (pages) were found
                break
            for r in results: # process results for page
                searchIt(r.url) # check for string
        # finished
    except SearchError, e:
        print "[ERROR] Search failed: %s" % e

if __name__ == '__main__':
    try:
        parser = argparse.ArgumentParser()
        parser.add_argument('-v', dest='verbose', default='0', help='Verbosity level', choices='012')
        group = parser.add_argument_group('required arguments')
        group.add_argument('-k', dest='keyword', help='Keyword to use on google query', required=True)
        args = parser.parse_args()
        main()
    except KeyboardInterrupt:
        print "Suspended by user..."
        sys.exit()

【讨论】:

  • 我还不确定,但我认为泄漏来自我运行的另一个 python 应用程序,因为我的脚本显示为 scriptname.py 并且高内存使用率在一个名为 Python 的进程上。我的脚本中仍然出现 IOError 错误,但我想这是另一个问题。你知道我怎样才能删除这个问题吗?
【解决方案2】:

可能是 urllib.urlopen()。见http://bugs.python.org/issue1208304

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2022-12-17
    • 1970-01-01
    • 2011-02-05
    • 1970-01-01
    • 2013-07-09
    • 1970-01-01
    • 2011-12-25
    相关资源
    最近更新 更多