【问题标题】:memory error when retrieving data from Songkick从 Songkick 检索数据时出现内存错误
【发布时间】:2017-05-30 08:52:36
【问题描述】:

我已经构建了一个爬虫,通过使用他们的 api 从 songkick 中检索音乐会数据。但是,从这些艺术家那里检索所有数据需要花费大量时间。抓取大约 15 小时后,脚本仍在运行,但 JSON 文件不再更改。我中断了脚本并检查了是否可以使用 TinyDB 访问我的数据。不幸的是,我收到以下错误。有谁知道为什么会这样?

错误:

('cannot fetch url', 'http://api.songkick.com/api/3.0/artists/8689004/gigography.json?apikey=###########&min_date=2015-04-25&max_date=2017-03-01')
8961344


Traceback (most recent call last):
  File "C:\Users\rmlj\Dropbox\Data\concerts.py", line 42, in <module>
    load_events()
  File "C:\Users\rmlj\Dropbox\Data\concerts.py", line 27, in load_events
    print(artist)
  File "C:\Python27\lib\idlelib\PyShell.py", line 1356, in write
    return self.shell.write(s, self.tags)
KeyboardInterrupt

>>> mydat = db.all()

Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    mydat = db.all()
  File "C:\Python27\lib\site-packages\tinydb\database.py", line 304, in all
    return list(itervalues(self._read()))
  File "C:\Python27\lib\site-packages\tinydb\database.py", line 277, in _read
    return self._storage.read()
  File "C:\Python27\lib\site-packages\tinydb\database.py", line 31, in read
    raw_data = (self._storage.read() or {})[self._table_name]
  File "C:\Python27\lib\site-packages\tinydb\storages.py", line 105, in read
    return json.load(self._handle)
  File "C:\Python27\lib\json\__init__.py", line 287, in load
    return loads(fp.read(),
MemoryError

你可以在下面找到我的脚本

 import urllib2
import requests
import json
import csv
import codecs


from tinydb import TinyDB, Query
db = TinyDB('events.json')


def load_events():
        MIN_DATE = "2015-04-25"
        MAX_DATE = "2017-03-01"
        API_KEY= "###############"
        with open('artistid.txt', 'r') as f:
            for a in f: 
                artist = a.strip() 
                print(artist)
                url_base = 'http://api.songkick.com/api/3.0/artists/{}/gigography.json?apikey={}&min_date={}&max_date={}'
                url = url_base.format(artist, API_KEY, MIN_DATE, MAX_DATE)
                # url = u'http://api.songkick.com/api/3.0/search/artists.json?query='+artist+'&apikey=WBmvXDarTCEfqq7h'
                try:
                  r = requests.get(url)
                  resp = r.json()
                  if(resp['resultsPage']['totalEntries']):
                    results = resp['resultsPage']['results']['event']
                    for x in results:
                        print(x)
                        db.insert(x)
                except:
                    print('cannot fetch url',url);

load_events()
db.close()
print ("End of script")    

【问题讨论】:

  • 您已从代码中删除了 API 密钥,但它显示在错误的第一行。

标签: python python-2.7 web-scraping songkick


【解决方案1】:

MemoryError 是一个内置的 Python 异常 (https://docs.python.org/3.6/library/exceptions.html#MemoryError),因此看起来进程内存不足,这与 Songkick 并没有真正的关系。

这个问题可能有你需要调试的信息:How to debug a MemoryError in Python? Tools for tracking memory use?

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-05-18
    • 1970-01-01
    • 1970-01-01
    • 2020-08-25
    • 2018-11-16
    • 2016-03-07
    • 1970-01-01
    相关资源
    最近更新 更多