【问题标题】:python3.4 twitter data scrape error: KeyError: 'user'python3.4 twitter数据抓取错误:KeyError:'user'
【发布时间】:2018-05-19 15:28:53
【问题描述】:

我从 twitter 中提取了 4gb 的 json txt 文件。现在我正在尝试浏览文件并提取用户位置。当我运行下面的脚本时,我得到了错误:

文件“filepath/test.py”,第 18 行,在 如果推文['user']['id']: KeyError: '用户'

收集的推文中是否可能缺少用户 ID?我认为它不能为空。我收集了较小的样本,在四分之三的样本中我得到了相同的错误,它仅适用于我没有发现任何 json 结构差异的数据集

    import json

# Tweets are stored in in file "fname". In the file used for this script,
# each tweet was stored on one line
fname = 'test_with_sample.json'
with open(fname, 'r') as f:
    # Create dictionary to later be stored as JSON. All data will be included
    # in the list 'data'
    users_with_geodata = {
        "data": []
    }
    all_users = []
    total_tweets = 0
    geo_tweets = 0
    for line in f:
        tweet = json.loads(line)

        if tweet['user']['id']:
            total_tweets += 1
            user_id = tweet['user']['id']
            if user_id not in all_users:
                all_users.append(user_id)

                # Give users some data to find them by. User_id listed separately
                # to make iterating this data later easier
                user_data = {
                    "user_id": tweet['user']['id'],
                    "features": {
                        "name": tweet['user']['name'],
                        "id": tweet['user']['id'],
                        "screen_name": tweet['user']['screen_name'],
                        "tweets": 1,
                        "location": tweet['user']['location'],
                    }
                }

                if tweet['place']:
                    user_data["features"]["primary_geo"] = tweet['place']['full_name'] + ", " + tweet['place'][
                        'country']
                    user_data["features"]["geo_type"] = "Tweet place"
                else:
                    user_data["features"]["primary_geo"] = tweet['user']['location']
                    user_data["features"]["geo_type"] = "User location"
                # Add only tweets with some geo data to .json. Comment this if you want to include all tweets.
                if user_data["features"]["primary_geo"]:
                    users_with_geodata['data'].append(user_data)
                    geo_tweets += 1

            # If user already listed, increase their tweet count
            elif user_id in all_users:
                for user in users_with_geodata["data"]:
                    if user_id == user["user_id"]:
                        user["features"]["tweets"] += 1
    #except KeyError:
    #    pass

    # Count the total amount of tweets for those users that had geodata
    for user in users_with_geodata["data"]:
        geo_tweets = geo_tweets + user["features"]["tweets"]
    # Get some aggregated numbers on the data
    print
    "The file included " + str(len(all_users)) + " unique users who tweeted with or without geo data"
    print
    "The file included " + str(
        len(users_with_geodata['data'])) + " unique users who tweeted with geo data, including 'location'"
    print
    "The users with geo data tweeted " + str(geo_tweets) + " out of the total " + str(total_tweets) + " of tweets."
# Save data to JSON file
with open('users_geo_sample.json', 'w') as fout:
    fout.write(json.dumps(users_with_geodata, indent=4))

【问题讨论】:

  • tweet = json.loads(line)之后放一个简单的print(tweet),你会看到什么时候会再次出现错误
  • 谢谢,当遇到类似{'delete': {'status' etc. 的行时它会崩溃,因此可能找不到用户 ID。我将尝试将所有这些都放入 try except 块中来处理它。欢迎任何建议

标签: python python-3.x twitter


【解决方案1】:

添加了 `if tweet['user']['id']' 为 false 时的异常处理,以继续循环:

try:
      ...code..
except KeyError:
  continue       

【讨论】:

    猜你喜欢
    • 2014-08-18
    • 1970-01-01
    • 1970-01-01
    • 2022-11-17
    • 1970-01-01
    • 2020-10-03
    • 2023-04-03
    • 2019-05-06
    • 1970-01-01
    相关资源
    最近更新 更多