【问题标题】:Parsing json files解析json文件
【发布时间】:2016-06-19 02:21:55
【问题描述】:

以下是对 .json 文件中获得的一组推文进行可视化分析的代码。在解释时,map() 函数会显示错误。有什么办法解决吗?

import json
import pandas as pd
import matplotlib.pyplot as plt


tweets_data_path = 'import_requests.txt'

tweets_data = []
tweets_file = open(tweets_data_path, "r")

for line in tweets_file:
   try:
    tweet = json.loads(line)
    tweets_data.append(tweet)
   except:
      continue

print(len(tweets_data))

tweets = pd.DataFrame()

tweets['text'] = map(lambda tweet: tweet['text'], tweets_data)

这些是导致我收到上述代码的“ValueError”消息的行:

Traceback(最近一次调用最后一次): 文件“tweet_len.py”,第 21 行,在 tweets['text'] = map(lambda tweet: tweet['text'], tweets_data)
setitem 中的文件“/usr/lib/python3/dist-packages/pandas/core/frame.py”,第 1887 行 self._set_item(key, value)
文件“/usr/lib/python3/dist-packages/pandas/core/frame.py”,第 1966 行,在 _set_item
self._ensure_valid_index(值) 文件“/usr/lib/python3/dist-packages/pandas/core/frame.py”,第 1943 行,在 _ensure_valid_index
raise ValueError('无法设置没有定义索引的框架' ValueError:无法设置没有定义索引的框架和无法转换为系列的值

我正在使用 Python3。

编辑:以下是收集的推特数据示例(.json 格式)。

{
    "created_at": "Sat Mar 05 05:47:23 +0000 2016",
    "id": 705993088574033920,
    "id_str": "705993088574033920",
    "text": "Tumi Inc. civil war: Staff manning US ceasefire hotline 'can't speak Arabic' #fakeheadlinebot #learntocode #makeatwitterbot #javascript",
    "source": "\u003ca href=\"http://javascriptiseasy.com\" rel=\"nofollow\"\u003eJavaScript is Easy\u003c/a\u003e",
    "truncated": false,
    "in_reply_to_status_id": null,
    "in_reply_to_status_id_str": null,
    "in_reply_to_user_id": null,
    "in_reply_to_user_id_str": null,
    "in_reply_to_screen_name": null,
    "user": {
        "id": 4382400263,
        "id_str": "4382400263",
        "name": "JavaScript is Easy",
        "screen_name": "javascriptisez",
        "location": "Your Console",
        "url": "http://javascriptiseasy.com",
        "description": "Get learning!",
        "protected": false,
        "verified": false,
        "followers_count": 167,
        "friends_count": 68,
        "listed_count": 212,
        "favourites_count": 11,
        "statuses_count": 55501,
        "created_at": "Sat Dec 05 11:18:00 +0000 2015",
        "utc_offset": null,
        "time_zone": null,
        "geo_enabled": false,
        "lang": "en",
        "contributors_enabled": false,
        "is_translator": false,
        "profile_background_color": "000000",
        "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
        "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png",
        "profile_background_tile": false,
        "profile_link_color": "FFCC4D",
        "profile_sidebar_border_color": "000000",
        "profile_sidebar_fill_color": "000000",
        "profile_text_color": "000000",
        "profile_use_background_image": false,
        "profile_image_url": "http://pbs.twimg.com/profile_images/673099606348070912/xNxp4zOt_normal.jpg",
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/673099606348070912/xNxp4zOt_normal.jpg",
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/4382400263/1449314370",
        "default_profile": false,
        "default_profile_image": false,
        "following": null,
        "follow_request_sent": null,
        "notifications": null
    },
    "geo": null,
    "coordinates": null,
    "place": null,
    "contributors": null,
    "is_quote_status": false,
    "retweet_count": 0,
    "favorite_count": 0,
    "entities": {
        "hashtags": [{
            "text": "fakeheadlinebot",
            "indices": [77, 93]
        }, {
            "text": "learntocode",
            "indices": [94, 106]
        }, {
            "text": "makeatwitterbot",
            "indices": [107, 123]
        }, {
            "text": "javascript",
            "indices": [124, 135]
        }],
        "urls": [],
        "user_mentions": [],
        "symbols": []
    },
    "favorited": false,
    "retweeted": false,
    "filter_level": "low",
    "lang": "en",
    "timestamp_ms": "1457156843690"
}

【问题讨论】:

  • 你能添加json文件的样本吗?
  • 谢谢。但是这个json 是无效的。请查看jsonlint

标签: json python-3.x pandas twitter


【解决方案1】:

我觉得你可以用read_json:

import pandas as pd

df = pd.read_json('file.json')
print df.head()
                       contributors  coordinates          created_at entities  \
contributors_enabled            NaN          NaN 2016-03-05 05:47:23      NaN   
created_at                      NaN          NaN 2016-03-05 05:47:23      NaN   
default_profile                 NaN          NaN 2016-03-05 05:47:23      NaN   
default_profile_image           NaN          NaN 2016-03-05 05:47:23      NaN   
description                     NaN          NaN 2016-03-05 05:47:23      NaN   

                       favorite_count favorited filter_level  geo  \
contributors_enabled                0     False          low  NaN   
created_at                          0     False          low  NaN   
default_profile                     0     False          low  NaN   
default_profile_image               0     False          low  NaN   
description                         0     False          low  NaN   

                                       id              id_str  \
contributors_enabled   705993088574033920  705993088574033920   
created_at             705993088574033920  705993088574033920   
default_profile        705993088574033920  705993088574033920   
default_profile_image  705993088574033920  705993088574033920   
description            705993088574033920  705993088574033920   

                                    ...                is_quote_status  lang  \
contributors_enabled                ...                          False    en   
created_at                          ...                          False    en   
default_profile                     ...                          False    en   
default_profile_image               ...                          False    en   
description                         ...                          False    en   

                       place  retweet_count  retweeted  \
contributors_enabled     NaN              0      False   
created_at               NaN              0      False   
default_profile          NaN              0      False   
default_profile_image    NaN              0      False   
description              NaN              0      False   

                                                                  source  \
contributors_enabled   <a href="http://javascriptiseasy.com" rel="nof...   
created_at             <a href="http://javascriptiseasy.com" rel="nof...   
default_profile        <a href="http://javascriptiseasy.com" rel="nof...   
default_profile_image  <a href="http://javascriptiseasy.com" rel="nof...   
description            <a href="http://javascriptiseasy.com" rel="nof...   

                                                                    text  \
contributors_enabled   Tumi Inc. civil war: Staff manning US ceasefir...   
created_at             Tumi Inc. civil war: Staff manning US ceasefir...   
default_profile        Tumi Inc. civil war: Staff manning US ceasefir...   
default_profile_image  Tumi Inc. civil war: Staff manning US ceasefir...   
description            Tumi Inc. civil war: Staff manning US ceasefir...   

                                 timestamp_ms  truncated  \
contributors_enabled  2016-03-05 05:47:23.690      False   
created_at            2016-03-05 05:47:23.690      False   
default_profile       2016-03-05 05:47:23.690      False   
default_profile_image 2016-03-05 05:47:23.690      False   
description           2016-03-05 05:47:23.690      False   

                                                 user  
contributors_enabled                            False  
created_at             Sat Dec 05 11:18:00 +0000 2015  
default_profile                                 False  
default_profile_image                           False  
description                             Get learning!  

[5 rows x 25 columns]

【讨论】:

  • 我在运行上述 sn-p 时收到“ValueError”错误消息。
  • 有趣的是我仍然收到相同的错误 - ValueError:Trailing Data 错误消息。
  • 嗯,是有效的原始json?如果是,可以分享这个file吗?
  • 跟Python的版本有关系吗?
猜你喜欢
  • 1970-01-01
  • 2018-10-10
  • 2019-06-09
  • 2020-04-12
  • 1970-01-01
  • 2017-07-16
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多