【问题标题】:ValueError: Expected object or value when reading json in panda using parameter lines=TrueValueError:使用参数lines = True在panda中读取json时的预期对象或值
【发布时间】:2019-04-05 06:46:43
【问题描述】:

我正在处理一个大的有效 JSON 文件。我正在尝试使用 Pandas 解析这个文件。当我尝试使用 Normal data = pd.read_json(filename) 方法读取此文件时。它读取文件。但是当我尝试使用参数 lines=Truedata = pd.read_json(filename, lines=True) 时,它会抛出一个错误 ValueError: Expected object or value

我想使用 Chunks 读取这个文件。但是如果我使用参数 chunksize,我会得到同样的错误。

有人可以指出我在这里做错了什么。

filename='data/tinyTwitter.json'
data = pd.read_json(filename, lines=True, chunksize=100)

数据

{
   "total_rows":3877777,
   "offset":805584,
   "rows":[
      {
         "id":"570379215192727552",
         "key":[
            "r1r01cdn8nb4",
            2015,
            2,
            25
         ],
         "value":{
            "type":"Feature",
            "geometry":{
               "type":"Point",
               "coordinates":[
                  144.92340088,
                  -37.95935781
               ]
            },
            "properties":{
               "created_at":"Wed Feb 25 00:26:16 +0000 2015",
               "text":"For the Oscars, Lady Gaga trained with a vocal coach DAILY for 6 months httmelbourne htto/ZSu8FifNUK",
               "location":"melbourne"
            }
         },
         "doc":{
            "_id":"570379215192727552",
            "_rev":"1-fa6a485cb4fe0575781b6c29286af554",
            "contributors":null,
            "truncated":false,
            "text":"For the Oscars, Lady Gaga trained with a vocal coach DAILY for 6 months htDIIS5EtsW9 #melbourne ho/ZSu8FifNUK",
            "in_reply_to_status_id":null,
            "favorite_count":0,
            "source":"",
            "retweeted":false,
            "coordinates":{
               "type":"Point",
               "coordinates":[
                  144.92340088,
                  -37.95935781
               ]
            },
            "entities":{
               "symbols":[

               ],
               "user_mentions":[

               ],
               "hashtags":[
                  {
                     "indices":[
                        95,
                        105
                     ],
                     "text":"melbourne"
                  }
               ],
               "urls":[
                  {
                     "url":"",
                     "indices":[
                        72,
                        94
                     ],
                     "expanded_url":"",
                     "display_url":"j.mp/1ag2Quk"
                  }
               ],
               "media":[
                  {
                     "expanded_url":"",
                     "display_url":"pir.FifNUK",
                     "url":"http/ZSu8FifNUK",
                     "media_url_https":"",
                     "id_str":"570379215142457344",
                     "sizes":{
                        "large":{
                           "h":380,
                           "resize":"fit",
                           "w":380
                        },
                        "small":{
                           "h":340,
                           "resize":"fit",
                           "w":340
                        },
                        "medium":{
                           "h":380,
                           "resize":"fit",
                           "w":380
                        },
                        "thumb":{
                           "h":150,
                           "resize":"crop",
                           "w":150
                        }
                     },
                     "indices":[
                        106,
                        128
                     ],
                     "type":"photo",
                     "id":570379215142457340,
                     "media_url":""
                  }
               ]
            },
            "in_reply_to_screen_name":null,
            "in_reply_to_user_id":null,
            "retweet_count":0,
            "id_str":"570379215192727552",
            "favorited":false,
            "user":{
               "follow_request_sent":false,
               "profile_use_background_image":true,
               "profile_text_color":"333333",
               "default_profile_image":false,
               "id":2543131938,
               "profile_background_image_url_https":"",
               "verified":false,
               "profile_location":null,
               "profile_image_url_https":"",
               "profile_sidebar_fill_color":"DDEEF6",
               "entities":{
                  "url":{
                     "urls":[
                        {
                           "url":"",
                           "indices":[
                              0,
                              22
                           ],
                           "expanded_url":"",
                           "display_url":"youthsnews.com.au"
                        }
                     ]
                  },
                  "description":{
                     "urls":[

                     ]
                  }
               },
               "followers_count":68313,
               "profile_sidebar_border_color":"C0DEED",
               "id_str":"2543131938",
               "profile_background_color":"C0DEED",
               "listed_count":6,
               "is_translation_enabled":false,
               "utc_offset":36000,
               "statuses_count":1390,
               "description":"media network",
               "friends_count":788,
               "location":"pacific, oceania",
               "profile_link_color":"042A38",
               "profile_image_url":"",
               "following":false,
               "geo_enabled":true,
               "profile_banner_url":"h8",
               "profile_background_image_url":"htng",
               "name":"ynnmedia™",
               "lang":"en",
               "profile_background_tile":false,
               "favourites_count":765,
               "screen_name":"ynnmedianetwork",
               "notifications":false,
               "url":"htxq",
               "created_at":"Tue Jun 03 09:27:23 +0000 2014",
               "contributors_enabled":false,
               "time_zone":"Yakutsk",
               "protected":false,
               "default_profile":false,
               "is_translator":false
            },
            "geo":{
               "type":"Point",
               "coordinates":[
                  -37.95935781,
                  144.92340088
               ]
            },
            "in_reply_to_user_id_str":null,
            "possibly_sensitive":false,
            "lang":"en",
            "created_at":"Wed Feb 25 00:26:16 +0000 2015",
            "in_reply_to_status_id_str":null,
            "place":null,
            "metadata":{
               "iso_language_code":"en",
               "result_type":"recent"
            },
            "location":"melbourne"
         }
      },
      {
         "id":"570379220146200576",
         "key":[
            "r1r01cdn8nb4",
            2015,
            2,
            25
         ],
         "value":{
            "type":"Feature",
            "geometry":{
               "type":"Point",
               "coordinates":[
                  144.92340088,
                  -37.95935781
               ]
            },
            "properties":{
               "created_at":"Wed Feb 25 00:26:17 +0000 2015",
               "text":"Abuses in AIB Roast were dubbed: Rakhi Sawant Ka",
               "location":"melbourne"
            }
         },
         "doc":{
            "_id":"570379220146200576",
            "_rev":"1-61252163c64f6f548cab2b8eb4cbd045",
            "contributors":null,
            "truncated":false,
            "text":"Abuses in AIB Roast were dubbed: Rakhi Sawant ourne htco/MbglBYEAKa",
            "in_reply_to_status_id":null,
            "favorite_count":0,
            "source":"t</a>",
            "retweeted":false,
            "coordinates":{
               "type":"Point",
               "coordinates":[
                  144.92340088,
                  -37.95935781
               ]
            },
            "entities":{
               "symbols":[

               ],
               "user_mentions":[

               ],
               "hashtags":[
                  {
                     "indices":[
                        69,
                        79
                     ],
                     "text":"melbourne"
                  }
               ],
               "urls":[
                  {
                     "url":"htKiAELeMO6",
                     "indices":[
                        46,
                        68
                     ],
                     "expanded_url":"/1ag2Omb",
                     "display_url":"j.mp/1ag2Omb"
                  }
               ],
               "media":[
                  {
                     "expanded_url":"h79220146200576/photo/1",
                     "display_url":"pglBYEAKa",
                     "url":"rr",
                     "media_url":"pk4O5UIAAI0l",
                     "id_str":"570379220049731584",
                     "sizes":{
                        "large":{
                           "h":380,
                           "resize":"fit",
                           "w":380
                        },
                        "small":{
                           "h":340,
                           "resize":"fit",
                           "w":340
                        },
                        "medium":{
                           "h":380,
                           "resize":"fit",
                           "w":380
                        },
                        "thumb":{
                           "h":150,
                           "resize":"crop",
                           "w":150
                        }
                     },
                     "indices":[
                        80,
                        102
                     ],
                     "type":"photo",
                     "id":570379220049731600,
                     "media_urrl":"htpk4O5UIAAI0l1.jpg"
                  }
               ]
            },
            "in_reply_to_screen_name":null,
            "in_reply_to_user_id":null,
            "retweet_count":0,
            "id_str":"570379220146200576",
            "favorited":false,
            "user":{
               "follow_request_sent":false,
               "profile_use_background_image":true,
               "profile_text_color":"333333",
               "default_profile_image":false,
               "id":2543131938,
               "profile_background_image_url_https":"h/images/themes/theme1/bg.png",
               "verified":false,
               "profile_location":null,
               "profile_image_url_https":"htt/567602629937606657/ZCcCDFzr_normal.jpeg",
               "profile_sidebar_fill_color":"DDEEF6",
               "entities":{
                  "url":{
                     "urls":[
                        {
                           "url":"htAxq",
                           "indices":[
                              0,
                              22
                           ],
                           "expanded_url":"hws.com.au",
                           "display_url":"youth.au"
                        }
                     ]
                  },
                  "description":{
                     "urls":[

                     ]
                  }
               },
               "followers_count":68313,
               "profile_sidebar_border_color":"C0DEED",
               "id_str":"2543131938",
               "profile_background_color":"C0DEED",
               "listed_count":6,
               "is_translation_enabled":false,
               "utc_offset":36000,
               "statuses_count":1390,
               "description":"media network",
               "friends_count":788,
               "location":"pacific, oceania",
               "profile_link_color":"042A38",
               "profile_image_url":"htes/567602629937606657/ZCcCDFzr_normal.jpeg",
               "following":false,
               "geo_enabled":true,
               "profile_banner_url":"httpanners/2543131938/1424079798",
               "profile_background_image_url":"http/themes/theme1/bg.png",
               "name":"ynnmedia™",
               "lang":"en",
               "profile_background_tile":false,
               "favourites_count":765,
               "screen_name":"ynnmedianetwork",
               "notifications":false,
               "url":"httgeAxq",
               "created_at":"Tue Jun 03 09:27:23 +0000 2014",
               "contributors_enabled":false,
               "time_zone":"Yakutsk",
               "protected":false,
               "default_profile":false,
               "is_translator":false
            },
            "geo":{
               "type":"Point",
               "coordinates":[
                  -37.95935781,
                  144.92340088
               ]
            },
            "in_reply_to_user_id_str":null,
            "possibly_sensitive":false,
            "lang":"en",
            "created_at":"Wed Feb 25 00:26:17 +0000 2015",
            "in_reply_to_status_id_str":null,
            "place":null,
            "metadata":{
               "iso_language_code":"en",
               "result_type":"recent"
            },
            "location":"melbourne"
         }
      }
   ]
}

【问题讨论】:

  • 数据是否保密?如果没有,是否可以共享前 200 行?
  • 好的,让我分享一下
  • 在您的输入文件中,每行是否有一个有效的 json 记录?
  • pandas.read_json 只接受预先指定格式的 json 输入。请参阅文档中的有效格式(查看具有不同 orient 参数的示例)。根据文档,如果您选择 lines=True pandas.read_json 期望每行一个有效的 json。您收到错误是因为您的输入不符合此格式。
  • @Sina 有没有办法根据 JSON 格式更改我的 JSON 格式,以便我可以正确使用 lines?

标签: python json pandas


【解决方案1】:

我在 cmets 中添加了上面的链接。但我认为问题在于 twitter 响应将多种 json 格式发送到 1 个文件中,并且不会将它们分解。有效的解决方案是我拿走了整个文件,并将它们分成一个列表。然后单独处理每一个。

import json

filename='data/tinyTwitter.json'

data = []
with open(filename) as json_file:  
    data_str = json_file.read()
    data_str = data_str.split('[',1)[-1]
    data_str = data_str.rsplit(']',1)[0]
    data_str = data_str.split('][')

for jsonStr in data_str:
    jsonStr = '[' + jsonStr + ']'

    temp_data = json.loads(jsonStr)
    for each in temp_data:
        data.append(each)

【讨论】:

    猜你喜欢
    • 2017-11-01
    • 2022-11-30
    • 2021-09-26
    • 1970-01-01
    • 2019-10-19
    • 2020-12-28
    • 2023-02-24
    • 2019-07-27
    • 2014-09-10
    相关资源
    最近更新 更多