【问题标题】:How to make a table from JSON? ValueError: Mixing dicts with non-Series may lead to ambiguous ordering如何从 JSON 制作表格? ValueError:将字典与非系列混合可能会导致排序不明确
【发布时间】:2021-12-25 14:51:39
【问题描述】:

我真的是 python 的初学者,但我正在尝试使用 IBM 的情感分析器来制作数据集。我得到一个 JSON 响应,我想将它放入一个表中。到目前为止,我所拥有的是:

response = natural_language_understanding.analyze(
    text = df_text,
    features=Features(sentiment=SentimentOptions(targets=['Pericles']))).get_result()
print(json.dumps(response, indent=2))

respj = json.dumps(response['sentiment'])
respj

打印出来的

'{"targets": [{"text": "Pericles", "score": -0.939436, "label": "negative"}], "document": {"score": -0.903556, "label": "negative"}}'

现在正是在这一点上,我真的很想用这些数据制作一个熊猫表。理想情况下,我希望上述所有信息的格式都像 -> Text |文字评分 |文件分数

我真的不需要正面或负面的标签,但拥有它并没有什么坏处。我将如何做到这一点?现在当我尝试时

json_df = pd.read_json(respj)
json_df.head()

我明白了

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-b06d8a1caf3f> in <module>
----> 1 json_df = pd.read_json(respj)
      2 json_df.head()

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    212                 else:
    213                     kwargs[new_arg_name] = new_arg_value
--> 214             return func(*args, **kwargs)
    215 
    216         return cast(F, wrapper)

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines, chunksize, compression)
    606         return json_reader
    607 
--> 608     result = json_reader.read()
    609     if should_close:
    610         filepath_or_buffer.close()

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in read(self)
    729             obj = self._get_object_parser(self._combine_lines(data.split("\n")))
    730         else:
--> 731             obj = self._get_object_parser(self.data)
    732         self.close()
    733         return obj

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json)
    751         obj = None
    752         if typ == "frame":
--> 753             obj = FrameParser(json, **kwargs).parse()
    754 
    755         if typ == "series" or obj is None:

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in parse(self)
    855 
    856         else:
--> 857             self._parse_no_numpy()
    858 
    859         if self.obj is None:

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
   1086 
   1087         if orient == "columns":
-> 1088             self.obj = DataFrame(
   1089                 loads(json, precise_float=self.precise_float), dtype=None
   1090             )

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    433             )
    434         elif isinstance(data, dict):
--> 435             mgr = init_dict(data, index, columns, dtype=dtype)
    436         elif isinstance(data, ma.MaskedArray):
    437             import numpy.ma.mrecords as mrecords

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/core/internals/construction.py in init_dict(data, index, columns, dtype)
    252             arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
    253         ]
--> 254     return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    255 
    256 

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/core/internals/construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype)
     62     # figure out the index, if necessary
     63     if index is None:
---> 64         index = extract_index(arrays)
     65     else:
     66         index = ensure_index(index)

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/core/internals/construction.py in extract_index(data)
    366 
    367             if have_dicts:
--> 368                 raise ValueError(
    369                     "Mixing dicts with non-Series may lead to ambiguous ordering."
    370                 )

ValueError: Mixing dicts with non-Series may lead to ambiguous ordering

如果有人能给我一些关于如何制作我正在尝试制作的桌子的提示,我将不胜感激。另外,如果有人能解释我现在遇到的错误,那也太好了。我想我得到了基本前提,这是因为 JSON 中已经有两个不兼容的“表”。感谢您的帮助。

【问题讨论】:

    标签: python json pandas


    【解决方案1】:

    如果您只想将 response['sentiment'] 转储为 JSON 字符串,则无需将其转储为 DataFrame。请改用pandas.json_normalize

    好像response['sentiment'] 有点像

    >>> response['sentiment']
    
    {
        "targets": [{"text": "Pericles", 
                     "score": -0.939436, 
                     "label": "negative"}], 
        "document": {"score": -0.903556, 
                     "label": "negative"}
    }
    

    那么,你只需要

    df = pd.json_normalize(response['sentiment'], 
                           record_path='targets',
                           meta=[['document','score'], ['document','label']])
    

    输出

    >>> df
    
           text     score     label document.score document.label
    0  Pericles -0.939436  negative      -0.903556       negative
    

    之后,您可以根据需要使用DataFrame.rename 重命名列:

    cols_mapping = {
        'text': 'Text', 
        'score': 'Text Score', 
        'label': 'Text Label', 
        'document.score': 'Document Score', 
        'document.label': 'Document Label'
    }
    
    df = df.rename(columns=cols_mapping)
    
    >>> df 
    
           Text  Text Score Text Label Document Score Document Label
    0  Pericles   -0.939436   negative      -0.903556       negative
    

    【讨论】:

      【解决方案2】:

      我相信这对你有用:

      targets = {k: [t[k] for t in j['targets']] for k in j['targets'][0].keys()}
      doc_scores = [j['document']['score']] * len(j['targets'])
      pd.DataFrame({'document_score': doc_scores, **targets})
      

      【讨论】:

        猜你喜欢
        • 2019-11-22
        • 2021-01-05
        • 2021-08-20
        • 2018-09-05
        • 2021-02-19
        • 2019-07-11
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多