【问题标题】:malformed list error creating dataframe from list of dict从字典列表创建数据帧格式错误的列表错误
【发布时间】:2021-11-16 11:46:31
【问题描述】:

我正在尝试从字典列表创建一个数据框。 dicts 最初是从 json 解析的。我在下面收到格式错误的列表错误。这份清单对我来说看起来不错。该列表位于错误下方。有谁知道问题可能是什么,您能建议如何解决吗?

代码:

import ast

address_list=result_dataframe['address'].tolist()



address_df=pd.DataFrame([ast.literal_eval(x) for x in address_list])

错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-58-da573ced8322> in <module>
      9 # address_list[:3]
     10 
---> 11 address_df=pd.DataFrame([ast.literal_eval(x) for x in address_list])

<ipython-input-58-da573ced8322> in <listcomp>(.0)
      9 # address_list[:3]
     10 
---> 11 address_df=pd.DataFrame([ast.literal_eval(x) for x in address_list])

~/anaconda3/envs/web_scrape_etl/lib/python3.7/ast.py in literal_eval(node_or_string)
     89                     return left - right
     90         return _convert_signed_num(node)
---> 91     return _convert(node_or_string)
     92 
     93 

~/anaconda3/envs/web_scrape_etl/lib/python3.7/ast.py in _convert(node)
     88                 else:
     89                     return left - right
---> 90         return _convert_signed_num(node)
     91     return _convert(node_or_string)
     92 

~/anaconda3/envs/web_scrape_etl/lib/python3.7/ast.py in _convert_signed_num(node)
     61             else:
     62                 return - operand
---> 63         return _convert_num(node)
     64     def _convert(node):
     65         if isinstance(node, Constant):

~/anaconda3/envs/web_scrape_etl/lib/python3.7/ast.py in _convert_num(node)
     53         elif isinstance(node, Num):
     54             return node.n
---> 55         raise ValueError('malformed node or string: ' + repr(node))
     56     def _convert_signed_num(node):
     57         if isinstance(node, UnaryOp) and isinstance(node.op, (UAdd, USub)):

ValueError: malformed node or string: {'city': 'Cazadero', 'line': '27951 King Ridge Rd', 'postal_code': '95421', 'state_code': 'CA', 'state': 'California', 'county': 'Sonoma', 'fips_code': '06097', 'county_needed_for_uniq': False, 'lat': 38.600149, 'lon': -123.190777}

数据:

address_list=[{'city': 'Cazadero',
  'line': '27951 King Ridge Rd',
  'postal_code': '95421',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Sonoma',
  'fips_code': '06097',
  'county_needed_for_uniq': False,
  'lat': 38.600149,
  'lon': -123.190777},
 {'city': 'Cazadero',
  'line': '1460 Big Barn Rd',
  'postal_code': '95421',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Sonoma',
  'fips_code': '06097',
  'county_needed_for_uniq': False,
  'lat': 38.563553,
  'lon': -123.159865},
 {'city': 'Cazadero',
  'line': '23480 Fort Ross Rd',
  'postal_code': '95421',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Sonoma',
  'fips_code': '06097',
  'county_needed_for_uniq': False,
  'lat': 38.538229,
  'lon': -123.163885},
 {'city': 'Cazadero',
  'line': '85 Sunrise Mountain Rd',
  'postal_code': '95421',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Sonoma',
  'fips_code': '06097',
  'county_needed_for_uniq': False,
  'lat': 38.504359,
  'lon': -123.075754},
 {'city': 'Cazadero',
  'line': '23800 Fort Ross Rd',
  'postal_code': '95421',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Sonoma',
  'fips_code': '06097',
  'county_needed_for_uniq': False,
  'lat': 38.534801,
  'lon': -123.168689},
 {'city': 'Guerneville',
  'line': '19800 Old Cazadero Rd',
  'postal_code': '95421',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Sonoma',
  'fips_code': '06097',
  'county_needed_for_uniq': False,
  'lat': 38.535157,
  'lon': -123.056528},
 {'city': 'Cazadero',
  'line': '2945 Austin Creek Rd',
  'postal_code': '95421',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Sonoma',
  'fips_code': '06097',
  'county_needed_for_uniq': False,
  'lat': 38.499242,
  'lon': -123.067987},
 {'city': 'Gualala',
  'line': '38851 S Highway 1',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Mendocino',
  'fips_code': '06045',
  'county_needed_for_uniq': False,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.773426,
  'lon': -123.533202},
 {'city': 'Gualala',
  'line': '37891 Old Coast Hwy',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Mendocino',
  'fips_code': '06045',
  'county_needed_for_uniq': False,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.780578,
  'lon': -123.546428},
 {'city': 'Gualala',
  'line': 'Tbd By Co of Mendocino',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county_needed_for_uniq': False,
  'is_approximate': True,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.801474,
  'lon': -123.425175},
 {'city': 'Gualala',
  'line': '46620 Iversen Ln',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Mendocino',
  'fips_code': '06045',
  'county_needed_for_uniq': False,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.85319,
  'lon': -123.642963},
 {'city': 'Anchor Bay',
  'line': '45741 Sunset Dr',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Mendocino',
  'fips_code': '06045',
  'county_needed_for_uniq': False,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.816458,
  'lon': -123.582281},
 {'city': 'Gualala',
  'line': '38957 Cypress Way',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Mendocino',
  'fips_code': '06045',
  'county_needed_for_uniq': False,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.77155,
  'lon': -123.531202},
 {'city': 'Gualala',
  'line': '39051 Cypress Way',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county_needed_for_uniq': False,
  'is_approximate': True,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.801474,
  'lon': -123.425175},
 {'city': 'Gualala',
  'line': '38954 Cypress Way',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Mendocino',
  'fips_code': '06045',
  'county_needed_for_uniq': False,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.770058,
  'lon': -123.530914},
 {'city': 'Gualala',
  'line': '38917 Cypress Way',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Mendocino',
  'fips_code': '06045',
  'county_needed_for_uniq': False,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.770361,
  'lon': -123.530834},
 {'city': 'Gualala',
  'line': '39001 Cypress Way',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Mendocino',
  'fips_code': '06045',
  'county_needed_for_uniq': False,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.770559,
  'lon': -123.53066},
 {'city': 'Gualala',
  'line': '37900 Marine View Dr',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Mendocino',
  'fips_code': '06045',
  'county_needed_for_uniq': False,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.780056,
  'lon': -123.543766},
 {'city': 'Gualala',
  'line': 'S Highway 1',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county_needed_for_uniq': False,
  'is_approximate': True,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.801474,
  'lon': -123.425175},
 {'city': 'Gualala',
  'line': '38300 Ocean Ridge Dr',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Mendocino',
  'fips_code': '06045',
  'county_needed_for_uniq': False,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.794639,
  'lon': -123.529917},
 {'city': 'Gualala',
  'line': 'Old Stage',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county_needed_for_uniq': False,
  'is_approximate': True,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.801474,
  'lon': -123.425175},
 {'city': 'Gualala',
  'line': '35110 Meadow Ct',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county_needed_for_uniq': False,
  'is_approximate': True,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.801474,
  'lon': -123.425175},
 {'city': 'Gualala',
  'line': 'Old Stage',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county_needed_for_uniq': False,
  'is_approximate': True,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.801474,
  'lon': -123.425175},
 {'city': 'Gualala',
  'line': '30101 S Highway 1',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Mendocino',
  'fips_code': '06045',
  'county_needed_for_uniq': False,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.847588,
  'lon': -123.643029},
 {'city': 'Gualala',
  'line': '46561 Getchell Gulch Rd',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Mendocino',
  'fips_code': '06045',
  'county_needed_for_uniq': False,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.808628,
  'lon': -123.568694},
 {'city': 'Gualala',
  'line': '38060 Ocean Ridge Dr',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Mendocino',
  'fips_code': '06045',
  'county_needed_for_uniq': False,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.797034,
  'lon': -123.534572},
 {'city': 'Gualala',
  'line': '46601 Gypsy Flat Rd',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Mendocino',
  'fips_code': '06045',
  'county_needed_for_uniq': False,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.822692,
  'lon': -123.597325},
 {'city': 'Anchor Bay',
  'line': '45971 Sunset Dr',
  'postal_code': '95445',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Mendocino',
  'fips_code': '06045',
  'county_needed_for_uniq': False,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.820473,
  'lon': -123.56952}]

【问题讨论】:

  • result_dataframe['address'].tolist()。如何重新创建result_dataframe

标签: json python-3.x pandas dataframe


【解决方案1】:

简单回答

认为问题可能是尝试使用 AST 使事情变得过于复杂。

字典列表(这是您在address_list 中拥有的)是可以直接用于构建数据框的东西之一。这有效:

address_list=[{'city': 'Cazadero',
  'line': '27951 King Ridge Rd',
  'postal_code': '95421',
  'state_code': 'CA',
  'state': 'California',
  'county': 'Sonoma',
  'fips_code': '06097',
  'county_needed_for_uniq': False,
  'lat': 38.600149,
  'lon': -123.190777},
 {'city': 'Cazadero',
  'line': '1460 Big Barn Rd',
  'postal_code': '95421',
# ...etc etc, I removed some...
  'state_code': 'CA',
  'state': 'California',
  'county': 'Mendocino',
  'fips_code': '06045',
  'county_needed_for_uniq': False,
  'time_zone': 'America/Los_Angeles',
  'lat': 38.820473,
  'lon': -123.56952}]

df = pd.DataFrame(address_list)

然后你有一个包含所有数据的工作 DataFrame:

# Print first few lines of the DataFrame...
print(df.head())

       city                    line postal_code state_code       state  \
0  Cazadero     27951 King Ridge Rd       95421         CA  California   
1  Cazadero        1460 Big Barn Rd       95421         CA  California   
2  Cazadero      23480 Fort Ross Rd       95421         CA  California   
3  Cazadero  85 Sunrise Mountain Rd       95421         CA  California   
4  Cazadero      23800 Fort Ross Rd       95421         CA  California   

   county fips_code  county_needed_for_uniq        lat         lon time_zone  \
0  Sonoma     06097                   False  38.600149 -123.190777       NaN   
1  Sonoma     06097                   False  38.563553 -123.159865       NaN   
2  Sonoma     06097                   False  38.538229 -123.163885       NaN   
3  Sonoma     06097                   False  38.504359 -123.075754       NaN   
4  Sonoma     06097                   False  38.534801 -123.168689       NaN   

  is_approximate  
0            NaN  
1            NaN  
2            NaN  
3            NaN  
4            NaN  

使用 AST

不要认为有任何需要/理由使用 AST,但顺便说一句:您拥有的代码不起作用的原因是因为 ast.literal_eval(x) 希望 x 成为 str 对象,但这里是 x是一个dict 对象。

为了完整起见,如果您通过调用 str(x) 将每个 dict (x) 更改回 str,然后再将其提供给 ast.literal_eval(然后执行完全相反的转换 >,从str 回到dict) 然后这种方法也有效,尽管我想不出任何理由这样做:

df = pd.DataFrame([ast.literal_eval(str(x)) for x in address_list])

所以,我认为 ma​​lformed node or string 错误的要点在于它所期望的“字符串”实际上是一个字典。如果您将一些字典存储在文本文件或其他东西中,那么使用 AST 可能是有意义的,尽管即便如此,解析器(JSON 解析器或类似的)可能是更好的选择。

【讨论】:

    猜你喜欢
    • 2020-09-25
    • 2012-11-15
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-12-17
    • 2018-03-21
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多