如果你只想从字典中提取一个键并且字典已经作为字典存储在列中,你可以这样做:
import numpy as np
import pandas as pd
# assuming, your dicts are stored in column 'data'
# and you want to store the url in column 'url'
df['url']= df['data'].map(lambda d: d.get('url', np.NaN) if hasattr(d, 'get') else np.NaN)
# from there you can do your transformation on the url column
测试数据和结果
df= pd.DataFrame({
'col1': [1, 5, 6],
'data': [{'url': 'http://foo.org', 'comment': 'not interesting'}, {'comment': 'great site about beer receipes, but forgot the url'}, np.NaN],
'json': ['{"url": "http://foo.org", "comment": "not interesting"}', '{"comment": "great site about beer receipes, but forgot the url"}', np.NaN]
}
)
# Result of the logic above:
col1 data url
0 1 {'url': 'http://foo.org', 'comment': 'not inte... http://foo.org
1 5 {'comment': 'great site about beer receipes, b... NaN
2 6 NaN NaN
如果您需要测试,如果您的数据已经存储在python dicts(而不是字符串)中,您可以这样做:
print(df['data'].map(type))
如果你的dicts存储为字符串,你可以先根据以下代码将它们转换为dicts:
import json
def get_url_from_json(document):
if pd.isnull(document):
url= np.NaN
else:
try:
_dict= json.loads(document)
url= _dict.get('url', np.NaN)
except:
url= np.NaN
return url
df['url2']= df['json'].map(get_url_from_json)
# output:
print(df[['col1', 'url', 'url2']])
col1 url url2
0 1 http://foo.org http://foo.org
1 5 NaN NaN
2 6 NaN NaN