在数据框中使用字符串拆分列答案

【问题标题】：Spliting a column with strings in a dataframe在数据框中使用字符串拆分列
【发布时间】：2018-08-27 07:37:57
【问题描述】：

我从 csv 文件导入了以下数据框：

ts  employee_id gps_lat gps_lng event_id    event_params    speed   status  serial_number
9/22/2016 13:53 1   34.97   -81.98  Down    {"type":"Down","maximumangle":0,"duration":0}   0   1100110 211
9/22/2016 13:53 1   34.97   -81.98  Left    {"type":"Left","maximumangle":-38.57,"duration":203}    0   1102110 212
9/22/2016 13:53 1   34.97   -81.98  Right   {"type":"Right","maximumangle":52.975,"duration":17}    0   1102130 250
9/22/2016 13:53 1   34.97   -81.98  Down    {"type":"Down","maximumangle":0,"duration":0}   0   1102130 249
9/22/2016 13:54 1   34.97   -81.98  Down    {"type":"Down","maximumangle":0,"duration":0}   0   1102140 280
9/22/2016 13:54 1   34.97   -81.98  Left    {"type":"Left","maximumangle":-10.866,"duration":40}    0   1102140 279

我需要将 event_params 列拆分为带有标题的单独列 - 类型、最大角度和持续时间，并且我需要去掉花括号。总之我需要以下输出。

ts  employee_id gps_lat gps_lng event_id    Type    maximumangle    duration    speed   status  serial_number
9/22/2016 13:53 1   34.97   -81.98  Down    Down    0   0   0   1100110 211
9/22/2016 13:53 1   34.97   -81.98  Left    Left    -38.57  203 0   1102110 212
9/22/2016 13:53 1   34.97   -81.98  Right   Right   52.975  17  0   1102130 250
9/22/2016 13:53 1   34.97   -81.98  Down    Down    0   0   0   1102130 249
9/22/2016 13:54 1   34.97   -81.98  Down    Down    0   0   0   1102140 280

#Code I am trying to use:

import re
parts = re.split('\df3|(?<!\d)[:.](?!\d)', df3)
parts

我试图通过首先拆分它来解决这个问题：分隔符，然后用 } 拆分最后一列，然后删除内容最大角度和持续时间的列。

我一直在尝试通过以下方式使用 re.split 函数，但它返回错误

--expected string or bytes-like object

【问题讨论】：

标签： python regex pandas delimiter

【解决方案1】：

由于很难重现您正在处理的确切数据，因此该解决方案应该给您足够的提示：

# create minimal sample data
df1 = pd.DataFrame({'employee_id':[1,2,3,4,5,6], 'gps':[1,1,1,1,1,1], 'event_params' : 
['{"type":"Down","maximumangle":0,"duration":0}',
'{"type":"Left","maximumangle":-38.57,"duration":203}',   
'{"type":"Right","maximumangle":52.975,"duration":17}', 
'{"type":"Down","maximumangle":0,"duration":0}',
'{"type":"Down","maximumangle":0,"duration":0}',
'{"type":"Left","maximumangle":-10.866,"duration":40}']})


# save event_params column to a new value while removing from df1
df2 = df1.pop('event_params')

# convert values to dictionary format using ast library
import ast
df2 = df2.apply(ast.literal_eval)

# convert dictionary to column format and add back to df1
df2 = pd.DataFrame(list(df2))
df1 = pd.concat([df1, df2], axis=1)

print(df1)

  employee_id   gps     duration    maximumangle    type
0           1     1            0           0.000   Down
1           2     1          203         -38.570    Left
2           3     1           17          52.975    Right
3           4     1            0           0.000    Down
4           5     1            0           0.000    Down
5           6     1           40         -10.866    Left

编辑 1： 将所有 event_params 转换为字典格式：

df2 = df2.apply(lambda x: ast.literal_eval(x) if isinstance(x, dict) else x)

【讨论】：

此外，数据集中的几行具有不同的语法。例如："{"maximumangle":31.495,"type":"Right","duration":16}"。因此，在到达数据集的这一行时，python 返回一个语法错误，因为它不是通常的字典格式。
通过不同的语法，你是指字典中键的顺序？
没有。所以有些行以 " 开头和结尾，而有些是纯字典格式。{"type":"Left","maximumangle":-38.57,"duration":203} "{"type":"Left","maximumangle ":-38.57,"持续时间":203}"
@ashlock 我已经更新了我的答案来处理你的情况。
@Dear Manish，它不工作。 df2 的内容仍然保持相同的格式。