【问题标题】:Spliting a column with strings in a dataframe在数据框中使用字符串拆分列
【发布时间】:2018-08-27 07:37:57
【问题描述】:

我从 csv 文件导入了以下数据框:

ts  employee_id gps_lat gps_lng event_id    event_params    speed   status  serial_number
9/22/2016 13:53 1   34.97   -81.98  Down    {"type":"Down","maximumangle":0,"duration":0}   0   1100110 211
9/22/2016 13:53 1   34.97   -81.98  Left    {"type":"Left","maximumangle":-38.57,"duration":203}    0   1102110 212
9/22/2016 13:53 1   34.97   -81.98  Right   {"type":"Right","maximumangle":52.975,"duration":17}    0   1102130 250
9/22/2016 13:53 1   34.97   -81.98  Down    {"type":"Down","maximumangle":0,"duration":0}   0   1102130 249
9/22/2016 13:54 1   34.97   -81.98  Down    {"type":"Down","maximumangle":0,"duration":0}   0   1102140 280
9/22/2016 13:54 1   34.97   -81.98  Left    {"type":"Left","maximumangle":-10.866,"duration":40}    0   1102140 279

我需要将 event_params 列拆分为带有标题的单独列 - 类型、最大角度和持续时间,并且我需要去掉花括号。总之我需要以下输出。

ts  employee_id gps_lat gps_lng event_id    Type    maximumangle    duration    speed   status  serial_number
9/22/2016 13:53 1   34.97   -81.98  Down    Down    0   0   0   1100110 211
9/22/2016 13:53 1   34.97   -81.98  Left    Left    -38.57  203 0   1102110 212
9/22/2016 13:53 1   34.97   -81.98  Right   Right   52.975  17  0   1102130 250
9/22/2016 13:53 1   34.97   -81.98  Down    Down    0   0   0   1102130 249
9/22/2016 13:54 1   34.97   -81.98  Down    Down    0   0   0   1102140 280

#Code I am trying to use:

import re
parts = re.split('\df3|(?<!\d)[:.](?!\d)', df3)
parts

我试图通过首先拆分它来解决这个问题:分隔符,然后用 } 拆分最后一列,然后删除内容最大角度和持续时间的列。

我一直在尝试通过以下方式使用 re.split 函数,但它返回错误

--expected string or bytes-like object

【问题讨论】:

    标签: python regex pandas delimiter


    【解决方案1】:

    由于很难重现您正在处理的确切数据,因此该解决方案应该给您足够的提示:

    # create minimal sample data
    df1 = pd.DataFrame({'employee_id':[1,2,3,4,5,6], 'gps':[1,1,1,1,1,1], 'event_params' : 
    ['{"type":"Down","maximumangle":0,"duration":0}',
    '{"type":"Left","maximumangle":-38.57,"duration":203}',   
    '{"type":"Right","maximumangle":52.975,"duration":17}', 
    '{"type":"Down","maximumangle":0,"duration":0}',
    '{"type":"Down","maximumangle":0,"duration":0}',
    '{"type":"Left","maximumangle":-10.866,"duration":40}']})
    
    
    # save event_params column to a new value while removing from df1
    df2 = df1.pop('event_params')
    
    # convert values to dictionary format using ast library
    import ast
    df2 = df2.apply(ast.literal_eval)
    
    # convert dictionary to column format and add back to df1
    df2 = pd.DataFrame(list(df2))
    df1 = pd.concat([df1, df2], axis=1)
    
    print(df1)
    
      employee_id   gps     duration    maximumangle    type
    0           1     1            0           0.000   Down
    1           2     1          203         -38.570    Left
    2           3     1           17          52.975    Right
    3           4     1            0           0.000    Down
    4           5     1            0           0.000    Down
    5           6     1           40         -10.866    Left
    

    编辑 1: 将所有 event_params 转换为字典格式:

    df2 = df2.apply(lambda x: ast.literal_eval(x) if isinstance(x, dict) else x) 
    

    【讨论】:

    • 此外,数据集中的几行具有不同的语法。例如:"{"maximumangle":31.495,"type":"Right","duration":16}"。因此,在到达数据集的这一行时,python 返回一个语法错误,因为它不是通常的字典格式。
    • 通过不同的语法,你是指字典中键的顺序?
    • 没有。所以有些行以 " 开头和结尾,而有些是纯字典格式。{"type":"Left","maximumangle":-38.57,"duration":203} "{"type":"Left","maximumangle ":-38.57,"持续时间":203}"
    • @ashlock 我已经更新了我的答案来处理你的情况。
    • @Dear Manish,它不工作。 df2 的内容仍然保持相同的格式。
    猜你喜欢
    • 1970-01-01
    • 2021-02-27
    • 2022-11-20
    • 1970-01-01
    • 2011-05-20
    相关资源
    最近更新 更多