【问题标题】:How to extract certain parts of a string from column to create other columns in Pandas如何从列中提取字符串的某些部分以在 Pandas 中创建其他列
【发布时间】:2021-04-16 04:09:02
【问题描述】:

我有一个看起来像这样的数据框

Title Ratings
Do schools kill creativity? [{'id': 7, 'name': 'Funny', 'count': 19645}, {'id': 1, 'name': 'Beautiful', 'count': 4573}, {'id': 9, 'name': 'Ingenious', 'count': 6073}, {'id': 3, 'name': 'Courageous', 'count': 3253}, {'id': 11, 'name': 'Longwinded', 'count': 387}, {'id': 2, 'name': 'Confusing', 'count': 242}, {'id': 8, 'name': 'Informative', 'count': 7346}, {'id': 22, 'name': 'Fascinating', 'count': 10581}, {'id': 21, 'name': 'Unconvincing', 'count': 300}, {'id': 24, 'name': 'Persuasive', 'count': 10704}, {'id': 23, 'name': 'Jaw-dropping', 'count': 4439}, {'id': 25, 'name': 'OK', 'count': 1174}, {'id': 26, 'name': 'Obnoxious', 'count': 209}, {'id': 10, 'name': 'Inspiring', 'count': 24924}]
Simple designs to save a life [{'id': 9, 'name': 'Ingenious', 'count': 269}, {'id': 3, 'name': 'Courageous', 'count': 92}, {'id': 7, 'name': 'Funny', 'count': 131}, {'id': 2, 'name': 'Confusing', 'count': 42}, {'id': 1, 'name': 'Beautiful', 'count': 91}, {'id': 8, 'name': 'Informative', 'count': 446}, {'id': 10, 'name': 'Inspiring', 'count': 397}, {'id': 22, 'name': 'Fascinating', 'count': 515}, {'id': 11, 'name': 'Longwinded', 'count': 45}, {'id': 21, 'name': 'Unconvincing', 'count': 49}, {'id': 24, 'name': 'Persuasive', 'count': 1234}, {'id': 25, 'name': 'OK', 'count': 73}, {'id': 23, 'name': 'Jaw-dropping', 'count': 139}, {'id': 26, 'name': 'Obnoxious', 'count': 21}]

我想将 Ratings 中的数据解析为看起来像

Title Rating Count
Do schools kill creativity? Funny 19645
Do schools kill creativity? Beautiful 4573

我尝试使用 } 作为分隔符来分解数据

#explode ratings by title
df['ratings'] = df['ratings'].str.split('}')
df_explode_ratings = df.explode('ratings').reset_index(drop=True)
cols = list(df_explode_ratings.columns)
cols.append(cols.pop(cols.index('title')))
df_explode_ratings = df_explode_ratings[cols]
df_explode_cols = ['title', 'ratings']
df_explode_ratings = df_explode_ratings.drop(columns=[col for col in df_explode_ratings if col not in df_explode_cols])

这可行,但我仍然需要进一步解析它,我打算再次拆分 ,但在 Ratings 列中得到了 NaN 值。

【问题讨论】:

  • 在你得到这个数据框之前会发生什么?看起来可以重新设计导致此数据结构的过程,以为您提供更有用的文件。如果没有,并且如果您没有大量的行,您甚至可以更好地循环行并使用 json 模块将字符串加载到 Ratings 中。
  • 嘿,谢谢,这是来自 Kaggle 的 .csv,看起来它是从 json 转储的,所以我无法控制文件中的数据集结构

标签: python pandas dataframe


【解决方案1】:

您的专栏是Ratings 字符串还是字典列表?如果是字符串,你可以应用ast.literal_eval然后展开列(如果是字典列表,你可以省略literal_eval这一步):

from ast import literal_eval

df.Ratings = df.Ratings.apply(literal_eval)
df = df.explode("Ratings")
df["Rating"] = df.apply(lambda x: x["Ratings"]["name"], axis=1)
df["Count"] = df.apply(lambda x: x["Ratings"]["count"], axis=1)
df = df.drop(columns="Ratings")
print(df)

打印:

                           Title        Rating  Count
0    Do schools kill creativity?         Funny  19645
0    Do schools kill creativity?     Beautiful   4573
0    Do schools kill creativity?     Ingenious   6073
0    Do schools kill creativity?    Courageous   3253
0    Do schools kill creativity?    Longwinded    387
0    Do schools kill creativity?     Confusing    242
0    Do schools kill creativity?   Informative   7346
0    Do schools kill creativity?   Fascinating  10581
0    Do schools kill creativity?  Unconvincing    300
0    Do schools kill creativity?    Persuasive  10704
0    Do schools kill creativity?  Jaw-dropping   4439
0    Do schools kill creativity?            OK   1174
0    Do schools kill creativity?     Obnoxious    209
0    Do schools kill creativity?     Inspiring  24924
1  Simple designs to save a life     Ingenious    269
1  Simple designs to save a life    Courageous     92
1  Simple designs to save a life         Funny    131
1  Simple designs to save a life     Confusing     42
1  Simple designs to save a life     Beautiful     91
1  Simple designs to save a life   Informative    446
1  Simple designs to save a life     Inspiring    397
1  Simple designs to save a life   Fascinating    515
1  Simple designs to save a life    Longwinded     45
1  Simple designs to save a life  Unconvincing     49
1  Simple designs to save a life    Persuasive   1234
1  Simple designs to save a life            OK     73
1  Simple designs to save a life  Jaw-dropping    139
1  Simple designs to save a life     Obnoxious     21

但正如 cmets 中所建议的,更好的是在创建 DataFrame 之前处理/解析数据。

【讨论】:

  • 很好的回答,没想到申请literal_eval
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2019-12-04
  • 1970-01-01
  • 2020-10-21
  • 1970-01-01
  • 2022-11-13
  • 2020-06-25
  • 2021-09-18
相关资源
最近更新 更多