将一列中的字符串拆分为两列答案

【问题标题】：split string in a column into two columns将一列中的字符串拆分为两列
【发布时间】：2021-11-21 01:58:35
【问题描述】：

在我的数据框中，我有一列字符串，例如：

**type**
game/design_game
game/art_game
design/product design 
fashion/accessories 
games/tabletop games 
art/digital art      
art/public art

我想把它从 / 分成两部分：

main_cat       subcat        
game           design_game        
game           art_game

我正在应用拆分功能：

   df.column.str.split('/',n=1, expand = True)

但我只得到 main_cat 列而不是 subcats

我也尝试过：

 # new data frame with split value columns
 new = df["column"].str.split("/", n = 1, expand = True)

 # making separate first name column from new data frame
 df["subcat"]= new[1]

 # making separate last name column from new data frame
 df["main_cat"]= new[0]

 # df display
 df.head(2)

但是为 new[1] 获取 keyerror

谁能帮帮我。谢谢！

【问题讨论】：

对我来说工作正常。
是的，应该是，但不适合我。有没有办法检查原因？
像任何用于字符串操作的pythonic技术一样？
一个想法 - 检查 / 是否通过 df["column"].head().tolist() ？如果没有其他字符，请尝试将 / 复制到 df["column"].str.split("copied char", n = 1, expand = True)
你在阅读 DF 时尝试过拆分吗？使用参数 sep="/"?

标签： pandas string dataframe split

【解决方案1】：

您可以使用带有named groups 的正则表达式。因此您无需担心丢失数据。

第一部分[^/]+ 匹配任何没有/ 的字符串，然后.* 匹配字符串的其余部分：

df['type'].str.extract('(?P<main_cat>[^/]+)/(?P<subcat>.*)')

输出：

  main_cat          subcat
0     game     design_game
1     game        art_game
2   design  product design
3  fashion     accessories
4    games  tabletop games
5      art     digital art
6      art      public art

注意。如果您希望只有一个 main_cat 的行并想抓住它：

df['type'].str.extract('(?P<main_cat>[^/]+)/?(?P<subcat>.*)')

最后一行仅是“艺术”的示例：

  main_cat          subcat
6      art

【讨论】：

非常感谢。这对我来说非常有效。