【问题标题】:How do I merge more than one column for csv's in pandas without picking_x or _y but instead picking the one that has the information如何在 Pandas 中合并多个 csv 列而不选择 _x 或 _x 而是选择具有信息的列
【发布时间】:2018-06-14 13:18:07
【问题描述】:

我正在尝试合并两个 csv,而不必从 _x 或 _y 中选择值。

MetaData1
Sample_name   TITLE
Cody        Chicken Pox
Claudia     Chicken Pox
Alex        Chicken Pox
Steven      Chicken Pox
Mom         Chicken Pox
Dad     

MetaData2
Sample_name    TITLE       Geo_Loc    DESCRIPTION
Dad         Chicken Pox     Earth       people
Me          Chicken Pox     Earth       people
Roger       Chicken Pox     Earth       people
Ben         Chicken Pox     Earth       people

合并成这样:

Merged Metadata 
Sample_name    TITLE             Geo_Loc                 DESCRIPTION
Cody        Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Claudia     Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Alex        Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Steven      Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Mom         Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Dad         Chicken Pox     Earth                   people
Me          Chicken Pox     Earth                   people
Roger       Chicken Pox     Earth                   people
Ben         Chicken Pox     Earth                   people

我目前的代码如下,

#Merging two or more csv files using pandas 
#Duplicate line for more than one csv file 
File_one = panda.read_csv('/Users/c1carpenter/Desktop/Test.txt', sep='\t', header=0, dtype=str)
File_two = panda.read_csv('/Users/c1carpenter/Desktop/Test2.txt', sep='\t', header=0, dtype=str)
Merge_File = panda.merge(File_one, File_two, how='outer', on='Sample_name')

但是,如果我有 100 列,其中 50 列最终是重复的。如何合并它们而不丢失数据。并且必须单独输入每个标题?如下所示。

# Cleanup to merge duplicate non-index column
mm['TITLE'] = mm[['TITLE_x', 'TITLE_y']].fillna('').sum(axis=1)
mm.drop(['TITLE_x','TITLE_y'], axis=1, inplace=True)

【问题讨论】:

  • 如果您发现它可以解决您的问题,请考虑接受我的回答。谢谢!

标签: python csv merge multiple-columns


【解决方案1】:

在合并之前,您可以调整第二个数据框,使其与第一个数据框没有任何重复的列。

df2_to_merge = df2[[col for col in df2.columns if col not in df1.columns]]

然后你会像你指定的那样将 df1 与 df2 合并。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-05-26
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-06-06
    • 1970-01-01
    相关资源
    最近更新 更多