【发布时间】:2018-06-14 13:18:07
【问题描述】:
我正在尝试合并两个 csv,而不必从 _x 或 _y 中选择值。
MetaData1
Sample_name TITLE
Cody Chicken Pox
Claudia Chicken Pox
Alex Chicken Pox
Steven Chicken Pox
Mom Chicken Pox
Dad
MetaData2
Sample_name TITLE Geo_Loc DESCRIPTION
Dad Chicken Pox Earth people
Me Chicken Pox Earth people
Roger Chicken Pox Earth people
Ben Chicken Pox Earth people
合并成这样:
Merged Metadata
Sample_name TITLE Geo_Loc DESCRIPTION
Cody Chicken Pox Missing:Not Applicable Missing:Not Applicable
Claudia Chicken Pox Missing:Not Applicable Missing:Not Applicable
Alex Chicken Pox Missing:Not Applicable Missing:Not Applicable
Steven Chicken Pox Missing:Not Applicable Missing:Not Applicable
Mom Chicken Pox Missing:Not Applicable Missing:Not Applicable
Dad Chicken Pox Earth people
Me Chicken Pox Earth people
Roger Chicken Pox Earth people
Ben Chicken Pox Earth people
我目前的代码如下,
#Merging two or more csv files using pandas
#Duplicate line for more than one csv file
File_one = panda.read_csv('/Users/c1carpenter/Desktop/Test.txt', sep='\t', header=0, dtype=str)
File_two = panda.read_csv('/Users/c1carpenter/Desktop/Test2.txt', sep='\t', header=0, dtype=str)
Merge_File = panda.merge(File_one, File_two, how='outer', on='Sample_name')
但是,如果我有 100 列,其中 50 列最终是重复的。如何合并它们而不丢失数据。并且必须单独输入每个标题?如下所示。
# Cleanup to merge duplicate non-index column
mm['TITLE'] = mm[['TITLE_x', 'TITLE_y']].fillna('').sum(axis=1)
mm.drop(['TITLE_x','TITLE_y'], axis=1, inplace=True)
【问题讨论】:
-
如果您发现它可以解决您的问题,请考虑接受我的回答。谢谢!
标签: python csv merge multiple-columns