插入列熊猫中的重复值答案

【问题标题】：Duplicate Values in inserted columns pandas插入列熊猫中的重复值
【发布时间】：2021-09-02 08:56:37
【问题描述】：

我正在向df 添加一些列。但是在 csv 中，插入的列对于插入的列中的所有行都有重复的数据条目。当我使用drop_duplicates。 csv中只有1行

name = driver.find_element_by_xpath('//h1').text
date = driver.find_element_by_xpath('//div[@id="col-content"]//p[1]').text
result = driver.find_element_by_xpath('//div[@id="event-status"]').text

table = driver.find_element_by_xpath('//table[@class="table-main detail-odds sortable"]').get_attribute('outerHTML')
df = pd.read_html(table)[0]
row_drop = df[df['Bookmakers'].str.contains("Highest", na=False)].index.tolist()
df = df.iloc[:row_drop[0]+1]
df.insert(0,'Name', f'{name}')
df.insert(1,'Date', f'{date}')
df.insert(2,'Result',f'{result}')
df.to_csv(f'gambling/{name}.csv', index=False)

电流输出

Name  Date   Result
John  14th   computer
John  14th   arts
John  14th   commerce

预期输出

Name  Date  Result
John  14th  computer
            arts
            commerce

【问题讨论】：

这是你的 csv 输出？当前输出在我看来是正确的
看起来没有一个简单的解决方案。看到这个answer。
@AnuragDabas 你想添加这个作为答案吗？那行得通。完美。
@AbhishekRai 确定添加了 :)
@AbhishekRai 由于 not_speshal 链接到的原因，这不是很难读回吗？

标签： python pandas

【解决方案1】：

你的 df：

df=pd.DataFrame({'Name': {0: 'John', 1: 'John', 2: 'John'},
 'Date': {0: '14th', 1: '14th', 2: '14th'},
 'Result': {0: 'computer', 1: 'arts', 2: 'commerce'}})

通过duplicated() 和mask() 方法尝试：

df[['Name','Date']]=df[['Name','Date']].mask(df.duplicated(subset=['Name','Date']),'')

或

通过duplicated() 和where() 方法：

df[['Name','Date']]=df[['Name','Date']].where(~df.duplicated(subset=['Name','Date']),'')

df的输出：

    Name    Date    Result
0   John    14th    computer
1                   arts
2                   commerce

【讨论】：

投反对票的人......为什么投反对票？请告诉！
我在检查同样的事情......谁投了我的票？