notepad++ 或 excel - 删除重复的广告原始行答案

【问题标题】：notepad++ or excel - delete duplicate ad original rowsnotepad++ 或 excel - 删除重复的广告原始行
【发布时间】：2022-11-18 14:49:51
【问题描述】：

我有一个包含 +200k 行的 .txt。在这个文件中，我想删除重复行和原始行。

我现在有了：

发短信给
发短信给
文本b
文本c
文本d
文本d
文本e

但我需要这样的结果

文本b
文本c
文本e

Suggest?


i have tried normal "delete duplicate" procedure of excel and notepad++ but i obtain this

text a
text b
text c
text d
text e

and it not work fine for me

looking for discussion i find something like that but applicated to access.

【问题讨论】：

标签： duplicates row notepad++

【解决方案1】：

这是一种使用 python 实现它的方法。希望能帮助到你：）

# Importing Pandas to create DataFrame
import pandas as pd

# Creating Empty DataFrame and Storing it in variable df
df = pd.DataFrame()

#replce this with our requiremnt
# lt = ["text a", "text a", "text b", "text c", "text d", "text d", "text e"]

text_file = open(r"C:UsersntchDesktop	xt.txt", "r")
lt = text_file.readlines()
# print (lt)
# print (type(lt))
text_file.close()

df['cols'] = lt
#creating an empty column with count value set to 1
df['dummy_count'] = 1

# grouping values 
df = df.groupby('cols').count()
# print(df)

df.reset_index(inplace = True)

df = df.loc[df["dummy_count"] == 1]
print(df["cols"])

以上代码的输出：expected output

【讨论】：

谢谢你的回复。但我有+200k。我没有 5 个字....所以，在 " lt = ["text a", "text a", "text b", "text c", "text d", "text d", "text e “]”里面怎么把你的例子替换掉？
我现在根据您的要求编辑了代码。希望能帮助到你：）