【发布时间】:2021-09-17 23:30:52
【问题描述】:
我见过很多这样的代码:
mergedStuff = pd.merge(df1, df2, how='inner')
or
mask = df1.reindex(df2.index).values == df2.values
但以上仅在同一行存在于其相对行中时才给出它不与每一行进行比较
例如:
df1 contains:
0
hello
how
are
you
guys
system
df2 contains:
0 1 ........ n
how hello you
hello guys hello
you system how
are you you
guys hello hello
system how how
hello are system
更新:比较 df2.isin(df1) 后的最终输出列:
#NOTE ! below output are manually entered, not from real output
# but i know this is how it gives.
false True false
false false false
false false false
false false false
false false false
false false false
false false false #only second column and second row are
# true, because
# it matches
# the same row
但我想要的是交叉检查每个 df1 行和 df2 行。
预期输出:
True True True
True True True
True True True
True True True
True True True
True True True
True True True #i want true for all Because every rows has
# the same word.
更新2:
但是,如果我这样运行,那么它会给出预期的输出:
df2[2].isin(df1[0])
True
True
True
True
True
True
True # 2nd column of df2 compared with df1 and gives good output.
# but if i give without index it gives crap.
如果你想帮忙,这里是测试的输入:
df1 = pd.DataFrame({0:
['hello','how','are','you','guys','system']})
df2 = pd.DataFrame({ 0:
['how','hello','you','hello','guys','hello',
'you','system','how','are','you','you'
'guys','hello','hello','system','how',
'how','hello','are','system'],
1: ['how','you','you','hello','guys',
'hello','you','system','how','are','you','you'
'guys','hello','hello','system',
'how','hello','hello','are','system'] ,
2: ['how','you','you','are','guys',
'hello','you','system','you','are','guys','you'
'guys','hello','hello','system',
'how','hello','hello','are','system']
})
这个终于成功了:
new = np.isin(df2, df1)
rows, cols = np.nonzero(~new)
#or
new = np.isin(df2, df1, invert=True)
rows, cols = np.nonzero(new)
x2 = []
for item in zip(rows,cols):
x2.append(df2.iloc[item])
【问题讨论】:
-
先对它们进行排序怎么样?
-
你到底想要什么?未出现的单词列表?
-
是的,如果您运行该代码,则会发生错误,我理解错误但不知道如何解决。在获得所有错误值之后,我可以在 excel 中为该值赋予颜色,这就是原因。
-
@Corralien 嗨,如果可能的话,你能检查一下这个帖子并回答吗?我谦虚的请求。 stackoverflow.com/questions/68314626/…
标签: python regex string dataframe sorting