嗯,有一种方法可以让任何类型都具有可比性:只需将其包装在一个类中,即可根据需要进行比较:
class DataFrameWrapper():
def __init__(self, df):
self.df = df
def __eq__(self, other):
return self.df.equals(other.df)
因此,当您包装“不可比较”的值时,您现在可以简单地使用==:
>>> import pandas as pd
>>> df1 = pd.DataFrame({'a': [1,2,3]})
>>> df2 = pd.DataFrame({'a': [3,2,1]})
>>> a = {'x': 1, 'y': {'z': "George", 'w': DataFrameWrapper(df1)}}
>>> b = {'x': 1, 'y': {'z': "George", 'w': DataFrameWrapper(df1)}}
>>> c = {'x': 1, 'y': {'z': "George", 'w': DataFrameWrapper(df2)}}
>>> a == b
True
>>> a == c
False
当然,包装你的价值观有它的缺点,但如果你只需要比较它们,那将是一种非常简单的方法。可能需要的是在进行比较之前进行递归包装,然后进行递归解包:
def recursivewrap(dict_):
for key, value in dict_.items():
wrapper = wrappers.get(type(value), lambda x: x) # for other types don't wrap
dict_[key] = wrapper(value)
return dict_ # return dict_ so this function can be used for recursion
def recursiveunwrap(dict_):
for key, value in dict_.items():
unwrapper = unwrappers.get(type(value), lambda x: x)
dict_[key] = unwrapper(value)
return dict_
wrappers = {pd.DataFrame: DataFrameWrapper,
dict: recursivewrap}
unwrappers = {DataFrameWrapper: lambda x: x.df,
dict: recursiveunwrap}
示例案例:
>>> recursivewrap(a)
{'x': 1,
'y': {'w': <__main__.DataFrameWrapper at 0x2affddcc048>, 'z': 'George'}}
>>> recursiveunwrap(recursivewrap(a))
{'x': 1, 'y': {'w': a
0 1
1 2
2 3, 'z': 'George'}}
如果您真的很喜欢冒险,您可以使用包装类,根据比较结果修改一些包含不相等信息的变量。
这部分答案基于不包含嵌套的原始问题:
您可以将不可散列值与可散列值分开,并对可散列值进行集合比较,对不可散列值进行“顺序无关”列表比较:
def split_hashable_unhashable(vals):
"""Seperate hashable values from unhashable ones and returns a set (hashables)
and list (unhashable ones)"""
set_ = set()
list_ = []
for val in vals:
try:
set_.add(val)
except TypeError: # unhashable
list_.append(val)
return set_, list_
def compare_lists_arbitary_order(l1, l2, cmp=pd.DataFrame.equals):
"""Compare two lists using a custom comparison function, the order of the
elements is ignored."""
# need to have equal lengths otherwise they can't be equal
if len(l1) != len(l2):
return False
remaining_indices = set(range(len(l2)))
for item in l1:
for cmpidx in remaining_indices:
if cmp(item, l2[cmpidx]):
remaining_indices.remove(cmpidx)
break
else:
# Run through the loop without finding a match
return False
return True
def dict_compare(d1, d2):
if set(d1) != set(d2): # compare the dictionary keys
return False
set1, list1 = split_hashable_unhashable(d1.values())
set2, list2 = split_hashable_unhashable(d2.values())
if set1 != set2: # set comparison is easy
return False
return compare_lists_arbitary_order(list1, list2)
它比预期的要长一点。对于您的测试用例,它绝对有效:
>>> import pandas as pd
>>> df1 = pd.DataFrame({'a': [1,2,3]})
>>> df2 = pd.DataFrame({'a': [3,2,1]})
>>> a = {'x': 1, 'y': df1}
>>> b = {'y': 1, 'x': df1}
>>> c = {'y': 1, 'x': df2}
>>> dict_compare(a, b)
True
>>> dict_compare(a, c)
False
>>> dict_compare(b, c)
False
set-操作也可用于查找差异(请参阅set.difference)。 lists 有点复杂,但并非不可能。可以将未找到匹配项的项目添加到单独的列表中,而不是立即返回False。