在解决方案by firelynx here on StackOverflow 中,这表明多态性不起作用。我必须同意 firelynx(经过广泛测试)。但是,将多态性的想法与the numpy broadcasting solution of piRSquared 结合起来,它可以工作!
唯一的问题是,最终,在后台,numpy 广播确实做了某种交叉连接,我们过滤了所有相等的元素,给O(n1*n2) 内存和O(n1*n2) 性能带来了打击。大概,有人可以在一般意义上使这更有效。
我在这里发帖的原因是 firelynx 的解决方案问题作为这个问题的副本被关闭,我倾向于不同意。因为当您有多个点属于多个区间时,这个问题和其中的答案并没有给出解决方案,而只是针对属于多个区间的一个点。我在下面提出的解决方案,确实处理了这些 n-m 关系。
基本上,为多态创建以下两个类PointInTime 和Timespan。
from datetime import datetime
class PointInTime(object):
doPrint = True
def __init__(self, year, month, day):
self.dt = datetime(year, month, day)
def __eq__(self, other):
if isinstance(other, self.__class__):
r = (self.dt == other.dt)
if self.doPrint:
print(f'{self.__class__}: comparing {self} to {other} (equals) gives {r}')
return (r)
elif isinstance(other, Timespan):
r = (other.start_date < self.dt < other.end_date)
if self.doPrint:
print(f'{self.__class__}: comparing {self} to {other} (Timespan in PointInTime) gives {r}')
return (r)
else:
if self.doPrint:
print(f'Not implemented... (PointInTime)')
return NotImplemented
def __repr__(self):
return "{}-{}-{}".format(self.dt.year, self.dt.month, self.dt.day)
class Timespan(object):
doPrint = True
def __init__(self, start_date, end_date):
self.start_date = start_date
self.end_date = end_date
def __eq__(self, other):
if isinstance(other, self.__class__):
r = ((self.start_date == other.start_date) and (self.end_date == other.end_date))
if self.doPrint:
print(f'{self.__class__}: comparing {self} to {other} (equals) gives {r}')
return (r)
elif isinstance (other, PointInTime):
r = self.start_date < other.dt < self.end_date
if self.doPrint:
print(f'{self.__class__}: comparing {self} to {other} (PointInTime in Timespan) gives {r}')
return (r)
else:
if self.doPrint:
print(f'Not implemented... (Timespan)')
return NotImplemented
def __repr__(self):
return "{}-{}-{} -> {}-{}-{}".format(self.start_date.year, self.start_date.month, self.start_date.day, self.end_date.year, self.end_date.month, self.end_date.day)
顺便说一句,如果您不希望使用 ==,而希望使用其他运算符(例如 !=、、=),您可以为它们创建相应的函数(__ne__、__lt__、 __gt__、__le__、__ge__)。
你可以将它与广播结合使用的方式如下。
import pandas as pd
import numpy as np
df1 = pd.DataFrame({"pit":[(x) for x in [PointInTime(2015,1,1), PointInTime(2015,2,2), PointInTime(2015,3,3), PointInTime(2015,4,4)]], 'vals1':[1,2,3,4]})
df2 = pd.DataFrame({"ts":[(x) for x in [Timespan(datetime(2015,2,1), datetime(2015,2,5)), Timespan(datetime(2015,2,1), datetime(2015,4,1)), Timespan(datetime(2015,2,1), datetime(2015,2,5))]], 'vals2' : ['a', 'b', 'c']})
a = df1['pit'].values
b = df2['ts'].values
i, j = np.where((a[:,None] == b))
res = pd.DataFrame(
np.column_stack([df1.values[i], df2.values[j]]),
columns=df1.columns.append(df2.columns)
)
print(df1)
print(df2)
print(res)
这给出了预期的输出。
<class '__main__.PointInTime'>: comparing 2015-1-1 to 2015-2-1 -> 2015-2-5 (Timespan in PointInTime) gives False
<class '__main__.PointInTime'>: comparing 2015-1-1 to 2015-2-1 -> 2015-4-1 (Timespan in PointInTime) gives False
<class '__main__.PointInTime'>: comparing 2015-1-1 to 2015-2-1 -> 2015-2-5 (Timespan in PointInTime) gives False
<class '__main__.PointInTime'>: comparing 2015-2-2 to 2015-2-1 -> 2015-2-5 (Timespan in PointInTime) gives True
<class '__main__.PointInTime'>: comparing 2015-2-2 to 2015-2-1 -> 2015-4-1 (Timespan in PointInTime) gives True
<class '__main__.PointInTime'>: comparing 2015-2-2 to 2015-2-1 -> 2015-2-5 (Timespan in PointInTime) gives True
<class '__main__.PointInTime'>: comparing 2015-3-3 to 2015-2-1 -> 2015-2-5 (Timespan in PointInTime) gives False
<class '__main__.PointInTime'>: comparing 2015-3-3 to 2015-2-1 -> 2015-4-1 (Timespan in PointInTime) gives True
<class '__main__.PointInTime'>: comparing 2015-3-3 to 2015-2-1 -> 2015-2-5 (Timespan in PointInTime) gives False
<class '__main__.PointInTime'>: comparing 2015-4-4 to 2015-2-1 -> 2015-2-5 (Timespan in PointInTime) gives False
<class '__main__.PointInTime'>: comparing 2015-4-4 to 2015-2-1 -> 2015-4-1 (Timespan in PointInTime) gives False
<class '__main__.PointInTime'>: comparing 2015-4-4 to 2015-2-1 -> 2015-2-5 (Timespan in PointInTime) gives False
pit vals1
0 2015-1-1 1
1 2015-2-2 2
2 2015-3-3 3
3 2015-4-4 4
ts vals2
0 2015-2-1 -> 2015-2-5 a
1 2015-2-1 -> 2015-4-1 b
2 2015-2-1 -> 2015-2-5 c
pit vals1 ts vals2
0 2015-2-2 2 2015-2-1 -> 2015-2-5 a
1 2015-2-2 2 2015-2-1 -> 2015-4-1 b
2 2015-2-2 2 2015-2-1 -> 2015-2-5 c
3 2015-3-3 3 2015-2-1 -> 2015-4-1 b
与基本的 Python 类型相比,拥有类的开销可能会带来额外的性能损失,但我没有对此进行研究。
以上是我们如何创建“内部”连接。创建“(outer) left”、“(outer) right”和“(full) outer”连接应该很简单。