NumPy 方式 -
In [15]: a = np.array(list_1)
In [16]: c = np.diff(np.flatnonzero(np.r_[True,a[:-1] != a[1:],True]))
In [17]: np.repeat(c,c)
Out[17]: array([1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 1, 1])
10,000x 给定示例的平铺版本的时间安排:
In [45]: list_1
Out[45]: [0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1]
In [46]: list_1 = np.tile(list_1,10000).tolist()
# Itertools groupby way :
In [47]: %%timeit
...: result = []
...: for k, v in groupby(list_1):
...: length = len(list(v))
...: result.extend([length] * length)
28.7 ms ± 435 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Pandas way :
In [48]: %%timeit
...: s = pd.Series(list_1)
...: s.groupby(s.diff().ne(0).cumsum()).transform('count')
28.3 ms ± 324 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# NumPy way :
In [49]: %%timeit
...: a = np.array(list_1)
...: c = np.diff(np.flatnonzero(np.r_[True,a[:-1] != a[1:],True]))
...: np.repeat(c,c)
8.16 ms ± 76.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)