Big O 复杂度(截至 Pandas 0.24)为 m*n,其中 m 是列数,n 是行数。请注意,这是在使用 DataFrame.__getitem__ 方法(又名 [])和 Index(see relevant code, with other types that would trigger a copy)时。
这是一个有用的堆栈跟踪:
<ipython-input-4-3162cae03863>(2)<module>()
1 columns = df.columns[::-1]
----> 2 df_reversed = df[columns]
pandas/core/frame.py(2682)__getitem__()
2681 # either boolean or fancy integer index
-> 2682 return self._getitem_array(key)
2683 elif isinstance(key, DataFrame):
pandas/core/frame.py(2727)_getitem_array()
2726 indexer = self.loc._convert_to_indexer(key, axis=1)
-> 2727 return self._take(indexer, axis=1)
2728
pandas/core/generic.py(2789)_take()
2788 axis=self._get_block_manager_axis(axis),
-> 2789 verify=True)
2790 result = self._constructor(new_data).__finalize__(self)
pandas/core/internals.py(4539)take()
4538 return self.reindex_indexer(new_axis=new_labels, indexer=indexer,
-> 4539 axis=axis, allow_dups=True)
4540
pandas/core/internals.py(4421)reindex_indexer()
4420 new_blocks = self._slice_take_blocks_ax0(indexer,
-> 4421 fill_tuple=(fill_value,))
4422 else:
pandas/core/internals.py(1254)take_nd()
1253 new_values = algos.take_nd(values, indexer, axis=axis,
-> 1254 allow_fill=False)
1255 else:
> pandas/core/algorithms.py(1658)take_nd()
1657 import ipdb; ipdb.set_trace()
-> 1658 func = _get_take_nd_function(arr.ndim, arr.dtype, out.dtype, axis=axis,
1659 mask_info=mask_info)
1660 func(arr, indexer, out, fill_value)
pandas/core/algorithms 中 L1660 上的 func 调用最终调用了具有O(m * n) 复杂性的 cython 函数。这是将原始数据中的数据复制到out 的位置。 out 包含以相反顺序排列的原始数据的副本。
inner_take_2d_axis0_template = """\
cdef:
Py_ssize_t i, j, k, n, idx
%(c_type_out)s fv
n = len(indexer)
k = values.shape[1]
fv = fill_value
IF %(can_copy)s:
cdef:
%(c_type_out)s *v
%(c_type_out)s *o
#GH3130
if (values.strides[1] == out.strides[1] and
values.strides[1] == sizeof(%(c_type_out)s) and
sizeof(%(c_type_out)s) * n >= 256):
for i from 0 <= i < n:
idx = indexer[i]
if idx == -1:
for j from 0 <= j < k:
out[i, j] = fv
else:
v = &values[idx, 0]
o = &out[i, 0]
memmove(o, v, <size_t>(sizeof(%(c_type_out)s) * k))
return
for i from 0 <= i < n:
idx = indexer[i]
if idx == -1:
for j from 0 <= j < k:
out[i, j] = fv
else:
for j from 0 <= j < k:
out[i, j] = %(preval)svalues[idx, j]%(postval)s
"""
请注意,在上面的模板函数中,有一个使用memmove的路径(这是本例中采用的路径,因为我们正在从int64映射到int64,并且输出的维度与我们只是在切换索引)。请注意,memmove is still O(n) 与它必须复制的字节数成正比,尽管可能比直接写入索引要快。