【问题标题】:pandas replace only part of a column with datetime index熊猫仅用日期时间索引替换列的一部分
【发布时间】:2017-07-14 01:39:46
【问题描述】:

这是对此的后续问题: pandas replace only part of a column

这是我当前的输入:

import pandas as pd
from pandas_datareader import data, wb
import numpy as np
from datetime import date

pd.set_option('expand_frame_repr', False)

df = data.DataReader('GE', 'yahoo', date (2000, 1, 1), date (2000, 2, 1))
df['x'] = np.where (df['Open'] > df['High'].shift(-2), 1, np.nan)
print (df.round(2))

# this section of code works perfectly for an integer based index.......
ii = df[pd.notnull(df['x'])].index
dd = np.diff(ii)
jj = [ii[i] for i in range(1,len(ii)) if dd[i-1] > 2]
jj = [ii[0]] + jj

for ci in jj:
    df.loc[ci:ci+2,'x'] = 1.0
# end of section that works perfectly for an integer based index......

print (df.round(2))

这是我当前的输出:

              Open    High     Low   Close    Volume  Adj Close    x
Date                                                                
2000-01-03  153.00  153.69  149.19  150.00  22069800      29.68  1.0 
2000-01-04  147.25  148.00  144.00  144.00  22121400      28.49  1.0
2000-01-05  143.75  147.00  142.56  143.75  27292800      28.44  NaN
2000-01-06  143.12  146.94  142.63  145.67  19873200      28.82  NaN
2000-01-07  148.00  151.88  147.00  151.31  20141400      29.94  NaN
2000-01-10  152.69  154.06  151.12  151.25  15226500      29.93  NaN
2000-01-11  151.00  152.69  150.62  151.50  15123000      29.98  NaN
2000-01-12  151.06  153.25  150.56  152.00  18342300      30.08  NaN 
2000-01-13  153.13  154.94  153.00  153.75  14953500      30.42  1.0
2000-01-14  153.38  154.63  149.56  151.00  18480300      29.88  1.0
2000-01-18  149.62  149.62  146.75  148.00  18296700      29.29  NaN
2000-01-19  146.50  150.94  146.25  148.72  14849700      29.43  NaN
2000-01-20  149.06  149.75  142.63  145.94  30759000      28.88  1.0
2000-01-21  147.94  148.25  143.94  144.13  24005400      28.52  1.0
2000-01-24  145.31  145.94  136.44  138.13  27116100      27.33  1.0
2000-01-25  138.06  140.38  137.00  138.50  25387500      27.41  NaN
2000-01-26  140.50  142.19  138.88  141.44  15856800      27.99  NaN
2000-01-27  141.56  141.75  137.06  141.75  19243500      28.05  1.0
2000-01-28  140.31  140.50  133.63  134.00  29846700      26.52  1.0
2000-01-31  134.00  135.94  133.06  134.00  21782700      26.52  NaN
2000-02-01  134.25  137.00  134.00  136.00  27339000      26.91  NaN
Traceback (most recent call last):
  File "C:\stocks\question4 for stack overflow.py", line 15, in <module>
    jj = [ii[i] for i in range(1,len(ii)) if dd[i-1] > 2]
  File "C:\stocks\question4 for stack overflow.py", line 15, in <listcomp>
    jj = [ii[i] for i in range(1,len(ii)) if dd[i-1] > 2]
TypeError: Cannot cast ufunc greater input from dtype('<m8[ns]') to dtype('<m8') with casting rule 'same_kind'

我想要做的是将列“x”更改为连续三个 1 的集合,不重叠。期望的输出是:

              Open    High     Low   Close    Volume  Adj Close    x
Date                                                                
2000-01-03  153.00  153.69  149.19  150.00  22069800      29.68  1.0
2000-01-04  147.25  148.00  144.00  144.00  22121400      28.49  1.0
2000-01-05  143.75  147.00  142.56  143.75  27292800      28.44  1.0
2000-01-06  143.12  146.94  142.63  145.67  19873200      28.82  NaN
2000-01-07  148.00  151.88  147.00  151.31  20141400      29.94  NaN
2000-01-10  152.69  154.06  151.12  151.25  15226500      29.93  NaN
2000-01-11  151.00  152.69  150.62  151.50  15123000      29.98  NaN
2000-01-12  151.06  153.25  150.56  152.00  18342300      30.08  NaN
2000-01-13  153.13  154.94  153.00  153.75  14953500      30.42  1.0
2000-01-14  153.38  154.63  149.56  151.00  18480300      29.88  1.0
2000-01-18  149.62  149.62  146.75  148.00  18296700      29.29  1.0
2000-01-19  146.50  150.94  146.25  148.72  14849700      29.43  NaN
2000-01-20  149.06  149.75  142.63  145.94  30759000      28.88  1.0
2000-01-21  147.94  148.25  143.94  144.13  24005400      28.52  1.0
2000-01-24  145.31  145.94  136.44  138.13  27116100      27.33  1.0
2000-01-25  138.06  140.38  137.00  138.50  25387500      27.41  NaN
2000-01-26  140.50  142.19  138.88  141.44  15856800      27.99  NaN
2000-01-27  141.56  141.75  137.06  141.75  19243500      28.05  1.0
2000-01-28  140.31  140.50  133.63  134.00  29846700      26.52  1.0
2000-01-31  134.00  135.94  133.06  134.00  21782700      26.52  1.0
2000-02-01  134.25  137.00  134.00  136.00  27339000      26.91  NaN

因此,1 月 5 日、18 日和 31 日从 NaN 更改为 1.0。

正如上面的评论所说,代码的第二部分非常适合基于整数的索引。但是,当使用 dtype datetime64[ns] 的日期时间索引时,它不起作用。我想我只需要对代码的第二部分进行微小的调整就可以让它工作(希望如此)。

提前致谢, 大卫

--------------后续部分 ------------------ ------------------

感谢您与我在一起 b2002。由于它的简洁性,我真的试图保持最佳解决方案。当我开箱即用运行您的代码时,输​​出如下:

原始输出

...jj = [ii[i] for i in range(1,len(ii)) if dd[i-1] > 2]...

... a[ci:ci+2] = 1.0...

              Open    High     Low   Close    Volume  Adj Close    x  ii  dd  jj  jj  desired
Date                                                                
2000-01-03  153.00  153.69  149.19  150.00  22069800      29.68  1.0  1
2000-01-04  147.25  148.00  144.00  144.00  22121400      28.49  1.0  1
2000-01-05  143.75  147.00  142.56  143.75  27292800      28.44  1.0  2          x    x
2000-01-06  143.12  146.94  142.63  145.67  19873200      28.82  1.0  3   1  
2000-01-07  148.00  151.88  147.00  151.31  20141400      29.94  NaN  4   1
2000-01-10  152.69  154.06  151.12  151.25  15226500      29.93  NaN  5   1
2000-01-11  151.00  152.69  150.62  151.50  15123000      29.98  NaN  6   1
2000-01-12  151.06  153.25  150.56  152.00  18342300      30.08  NaN  7   1
2000-01-13  153.13  154.94  153.00  153.75  14953500      30.42  1.0  1
2000-01-14  153.38  154.63  149.56  151.00  18480300      29.88  1.0  1
2000-01-18  149.62  149.62  146.75  148.00  18296700      29.29  1.0  10  3   x  x    x
2000-01-19  146.50  150.94  146.25  148.72  14849700      29.43  1.0  11  1
2000-01-20  149.06  149.75  142.63  145.94  30759000      28.88  1.0  1
2000-01-21  147.94  148.25  143.94  144.13  24005400      28.52  1.0  1
2000-01-24  145.31  145.94  136.44  138.13  27116100      27.33  1.0  1
2000-01-25  138.06  140.38  137.00  138.50  25387500      27.41  1.0  15  4   z  z
2000-01-26  140.50  142.19  138.88  141.44  15856800      27.99  1.0  16  1
2000-01-27  141.56  141.75  137.06  141.75  19243500      28.05  1.0  1
2000-01-28  140.31  140.50  133.63  134.00  29846700      26.52  1.0  1
2000-01-31  134.00  135.94  133.06  134.00  21782700      26.52  1.0  19  3   x  x    x
2000-02-01  134.25  137.00  134.00  136.00  27339000      26.91  1.0  20  1              

我真的很想了解发生了什么,所以我设置了列 ii、dd、jj 之前、jj 之后和期望。当我将输入调整为:

...jj = [ii[i] for i in range(1,len(ii)) if dd[i-1] > 2]...

... a[ci:ci+1] = 1.0...

这是输出:

              Open    High     Low   Close    Volume  Adj Close    x
Date                                                                
2000-01-03  153.00  153.69  149.19  150.00  22069800      29.45  1.0
2000-01-04  147.25  148.00  144.00  144.00  22121400      28.27  1.0
2000-01-05  143.75  147.00  142.56  143.75  27292800      28.22  1.0
2000-01-06  143.12  146.94  142.63  145.67  19873200      28.60  NaN
2000-01-07  148.00  151.88  147.00  151.31  20141400      29.70  NaN
2000-01-10  152.69  154.06  151.12  151.25  15226500      29.69  NaN
2000-01-11  151.00  152.69  150.62  151.50  15123000      29.74  NaN
2000-01-12  151.06  153.25  150.56  152.00  18342300      29.84  NaN
2000-01-13  153.13  154.94  153.00  153.75  14953500      30.18  1.0
2000-01-14  153.38  154.63  149.56  151.00  18480300      29.64  1.0
2000-01-18  149.62  149.62  146.75  148.00  18296700      29.05  1.0
2000-01-19  146.50  150.94  146.25  148.72  14849700      29.19  NaN
2000-01-20  149.06  149.75  142.63  145.94  30759000      28.65  1.0
2000-01-21  147.94  148.25  143.94  144.13  24005400      28.29  1.0
2000-01-24  145.31  145.94  136.44  138.13  27116100      27.12  1.0
2000-01-25  138.06  140.38  137.00  138.50  25387500      27.19  1.0
2000-01-26  140.50  142.19  138.88  141.44  15856800      27.77  NaN
2000-01-27  141.56  141.75  137.06  141.75  19243500      27.83  1.0
2000-01-28  140.31  140.50  133.63  134.00  29846700      26.31  1.0
2000-01-31  134.00  135.94  133.06  134.00  21782700      26.31  1.0
2000-02-01  134.25  137.00  134.00  136.00  27339000      26.70  NaN

唯一的问题是 1 月 25 日,其中 np.diff 给出的值为 4。我只需要代码跳过 4 的值即可单独保留现有的三个 1 集。我试图在 dd 去 jj 之前修改它,这两次尝试都没有奏效:

dd[dd == 4] = 1

dd = [3 if x==4 else x for x in dd]

还尝试用这个来修改 jj 条目:

jj = [ii[i] for i in range(1,len(ii)) if ((dd == 4) or (dd[i-1] > 2))]

它给出了这个错误信息:

Traceback (most recent call last):
  File "C:\stocks\question4 for stack overflow.py", line 109, in <module>
    jj = [ii[i] for i in range(1,len(ii)) if ((dd == 4) or (dd[i-1] > 2))]
  File "C:\stocks\question4 for stack overflow.py", line 109, in <listcomp>
    jj = [ii[i] for i in range(1,len(ii)) if ((dd == 4) or (dd[i-1] > 2))]
ValueError: The truth value of an array with more than one element is     ambiguous. Use a.any() or a.all()

有人有什么想法吗?

【问题讨论】:

  • 您可以尝试使用ix 进行基于标签/整数的混合访问,而不是loc,或者reset_index 并执行转换并将set_index 返回到Date
  • 你能解释一下你的代码的逻辑吗?你想做什么?为什么这些行需要三个连续的 1?
  • 冻糕 - 这只是一个例子。没有具体原因。
  • 如果你的数据不是太大和/或你不是太在意超快的速度,我写的函数可以从一个单独的文件中导入并单行执行。关于较短的代码,我应该说代码将运行,而代码将在我的答案的第一行运行。如果您愿意,我可以帮助您设置单独的文件并导入。很遗憾,现在没有时间处理其他代码。

标签: python pandas datetime indexing


【解决方案1】:

--------- 最终答案 / 终于解决了 ----------- 好吧,这是几个星期的兼职工作和几十个小时,但我终于明白了!我知道这段代码是一种钝器,但它可以工作。如果有人对减少代码或加快代码有任何建议,请告诉我!

这是最后的输入:

import pandas as pd
from pandas_datareader import data, wb
import numpy as np
from datetime import date 

df = data.DataReader('GE', 'yahoo', date (2000, 1, 1), date (2000, 6, 1))
df['x'] = np.where (df['Open'] < df['High'].shift(-2), 1, np.nan)
df['x2'] = df['x']

test = 0

for i in np.nditer(df['x2'], op_flags=['readwrite']):

    if test == 4:
        test = 0

    if test == 3:
        i[...] = 3
        test = 4

    if test == 2:
        i[...] = 2
        test = 3

    if (test == 1) & (i[...] == 1):
        i[...] = 1
        test = 2

    if (test == 0) & (i[...] == 1):
        i[...] = 1
        test = 2

    if (test == 0) & (i[...] == np.nan):
        i[...] = np.nan
        test = 1

print (df.round(2))

这是最终输出的部分:

              Open    High     Low   Close    Volume  Adj Close    x   x2
Date                                                                     
2000-01-03  153.00  153.69  149.19  150.00  22069800      29.45  NaN  NaN
2000-01-04  147.25  148.00  144.00  144.00  22121400      28.27  NaN  NaN
2000-01-05  143.75  147.00  142.56  143.75  27292800      28.22  1.0  1.0
2000-01-06  143.12  146.94  142.63  145.67  19873200      28.60  1.0  2.0
2000-01-07  148.00  151.88  147.00  151.31  20141400      29.70  1.0  3.0
2000-01-10  152.69  154.06  151.12  151.25  15226500      29.69  1.0  1.0
2000-01-11  151.00  152.69  150.62  151.50  15123000      29.74  1.0  2.0
2000-01-12  151.06  153.25  150.56  152.00  18342300      29.84  1.0  3.0
2000-01-13  153.13  154.94  153.00  153.75  14953500      30.18  NaN  NaN
2000-01-14  153.38  154.63  149.56  151.00  18480300      29.64  NaN  NaN
2000-01-18  149.62  149.62  146.75  148.00  18296700      29.05  1.0  1.0
2000-01-19  146.50  150.94  146.25  148.72  14849700      29.19  1.0  2.0
2000-01-20  149.06  149.75  142.63  145.94  30759000      28.65  NaN  3.0
2000-01-21  147.94  148.25  143.94  144.13  24005400      28.29  NaN  NaN
2000-01-24  145.31  145.94  136.44  138.13  27116100      27.12  NaN  NaN
2000-01-25  138.06  140.38  137.00  138.50  25387500      27.19  1.0  1.0
2000-01-26  140.50  142.19  138.88  141.44  15856800      27.77  NaN  2.0
2000-01-27  141.56  141.75  137.06  141.75  19243500      27.83  NaN  3.0
2000-01-28  140.31  140.50  133.63  134.00  29846700      26.31  NaN  NaN
2000-01-31  134.00  135.94  133.06  134.00  21782700      26.31  1.0  1.0
2000-02-01  134.25  137.00  134.00  136.00  27339000      26.70  1.0  2.0
2000-02-02  137.12  137.62  134.06  134.06  21820200      26.32  1.0  3.0
2000-02-03  135.94  139.81  135.25  139.25  20232000      27.34  1.0  1.0
2000-02-04  141.00  143.12  140.50  141.56  18167100      27.79  NaN  2.0
2000-02-07  141.69  141.75  135.88  136.50  18285000      26.80  NaN  3.0

我将 x2 列中的值更改为显示 1 - 3 而不是仅显示 1,以查看新系列何时在旧系列的末尾开始。

【讨论】:

  • 不明白-1...你能解释一下吗?
【解决方案2】:

如果代码不依赖于索引,它将起作用:

#mod version
a = np.array(df.x)
ii = np.where(np.isnan(a))[0]

dd = np.diff(ii)
jj = [ii[i] for i in range(1,len(ii)) if dd[i-1] > 2]
jj = [ii[0]] + jj

for ci in jj:
    a[ci:ci+2] = 1.0
df.x = a

我不确定结果是否正是您正在寻找的结果......

下面的代码允许你搜索特定的模式然后替换 这些模式与其他定义的模式。缺点是循环通过 整个数组根据搜索模式的数量多次, 这可能会或可能不会取决于您的数据大小。

“找到”的模式被标记出来,不包括在后续的 避免重叠结果的搜索循环。所以,搜索是在一个 优先时尚。调整图案和填充列表中的元素以更改规则。

我认为下面的模式规则会根据您的 previous question 产生所需的输出,但它只是经过轻微测试......

# search patterns in original data (zeros represent nans)
p1 = [1., 1., 1.]
p2 = [1., 0., 1.]
p3 = [1., 1., 0.]
p4 = [1., 0., 0.]

# markers to 'set aside' found patterns (can be any list of floats > 1.0 
# for each, the same float for each fill makes it easy to see which
# replacements were done where for testing...)
f1 = [5., 5., 5.]
f2 = [4., 4., 4.]
f3 = [3., 3., 3.]
f4 = [2., 2., 2.]

patterns = [p1, p2, p3, p4]
fills = [f1, f2, f3, f4]

def fill_segments(a, test_patterns, fill_patterns):
    # replace nans with zeros so fast numpy array_equal will work
    nan_idx = np.where(np.isnan(a))[0]
    np.put(a, nan_idx, 0.)
    col_index = list(np.arange(a.size))
    # loop forward through sequence comparing segment patterns
    for j in np.arange(len(test_patterns)):
        this_pattern = test_patterns[j]
        snip = len(this_pattern)
        rng = col_index[:-snip + 1]
        for i in rng:
            seg = a[col_index[i: i + snip]]
            if np.array_equal(seg, this_pattern):
                # when a match is found, replace values in array segment
                # with fill pattern
                pattern_indexes = col_index[i: i + snip]
                np.put(a, pattern_indexes, fill_patterns[j])
    # convert all fillers to ones
    np.put(a, np.where(a > 1.)[0], 1.)
    # convert zeros back to nans
    np.put(a, np.where(a == 0.)[0], np.nan)

    return a

运行函数并分配给 df.x 列

df.x = fill_segments(np.array(df.x), patterns, fills)

输入:

              Open    High     Low   Close    Volume  Adj Close    x
Date                                                                
2000-01-03  153.00  153.69  149.19  150.00  22069800  29.68      1.0
2000-01-04  147.25  148.00  144.00  144.00  22121400  28.49      1.0
2000-01-05  143.75  147.00  142.56  143.75  27292800  28.44     NaN 
2000-01-06  143.12  146.94  142.63  145.67  19873200  28.82     NaN 
2000-01-07  148.00  151.88  147.00  151.31  20141400  29.94     NaN 
2000-01-10  152.69  154.06  151.12  151.25  15226500  29.93     NaN 
2000-01-11  151.00  152.69  150.62  151.50  15123000  29.98     NaN 
2000-01-12  151.06  153.25  150.56  152.00  18342300  30.08     NaN 
2000-01-13  153.13  154.94  153.00  153.75  14953500  30.42      1.0
2000-01-14  153.38  154.63  149.56  151.00  18480300  29.88      1.0
2000-01-18  149.62  149.62  146.75  148.00  18296700  29.29     NaN 
2000-01-19  146.50  150.94  146.25  148.72  14849700  29.43     NaN 
2000-01-20  149.06  149.75  142.63  145.94  30759000  28.88      1.0
2000-01-21  147.94  148.25  143.94  144.13  24005400  28.52      1.0
2000-01-24  145.31  145.94  136.44  138.13  27116100  27.33      1.0
2000-01-25  138.06  140.38  137.00  138.50  25387500  27.41     NaN 
2000-01-26  140.50  142.19  138.88  141.44  15856800  27.99     NaN 
2000-01-27  141.56  141.75  137.06  141.75  19243500  28.05      1.0
2000-01-28  140.31  140.50  133.63  134.00  29846700  26.52      1.0
2000-01-31  134.00  135.94  133.06  134.00  21782700  26.52     NaN 
2000-02-01  134.25  137.00  134.00  136.00  27339000  26.91     NaN 

输出:

              Open    High     Low   Close    Volume  Adj Close    x
Date                                                                
2000-01-03  153.00  153.69  149.19  150.00  22069800  29.68      1.0
2000-01-04  147.25  148.00  144.00  144.00  22121400  28.49      1.0
2000-01-05  143.75  147.00  142.56  143.75  27292800  28.44      1.0
2000-01-06  143.12  146.94  142.63  145.67  19873200  28.82     NaN 
2000-01-07  148.00  151.88  147.00  151.31  20141400  29.94     NaN 
2000-01-10  152.69  154.06  151.12  151.25  15226500  29.93     NaN 
2000-01-11  151.00  152.69  150.62  151.50  15123000  29.98     NaN 
2000-01-12  151.06  153.25  150.56  152.00  18342300  30.08     NaN 
2000-01-13  153.13  154.94  153.00  153.75  14953500  30.42      1.0
2000-01-14  153.38  154.63  149.56  151.00  18480300  29.88      1.0
2000-01-18  149.62  149.62  146.75  148.00  18296700  29.29      1.0
2000-01-19  146.50  150.94  146.25  148.72  14849700  29.43     NaN 
2000-01-20  149.06  149.75  142.63  145.94  30759000  28.88      1.0
2000-01-21  147.94  148.25  143.94  144.13  24005400  28.52      1.0
2000-01-24  145.31  145.94  136.44  138.13  27116100  27.33      1.0
2000-01-25  138.06  140.38  137.00  138.50  25387500  27.41     NaN 
2000-01-26  140.50  142.19  138.88  141.44  15856800  27.99     NaN 
2000-01-27  141.56  141.75  137.06  141.75  19243500  28.05      1.0
2000-01-28  140.31  140.50  133.63  134.00  29846700  26.52      1.0
2000-01-31  134.00  135.94  133.06  134.00  21782700  26.52      1.0
2000-02-01  134.25  137.00  134.00  136.00  27339000  26.91     NaN 

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-02-19
    • 2021-12-29
    • 2013-07-20
    • 1970-01-01
    相关资源
    最近更新 更多