【问题标题】:How to write the given loop efficiently?如何有效地编写给定的循环?
【发布时间】:2021-02-05 09:54:58
【问题描述】:

编写以下循环的任何有效方法? dataPLprocessed 是一个时间序列数据,我想根据滚动 7 天的百分位值计算分数(有关更多说明,请参见下面的循环)。

for i in len(dataPLprocessed):
    if (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i]<.05) or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i]>.95) :
        dataPLprocessed['score'] =10
    elif (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] < .1)or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] > .9):
        dataPLprocessed['score'] = 9
    elif (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] < .15) or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] > .85):
        dataPLprocessed['score'] = 8
    elif (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] < .2) or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] > .8):
        dataPLprocessed['score'] = 7
    elif (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] < .25)or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] > .75):
        dataPLprocessed['score'] = 6
    elif (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] < .3)or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] > .7):
        dataPLprocessed['score'] = 5
    elif (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] < .35) or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] > .65):
        dataPLprocessed['score'] = 4
    elif (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] < .4) or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] > .6):
        dataPLprocessed['score'] = 3
    elif (dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] < .45) or (
            dataPLprocessed.rolling(‘7D’)['lineardifference'].rank(pct=True)[i] > .55):
        dataPLprocessed['score'] = 2
    else:
        dataPLprocessed['score'] = 1

【问题讨论】:

    标签: python pandas time-series


    【解决方案1】:

    这可能有助于避免重复的数据访问代码来获得排名值:

    for i in len(dataPLprocessed):
        rank = dataPLprocessed.rolling('7D')['lineardifference'].rank(pct=True)[i]
        if   rank < 0.05 or rank > 0.95: score = 10
        elif rank < 0.1  or rank > 0.9:  score = 9
        elif rank < 0.15 or rank > 0.85: score = 8
        elif rank < 0.2  or rank > 0.8:  score = 7
        elif rank < 0.25 or rank > 0.75: score = 6
        elif rank < 0.3  or rank > 0.7:  score = 5
        elif rank < 0.35 or rank > 0.65: score = 4
        elif rank < 0.4  or rank > 0.6:  score = 3
        elif rank < 0.45 or rank > 0.55: score = 2
        else:                            score = 1
        dataPLprocessed['score'] = score
    

    如果这还不够改进,您可能可以通过使用二进制搜索来计算分数,从而减少几毫秒的额外时间:

    from bisect import bisect_left, bisect_right
    loRanks  = [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45]
    hiRanks  = [0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]
    def getScore(rank):
        if rank<0.45: return 10-bisect_right(loRanks,rank)
        else:         return 1+bisect_left(hiRanks,rank)
    
    
    for i in len(dataPLprocessed):
        rank = dataPLprocessed.rolling('7D')['lineardifference'].rank(pct=True)[i]
        dataPLprocessed['score'] =  getScore(rank)
    

    【讨论】:

    • 难道你不能也只写一个while循环并从前一个值中添加/减去0.05,并在每次迭代后从分数中减去1,直到达到给定的阈值?
    • 你可以,但这比二分查找效率低。大部分性能提升将来自一开始就获得rank,而不是每次比较。 while 循环与 if/elif/else 相比,差别不大。
    猜你喜欢
    • 2014-04-17
    • 2013-02-16
    • 2012-04-12
    • 1970-01-01
    • 2023-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-01-19
    • 2015-08-06
    相关资源
    最近更新 更多