使用范围可迭代从 df 的列中获取范围内的所有项目答案

【问题标题】：Get all items from df's column that are within a range, using ranges iterable使用范围可迭代从 df 的列中获取范围内的所有项目
【发布时间】：2019-07-29 22:03:01
【问题描述】：

我想创建一个可以应用于 df 列的函数，该函数将识别该列（'C2017Value'）中位于范围列表（范围）中的任何范围内的所有条目......并输出范围内的相应条目及其 c 值到结果字典 {'c' : C2017Value}，如下所示：

results = {'c3': 268} #268 is within one of the ranges

我被困在代码上，如果有任何见解和反馈，我将不胜感激。

df #dataframe with two columns, 'c' and C2017Value
 'c1', 137674167
 'c2',  2166178
 'c3',  268

ranges = [
 (261, 4760),
 (12273391, 11104571063),
 (45695385, 4134339925),
 (15266178, 1376748162),
 (10106104, 97810284),
 (6492248, 588025190)
 ]

这是我对这个功能的尝试：

between_range = [c2017 for c2017
               in sorted(ranges)
               if ranges[0] <= value <= ranges[1]
               ][0]

def get_output_list(c2017value):
  output_list = []
  index = 0
  for c in df:
    if ranges[0][0] <= c2017value <= ranges[0][1]:
      output_list.append(c)
    else:
      index += 1
  return output_list

def get_output_list0(df, ranges):
  output_list = []
  index = 0
  for c in df:
    if c.column_value('C2017Value') == xrange() ranges[index]:
      output_list.append(c)
    else:
      index += 1
  return output_list

def get_output_list1(C2017Value):
    for x, y in sorted(ranges):
        if any(x <= C2017Value < y):
            for c in ms_df:
                output.append(c)

def get_output_list2(CValue):
    output = []
    ranges = get_ranges()
    for c in ms_df:
        ##if MINvalue<= CValue <=MAXvalue:
        if C2017Value in ranges(MINvalue, MAXvalue):
            return c
            output.append(c)
            break

def get_output_list3(C2017Value):
    ##ranges = get_ranges()
    for c in df:
        ##if MINvalue<= CValue <=MAXvalue:
        if CValue in ranges:
            return c

def get_output_list4(df, C2017Value, ranges[0:1]):
    ##ranges = get_ranges()
    for c in df_countries:
    ##if MINvalue<= CValue <=MAXvalue:
        if C2017Value in ranges:
        #if C2017Value in range(ranges):    
        #return c
            output.append(c)
            return output

def get_output_list5(C2017Value:
    for c in df_countries:
        for x in sorted(ranges):
            range_list = ranges[range_name]
            if any(start <= number < end for start,end):
                results.setdefault(range_name, 0) += 1

def get_output_list6(C2017Value):
    for c in ms_df:
        for x, y in sorted(ranges):
            if any(x <= C2017Value < y):
                output.append(c)

这两个可能是最有希望的尝试：

between_range = [c2017 for c2017
               in sorted(ranges)
               if ranges[0] <= value <= ranges[1]
               ][0]


def get_output_list(c2017value):
  output_list = []
  index = 0
  for c in df:
    if ranges[0][0] <= c2017value <= ranges[0][1]:
      output_list.append(c)
    else:
      index += 1
  return output_list

between_range 收到以下错误消息： "

【问题讨论】：

如果你得到<= not supported between instances of 'int' and 'str'"，那么你应该检查你比较的——也许你比较261 <= 'c3'而不是261 <= 268
df[ df['C2017Value'].between(261, 4760) ] 怎么样？它给出了在261, 4760范围内具有'C2017Value'的所有行
这可能是您的解决方案：Python pandas slice dataframe by multiple index ranges
是的，这是单个范围的绝佳解决方案...我想获取范围列表中任何范围内的所有 ['C2017Value'] 列值。
您可以在for a,b in ranges 中运行between(a, b)，但它会多次检查DF 中的所有行，因此对于大DF 会产生问题

标签： python pandas iterator range

【解决方案1】：

将apply() 与检查范围内值的函数一起使用，我可以创建带有结果的新 DF

def check_ranges(value):
    for a, b in ranges:
        if a <= value <= b:
            return True
    return False

results = df[ df['C2017Value'].apply(check_ranges) ]

工作代码：

import pandas as pd

df = pd.DataFrame([
        ['c1', 137674167],
        ['c2', 2166178],
        ['c3', 268],
     ], columns=['c', 'C2017Value'])

ranges = [
    (261, 4760),
    (12273391, 11104571063),
    (45695385, 4134339925),
    (15266178, 1376748162),
    (10106104, 97810284),
    (6492248, 588025190)
]

def check_ranges(value):
    for a, b in ranges:
        if a <= value <= b:
            return True
    return False

results = df[ df['C2017Value'].apply(check_ranges) ]

print(results)

结果：

   c   C2017Value
0  c1   137674167
2  c3         268

它也可以获取范围作为参数，然后它需要lambda

def check_ranges(value, ranges):
    for a, b in ranges:
        if a <= value <= b:
            return True
    return False

results = df[ df['C2017Value'].apply(lambda x, r=ranges:check_ranges(x,r)) ]

编辑：类似的代码可以给出

    c  C2017Value                    range
0  c1   137674167  (12273391, 11104571063)
1  c2     2166178                     None
2  c3         268              (261, 4760)

它返回(a, b)而不是True和None而不是False（但它可以返回False或NaN）

def get_range(value, ranges):
    for a, b in ranges:
        if a <= value <= b:
            return (a, b)
    return None

df['range'] = df['C2017Value'].apply(lambda x, r=ranges:get_range(x,r))

print(df)

工作代码：

import pandas as pd

df = pd.DataFrame([
        ['c1', 137674167],
        ['c2', 2166178],
        ['c3', 268],
     ], columns=['c', 'C2017Value'])

ranges = [
    (261, 4760),
    (12273391, 11104571063),
    (45695385, 4134339925),
    (15266178, 1376748162),
    (10106104, 97810284),
    (6492248, 588025190)
]

def get_range(value, ranges):
    for a, b in ranges:
        if a <= value <= b:
            return (a, b)
    return None

df['range'] = df['C2017Value'].apply(lambda x, r=ranges:get_range(x,r))

print(df)

results = df[ df['range'].notnull() ]

print(results)

结果：

    c  C2017Value                    range
0  c1   137674167  (12273391, 11104571063)
1  c2     2166178                     None
2  c3         268              (261, 4760)

    c  C2017Value                    range
0  c1   137674167  (12273391, 11104571063)
2  c3         268              (261, 4760)

【讨论】：

这个解决方案看起来很棒。我已经尝试了工作代码并得到了你的结果。不幸的是，我正在尝试我的 df 上的函数，并得到一个空数据框的结果。我现在正在再次审查，看看我是否可以找到问题。
也许在check_ranges 中添加print() 以查看变量中的值并使用您的一小部分数据对其进行测试。
它有效！范围被倒置了，所以我更新了范围函数，现在 check_ranges 正在工作。
你可以添加check_ranges - if a>b: a,b = b,a