【问题标题】:Solving ValueError: cannot convert float NaN to integer解决 ValueError:无法将浮点 NaN 转换为整数
【发布时间】:2020-04-10 22:21:44
【问题描述】:

我正在编写一个函数,它返回一个字典,其中数据集中所有引用的创建日期用作键,作为值,它指定由函数 do_get_citations_per_year 返回的两个项目的元组。

def do_get_citations_per_year(data, year):
    result = tuple()
    my_ocan['creation'] = pd.DatetimeIndex(my_ocan['creation']).year

    len_citations = len(my_ocan.loc[my_ocan["creation"] == year, "creation"])
    timespan = my_ocan.loc[my_ocan["creation"] == year, "timespan"].fillna(0).mean()

    result = (len_citations, round(timespan))

    return result

def do_get_citations_all_years(data):
    mydict = {}
    s = set(my_ocan.creation)
    print(s)
    for year in s:
        mydict[year] = do_get_citations_per_year(data, year)
    #print(mydict)
    return mydict

我不断收到错误消息:

(32, 240)
{2016, 2017, 2018, 2013, 2015}
  File "/Users/lisa/Desktop/yopy/execution_example.py", line 28, in <module>
    print(my_ocan.get_citations_all_years())
  File "/Users/lisa/Desktop/yopy/ocan.py", line 35, in get_citations_all_years
    return do_get_citations_all_years(self.data)
  File "/Users/lisa/Desktop/yopy/lisa.py", line 113, in do_get_citations_all_years
    mydict[year] = do_get_citations_per_year(data, year)
  File "/Users/lisa/Desktop/yopy/lisa.py", line 103, in do_get_citations_per_year
    result = (len_citations, round(timespan))
ValueError: cannot convert float NaN to integer

Process finished with exit code 1

更新:为了提供一个工作示例,我在这里发布了其他函数,特别是处理我的数据帧 (my_ocan)do_process_citation_data(f_path) 和我的解析函数 parse_timespan 的函数:

def do_process_citation_data(f_path):
    global my_ocan

    my_ocan = pd.read_csv(f_path, names=['oci', 'citing', 'cited', 'creation', 'timespan', 'journal_sc', 'author_sc'],
                          parse_dates=['creation', 'timespan'])
    my_ocan = my_ocan.iloc[1:]  # to remove the first row
    my_ocan['creation'] = pd.to_datetime(my_ocan['creation'], format="%Y-%m-%d", yearfirst=True)
    my_ocan['timespan'] = my_ocan['timespan'].map(parse_timespan)

    print(my_ocan['timespan'])

    return my_ocan

    #print(my_ocan['timespan'])

timespan_regex = re.compile(r'P(?:(\d+)Y)?(?:(\d+)M)?(?:(\d+)D)?')
def parse_timespan(timespan):
    # check if the input is a valid timespan
    if not timespan or 'P' not in timespan:
        return None

    # check if timespan is negative and skip initial 'P' literal
    curr_idx = 0
    is_negative = timespan.startswith('-')
    if is_negative:
        curr_idx = 1

    # extract years, months and days with the regex
    match = timespan_regex.match(timespan[curr_idx:])

    years = int(match.group(1) or 0)
    months = int(match.group(2) or 0)
    days = int(match.group(3) or 0)

    timespan_days = years * 365 + months * 30 + days

    return timespan_days if not is_negative else -timespan_days

当我打印 my_ocan['timespan']

我明白了:

1        486.0
2       1080.0
3        730.0
4        824.0
5        365.0
6          0.0
...

我认为问题是 0.0

如何解决这个浮点 NaN 到整数的问题?

提前谢谢你!

【问题讨论】:

标签: python pandas


【解决方案1】:

我已经尝试过使用 python 2.7 这个:

>>> round(float('NaN'))
nan
>>> round(float(0.0))
0.0

这与 python 3.6:

>>> round(float('NaN'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot convert float NaN to integer
>>> round(float(0.0))
0

因此,您似乎将任何 NaN 值输入到 round 函数中。您可以使用 try except 语句来解决此问题:

try:
    result = (len_citations, round(timespan))
except ValueError:
    result = (len_citations, 0)

【讨论】:

  • 嘿,谢谢,但错误仍然存​​在 :( 文件 "/Users/lisa/Desktop/yopy/lisa.py",第 115 行,在 do_get_citations_per_year 结果 = (len_citations, round(timespan)) ValueError: 无法将浮点 NaN 转换为整数
  • 我还更新了我的问题以提供更多数据。谢谢
猜你喜欢
  • 2020-08-14
  • 1970-01-01
  • 2021-11-12
  • 1970-01-01
  • 1970-01-01
  • 2015-09-10
  • 1970-01-01
  • 2018-04-30
  • 1970-01-01
相关资源
最近更新 更多