【问题标题】:how to perform histogram on numpy array whose dtype is object using histogramdd?如何使用 histogramdd 对 dtype 为对象的 numpy 数组执行直方图?
【发布时间】:2014-12-08 06:59:33
【问题描述】:

我想在一个(N, 3) numpy array上做直方图,它的三个维度分别代表经度、纬度和时间戳,像这样:

array([[116.45565032958984, 39.889976501464844,
        datetime.datetime(2012, 10, 1, 6, 32, 39)],
       [116.45565032958984, 39.889984130859375,
        datetime.datetime(2012, 10, 1, 6, 33, 31)],
       [116.45565032958984, 39.889984130859375,
        datetime.datetime(2012, 10, 1, 6, 33, 33)],
       [116.45565032958984, 39.889984130859375,
        datetime.datetime(2012, 10, 1, 6, 33, 37)],
       [116.45561981201172, 39.89040756225586,
        datetime.datetime(2012, 10, 1, 6, 34, 42)],
       [116.45561981201172, 39.890411376953125,
        datetime.datetime(2012, 10, 1, 6, 36, 40)],
       [116.45549774169922, 39.8941650390625,
        datetime.datetime(2012, 10, 1, 6, 37, 54)],
       [116.45556640625, 39.92431640625,
        datetime.datetime(2012, 10, 1, 6, 38, 57)],
       [116.45578002929688, 39.93780517578125,
        datetime.datetime(2012, 10, 1, 6, 42, 10)],
       [116.44468688964844, 39.93989944458008,
        datetime.datetime(2012, 10, 1, 6, 43, 21)]], dtype=object)

我尝试像这样使用np.histogramdd

import numpy as np
np.histogramdd(my_data, bins = (lon_bin_num, lat_bin_num, time_bin_num), 
                range = [[lon_min, lon_max], [lat_min, lat_max], 
                [start_datetime, end_datetime]])

得到TypeError:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-271-58c94eecf21d> in <module>()
      1 np.histogramdd(tmp2, bins = (lon_bin_num, lat_bin_num, time_bin_num),
----> 2                range = [[lon_min, lon_max], [lat_min, lat_max], [start_datetime, end_datetime]])

/*/*/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.pyc in histogramdd(sample, bins, range, normed, weights)
    318         smax = zeros(D)
    319         for i in arange(D):
--> 320             smin[i], smax[i] = range[i]
    321 
    322     # Make sure the bins have a finite width.

TypeError: float() argument must be a string or a number

我知道这是导致错误的日期时间对象,但我想知道如何纠正此错误或如何在其 dtype = object 的 numpy ndarray 上执行直方图?

【问题讨论】:

    标签: python numpy histogram python-datetime multidimensional-array


    【解决方案1】:

    许多 NumPy 函数不适用于 dtype object 的数组。要使用np.histogramdd,您需要一个形状为(N, D) 的数组,因此结构化数组在这里也无济于事(因为结构化数组会删除D 维度)。您需要一个 homogenous 非对象 dtype 的数组。由于前两列是浮点数,让我们尝试将第三列也表示为浮点数:

    您可以将日期转换为 NumPy 的原生 datetime64[s] dtype:

    In [102]: dates = np.array(my_data[:, 2],dtype='<M8[s]')
    
    In [103]: dates
    Out[103]: 
    array(['2012-10-01T02:32:39-0400', '2012-10-01T02:33:31-0400',
           '2012-10-01T02:33:33-0400', '2012-10-01T02:33:37-0400',
           '2012-10-01T02:34:42-0400', '2012-10-01T02:36:40-0400',
           '2012-10-01T02:37:54-0400', '2012-10-01T02:38:57-0400',
           '2012-10-01T02:42:10-0400', '2012-10-01T02:43:21-0400'], dtype='datetime64[s]')
    

    然后使用astype将那些datetime64[s]s转换成floats:

    In [104]: float_dates = dates.astype('float')
    
    In [105]: float_dates
    Out[105]: 
    array([  1.34907316e+09,   1.34907321e+09,   1.34907321e+09,
             1.34907322e+09,   1.34907328e+09,   1.34907340e+09,
             1.34907347e+09,   1.34907354e+09,   1.34907373e+09,
             1.34907380e+09])
    

    现在用 dtype float 形成一个新数组:

    arr = np.empty_like(my_data, dtype='float')
    arr[:, 0:2] = my_data[:, 0:2]
    arr[:, 2] = float_dates
    
    hist, edges = np.histogramdd(arr, bins=(xedges, yedges, zedges))
    

    虽然这会给您一个直方图,但您可能还需要将浮点数重新解释为日期。您可以使用astype 做到这一点。获取datetime64[s]

    In [99]: float_dates.astype('<M8[s]')
    Out[99]: 
    array(['2012-10-01T02:32:39-0400', '2012-10-01T02:33:31-0400',
           '2012-10-01T02:33:33-0400', '2012-10-01T02:33:37-0400',
           '2012-10-01T02:34:42-0400', '2012-10-01T02:36:40-0400',
           '2012-10-01T02:37:54-0400', '2012-10-01T02:38:57-0400',
           '2012-10-01T02:42:10-0400', '2012-10-01T02:43:21-0400'], dtype='datetime64[s]')
    

    获取 Python datetime.datetime 对象:

    In [116]: float_dates.astype('<M8[s]').tolist()
    Out[116]: 
    [datetime.datetime(2012, 10, 1, 6, 32, 39),
     datetime.datetime(2012, 10, 1, 6, 33, 31),
     datetime.datetime(2012, 10, 1, 6, 33, 33),
     datetime.datetime(2012, 10, 1, 6, 33, 37),
     datetime.datetime(2012, 10, 1, 6, 34, 42),
     datetime.datetime(2012, 10, 1, 6, 36, 40),
     datetime.datetime(2012, 10, 1, 6, 37, 54),
     datetime.datetime(2012, 10, 1, 6, 38, 57),
     datetime.datetime(2012, 10, 1, 6, 42, 10),
     datetime.datetime(2012, 10, 1, 6, 43, 21)]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-03-13
      • 1970-01-01
      • 2020-09-09
      • 2011-04-13
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多