numpy.genfromtxt 与 datetime.strptime 转换器答案

【问题标题】：numpy.genfromtxt with datetime.strptime converternumpy.genfromtxt 与 datetime.strptime 转换器
【发布时间】：2012-12-13 22:54:36
【问题描述】：

我有与 gist 中看到的类似的数据，我正在尝试使用 numpy 提取数据。我对 python 比较陌生，所以我尝试使用以下代码进行操作

import numpy as np
from datetime import datetime

convertfunc = lambda x: datetime.strptime(x, '%H:%M:%S:.%f')
col_headers = ["Mass", "Thermocouple", "T O2 Sensor",\
               "Igniter", "Lamps", "O2", "Time"]
data = np.genfromtxt(files[1], skip_header=22,\
                     names=col_headers,\
                     converters={"Time": convertfunc})

从要点中可以看出，有 22 行标题材料。在 Ipython 中，当我“运行”以下代码时，我收到一个以以下结尾的错误：

TypeError: float() argument must be a string or a number

完整的 ipython 错误跟踪可见here。

我可以使用 genfromtxt 的参数（如 usecols=range(0,6)）很好地提取六列数字数据，但是当我尝试使用转换器尝试处理最后一列时，我感到很困惑.任何和所有的 cmets 将不胜感激！

【问题讨论】：

尝试使用 read_table，它负责自动检测类型。

标签： python numpy ipython

【解决方案1】：

发生这种情况是因为np.genfromtxt 试图创建一个浮点数组，但由于convertfunc 返回一个不能转换为浮点数的日期时间对象而失败。最简单的解决方案是将参数dtype='object' 传递给np.genfromtxt，确保创建对象数组并防止转换为浮点数。但是，这意味着其他列将保存为字符串。要将它们正确保存为浮点数，您需要指定每个的dtype 以获得structured array。在这里，我将它们全部设置为双精度，除了最后一列，这将是一个对象 dtype：

dd = [(a, 'd') for a in col_headers[:-1]] + [(col_headers[-1], 'object')]
data = np.genfromtxt(files[1], skip_header=22, dtype=dd, 
                     names=col_headers, converters={'Time': convertfunc})

这将为您提供一个结构化数组，您可以使用您提供的名称访问该数组：

In [74]: data['Mass']
Out[74]: array([ 0.262 ,  0.2618,  0.2616,  0.2614])
In [75]: data['Time']
Out[75]: array([1900-01-01 15:49:24.546000, 1900-01-01 15:49:25.171000,
                1900-01-01 15:49:25.405000, 1900-01-01 15:49:25.624000], 
                dtype=object)

【讨论】：

谢谢。这似乎工作得很好。感谢您的解释！

【解决方案2】：

你可以使用pandas read_table：

    import pandas as pd
    frame=pd.read_table('/tmp/gist', header=None, skiprows=22,delimiter='\s+')

为我工作。您需要单独处理标题，因为它们是可变数量的空间分隔。

【讨论】：