使用 Python 将 InfluxDB 结果集设置为 Pandas Dataframe答案

【问题标题】：InfluxDB result set to Pandas Dataframe using Python使用 Python 将 InfluxDB 结果集设置为 Pandas Dataframe
【发布时间】：2019-04-25 02:16:06
【问题描述】：

非常感谢您的帮助。

我从 influx db 获得了这个结果集。它实际上是一本字典：

    {u'current': [[0.03341725795376516, u'2018-10-10T12:41:27Z']],  u'voltage': [[12.95246814679179, u'2018-10-10T12:41:27Z']], u'temperature': [[0.035324635690852216, u'2018-10-10T12:41:27Z']], u'tags': {u'product': u'00000000000000'}}

另一个例子是：

        u'data': {
        u'measurement': u'telemetry'},
        u'tags': {u'product_imei': u'000000000000000'},
        u'current': [
            [1.234, u'2016-01-01T00:00:00Z'], [2.234, u'2016-01-01T04:00:00Z'], [3.234, u'2016-01-01T08:00:00Z'], [1.234, u'2016-01-01T12:00:00Z'], [2.345, u'2016-01-01T16:00:00Z'], [2.678, u'2016-01-01T20:00:00Z'], [2.91, u'2016-01-02T00:00:00Z'], [2.345, u'2016-01-02T04:00:00Z'], [2.678, u'2016-01-02T08:00:00Z'], [2.91, u'2016-01-02T12:00:00Z'], [2.345, u'2016-01-02T16:00:00Z'], [2.678, u'2016-01-02T20:00:00Z'], [2.91, u'2016-01-03T00:00:00Z']
        ],
        u'voltage': [
            [14.243, u'2016-01-01T00:00:00Z'], [14.723, u'2016-01-01T04:00:00Z'], [14.826, u'2016-01-01T08:00:00Z'], [13.284, u'2016-01-01T12:00:00Z'], [12.345, u'2016-01-01T16:00:00Z'], [12.678, u'2016-01-01T20:00:00Z'], [12.91, u'2016-01-02T00:00:00Z'], [12.345, u'2016-01-02T04:00:00Z'], [12.678, u'2016-01-02T08:00:00Z'], [12.91, u'2016-01-02T12:00:00Z'], [12.345, u'2016-01-02T16:00:00Z'], [12.678, u'2016-01-02T20:00:00Z'], [12.91, u'2016-01-03T00:00:00Z']
        ],
        u'temperature': [
            [21.345, u'2016-01-01T00:00:00Z'], [None, u'2016-01-01T04:00:00Z'], [21.345, u'2016-01-01T08:00:00Z'], [None, u'2016-01-01T12:00:00Z'], [21.345, u'2016-01-01T16:00:00Z'], [None, u'2016-01-01T20:00:00Z'], [21.91, u'2016-01-02T00:00:00Z'], [None, u'2016-01-02T04:00:00Z'], [21.678, u'2016-01-02T08:00:00Z'], [None, u'2016-01-02T12:00:00Z'], [21.345, u'2016-01-02T16:00:00Z'], [None, u'2016-01-02T20:00:00Z'], [21.91, u'2016-01-03T00:00:00Z']
        ]
        }

我想使用 python 有一个与此类似的 pandas DataFrame：

    time                 current  product    voltage  temperature
------------------------------------------------------------------
2016-01-01 00:00:00   1.234  000000000000000   14.243   21.345
2016-01-01 04:00:00   2.234  000000000000000   14.723
2016-01-01 08:00:00   3.234  000000000000000   14.826   21.345
2016-01-01 12:00:00   1.234  000000000000000   13.284
2016-01-01 16:00:00   2.345  000000000000000   12.345   21.345
2016-01-01 20:00:00   2.678  000000000000000   12.678
2016-01-02 00:00:00   2.910  000000000000000   12.910   21.910
2016-01-02 04:00:00   2.345  000000000000000   12.345
2016-01-02 08:00:00   2.678  000000000000000   12.678   21.678
2016-01-02 12:00:00   2.910  000000000000000   12.910
2016-01-02 16:00:00   2.345  000000000000000   12.345   21.345
2016-01-02 20:00:00   2.678  000000000000000   12.678
2016-01-03 00:00:00   2.910  000000000000000   12.910   21.910

我已经尝试过一种非常低效的方法，实际上是逐行写入。太多时间。我花了很长时间为数千个单位做这个。

    for i, line in enumerate(results['voltage']):

        aux_dict = {}
        for key in results.keys():
                try:
                    results[key]
                    aux_dict[key] = results[key][i][0]
                    aux_dict['time'] = pd.to_datetime(line[1], infer_datetime_format=True)
                    output.append(aux_dict)
                except:
                    "Column '" + key + "' does not have data."
                    continue

    df = pd.DataFrame(output)

提前感谢您的帮助。

【问题讨论】：

您可以将您的字典转换为数据框，但它应该是文字字典，您可以显示您的字典吗？
您好，谢谢您的回答。字典在第一个代码块中。问题是，例如对于当前，它作为字典“当前”中的键是这样的：[[current1，datetime1]，[current2，datetime2]，...]。然后电压相同：'电压'：[[电压1，日期时间1]，[电压2，日期时间2]，...]等。谢谢你的回答。
问题是如何获得共享日期时间的统一数据框。请注意，温度每 8 小时记录一次。
InfluxDB and pandas errors in Python的可能重复

标签： python pandas dataframe influxdb

【解决方案1】：

我以前想回答这个问题。最后，我只是创建了一个处理不同数据输入的函数，并创建了一个带有列名的数据框。我只会在这里发布问题的答案。

背景： * 向端点发出请求，结果在 r.json()['data'] --> 标签字典中，例如'电压'，'电流'有一个列表（多个测量值）列表（测量值，时间）。示例：

import pandas as pd

d = {
'current': [[-1.8795300221255817, '2018-09-14T13:36:00Z']],
'voltage': [[12.0, '2018-09-14T13:36:00Z']]
}

fields = ['current', 'voltage']

df = pd.DataFrame()
for field in fields:
    df_aux = pd.DataFrame(d[field], columns = [field, 'time'])  # check above definition of d
    df_aux.set_index('time', inplace = True)
    df[field] = df_aux[field]

df.index = pd.to_datetime(df.index, errors='coerce')   #convert it to datetime

print df.head()

# When converting to datetime remember to check that the format was read correctly.

谢谢！

【讨论】：

【解决方案2】：

我建议使用Pinform 库，这是一个用于 InfluxDB 的 ORM，可轻松创建测量类并读取/写入 db。

供您使用：

from pinform import Measurement, MeasurementUtils
from pinform.fields import FloatField
from pinform.tags import Tag

class CurrentAndVoltage(Measurement):
  class Meta:
    measurement_name = 'current_voltage'

  current = FloatField(null=False)
  voltage = FloatField(null=False)


items = CurrentAndVoltage(time_point=datetime.datetime.now(), current=-1.87, voltage=12.0)

df = MeasurementUtils.to_dataframe([item])

【讨论】：

【解决方案3】：

使用influxdb python 模块，这是一个苗条的解决方案，它依赖于通过InfluxDBClient.query 方法解析ResultSet 返回的对象，而查询中没有GROUP BY 子句。

假设你有 Influx：

> SELECT P FROM device WHERE  time > now()-24h                                                                                                                
name: device                                                                                                                                                  
time                P                                                                                                                                         
----                -                                                          
1612958108000000000 238                                                                                                                                       
1612958108000000000 0                                                          
1612958108000000000 357                                                        
1612958108000000000 0                                                                                                                                         
1612958108000000000 0

from os import environ

import pandas as pd
from influxdb import InfluxDBClient


def client(database=None):
    return InfluxDBClient(
        username=environ['INFLUXDB_USER'],
        password=environ['INFLUXDB_PASS'],
        host=environ['INFLUXDB_HOST'],
        port=environ['INFLUXDB_PORT'],
        database=database
    )

r = client(database='test').query('SELECT P FROM device WHERE  time > now()-24h')

df = pd.DataFrame(columns=['measurement', 'time', 'P'])

for k, v in r.items():
    data = {'measurement': k[0]}
    for p in v:
        data.update({'time': p['time'], 'P': p['P']})
        df = df.append(data, ignore_index=True)

df.head()

    measurement     time    P
0   device  2021-02-10T11:55:08Z    238.0
1   device  2021-02-10T11:55:08Z    0.0
2   device  2021-02-10T11:55:08Z    357.0
3   device  2021-02-10T11:55:08Z    0.0
4   device  2021-02-10T11:55:08Z    0.0

如果您使用GROUP BY 子句进行查询，假设在 Influx 中您有：

> SELECT P FROM device WHERE  time > now()-24h GROUP BY "device_id", "asset_id"                                                                               
name: device                                                                                                                                                  
tags: asset_id=57, device_id=44                                                           
time                P                                                                                                                                         
----                -                                                                                                                                         
1612958108000000000 0                                                                                                                                         
1612958108000000000 327                                                        
1612958108000000000 0                                                                                                                                         
1612958108000000000 238                                                        
1612958108000000000 357

确保从ResultSet 的键中解析标签：

r = client(database='test').query('SELECT P FROM device WHERE  time > now()-24h GROUP BY "device_id", "asset_id"')

df = pd.DataFrame(columns=['measurement', 'time', 'P', 'device_id', 'asset_id'])

for k, v in r.items():
    data = {'measurement': k[0], 'device_id': k[1]['device_id'], 'asset_id': k[1]['asset_id']}
    for p in v:
        data.update({'time': p['time'], 'P': p['P']})
        df = df.append(data, ignore_index=True)

df.head()

    measurement     time    P   device_id   asset_id
0   device  2021-02-10T11:55:08Z    0.0     44  57
1   device  2021-02-10T11:55:08Z    327.0   44  57
2   device  2021-02-10T11:55:08Z    0.0     44  57
3   device  2021-02-10T11:55:08Z    238.0   44  57
4   device  2021-02-10T11:55:08Z    357.0   44  57

【讨论】：