【问题标题】:Down sampling in pythonpython中的下采样
【发布时间】:2018-10-23 05:08:51
【问题描述】:

我正在尝试对我的分钟数据进行下采样,而我的索引是日期时间。但是当我调用 pandas.resample 时,它​​只返回一列,而我的数据包含六列

import pandas as pd             
from matplotlib import pyplot
dataset = pd.read_csv('household_power_consumption.txt', sep=';', header=0, 
low_memory=False, infer_datetime_format=True, parse_dates={'datetime': 
[0,1]}, index_col=['datetime'])  #Date and time has been combined
dataset.head();
dataset=dataset.resample('H', how='mean', label='left');
a=dataset.head();
print(a)
dataset.to_csv('Downsampled_House_data.csv');

dataset.resample 只返回一列。

【问题讨论】:

  • 欢迎来到 SO。请仅使用相关标签。谢谢!
  • 什么返回print (dataset.info())
  • 还有print (dataset.head()) ?
  • 您好,您能解释一下您的输入的外观和预期的输出外观吗?
  • @jezrael dataset.head 只返回一列。说所有其他都是对象类型,除了返回的列是 float64 类型

标签: python pandas downsampling


【解决方案1】:

如果数据文件来自link,问题是一些缺失值是?

所以必要的参数na_values='?'

dataset = pd.read_csv('household_power_consumption.txt', 
                      sep=';', 
                      header=0, 
                      low_memory=False, 
                      infer_datetime_format=True, 
                      parse_dates={'datetime': [0,1]},  #Date and time has been combined
                      index_col=['datetime'],
                      na_values='?') 
print(dataset.head())
                     Global_active_power  Global_reactive_power  Voltage  \
datetime                                                                   
2006-12-16 17:24:00                4.216                  0.418   234.84   
2006-12-16 17:25:00                5.360                  0.436   233.63   
2006-12-16 17:26:00                5.374                  0.498   233.29   
2006-12-16 17:27:00                5.388                  0.502   233.74   
2006-12-16 17:28:00                3.666                  0.528   235.68   

                     Global_intensity  Sub_metering_1  Sub_metering_2  \
datetime                                                                
2006-12-16 17:24:00              18.4             0.0             1.0   
2006-12-16 17:25:00              23.0             0.0             1.0   
2006-12-16 17:26:00              23.0             0.0             2.0   
2006-12-16 17:27:00              23.0             0.0             1.0   
2006-12-16 17:28:00              15.8             0.0             1.0   

                     Sub_metering_3  
datetime                             
2006-12-16 17:24:00            17.0  
2006-12-16 17:25:00            16.0  
2006-12-16 17:26:00            17.0  
2006-12-16 17:27:00            17.0  
2006-12-16 17:28:00            17.0  

print (dataset.info())
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2075259 entries, 2006-12-16 17:24:00 to 2010-11-26 21:02:00
Data columns (total 7 columns):
Global_active_power      float64
Global_reactive_power    float64
Voltage                  float64
Global_intensity         float64
Sub_metering_1           float64
Sub_metering_2           float64
Sub_metering_3           float64
dtypes: float64(7)
memory usage: 126.7 MB
None

dataset=dataset.resample('H', label='left').mean()
print(dataset.head())
                     Global_active_power  Global_reactive_power     Voltage  \
datetime                                                                      
2006-12-16 17:00:00             4.222889               0.229000  234.643889   
2006-12-16 18:00:00             3.632200               0.080033  234.580167   
2006-12-16 19:00:00             3.400233               0.085233  233.232500   
2006-12-16 20:00:00             3.268567               0.075100  234.071500   
2006-12-16 21:00:00             3.056467               0.076667  237.158667   

                     Global_intensity  Sub_metering_1  Sub_metering_2  \
datetime                                                                
2006-12-16 17:00:00         18.100000             0.0        0.527778   
2006-12-16 18:00:00         15.600000             0.0        6.716667   
2006-12-16 19:00:00         14.503333             0.0        1.433333   
2006-12-16 20:00:00         13.916667             0.0        0.000000   
2006-12-16 21:00:00         13.046667             0.0        0.416667   

                     Sub_metering_3  
datetime                             
2006-12-16 17:00:00       16.861111  
2006-12-16 18:00:00       16.866667  
2006-12-16 19:00:00       16.683333  
2006-12-16 20:00:00       16.783333  
2006-12-16 21:00:00       17.216667  

【讨论】:

    猜你喜欢
    • 2013-09-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-02-14
    • 1970-01-01
    • 2022-01-04
    相关资源
    最近更新 更多