【发布时间】:2014-12-13 00:44:44
【问题描述】:
我正在为我拥有的一些公共交通数据插入到达时间。我有一个工作脚本,但它似乎在二次时间运行。这是脚本:
import pandas as pd
#read the txt file
st = pd.read_csv('interpolated_test.csv')
# sort first by trip_id, then by stop_sequence
sorted_st = st.sort(['trip_id','stop_sequence'], ascending=[False,True])
# reset the index values in prep. for iteration
reindexed = sorted_st.reset_index(drop=True)
# for each row in 'arrival_time' that has a value of hh:mm:ss
for i in reindexed['arrival_time']:
# for i in range(len(reindexed['arrival_time'])):
if pd.isnull(i) == False:
# splice hh:mm:ss
hour = int(i[:2])
minute = int(i[3:5])
# assign hh:mm:ss to numeric value
minute_value = (hour * 60) + minute
# replace current string with int value
# takes ~655s to execute on Macbook Pro w/ entire stop_times.txt
# runs in quadratic time
reindexed = reindexed.replace(i,minute_value)
# interpolate and write out
new = reindexed.apply(pd.Series.interpolate)
print(new)
这里是 csv 的链接:https://gist.github.com/adampitchie/0192933ed0eba122ba7e
我缩短了 csv,这样您就可以运行该文件而无需等待它完成。
对于任何熟悉熊猫的人来说,这应该是唾手可得的成果,但我被困住了,我们将不胜感激。
[更新] 所以我尝试用FULL CSV FILE 运行相同的代码,我得到了这个错误:
Traceback (most recent call last):
File "/Users/tester/Desktop/ETL/interpolate.py", line 49, in <module>
reindexed[col].dt.hour * 60
File "pandas/src/properties.pyx", line 34, in pandas.lib.cache_readonly.__get__ (pandas/lib.c:40664)
File "/Library/Python/2.7/site-packages/pandas/core/series.py", line 2513, in dt
raise TypeError("Can only use .dt accessor with datetimelike values")
TypeError: Can only use .dt accessor with datetimelike values
看起来pd.to_datetime(reindexed[col]) 不起作用。
为了完整起见,这是代码:
import pandas as pd
st = pd.read_csv('csv/stop_times.csv')
sorted_st = st.sort(['trip_id','stop_sequence'], ascending=[False,True])
reindexed = sorted_st.reset_index(drop=True)
for col in ('arrival_time', 'departure_time'):
reindexed[col] = pd.to_datetime(reindexed[col])
reindexed[col] = (
reindexed[col].dt.hour * 60
+ reindexed[col].dt.minute)
reindexed[col] = reindexed[col].interpolate()
print(reindexed.iloc[:, :3])
【问题讨论】: