【发布时间】:2018-03-19 15:59:22
【问题描述】:
我知道已经发布了一些关于如何在不重新排序数据的情况下输出数据帧的唯一值的帖子。
我已经多次尝试实现这些方法,但是,我认为问题与相关数据框的定义方式有关。
基本上,我想查看名为“C”的数据帧,并将唯一值输出到一个名为“C1”的新数据帧中,同时不更改它们目前的存储顺序。
我目前使用的线路是:
C1 = pd.DataFrame(np.unique(C))
但是,这会返回一个升序列表(而我只是希望仅保留列表顺序并删除重复项)。
再次向那些看到我的代码并摇头的高级用户道歉——我还在学习!而且,是的,我已经尝试了很多方法来解决这个问题(重新定义 C 数据帧,将输出转换为列表等),不幸的是无济于事,所以这是我向 Python 众神求助的呼声。我将 C 和 C1 都定义为数据框,因为我知道这些几乎是存放数据的最佳数据结构,以便以后可以调用和使用它们,另外,在不影响包含在其中的数据的情况下命名列非常有用数据框)。
再次感谢您的帮助。
F0 = ('08/02/2018','08/02/2018',50)
F1 = ('08/02/2018','09/02/2018',52)
F2 = ('10/02/2018','11/02/2018',46)
F3 = ('12/02/2018','16/02/2018',55)
F4 = ('09/02/2018','28/02/2018',48)
F_mat = [[F0,F1,F2,F3,F4]]
F_test = pd.DataFrame(np.array(F_mat).reshape(5,3),columns=('startdate','enddate','price'))
#convert string dates into DateTime data type
F_test['startdate'] = pd.to_datetime(F_test['startdate'])
F_test['enddate'] = pd.to_datetime(F_test['enddate'])
#convert datetype to be datetime type for columns startdate and enddate
F['startdate'] = pd.to_datetime(F['startdate'])
F['enddate'] = pd.to_datetime(F['enddate'])
#create contract duration column
F['duration'] = (F['enddate'] - F['startdate']).dt.days + 1
#re-order the F matrix by column 'duration', ensure that the bootstrapping
#prioritises the shorter term contracts
F.sort_values(by=['duration'], ascending=[True])
# create prices P
P = pd.DataFrame()
for index, row in F.iterrows():
new_P_row = pd.Series()
for date in pd.date_range(row['startdate'], row['enddate']):
new_P_row[date] = row['price']
P = P.append(new_P_row, ignore_index=True)
P.fillna(0, inplace=True)
#create C matrix, which records the unique day prices across the observation interval
C = pd.DataFrame(np.zeros((1, intNbCalendarDays)))
C.columns = tempDateRange
#create the Repatriation matrix, which records the order in which contracts will be
#stored in the A matrix, which means that once results are generated
#from the linear solver, we know exactly which CalendarDays map to
#which columns in the results array
#this array contains numbers from 1 to NbContracts
R = pd.DataFrame(np.zeros((1, intNbCalendarDays)))
R.columns = tempDateRange
#define a zero filled matrix, P1, which will house the dominant daily prices
P1 = pd.DataFrame(np.zeros((intNbContracts, intNbCalendarDays)))
#rename columns of P1 to be the dates contained in matrix array D
P1.columns = tempDateRange
#create prices in correct rows in P
for i in list(range(0, intNbContracts)):
for j in list(range(0, intNbCalendarDays)):
if (P.iloc[i, j] != 0 and C.iloc[0,j] == 0) :
flUniqueCalendarMarker = P.iloc[i, j]
C.iloc[0,j] = flUniqueCalendarMarker
P1.iloc[i,j] = flUniqueCalendarMarker
R.iloc[0,j] = i
for k in list(range(j+1,intNbCalendarDays)):
if (C.iloc[0,k] == 0 and P.iloc[i,k] != 0):
C.iloc[0,k] = flUniqueCalendarMarker
P1.iloc[i,k] = flUniqueCalendarMarker
R.iloc[0,k] = i
elif (C.iloc[0,j] != 0 and P.iloc[i,j] != 0):
P1.iloc[i,j] = C.iloc[0,j]
#convert C dataframe into C_list, in prepataion for converting C_list
#into a unique, order preserved list
C_list = C.values.tolist()
#create C1 matrix, which records the unique day prices across unique days in the observation period
C1 = pd.DataFrame(np.unique(C))
【问题讨论】:
-
你看过
drop_duplicates吗?
标签: python pandas dataframe unique