【问题标题】:Why does pd.as_matrix() change the values and the number of decimal places from the original data frame?为什么 pd.as_matrix() 会更改原始数据帧的值和小数位数?
【发布时间】:2018-11-07 10:14:32
【问题描述】:

我有一个包含两个十进制值和一个 ID 的数据框:

当我对 x 和 y 值应用 as 矩阵函数时,它会生成一个如下所示的数组:

coords = df.as_matrix(columns=['x', 'y'])
coords

产量:

array([[ 0.0703843 ,  0.170845  ],
       [ 0.07022078,  0.17150128],
       [ 0.07208886,  0.17159163],
       ..., 
       [ 0.07162819,  0.17044404],
       [ 0.06951432,  0.17096308],
       [ 0.07104143,  0.17040137]])

这立即看起来很奇怪,因为小数位的长度不一致,但我只是假设 pandas 为显示目的做了一些缩短

但是当我尝试检索 ID 时 - 当它们应该全部匹配时,我只能得到一个或零个匹配项:

ids = []
for coord in coords:
        try:
            _id = df.loc[df['x'] == coord[0]]['id'][1]
            ids.append(_id)
        except:                
            pass
len(ids)
1

我想了解的是为什么 pd.as_matrix 函数会从数据框中提取一个无法再次引用的值,如果是这样,如何从数据框中检索 id。

如有任何帮助,我们将不胜感激。

谢谢

编辑

Bellow 是 CSV 中数据框的子集:

,id,x,y
0,07379a26-2447-4fce-83ac-4784abf07389,0.07038429591623253,0.17084500318384327
1,f5cc3adb-0588-4705-b1a3-fe1b669b776f,0.07022078416348305,0.17150127781674332
2,b5a57ffe-8565-4443-9685-11675ce25dc4,0.07208886125821728,0.17159163002146055
3,940efcaa-6d9d-4b10-a0fe-d8ec8c1d9c7e,0.07057468050347501,0.1700482708522834
4,616d7794-565a-4d2d-98cb-334beb5b91ef,0.07057895306948389,0.170054305037284
5,e2d1819d-1f58-407d-9950-be0a0c00374b,0.07161607658023798,0.17013089473907284
6,6a739687-f9ad-47bd-8a4b-c47bc4b2aec6,0.070163429153604,0.16889764101717875
7,dd2df646-9a66-4baa-8815-d24f1858eda7,0.07035099968831582,0.16995622800529742
8,6a224d76-efea-4313-803d-c25b619dae0a,0.07066777462044714,0.17021849979554743
9,321147fa-ee51-4bab-9634-199c92a42d2f,0.06984869509314469,0.17098101436534555
10,e52d6289-01ba-4e7d-8054-bb9a349c0505,0.07068704829137691,0.17029718331066224
11,517f256b-6171-4d93-9b4b-0f81aac828fb,0.0713283119291569,0.16983952831019206
12,e339c742-9784-49fc-a435-790db0364229,0.07131341496221469,0.1698513011377732
13,6f20ad5a-22fb-43a2-8885-838e5161df14,0.06942397329210678,0.1716572235671854
14,f6e1008f-2b22-4d88-8c84-c0dc4f2d822e,0.06942427697939664,0.17165098925109726
15,8a2d35e5-10a2-4188-b98f-54200d2db8da,0.07048162129308791,0.16896051533992895
16,adab8fd8-4348-412d-85d2-01491886967b,0.07076495746208027,0.16966622176968035
17,df79523b-848b-45a9-8dab-fe53c2a5b62d,0.06988926585338372,0.17028143287771583
18,db05d97c-3b16-4da8-9659-820fc7e3f858,0.0713167479593096,0.1685149810693375
19,d43963d1-b803-473c-85dc-2ed2e9f77f4e,0.07045583812582461,0.1706502407290604
20,9d99c9a6-2de3-4e6a-9bd7-9d7ece358a2f,0.07044174575566758,0.17066067488910522
21,3eec44be-b9e2-45a2-b919-05028f5a0ba9,0.07079585677115756,0.16920818686920963
22,9f836847-2b67-4b33-930a-1f84452628ba,0.07078522829778934,0.16919781903167638
23,fbaa8958-a5d5-4dfb-91f7-8c11afe226a8,0.07128542860765898,0.16834798505762455
24,a84b59c4-4145-472d-a26a-4c930648c16c,0.07196635776157265,0.17047633495883885
25,29cf8ad3-7068-4207-b0a2-4f0cff337c9f,0.0719701195278871,0.17051442269732875
26,d0f512c8-5c4f-427a-99e1-ebb4c5b363e5,0.0718787509597688,0.17054903897593635
27,74b1db2d-002b-4f89-8d02-ac084e9a3cd5,0.07089130417373782,0.16981103290127117
28,89210a0c-8144-491d-9e98-19e7f4c3085e,0.07076060461092577,0.1707011426749184
29,aebb377e-7c26-4bb5-8563-c3055a027844,0.07103977816965212,0.17113978347674103
30,00b527a0-d40a-44b4-90f9-750fd447d2d7,0.07097785505134419,0.16963542019904118
31,8c186559-f50d-40ca-a821-11596e1e5261,0.06992637446216321,0.17110063865050085
32,0e64cf14-6ccd-4ad0-9715-ab410f6baf6a,0.0718311255786932,0.1705675237580442
33,f5479823-1efe-47b8-9977-73dc41d1d69e,0.07016981880399553,0.1703708437681898
34,385cfa13-2476-4e3d-b755-3063a7f802b9,0.07016550435008462,0.17037054473511137
35,a40bf573-b701-46f0-9a06-5857cf3ab199,0.0701443567773146,0.17035314147536326
36,0c5a9751-2c1b-4003-834d-9584d2f907a2,0.07016050805421256,0.17038992836178396
37,65b09067-9cf0-492d-8a70-13d4f92f8a10,0.07137336818557355,0.1684713798357405

【问题讨论】:

  • 另外你为什么使用一个裸露的except 子句?您忽略了什么错误?
  • 我刚刚运行了类似的代码(使用手动复制的数据框,没有try/except)并且没有任何问题,每个值都匹配。这听起来像一个浮点舍入错误。试试np.isclose 而不是==
  • @FHTMitchell 你好。生成数据框并非易事。我在 CSV 中包含了一个子集。错误代码是键值错误,因为 loc 正在返回一个空帧。
  • @FHTMitchell 尝试除查看实际匹配的数量。
  • @FHTMitchell - 你不能在 df.loc 函数中使用 np.isclose() - 所以没有办法找到它。

标签: python pandas numpy dataframe


【解决方案1】:

问题在于地理数据帧上的 df.loc 函数。

一旦我将它导出到 csv,然后使用普通 pandas 重新读取数据帧,它似乎工作得很好。

只是让发现此内容的人知道。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-10-04
    • 2021-11-18
    • 2021-04-01
    • 2017-04-27
    • 1970-01-01
    • 2019-07-27
    • 2014-03-18
    相关资源
    最近更新 更多