从具有特定条件的列中的字典中删除所有非数字符号答案

【问题标题】：Remove all non-numeric symbols from dictionaries in a column with a specific condition从具有特定条件的列中的字典中删除所有非数字符号
【发布时间】：2020-07-21 07:33:17
【问题描述】：

我在数据框中有一列。

我想只为带有“金额”的行保留数值：，其中这些数字表示付款金额。

我理想的列输出 - 数字只保留在行中，其中“金额”是之前的。其他所有内容都是 NaN。

value
500
NaN
4
3
NaN

等

我试过了

test_df['value'] = test_df['value'].str.extract('(\d+)', expand = False)

但它将所有列值转换为 NaN。另外，它不会用“数量”区分列：，所以无论如何都没有帮助。

我也尝试了this question 的解决方案，但到目前为止还没有弄清楚。谢谢！

更新：

{"person": "78afa995795e4d85b5d9ceeca43f5fef", "event": "offer received", 
"value": {"offer id": "9b98b8c7a33c4b65b9aebfe6a799e6d9"}, "time": 0}
{"person": "a03223e636434f42ac4c3df47e8bac43", "event": "offer received", 
"value": {"offer id": "0b1e1539f2cc45b7b9fa7c272da2e1d7"}, "time": 0}
{"person": "e2127556f4f64592b11af22de27a7932", "event": "offer received", 
"value": {"offer id": "2906b810c7d4411798c6938adc9daaa5"}, "time": 0}
{"person": "8ec6ce2a7e7949b1bf142def7d0e0586", "event": "offer received", 
"value": {"offer id": "fafdcd668e3743c1bb461111dcafc2a4"}, "time": 0}

【问题讨论】：

标签： python pandas digits

【解决方案1】：

我认为有字典，所以使用Series.str.get：

test_df = pd.read_pickle('col.pkl').to_frame()

test_df['value'] = test_df['value'].str.get('amount')
print (test_df)
        value
0         NaN
1         NaN
2         NaN
3         NaN
4         NaN
      ...
306529   1.59
306530   9.53
306531   3.61
306532   3.53
306533   4.05

[306534 rows x 1 columns]

【讨论】：

Series.str.get 将 int 作为参数
@AnakinSkywalker - print (test_df['value'].head(3).to_dict()) 是什么？
@AnakinSkywalker - 似乎有些数据相关的问题，是否可以通过此列创建文件并分享？ test_df['value'].to_pickle('col.pkl')，因为示例数据完美运行，编辑了答案。
@AnakinSkywalker - 不幸的是，我不需要文本数据，而是通过 pickle 使用真实数据归档，因此无法使用（因为对我来说工作完美），所以请使用 test_df['value'].to_pickle('col.pkl') 并通过 Dropbox、gdocs 共享或类似的真实数据进行测试。
@HichamZouarhi str.get 是这样定义的，源代码if isinstance(x, dict):return x.get(i) elif len(x) > i >= -len(x):return x[i] return np.nan 如果Series 有字典，那么i 可以是任何可散列项。 Github repo 链接到str.get

【解决方案2】：

如果 value 是一个字典，你应该尝试获取 amount 键（如果存在）

test_df['value'] = test_df['value'].apply(lambda x: x.get("amount") if "amount" in x.keys() else None)

编辑

如果它们不是全部 dicts，则将值转换为字符串并剥离 {"amount" : 和 }

test_df['value'] = test_df['value'].apply(lambda x: float(str(x).strip("{'amount' :").strip('}')) if "amount" in str(x) else None)

【讨论】：

语法错误：扫描字符串时 EOL :(
@AnakinSkywalker 我的错我忘了双引号
Nope :( AttributeError: 'float' object has no attribute 'keys'
@AnakinSkywalker 不是最好的方法，但我认为编辑应该可以工作
@AnakinSkywalker 从 java 切换到 python 太快并且混淆 None 到 null，我已经编辑了它