【发布时间】:2020-03-20 11:40:28
【问题描述】:
df
Food_Id Month_yr Qty Sales
0 1 November_18 5 1920
1 2 November_18 6 2850
2 2 November_18 8 3852
3 1 November_18 6 1920
4 2 November_18 7 2650
5 1 November_18 2 3952
6 1 November_18 3 1320
7 2 November_18 8 2650
8 1 November_18 9 3152
9 1 December_18 5 1920
10 2 December_18 6 2150
11 2 December_18 8 3852
13 1 December_18 6 4920
14 2 December_18 6 3690
15 2 December_18 2 8952
16 1 December_18 7 7340
17 1 December_18 4 3650
18 2 December_18 9 8152
19 1 January_19 5 1920
20 2 January_19 6 8150
21 2 January_19 8 3852
22 1 January_19 1 3920
23 2 January_19 3 2690
24 2 January_19 2 8952
25 1 January_19 2 7340
26 1 January_19 4 5630
27 2 January_19 7 6152
我有一个大约 2gb 的大数据集,我需要做的是每个月比较每个 food_id 的销售额。 如果从本月到下个月某个特定食品 ID 的销售额差异为 1000,则标记该月。
输出
Food_Id Month_yr Sales diff_frm_lst_month Flag
0 1 November_18 12264 Null Null
1 2 November_18 12002 Null Null
2 1 December_18 17830 5566 more than 1000
3 2 December_18 26794 14792 more than 1000
4 1 January_18 18800 970 less than 1000
5 2 January_18 29796 3002 more than 1000
由于数据量大,请提一下如何处理大数据。
【问题讨论】:
-
你尝试过什么吗?请分享
标签: python python-3.x pandas python-2.7