【发布时间】:2017-10-16 02:00:12
【问题描述】:
我有一个名为“已调整”的 pandas 完整数据框。我想在“fyear”和“conm”上添加具有新值条件的“stage”列。
fyear conm indadjsg
1 1999 1-800-FLOWERS.COM 26.646086
2 2000 1-800-FLOWERS.COM 22.727175
3 2001 1-800-FLOWERS.COM 7.312014
4 2002 1-800-FLOWERS.COM 4.948308
5 2003 1-800-FLOWERS.COM 6.278798
23 1996 ABERCROMBIE & FITCH -CL A 34.831691
24 1997 ABERCROMBIE & FITCH -CL A 48.053137
25 1998 ABERCROMBIE & FITCH -CL A 48.918326
26 1999 ABERCROMBIE & FITCH -CL A 46.956456
27 2000 ABERCROMBIE & FITCH -CL A 33.91436
28 2001 ABERCROMBIE & FITCH -CL A 67.23423
29 2002 ABERCROMBIE & FITCH -CL A 99.09342
11929 2006 CLIFTON BANCORP INC 0.236418
11930 2007 CLIFTON BANCORP INC -1.366626
11931 2008 CLIFTON BANCORP INC 8.564019
11932 2009 CLIFTON BANCORP INC -4.966110
11933 2010 CLIFTON BANCORP INC -4.359552
11934 2011 CLIFTON BANCORP INC -16.313852
11935 2012 CLIFTON BANCORP INC -18.193550
11936 2013 CLIFTON BANCORP INC -10.126603
11937 2014 CLIFTON BANCORP INC 4.718584
11938 2015 CLIFTON BANCORP INC -11.889065
11940 2015 CLIPPER REALTY INC 70.945767
11941 2016 CLIPPER REALTY INC 3.776001
11980 2014 CM FINANCE INC 205.894048
11981 2015 CM FINANCE INC 68.518555
121247 2009 VCA INC -5.552030
121248 2010 VCA INC -3.357275
121249 2011 VCA INC -0.930798
121250 2012 VCA INC 5.974914
121256 2007 VIASPACE INC -50.966869
121257 2008 VIASPACE INC 149.957403
121258 2009 VIASPACE INC 197.776855
121259 2010 VIASPACE INC -25.201733
121260 2011 VIASPACE INC 77.082624
121261 2012 VIASPACE INC 78.034233
121266 2005 YASHENG GROUP -3.728098
121267 2006 YASHENG GROUP -2.233927
121268 2007 YASHENG GROUP 0.349349
121279 2009 YUHE INTERNATIONAL INC 27.995324
121280 2010 YUHE INTERNATIONAL INC 34.375630
1) 如果唯一公司的 fyear 数量等于或小于 5,我想填写“start”。
byyr = adjusted.groupby(by=['conm'])['fyear']
dfbyyr =byyr.count().to_frame()
start = dfbyyr[dfbyyr['fyear'] <= 5]
fyear
conm
1-800-FLOWERS.COM 5
ABERCROMBIE & FITCH -CL A 7
CLIFTON BANCORP INC 10
CLIPPER REALTY INC 2
CM FINANCE INC 2
VCA INC 4
VIASPACE INC 6
YASHENG GROUP 3
YUHE INTERNATIONAL INC 2
2) 在我用“开始”条件填充其余数据后,我想填充另一个值。 我计算了独特公司的平均 indadjsg。
mask2 = adjusted.groupby(by=['conm'])['indadjsg']
countsg = mask2.mean().to_frame().reset_index()
c = countsg.dropna()
数据框'c'
conm indadjsg
0 1-800-FLOWERS.COM 3.291539
1 ABERCROMBIE & FITCH -CL A 105.335324
2 CLIFTON BANCORP INC 22.920683
3 CLIPPER REALTY INC 36.784677
4 CM FINANCE INC 1.605919
5 VCA INC 3.116871
6 VIASPACE INC -106.153789
7 YASHENG GROUP -2.676296
8 YUHE INTERNATIONAL INC 12.306557
我要给出的条件如下:
indadjsg < 0, 'decline'
0 <= indadjsg <= 15, 'revival'
15< indadjsg <= 100, 'mature'
100< indajsg , 'growth'
我要制作的最终数据框是这样的
fyear conm indadjsg stage
1 1999 1-800-FLOWERS.COM 26.646086 start
2 2000 1-800-FLOWERS.COM 22.727175 start
3 2001 1-800-FLOWERS.COM 7.312014 start
4 2002 1-800-FLOWERS.COM 4.948308 start
5 2003 1-800-FLOWERS.COM 6.278798 start
23 1996 ABERCROMBIE & FITCH -CL A 34.831691 growth
24 1997 ABERCROMBIE & FITCH -CL A 48.053137 growth
25 1998 ABERCROMBIE & FITCH -CL A 48.918326 growth
26 1999 ABERCROMBIE & FITCH -CL A 46.956456 growth
27 2000 ABERCROMBIE & FITCH -CL A 33.91436 growth
28 2001 ABERCROMBIE & FITCH -CL A 67.23423 growth
29 2002 ABERCROMBIE & FITCH -CL A 99.09342 growth
11929 2006 CLIFTON BANCORP INC 0.236418 mature
11930 2007 CLIFTON BANCORP INC -1.366626 mature
11931 2008 CLIFTON BANCORP INC 8.564019 mature
11932 2009 CLIFTON BANCORP INC -4.966110 mature
11933 2010 CLIFTON BANCORP INC -4.359552 mature
11934 2011 CLIFTON BANCORP INC -16.313852 mature
11935 2012 CLIFTON BANCORP INC -18.193550 mature
11936 2013 CLIFTON BANCORP INC -10.126603 mature
11937 2014 CLIFTON BANCORP INC 4.718584 mature
11938 2015 CLIFTON BANCORP INC -11.889065 mature
11940 2015 CLIPPER REALTY INC 70.945767 start
11941 2016 CLIPPER REALTY INC 3.776001 start
11980 2014 CM FINANCE INC 205.894048 start
11981 2015 CM FINANCE INC 68.518555 start
121247 2009 VCA INC -5.552030 start
121248 2010 VCA INC -3.357275 start
121249 2011 VCA INC -0.930798 start
121250 2012 VCA INC 5.974914 start
121256 2007 VIASPACE INC -50.966869 decline
121257 2008 VIASPACE INC 149.957403 decline
121258 2009 VIASPACE INC 197.776855 decline
121259 2010 VIASPACE INC -25.201733 decline
121260 2011 VIASPACE INC 77.082624 decline
121261 2012 VIASPACE INC 78.034233 decline
121266 2005 YASHENG GROUP -3.728098 start
121267 2006 YASHENG GROUP -2.233927 start
121268 2007 YASHENG GROUP 0.349349 start
121279 2009 YUHE INTERNATIONAL INC 27.995324 start
121280 2010 YUHE INTERNATIONAL INC 34.375630 start
有什么方法可以一次性完成吗?我只能想到制作单独的列并将其合并。你能帮助我有效地思考吗?提前谢谢你。
【问题讨论】:
标签: python pandas dataframe group-by conditional-statements