【发布时间】:2021-05-19 18:31:47
【问题描述】:
目前,我有一个这样的数据框:
| index | domain | type | upstream | downstream | flag |
|---|---|---|---|---|---|
| 1 | bing | search engine | 1 | 0 | NaN |
| 2 | bbcnews | public broadcaster | 1 | 1 | centre |
| 3 | bbcnews | public broadcaster | 1 | 1 | centre |
| 4 | social media | 1 | 0 | NaN | |
| 5 | foxnews | commercial broadcaster | 1 | 1 | centre |
我想获得这样的数据框:
| index | domain | type | upst | downst | flag | refer_fb | refer_soc_med | ref_bing | refer_search_eng |
|---|---|---|---|---|---|---|---|---|---|
| 1 | bing | search engine | 1 | 0 | NaN | NaN | NaN | NaN | NaN |
| 2 | bbcnews | public broadcaster | 1 | 1 | centre | 0 | 0 | 1 | 1 |
| 3 | bbcnews | public broadcaster | 1 | 1 | centre | 0 | 0 | 1 | 1 |
| 4 | social media | 1 | 0 | NaN | NaN | NaN | NaN | NaN | |
| 5 | foxnews | commercial broadcaster | 1 | 1 | centre | 1 | 1 | 0 | 0 |
我的脚本需要做的是:
新建列,当上一行满足upstream = 1,downstream = 0的条件时,根据上一行对每条新闻(始终标记为中心)进行分类。新闻有6类(例如,comm broadcaster) ,公共广播公司只是示例)。我想要新列中的二进制值,例如上面的示例。
重要的是,如果 'news' 类型之后的后续行也是 'centre' 标志显示的 'news',那么这也应该与前一个新闻行的分类相同。
【问题讨论】:
-
查看 shift() 列,然后应用您的标准
标签: python pandas dataframe data-wrangling