【问题标题】:check if the name are duplicate in email column检查名称是否在电子邮件列中重复
【发布时间】:2020-09-07 19:39:46
【问题描述】:

我有一个如下所示的数据框,现在我想检查@之前的名称是否重复,如果重复则将新列突变为(1,0)以表示 TRUE 和 FALSE

df <- data.frame(ID =c("DEV2962","KTN2252","ANA2719","ITI2624","DEV2698","HRT2921","","KTN2624","ANA2548","ITI2535","DEV2732","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
                 city=c("del","mum","nav","pun","bang","chen","triv","vish","del","mum","bang","vish","bhop","kol","noi","gurg"),
                 email = c("akash.dev@gmail.com","rahul.singh@gmail.com","salman.abbas@gmail.com","ram.lal@gmail.com","ram.lal@gmail.com","prabal.garg@gmail.com","sanu.ali@gmail.com","kunal.singh@gmail.com","lakhan.tomar@gmail.com","praveen.thakur@gmail.com","sarman.ali@gmail.com","zuber.khan@gmail.com","giriraj.singh@gmail.com","lokesh.sharma@gmail.com","pooja.pawar@gmail.com","nikita.sharma@gmail.com"),
                 name= c("dev,akash","singh,rahul","abbas,salman","lal,ram","singh,nkunj","garg,prabal","ali,sanu","singh,kunal","tomar,lakhan","thakur,praveen","ali,sarman","khan,zuber","singh,giriraj","sharma,lokesh","pawar,pooja","sharma,nikita"))

我也有一个相同的旧数据框,检查邮件 ID 是否存在于旧数据框中,如果存在,则检查所有记录是否相同,如(名称、城市、ID)

我尝试过使用 string_detect 但它不起作用。

输出会是这样的

【问题讨论】:

  • 你能以dput 格式发布old 吗?请使用dput(old) 的输出编辑问题。或者,如果 dput(head(old, 20)) 的输出太大。

标签: r


【解决方案1】:

这应该可以解决问题的第一部分:

library(stringr)
df %>% 
  mutate(first =str_extract(email, "[^\\@]+"),
         duplicate = as.numeric(duplicated(first))) 

第一行提取直到@ 的所有内容,第二行查找first 的所有重复观察结果。

【讨论】:

  • 赞成,但我会用select(-first) 结束管道,辅助列不在预期的输出中。
猜你喜欢
  • 1970-01-01
  • 2011-08-17
  • 1970-01-01
  • 2019-06-29
  • 2012-03-03
  • 2021-06-12
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多