【问题标题】:populate column in r based on regex in another column根据另一列中的正则表达式填充 r 中的列
【发布时间】:2018-08-06 09:20:26
【问题描述】:

我在 R 中对葡萄酒评论进行一些数据整理,但找不到一种优雅的方式来做我想做的事情。
我的目标是查看通常包含葡萄酒年份的葡萄酒评论的标题栏,并将该年份放在不同的栏中。 内核:https://www.kaggle.com/kieroneil/data-wrangling-wine-reviews-in-r

这是我想要的代码,但我希望有人能告诉我一个更好的方法:

# Create the year columns and assign an arbitrary value.
library(tidyverse)
wine_04$year <- 1900
year_2000 <- unlist(str_detect(wine_04$title, "2000"))
year_2001 <- unlist(str_detect(wine_04$title, "2001"))
year_2002 <- unlist(str_detect(wine_04$title, "2002"))
year_2003 <- unlist(str_detect(wine_04$title, "2003"))
year_2004 <- unlist(str_detect(wine_04$title, "2004"))
year_2005 <- unlist(str_detect(wine_04$title, "2005"))
year_2006 <- unlist(str_detect(wine_04$title, "2006"))
year_2007 <- unlist(str_detect(wine_04$title, "2007"))
year_2008 <- unlist(str_detect(wine_04$title, "2008"))
year_2009 <- unlist(str_detect(wine_04$title, "2009"))
year_2010 <- unlist(str_detect(wine_04$title, "2010"))
year_2011 <- unlist(str_detect(wine_04$title, "2011"))
year_2012 <- unlist(str_detect(wine_04$title, "2012"))
year_2013 <- unlist(str_detect(wine_04$title, "2013"))
year_2014 <- unlist(str_detect(wine_04$title, "2014"))
year_2015 <- unlist(str_detect(wine_04$title, "2015"))
year_2016 <- unlist(str_detect(wine_04$title, "2016"))
year_2017 <- unlist(str_detect(wine_04$title, "2017"))

wine_04[year_2000 == TRUE, 15] <- 2000
wine_04[year_2001 == TRUE, 15] <- 2001
wine_04[year_2002 == TRUE, 15] <- 2002
wine_04[year_2003 == TRUE, 15] <- 2003
wine_04[year_2004 == TRUE, 15] <- 2004
wine_04[year_2005 == TRUE, 15] <- 2005
wine_04[year_2006 == TRUE, 15] <- 2006
wine_04[year_2007 == TRUE, 15] <- 2007
wine_04[year_2008 == TRUE, 15] <- 2008
wine_04[year_2009 == TRUE, 15] <- 2009
wine_04[year_2010 == TRUE, 15] <- 2010
wine_04[year_2011 == TRUE, 15] <- 2011
wine_04[year_2012 == TRUE, 15] <- 2012
wine_04[year_2013 == TRUE, 15] <- 2013
wine_04[year_2014 == TRUE, 15] <- 2014
wine_04[year_2015 == TRUE, 15] <- 2015
wine_04[year_2016 == TRUE, 15] <- 2016
wine_04[year_2017 == TRUE, 15] <- 2017

感谢您的帮助。

【问题讨论】:

  • 您希望wine_04 包含标有year 的包含葡萄酒年份的列?这行得通吗? wine_04$year &lt;- sub('.*(\\d{4}).*', '\\1', wine_04$title)

标签: r regex tidyverse stringr


【解决方案1】:

这行得通。

library(stringr)
df <- data.table(text = c('the wine is from 1898','the wine is since 2008'))
df[,year := str_extract(string = text, pattern = '\\d{4}')]

                     text year
1:  the wine is from 1898 1898
2: the wine is since 2008 2008

【讨论】:

  • 工作完美。谢谢马尼什。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2016-10-04
  • 2015-10-01
  • 1970-01-01
  • 1970-01-01
  • 2016-02-14
  • 1970-01-01
相关资源
最近更新 更多