【问题标题】:R - match recoding adviceR - 匹配重新编码建议
【发布时间】:2015-10-07 16:32:55
【问题描述】:

我正在努力做一些看似简单的事情。

所以我有一个代码列表及其重新编码。

> head(codesTv)

  X5000 TV.Diary.Event
1  5001           Play
2  5002   Drama Series
3  5003    Other Drama
4  5004           Film
5  5005      Pop Music
6  5006         Comedy

然后我有一个需要重新编码的向量,名为ttest

> head(as.data.frame(ttest))
                ttest
1        SPITTING IMA
2                5999
3        KRAMERVSKRAM
4                NEWS
5           BROOKSIDE
6             NOTHING

我需要的是简单地从codesTv 重新编码需要重新编码的值。

但我发现这样做的唯一方法是这段繁琐的代码:

ttest [ ttest %in% codesTv$X5000 ] = codesTv$TV.Diary.Event [ match(ttest [ttest %in% codesTv$X5000], codesTv$X5000) ] 

有人会有更简单的想法吗?

数据

ttest = c("SPITTING IMA", "5999", "KRAMERVSKRAM", "NEWS", "BROOKSIDE", 
"NOTHING", "NOTHING", "BROOKSIDE", "5004", "5004", "5999", "YANKS", 
"5999", "5999", "5999", "5999", "\"V\"", "GET FRESH", "5999", 
"5999", "HEIDI", "FAME", "SAT  SHOW", "5021", "BLUE PETER", "V", 
"EASTENDERS", "WORLD  CUP", "GRANDSTAND", "SPORT", "WORLD CUP", 
"BLUE PETER", "WORLD CUP", "HORIZON", "REGGIEPERRIN", "5999", 
"BROOKSIDE", "HNKYTNK MAN", "5999", "5999")

 codesTv = structure(list(X5000 = c("5001", "5002", "5003", "5004", "5005", 
"5006", "5007", "5008", "5009", "5010", "5011", "5012", "5013", 
"5014", "5015", "5016", "5017", "5019", "5020", "5021", "5022", 
"5023", "5888", "5999"), TV.Diary.Event = c("Play", "Drama Series", 
"Other Drama", "Film", "Pop Music", "Comedy", "Chat Show", "Quiz/Panel Game", 
"Cartoon", "Special L/E Event", "Classical Music", "Contemporary Music", 
"Arts", "News", "Politics", "Consumer Affairs", "Spec Current Affairs", 
"Documentary", "Religious Affairs", "Sport", "Childrens TV", 
"Party Political", "Continuation Event", "Non-event (Missing)"
)), .Names = c("X5000", "TV.Diary.Event"), row.names = c(NA, 
-24L), class = "data.frame")

【问题讨论】:

  • @PierreLafortune 有三个重叠的 val:intersect(ttest, codesTv$X5000)
  • 哼哼然后怎么用呢?

标签: r match recode


【解决方案1】:

OP 的解决方案应该可以正常工作。这是另一种方式:

library(data.table)

# confirm that there is overlap
intersect(ttest, codesTv$X5000) # "5999" "5004" "5021"  

# replace values in ttest
setDT(list(X5000=ttest))[codesTv, X5000 := i.TV.Diary.Event, on="X5000"]

# confirm that the values were overwritten
intersect(ttest, codesTv$X5000) # character(0)

Stole this idea from @eddi。这应该是节省内存的,因为我们通过引用而不是复制来修改ttest

【讨论】:

  • 不错的解决方案 - 不过只是一个问题:我的繁琐方式是否有效?
  • @giacomoV 我想是的。它看起来不错,之后也有一个空的交叉点。
  • 顺便说一句@Frank - 我需要在几件事上引用你的帮助。您认为最好的方法是什么?
  • @giacomoV 我怀疑没有必要在大多数期刊和研究生院的风格指南中引用像 SO 这样的在线资源。就个人而言,我只是将引用 cmets 放入代码中,包含用户名和帖子链接。看起来 SO 没有内置的引用工具,但你可以在 math.SE 上看到它的样子:meta.stackexchange.com/questions/49760/…
猜你喜欢
  • 2015-10-21
  • 1970-01-01
  • 2015-10-25
  • 1970-01-01
  • 2021-06-14
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多