【发布时间】:2020-09-22 13:33:21
【问题描述】:
问题:
首先,我才刚刚开始。虽然我为我的代码感到自豪,但我已经意识到它返回到它并在不同的变量上使用它是多么的低效和不可复制。特别是,#3)在排除列(倾盆大雨、降水、雨水)时具有手动组件,这不是很可复制。有人可以建议吗? (如果你能相信的话,以前看起来更糟)
代码:
# 1) filter for dictionaries containing 1,000 noun counts or more
f1_raincount <- raincount %>% filter(total_ncount >= 1000)
# 2) filter for dictionaries which contain 3 or more tokens from our set of rain-related tokens
f2_raincount <- f1_raincount
#compute rain-set count
f2_raincount$set_count <- f2_raincount %>% select(cloud:thunderstorm) %>% apply(1, function(x) sum(x != 0, values_drop_na=TRUE))
f2_raincount <- f2_raincount %>% filter(set_count >= 3)
# 3) Select for rain-related noun tokens with frequencies greater than 10 across dictionaries
#First, compute dictionary counts
f3_raincount <- f2_raincount
f3_dict_long <- f3_raincount %>% select(cloud:thunderstorm) %>% apply(2, function(x) sum(x !=0))
#Second, exclude those under 10: downpour, precipitation, rainwater
f3_raincount <- f3_raincount %>% select(-c(downpour, precipitation, rainwater) )
# 4) given exclusion #3, compute rain set count and filter again
f4_raincount <- f3_raincount
f4_raincount$set_count2 <- f4_raincount %>% select(cloud:thunderstorm) %>% apply(1, function(x) sum(x != 0))
f4_raincount <- f4_raincount %>% filter(set_count2 >= 3) %>%
select(id:dictsize) #select final rain-set
【问题讨论】:
标签: r dplyr apply text-mining