【发布时间】:2018-05-16 16:25:21
【问题描述】:
TidyText Mining Section 3.3 中有一段可爱的代码,我正试图在我自己的数据集中进行复制。但是,在我的数据中,我无法让 ggplot “记住”我想要按降序排列的数据,并且我想要某个 top_n。
我可以运行 TidyText Mining 中的代码,并得到与书中显示的相同的图表。但是,当我在自己的数据集上运行此程序时,构面包装不显示 top_n (它们似乎显示随机数量的类别),并且每个构面中的数据未按降序排序。
我可以用一些随机文本数据和完整代码复制这个问题——但我也可以用mtcars 复制这个问题——这真的让我很困惑。
我希望下面的图表按降序显示每个方面的 mpg,并且每个方面只给我顶部的 1 类别。它不适合我。
require(tidyverse)
mtcars %>%
arrange (desc(mpg)) %>%
mutate (gear = factor(gear, levels = rev(unique(gear)))) %>%
group_by(am) %>%
top_n(1) %>%
ungroup %>%
ggplot (aes (gear, mpg, fill = am)) +
geom_col (show.legend = FALSE) +
labs (x = NULL, y = "mpg") +
facet_wrap(~am, ncol = 2, scales = "free") +
coord_flip()
但我真正想要的是有一个像 TidyText 书中那样排序的图表(仅数据示例)。
require(tidyverse)
require(tidytext)
starwars <- tibble (film = c("ANH", "ESB", "ROJ"),
text = c("It is a period of civil war. Rebel spaceships, striking from a hidden base, have won their first victory against the evil Galactic Empire. During the battle, Rebel spies managed to steal secret plans to the Empire's ultimate weapon, the DEATH STAR, an armored space station with enough power to destroy an entire planet. Pursued by the Empire's sinister agents, Princess Leia races home aboard her starship, custodian of the stolen plans that can save her people and restore freedom to the galaxy.....",
"It is a dark time for the Rebellion. Although the Death Star has been destroyed, Imperial troops have driven the Rebel forces from their hidden base and pursued them across the galaxy. Evading the dreaded Imperial Starfleet, a group of freedom fighters led by Luke Skywalker has established a new secret base on the remote ice world of Hoth. The evil lord Darth Vader, obsessed with finding young Skywalker, has dispatched thousands of remote probes into the far reaches of space....",
"Luke Skywalker has returned to his home planet of Tatooine in an attempt to rescue his friend Han Solo from the clutches of the vile gangster Jabba the Hutt. Little does Luke know that the GALACTIC EMPIRE has secretly begun construction on a new armored space station even more powerful than the first dreaded Death Star. When completed, this ultimate weapon will spell certain doom for the small band of rebels struggling to restore freedom to the galaxy...")) %>%
unnest_tokens(word, text) %>%
mutate(film = as.factor(film)) %>%
count(film, word, sort = TRUE) %>%
ungroup()
total_wars <- starwars %>%
group_by(film) %>%
summarize(total = sum(n))
starwars <- left_join(starwars, total_wars)
starwars <- starwars %>%
bind_tf_idf(word, film, n)
starwars %>%
arrange(desc(tf_idf)) %>%
mutate(word = factor(word, levels = rev(unique(word)))) %>%
group_by(film) %>%
top_n(10) %>%
ungroup %>%
ggplot(aes(word, tf_idf, fill = film)) +
geom_col(show.legend = FALSE) +
labs (x = NULL, y = "tf-idf") +
facet_wrap(~film, ncol = 2, scales = "free") +
coord_flip()
【问题讨论】:
-
您对
mtcars代码的前几行有何期待?如果您按am分组并取最高的mpg,则您有一个2 行数据框,因为am只有2 个值。这是你的意图吗? -
嗨 Camillle 14 - 是的,这就是目的 - 数据框应该(并且确实)按 mpg 排序,无论您要求多少,但这似乎并没有在我的任何数据集中传递给 ggplot (但适用于 TidyText 书中更大的数据示例)