【发布时间】:2020-10-02 15:28:13
【问题描述】:
所以我有一个 113K 行 X 14 列的中型数据库
Month District Age Gender Education Disability Religion Occupation JobSeekers
1 2020-01 Dan U17 Male None None Jewish Unprofessional workers 2
2 2020-01 Dan U17 Male None None Muslims Sales and costumer service 1
3 2020-01 Dan U17 Female None None Other Undefined 1
4 2020-01 Dan 18-24 Male None None Jewish Production and construction 1
5 2020-01 Dan 18-24 Male None None Jewish Academic degree 1
6 2020-01 Dan 18-24 Male None None Jewish Practical engineers and technicians 1
GMI ACU NACU NewSeekers NewFiredSeekers
1 0 0 2 0 0
2 0 0 1 0 0
3 0 0 1 0 0
4 0 0 1 0 0
5 0 0 1 0 0
6 0 0 1 1 1
我将它分组到一个较小的表中,其中包含我需要使用的某些数据
Sorta <- datac %>%
group_by(District, Month,Gender, Occupation) %>%
summarise(JobSeekers=sum(JobSeekers))
结果:
District Month Gender Occupation JobSeekers GMI ACU NACU NewSeekers NewFiredSeekers
<chr> <chr> <chr> <chr> <int> <int> <int> <int> <int> <int>
1 Dan 2020-01 Female Academic degree 4560 120 2622 1818 863 597
2 Dan 2020-01 Female Agriculture, forestry and fi~ 14 9 2 3 1 0
3 Dan 2020-01 Female Machine Operators and drivers 57 6 10 41 9 6
4 Dan 2020-01 Female Managers 1913 36 969 908 390 310
5 Dan 2020-01 Female Officials and clerks 1702 120 263 1319 344 243
6 Dan 2020-01 Female Practical engineers and tech~ 2847 66 1125 1656 671 504
比我试图从该表中绘制的数据应该显示趋势,如按地区划分的失业人数、显示失业率随时间增长的时间表等等 每次我尝试这样做时,我都会收到有关字符列的各种错误 所以我请求您帮助将字符和数值绘制在一起
结构如下:
structure(
list(
District = c(
"Dan",
"Dan",
"Dan",
"Dan",
"Dan",
"Dan",
"Dan",
"Dan",
"Dan",
"Dan",
"Dan",
"Dan",
"Dan",
"Dan",
"Dan",
"Dan",
"Dan",
"Dan",
"Dan",
"Dan"
),
Month = c(
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01"
),
Gender = c(
"Female",
"Female",
"Female",
"Female",
"Female",
"Female",
"Female",
"Female",
"Female",
"Female",
"Male",
"Male",
"Male",
"Male",
"Male",
"Male",
"Male",
"Male",
"Male",
"Male"
),
Occupation = c(
"Academic degree",
"Agriculture, forestry and fishing",
"Machine Operators and drivers",
"Managers",
"Officials and clerks",
"Practical engineers and technicians",
"Production and construction",
"Sales and costumer service",
"Undefined",
"Unprofessional workers",
"Academic degree",
"Agriculture, forestry and fishing",
"Machine Operators and drivers",
"Managers",
"Officials and clerks",
"Practical engineers and technicians",
"Production and construction",
"Sales and costumer service",
"Undefined",
"Unprofessional workers"
),
JobSeekers = c(
4560L,
14L,
57L,
1913L,
1702L,
2847L,
480L,
3086L,
893L,
1985L,
2605L,
44L,
1276L,
2236L,
247L,
2249L,
1258L,
2233L,
924L,
2462L
),
GMI = c(
120L,
9L,
6L,
36L,
120L,
66L,
47L,
396L,
155L,
998L,
119L,
26L,
240L,
101L,
30L,
111L,
322L,
359L,
309L,
1124L
),
ACU = c(
2622L,
2L,
10L,
969L,
263L,
1125L,
99L,
392L,
259L,
52L,
1549L,
1L,
49L,
797L,
44L,
829L,
102L,
202L,
124L,
58L
),
NACU = c(
1818L,
3L,
41L,
908L,
1319L,
1656L,
334L,
2298L,
479L,
935L,
937L,
17L,
987L,
1338L,
173L,
1309L,
834L,
1672L,
491L,
1280L
),
NewSeekers = c(
863L,
1L,
9L,
390L,
344L,
671L,
83L,
622L,
201L,
325L,
550L,
5L,
239L,
469L,
53L,
525L,
233L,
432L,
212L,
324L
),
NewFiredSeekers = c(
597L,
0L,
6L,
310L,
243L,
504L,
60L,
375L,
123L,
150L,
447L,
4L,
196L,
405L,
41L,
429L,
162L,
316L,
124L,
190L
)
),
row.names = c(NA,-20L),
class = c("grouped_df", "tbl_df", "tbl", "data.frame"),
groups = structure(
list(
District = c("Dan", "Dan"),
Month = c("2020-01", "2020-01"),
Gender = c("Female", "Male"),
.rows = list(1:10, 11:20)
),
row.names = c(NA,-2L),
class = c("tbl_df", "tbl", "data.frame"),
.drop = TRUE
)
)
第二个问题是关于如何制作失业人员/职业/年龄的“热点”区域的地图
请帮忙!
更新:
dist.oc.mo <- Cdata %>%
group_by(District,Gender,Occupation,Month) %>%
summarise(JobSeekers=sum(JobSeekers),GMI=sum(GMI), ACU=sum(ACU), NACU=sum(NACU), NewSeekers=sum(NewSeekers), NewFiredSeekers=sum(NewFiredSeekers))
p <- ggplot(data = dist.oc.mo) +
geom_bar(mapping = aes(x = Occupation, y = JobSeekers, fill=factor(District)),
stat = "identity", position = "dodge", alpha=0.7 ) +
labs(title = "March-April Jobseekers", subtitle = "This barchart describes unemployment trend for March and April sorted by jobseekers number and occupation type", fill = "District",
x = "Occupation", y = "JobSeekers") +
scale_x_discrete(labels = wrap_format(10)) +
scale_fill_brewer(palette="Set1") +
theme(legend.position = "bottom")
p
[https://i.stack.imgur.com/v0R0V.jpg][1]
问候, 摩西
【问题讨论】:
-
如果您可以提供一个代表来复制您的问题,那将会很有帮助。请看链接:stackoverflow.com/questions/5963269/…
-
嗨@YBS 感谢您的关注。我从表中粘贴了 1:20,顺便说一下,如果它有什么不同,它会显示为 tibble。希望你能帮助我,再次感谢!
-
您在寻找什么类型的情节?条形图是否足够,或者您有特定的想法?
-
我需要一个条形图和“热点地图”示例,显示有大量失业人口的地区。非常感谢您的帮助!
标签: r database ggplot2 plot statistics