【发布时间】:2021-01-14 19:49:48
【问题描述】:
我有一个数据集,其中行是个人犯罪。其中一列是 LSOA 代码,另一列是地点名称。当然,有多个具有相同名称和 LSOA 的行。我想最终得到一个数据框,其中每个区域名称都带有相应的 LSOA 代码。我一直在兜圈子,试图找到一种方法来处理子集、计数、频率等,但要么丢失其中一列,要么就是不起作用。
这是数据集的一个示例。
Code | Name | Crime_Type | Outcome
----------------------------------------------------
E01000852 Camden 026C Vehicle Under investigation
E01000982 Croydon 017C Other Unable to prosecute
E01000982 Croydon 017C Other Under investigation
E01003950 Southwark 032B Assault Status update unavailable
E01003950 Southwark 032B Violence Under investigation
E01003950 Southwark 032B Other Under investigation
这就是我想要的输出
Code | Name
-----------------
E01000852 Camden 026C
E01000982 Croydon 017C
E01003950 Southwark 032B
我尝试了以下方法,但我丢失了名称列。
name <- as.data.frame(table(data$Code))
任何帮助表示赞赏。
dput(head(data, 10)
structure(list(code = c("E01000013", "E01000852", "E01000982",
"E01000982", "E01000996", "E01001227", "E01001591", "E01001751",
"E01002848", "E01003171"), name = c("Barking and Dagenham 013A",
"Camden 026C", "Croydon 017C", "Croydon 017C", "Croydon 009C",
"Ealing 019D", "Greenwich 012C", "Hackney 021D", "Kensington and Chelsea 015C",
"Lambeth 020B"), crime_type = c("Public order", "Vehicle crime",
"Other crime", "Violence and sexual offences", "Violence and sexual offences",
"Violence and sexual offences", "Violence and sexual offences",
"Violence and sexual offences", "Other crime", "Violence and sexual offences"
), outcome_category = c("Unable to prosecute suspect", "Further investigation is not in the public interest",
"Under investigation", "Under investigation", "Under investigation",
"Status update unavailable", "Status update unavailable", "Under investigation",
"Under investigation", "Unable to prosecute suspect"), outcome_recode = c("0",
"1", NA, NA, NA, NA, NA, NA, NA, "0"), density = c(8927, 16348,
11760, 11760, 11302, 8537, 10382, 11269, 17929, 16309), population = c(1855,
2037, 1610, 1610, 1189, 1476, 2095, 1732, 1472, 1701), IMD_value = c(2,
6, 5, 5, 5, 8, 3, 5, 3, 4), urban_rural_class = c("Urban major conurbation",
"Urban major conurbation", "Urban major conurbation", "Urban major conurbation",
"Urban major conurbation", "Urban major conurbation", "Urban major conurbation",
"Urban major conurbation", "Urban major conurbation", "Urban major conurbation"
)), row.names = c(NA, 10L), class = "data.frame")
【问题讨论】:
-
您只想要唯一的名称?其余信息会怎样?
-
我不需要另外两列,只需要前两列。
-
您能否通过
dput(head(df,n))提供您的数据?你试过unique(df$code)吗? -
似乎与示例不同,但尝试使用
df %>% filter(!duplicated(name)) %>% select(1:2)使用dplyr
标签: r