向引用相应数字/字符的数据框添加一列答案

【问题标题】：Add a column to a dataframe that references corresponding numbers/characters向引用相应数字/字符的数据框添加一列
【发布时间】：2023-03-16 23:54:02
【问题描述】：

我希望你们中的一个可以帮助我 - 我一直在尝试很多不同的方法，但似乎找不到正确的答案。我对 R 相当陌生，但一直在编写一个脚本来格式化我拥有的一些数据。最终，随着数据的到来，我希望每周运行这个脚本。

我有一个品种代码列表 (1 - 80)，其中许多（但不是全部）有一个对应的 3 个字符的国家/地区（例如 GBR 或 NLD 等）。我想要做的是在我的数据中创建一个新列，其中包含国家代码，对应于品种代码。

我遇到的一个问题是并非所有数字 (1 - 80) 都有对应的国家/地区代码。所以我不能创建一个包含它们的向量，因为它们不是同一类型。

如果没有相关的国家代码，我希望国家代码是品种代码的编号。例如，品种代码 6 没有关联的国家/地区，因此我希望“6”填充新的 sere_country 列中的相关字段。

如果有帮助，我已经添加了我一直在尝试使用的脚本，但无济于事！

#denoting country codes for breed codes 1-80
breed_country<-c("GBR", "GBR", "GBR", "GBR", "GBR", "6", "GBR", "8", "9", 
"10", 
"11", "GBR", "NZL", "GBR", "GBR", "16", "DNK", "18", "19", "GBR", "21", 
"GBR", 
"23", "24", "25", "26", "CHE", "28", "29", "30", "31", "32", "33", "34", 
"35", 
"36", "37", "38", "39", "40", "41", "42", "CZE", "44", "45", "IRL", "AUS", 
"POL", "DEU", "50", "51", "SWE", "DEU", "ESP", "55", "56", "57", "58", 
"SWE", 
"DEU", "DNK", "NZL", "NLD", "CAN", "USA", "66", "67", "68", "USA", "70", 
"FRA", 
"ITA", "FIN", "JEY", "GGY", "76", "NOR", "78", "79", "80")

breed_id<-c("Sire.Breed")

sire_country<-breed_country[breed_id]

sire_country[is.na("Sire.ID")]<-""


#the output looks like
    sire_country
 [1] NA


#when I add sire_country to my data frame, I get


sire_country
1                       <NA>
2                       <NA>
3                       <NA>
4                       <NA>  
5                       <NA>
6                       <NA> 
7                       <NA>
8                       <NA>
9                       <NA>
10                      <NA>
11                      <NA>
12                      <NA>
13                      <NA>
14                      <NA>
15                      <NA>

# "Sire.Breed" is a column containing numerical breed codes in the data 
frame: df
# sire_country is what I want the new column with the country codes in to be 
called
# if there is no "Sire.ID" present, I want the field to remain blank - I 
have used this function elsewhere and it work fine

我的数据是从 .csv 文件中读取的。不幸的是，我不能发布它，因为它是机密的。但一个虚构的例子是：

animal  name    breed   Mother  Father  ID              Company DOB
1       Alice   2       Vera    Tom     123456789012    Heinz   12/05/2017
2       Kate    63      Lucy    Jack    123456987147    Google  03/06/2017

（我无法更好地格式化表格，抱歉）

然后我希望在末尾添加与品种相关的国家代码（在这种情况下为 2 或 63），如下所示：

animal  name    breed   Mother  Father  ID              Company DOB   Country
1       Alice   2       Vera    Tom     123456789012    Heinz   12/05/2017   GBR
2       Kate    63      Lucy    Jack    123456987147    Google  03/06/2017   NLD

抱歉，如果我在整个过程中使用了错误的语言，我仍在学习！您能给我的任何帮助将不胜感激。

谢谢！

【问题讨论】：

能否请您也提供您的数据，而不是仅提供国家/地区代码
从实际具有列的数据结构开始。 IE。 data.frame(code = 1:80, country = breed_country)。您能否向我们提供预期输出的可视化表示（手动创建）。 80 行是多余的，10 行就足以说明问题了。
我很难理解breed 列和您希望创建的新列之间的区别。如果您的示例显示了关于 breed 列和所需列的不同场景（例如 breed 的哪种值映射到结果 col 的哪种值），那将很有帮助。

标签： r function dataframe reference data-management

【解决方案1】：

您应该学习索引向量、矩阵和数据框的不同方法，例如http://www.cookbook-r.com/Basics/Indexing_into_a_data_structure/

作为练习，您可以看到以下输出：

breed_country[2]
breed_country[c(2, 65, 10, 80)]

您注意到breed_country 元素的顺序实际上对应于品种代码 1:80，因此您可以轻松地按练习中看到的相应品种代码索引 breed_country。

现在您将使用df$breed，这是您的数据框中与品种代码相对应的列，来索引您的breed_country 向量。

如您所见，df$breed 会按照数据框中的顺序为您提供品种代码向量：

df$breed # breed codes of df
breed_country[df$breed] # index breed_country by breed codes in df
df$Country <- breed_country[df$breed] # assign to new column "Country"
head(df) # print first 6 rows of df

这是你出错的地方：

breed_id<-c("Sire.Breed")
breed_country[breed_id]

这相当于：

breed_country["Sire.Breed"]

但您的breed_country 元素都没有名称"Sire.Breed"，因此您的输出sire_country 是NA。

然后进一步使用is.na("Sire.ID")，询问字符向量是否为NA，不是，输出为FALSE。您应该单步执行您的代码并查看每个调用的输出。

【讨论】：