R：如果 colA 中的 8 个字符的子字符串等于 colB 中的 8 个字符的子字符串，则将 colA 的值添加到新的 colC答案

【问题标题】：R: If a substring of 8 characters in colA is equal to a substring of 8 characters in colB add the value of colA to a new colCR：如果 colA 中的 8 个字符的子字符串等于 colB 中的 8 个字符的子字符串，则将 colA 的值添加到新的 colC
【发布时间】：2018-11-29 22:56:50
【问题描述】：

在 R 中，我需要将一个 colA (Longitude.x) 的前 8 个字符与第二个 colB (X.x) 的前 8 个字符进行比较。如果 8 个字符相同，那么我想将 colA (Longitude.x) 的值写入新的 colC (XCoord)。换句话说，如果 colA 包含 -122.23538 的经度值，而 colB 包含 -122.235873 的 X 值，我希望 colC 取 colA 的值 -122.23538，因为前 8 个字符 (-122.235) 匹配。

colA (Longitude.x) 和 colB (X.x) 在第一次读入 R 时都是 double 类型，所以我使用以下代码将它们转换为字符：

schools_merge$Longitude.x[] <- lapply(schools_merge$Longitude.x[], as.character)
schools_merge$X.x[] <- lapply(schools_merge$X.x[], as.character)

colA 和 B 的类和类型都变成了“列表”。

我尝试了以下代码来编写新的 colC (XCoord)：

schools_merge$XCoord <- if(substr(schools_merge$X.x,1,8) == substr(schools_merge$Longitude.x,1,8)) "yes" else "no"

当这段代码运行时，它会返回一个警告——

Warning message:
In if (substr(schools_merge$X.x, 1, 8) == substr(schools_merge$Longitude.x,  
: the condition has length > 1 and only the first element will be used

——而不是期望的结果（例如，每个列表中的第二个元素应该导致 colC (XCoord) 的“是”，因为数字 -122.23538 的字符 1-8 等于字符 1-8 -122.235873)。

head(schools_merge$XCoord)
head(schools_merge$Longitude.x)
head(schools_merge$X.x)

> head(schools_merge$XCoord)
[1] "no" "no" "no" "no" "no" "no"
> head(schools_merge$Longitude.x)
[[1]]
[1] "-120.76288"

[[2]]
[1] "-122.23538"

[[3]]
[1] "-122.19604"

[[4]]
[1] "-122.09222"

[[5]]
[1] "-121.77057"

[[6]]
[1] "-122.21629"

> head(schools_merge$X.x)
[[1]]
[1] "-120.763628"

[[2]]
[1] "-122.235873"

[[3]]
[1] "-122.197942"

[[4]]
[1] "-122.092998"

[[5]]
[1] "-121.770702"

[[6]]
[1] "-122.216899"

我能想到的可能性是：1）我假设的字符数（即'-'和'.'以及所有数字）是不正确的，但我尝试了几种不同的字符数迭代比较，我仍然得到相同的结果——head() 全部为“是”或全部为“否”，或者 2）我可能需要更改为将列转换为向量而不是字符。非常感谢任何帮助！

谢谢你，安娜

针对下面的 cmets，这里是数据子集和脚本的链接：https://sfsu.box.com/s/043n3mxrj4i4mwaefykugjc16yr8mchp

【问题讨论】：

你能发布一个可重现的数据集吗？这将使我们更容易提供帮助！
在你的代码中你有substr(schools_merge$X.x,1,7) 而不是substr(schools_merge$X.x,1,8)
我会弄清楚如何发布数据集。感谢您了解 7 对 8 个字符。在我的代码中，我在 7 或 8 之间切换，看看这是否对结果有任何影响，但事实并非如此。我认为问题的核心更多地与关于“条件长度 >1 并且只会使用第一个元素”的警告消息有关。
@Giovana Stein 我刚刚添加了一个指向 Box 文件夹的链接，该文件夹包含数据和脚本的子集（请注意，名称已替换为“过滤器”）。非常感谢任何额外的帮助。再次感谢。

标签： r if-statement substring

【解决方案1】：

也许你可以试试下面的代码：

if(substr(schools_merge$X.x,1,8) == substr(schools_merge$Longitude.x,1,8)){
schools_merge$XCoord = "yes"}else{
schools_merge$XCoord = "no"}

【讨论】：

我也尝试过这段代码，但出现相同的警告消息：警告消息：在 if (substr(schools_merge$Xx, 1, 8) == substr(schools_merge$Longitude.x, : 条件长度 > 1 并且只使用第一个元素