【发布时间】:2022-01-19 14:42:36
【问题描述】:
我有一个数据框,其中一个列是id,并且在记录数据期间有些值被弄乱了。
这是数据类型的示例
dput(df)
structure(list(Id = c("'110171786'", "'1103fbfd5'", "'0700edf6dc'",
"'1103fad09'", "'01103fc9bb'", "''", "''", "0000fba2b'", "'01103fb169'",
"'01103fd723'", "'01103f9c34'", "''", "''", "''", "'01103fc088'",
"'01103fa6d8'", "'01103fb374'", "'01103fce8c'", "'01103f955d'",
"'011016e633'", "'01103fa0da'", "''", "''", "''", "'01103fa4bd'",
"'01103fb5c4'", "'01103fd0d7'", "'01103f9e2e'", "'01103fc657'",
"'01103fd4d1'", "'011016e78e'", "'01103fbda2'", "'01103fbae7'",
"'011016ee23'", "'01103fc847'", "'01103fbfbb'", "''", "'01103fb8bb'",
"'01103fc853'", "''", "'01103fbcd5'", "'011016e690'", "'01103fb253'",
"'01103fcb19'", "'01103fb446'", "'01103fa4fa'", "'011016cfbd'",
"'01103fd250'", "'01103fac7d'", "'011016a86e'"), Weight = c(11.5,
11.3, 11.3, 10.6, 10.6, 8.9, 18.7, 10.9, 11.3, 18.9, 18.9, 8.6,
8.8, 8.4, 11, 10.4, 10.4, 10.8, 11.2, 11, 10.3, 9.5, 8.1, 9.3,
10.2, 10.5, 11.2, 21.9, 18, 17.8, 11.3, 11.5, 10.8, 10.5, 12.8,
10.9, 8.9, 10.3, 10.8, 8.9, 10.9, 9.9, 19, 11.6, 11.3, 11.7,
10.9, 12.1, 11.3, 10.6)), class = "data.frame", row.names = c(NA,
-50L))
>
我想做的是搜索id列并替换以下错误
- 有些值的前面缺少一个零,现在所有这些都以 1 开头,这样可以很容易地找到它们。所以基本上任何字符长度为 9 并以 1 开头的东西都需要 0 作为第一个字符。
- 有些值的长度小于 10 个字符,需要删除。
- 有些有多个前导 0,需要删除。
【问题讨论】:
-
为什么你的ID被双引号例如
"'110171786'"而不是"110171786"?只是好奇 -
我认为最初只是为了阻止 excel 将它们视为数字并删除零(这不起作用),并且一些 ID 在中间有一个“E”,而 excel 将其变成科学计数法。旧数据库系统的遗物