【发布时间】:2020-04-23 15:28:00
【问题描述】:
我正在尝试根据| 将我的数据拆分为新列。例如我有这样的观察:
fdic : Federal Deposit Insurance Corp | unbco : United Bancorp Inc Ohio
我想根据| 分成两列。然而,有些观察没有分隔符,有些有超过 2 个分隔符,并且无法使用来自 tidyr 的 separate。我有以下行 as.data.frame(do.call(rbind, strsplit(xx$CO, "\\|"))) - 这几乎可以满足我的要求,但它在分离时会重复观察。
那是;
第一次观察。
evgnl : Evogene Limited | monsan : Monsanto Company
第 1 列和第 2 列正确拆分,但它重复第 1 列。
evgnl : Evogene Limited monsan : Monsanto Company evgnl : Evogene Limited
我希望这些观察结果具有NA 值。
evgnl : Evogene Limited monsan : Monsanto Company NA
数据:
structure(list(grp = c("10163", "8518", "2533", "6604", "7984",
"10689", "1911", "8092", "3091", "10878", "2193", "102", "214",
"4486", "8789", "8352", "10769", "10366", "6406", "8634"), WC = c(" 2,685 words ",
" 632 words ", " 139 words ", " 359 words ", " 3,610 words ",
" 448 words ", " 185 words ", " 2,321 words ", " 192 words ",
" 830 words ", " 803 words ", " 4,697 words ", " 4,649 words ",
" 748 words ", " 1,029 words ", " 3,125 words ", " 44 words ",
" 3,212 words ", " 1,150 words ", " 774 words "), CO = c(" evgnl : Evogene Limited | monsan : Monsanto Company ",
" codvbc : Codorus Valley Bancorp Inc ", " blycon : Blyth Inc ",
" icfcns : ICF International Inc. ", " fossil : Fossil Group Inc ",
" jpmsi : JP Morgan Securities LLC | rganus : Reinsurance Group of America Inc | cnyc : JPMorgan Chase & Co. ",
" usxmar : US Steel Corp ", "NULL", " toro : The Toro Company ",
" casms : CAS Medical Systems Inc ", " fdic : Federal Deposit Insurance Corp | unbco : United Bancorp Inc Ohio ",
" crane : Crane Co ", " pplres : PPL Corp ", " unnatf : United Natural Foods Inc ",
" intgxc : IntelGenx Technologies Corp. ", " gordmi : Gordmans Stores, Inc. | scp : Sun Capital Partners Inc ",
"NULL", " crginc : Cargill, Inc. ", "NULL", " cytmxt : CytomX Therapeutics, Inc. "
)), class = "data.frame", row.names = c(NA, -20L))
【问题讨论】:
标签: r