【发布时间】:2018-11-09 22:11:39
【问题描述】:
我有一个数据框,其中包含一个名为 ProjectSubject 的列。数据框大约有 1,000,000 行长。
在 ProjectSubject 列中,我有很多不同的字符串。这是一个例子:
>unique(unlist(projectdf$ProjectSubject))
[1] "Applied Learning" "Applied Learning, Literacy
& Language"
[3] "Literacy & Language" "Special Needs"
[5] "Literacy & Language, History & Civics" "Math & Science"
[7] "History & Civics, Math & Science" "Literacy & Language,
Special Needs"
[9] "Applied Learning, Special Needs" "Health & Sports, Special
Needs"
[11] "Math & Science, Literacy & Language" "Literacy & Language, Math
& Science"
[13] "Literacy & Language, Music & The Arts" "Math & Science, Special
Needs"
[15] "Health & Sports" "Music & The Arts"
[17] "Math & Science, Applied Learning" "Literacy & Language,
Applied Learning"
[19] "Applied Learning, Music & The Arts" "History & Civics,
Literacy & Language"
[21] "Applied Learning, Math & Science" "Health & Sports, Math &
Science"
[23] "Applied Learning, Health & Sports" "History & Civics"
[25] "History & Civics, Music & The Arts" "Math & Science, History &
Civics"
[27] "Math & Science, Music & The Arts" "Special Needs, Music &
The Arts"
[29] "History & Civics, Applied Learning" "History & Civics, Special
Needs"
我需要一种简洁、非手动的方式来遍历数据框中的整个列,并用不同的字符串替换这些字符串。例如,我想将“Applied Learning, Special Needs”替换为“Special Needs”,或者类似地将“Applied Learning, Math & Science”替换为“Math”。
我有大约 50 个唯一字符串,很像上面给出的示例代码,我想将其减少到大约 10 个唯一字符串。最好有一种方法,我不必为 50 个字符串中的每一个手动键入一行代码就不必这样做。
【问题讨论】:
标签: r dataframe character rename