【问题标题】:Connect multiple hashtags from one line with each other将一行中的多个主题标签相互连接
【发布时间】:2015-04-20 15:15:17
【问题描述】:

我有一个包含超过 50.000 行推文的列表。现在我已经从该列表中导出了主题标签,但现在我遇到了几千行看起来像这样的主题标签

hashtag1;hashtag2;hashtag3;hashtag4

由于我想做一个联合标签分析,我正在寻找一种方法来将这些多个标签相互连接,而不必手动将这些线转换为无向边。示例:

hashtag1;hashtag2
hashtag1;hashtag3
hashtag1;hashtag4
hashtag2;hashtag3
hashtag2;hashtag4
hashtag3;hashtag4

那么,您知道如何完成这项任务(例如通过 R)吗?我是一个 R 菜鸟,对其他语言的“精通”甚至更少,但我渴望学习。

structure(list(V1 = structure(c(1L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 7L, 8L, 8L, 9L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 13L, 
13L, 13L, 13L, 14L, 14L), .Label = c("profitkapital", "resupply", 
"robotik", "rudidutschke", "russland", "sanktionen", "sanktionieren", 
"schiller", "siegertyp", "snowden", "sockeleinkommen", "solidarity", 
"sozialismus", "sozialphilosoph"), class = "factor"), V2 = structure(c(4L, 
3L, 2L, 7L, 7L, 7L, 7L, 17L, 6L, 8L, 9L, 10L, 10L, 11L, 12L, 
13L, 18L, 18L, 1L, 15L, 15L, 14L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 5L, 5L, 4L, 4L, 4L, 4L, 16L, 16L), .Label = c("alltag", 
"arbeit", "bbq", "bge", "blockupy", "deutschland", "digitalisierung", 
"griechenland", "grundeinkommen", "hartziv", "kenfm", "kirche", 
"kopf", "kraft", "marx", "negt", "piraten", "sanktion"), class = "factor"), 
    V3 = structure(c(1L, 3L, 2L, 4L, 4L, 4L, 4L, 4L, 5L, 4L, 
    4L, 4L, 13L, 10L, 13L, 4L, 14L, 14L, 7L, 6L, 6L, 15L, 8L, 
    8L, 8L, 8L, 8L, 8L, 8L, 1L, 1L, 1L, 1L, 12L, 12L, 11L, 11L, 
    11L, 11L, 9L, 9L), .Label = c("", "abitur", "bbqrub", "bge", 
    "brd", "brecht", "deutschen", "fsa", "grundeinkommen", "hartziv", 
    "linkezukunft", "ows", "vatikan", "widerspruch", "würde"
    ), class = "factor"), V4 = structure(c(1L, 3L, 6L, 1L, 1L, 
    1L, 1L, 1L, 8L, 1L, 2L, 1L, 9L, 5L, 9L, 10L, 4L, 4L, 7L, 
    3L, 3L, 11L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    12L, 12L, 1L, 1L, 1L, 1L, 3L, 3L), .Label = c("", "bank", 
    "bge", "eilantrag", "haarp", "job", "jobcentern", "merkel", 
    "pastor", "probleme", "super", "unibrennt"), class = "factor"), 
    V5 = structure(c(1L, 3L, 5L, 1L, 1L, 1L, 1L, 1L, 7L, 1L, 
    10L, 1L, 2L, 9L, 2L, 4L, 8L, 8L, 6L, 1L, 1L, 6L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 
    1L, 1L), .Label = c("", "bge", "bgenation", "fliegen", "geld", 
    "hartziv", "hitler", "sg", "ttip", "vorbild"), class = "factor"), 
    V6 = structure(c(1L, 5L, 2L, 1L, 1L, 1L, 1L, 1L, 6L, 1L, 
    1L, 1L, 8L, 4L, 8L, 7L, 4L, 4L, 4L, 1L, 1L, 4L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = c("", "altersarmut", "antifa", "bge", "deeznuts", 
    "holocaust", "klatsch", "sex"), class = "factor"), V7 = structure(c(1L, 
    1L, 2L, 1L, 1L, 1L, 1L, 1L, 6L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 
    4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "bge", 
    "cia", "hartz", "spanishrevolution", "wahre"), class = "factor"), 
    V8 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 
    1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = c("", "cityoflondon", "grund", "peace"), class = "factor"), 
    V9 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 
    1L, 1L, 1L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = c("", "bge", "occupy", "rothschild"), class = "factor"), 
    V10 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = c("", "ard", "gezi"), class = "factor"), V11 = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "refugeeswelcome", 
    "zdf"), class = "factor"), V12 = structure(c(1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "nolegida", 
    "wdr"), class = "factor"), V13 = structure(c(1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "nopegida", 
    "swr"), class = "factor"), V14 = structure(c(1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "nocastor", 
    "zukunft"), class = "factor")), .Names = c("V1", "V2", "V3", 
"V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12", "V13", 
"V14"), class = "data.frame", row.names = c(NA, -41L))

【问题讨论】:

    标签: r networking twitter hashtag


    【解决方案1】:

    您可以尝试使用 combinatcombn 的包,这将生成几个排列

    library(combinat)
    combn(c("hashtag1", "hashtag2", "hashtag3", "hashtag4"), 2)
         [,1]       [,2]       [,3]       [,4]       [,5]       [,6]      
    [1,] "hashtag1" "hashtag1" "hashtag1" "hashtag2" "hashtag2" "hashtag3"
    [2,] "hashtag2" "hashtag3" "hashtag4" "hashtag3" "hashtag4" "hashtag4"
    

    【讨论】:

    • 您好,感谢您的回复! :) 我试过你的建议,觉得这是正确的方法(不知何故)。但是,如果应用于我的数据文件,我会得到一个包含一列的表,并且所有内容都是“NA”。然后我试图用“, 2)”来玩弄,并用“min”或“fun”替换它,但这些都不起作用,因为主题标签不是数字的。你有什么进一步的建议吗? :)
    • 你能编辑你的帖子和dput你的数据或至少一部分吗?
    • 是的,就是这样,但我没有在您的数据中看到主题标签?所以我不确定你到底对你的数据做了什么?
    • 哦,这些词是在 twitter 上使用的主题标签,我刚刚删除了主题标签 :) 我想要做的是联合主题标签分析,它是主题标签的网络分析(例如wiki.digitalmethods.net/Dmi/CoWordLifeline)
    • 好的,假设数据是df,那么你试试这个hashtag = unlist(apply(df, 1, unique)); hashtag2 <- unique(hashtag[hashtag != ""]); combn(hashtag2),如果它是你要找的。​​span>
    猜你喜欢
    • 1970-01-01
    • 2014-12-27
    • 1970-01-01
    • 2020-07-18
    • 2018-10-22
    • 2018-12-26
    • 1970-01-01
    • 2014-10-07
    • 1970-01-01
    相关资源
    最近更新 更多