【问题标题】:Homophily in a social network using python使用python的社交网络中的同质性
【发布时间】:2021-11-27 15:09:25
【问题描述】:

我正在尝试确定以节点为键、颜色为值的数据集的同质性,然后是同质性的机会。

例子:

Node  Target   Colors 
A       N        1
N       A        0 
A       D        1
D       A        1
C       X        1
X       C        0
S       D        0
D       S        1
B                0
R       N        2
N       R        2

颜色与节点列相关联,范围从 0 到 2 (int)。 计算特征 z(在我的情况下为颜色)上的同质性概率的步骤如下所示:

c_list=df[['Node','Colors']].set_index('Node').T.to_dict('list')
print("\nChance of same color:", round(chance_homophily(c_list),2))

其中chance_homophily定义如下:

#  The function below takes a dictionary with characteristics as keys and the frequency of their occurrence as values.
# Then it computes the chance homophily for that characteristic (color)

def chance_homophily(dataset):
    freq_dict = Counter([tuple(x) for x in dataset.values()])
    df_freq_counter = freq_dict
    c_list = list(df_freq_counter.values())
    
    chance_homophily = 0
    for class_count in c_list:
        chance_homophily += (class_count/sum(c_list))**2
    return chance_homophily

那么同质性计算如下:

def homophily(G, chars, IDs):
    """
    Given a network G, a dict of characteristics chars for node IDs,
    and dict of node IDs for each node in the network,
    find the homophily of the network.
    """
    num_same_ties = 0
    num_ties = 0
    for n1, n2 in G.edges():
        if IDs[n1] in chars and IDs[n2] in chars:
            if G.has_edge(n1, n2):
                num_ties+=1
                if chars[IDs[n1]] == chars[IDs[n2]]:
                    num_same_ties+=1
    return (num_same_ties / num_ties) 

G 应该从我上面的数据集构建(因此同时考虑节点和目标列)。 我并不完全熟悉这个网络属性,但我认为我在实现中遗漏了一些东西(例如,它是否正确计算了网络中节点之间的关系?)。在网上找到的另一个示例(使用不同的数据集)

https://campus.datacamp.com/courses/using-python-for-research/case-study-6-social-network-analysis?ex=1

特征也是颜色(虽然它是一个字符串,而我有一个数字变量)。我不知道他们是否考虑到节点之间的关系来确定,可能使用邻接矩阵:这部分尚未在我的代码中实现,我正在使用

G = nx.from_pandas_edgelist(df, source='Node', target='Target')

【问题讨论】:

    标签: python networkx social-networking


    【解决方案1】:

    您的代码运行良好。您唯一缺少的是 IDs 字典,它将您的节点名称映射到图 G 中的节点名称。通过从 pandas 边缘列表创建图,您已经在命名您的节点,因为它们在数据。

    这使得“IDs”字典的使用变得不必要。查看下面的示例,一次使用 ID 字典,一次使用普通字典以使用原始函数:

    import networkx as nx
    import pandas as pd
    from collections import Counter
    
    df = pd.DataFrame({"Node":["A","N","A","D","C","X","S","D","B","R","N"],
                      "Target":["N","A","D","A","X","C","D","S","","N","R"],
                      "Colors":[1,0,1,1,1,0,0,1,0,2,2]})
    
    c_list=df[['Node','Colors']].set_index('Node').T.to_dict('list')
    
    G = nx.from_pandas_edgelist(df, source='Node', target='Target')
    
    def homophily_without_ids(G, chars):
        """
        Given a network G, a dict of characteristics chars for node IDs,
        and dict of node IDs for each node in the network,
        find the homophily of the network.
        """
        num_same_ties = 0
        num_ties = 0
        for n1, n2 in G.edges():
            if n1 in chars and n2 in chars:
                if G.has_edge(n1, n2):
                    num_ties+=1
                    if chars[n1] == chars[n2]:
                        num_same_ties+=1
        return (num_same_ties / num_ties)
    
    print(homophily_without_ids(G, c_list))
    
    
    #create node ids map - trivial in this case
    nodes_ids = {i:i for i in G.nodes()}
    
    def homophily(G, chars, IDs):
        """
        Given a network G, a dict of characteristics chars for node IDs,
        and dict of node IDs for each node in the network,
        find the homophily of the network.
        """
        num_same_ties = 0
        num_ties = 0
        for n1, n2 in G.edges():
            if IDs[n1] in chars and IDs[n2] in chars:
                if G.has_edge(n1, n2):
                    num_ties+=1
                    if chars[IDs[n1]] == chars[IDs[n2]]:
                        num_same_ties+=1
        return (num_same_ties / num_ties) 
    
    print(homophily(G, c_list, nodes_ids))
    

    【讨论】:

    • 非常感谢,math_noob。这说得通。实际上,我想知道代码中是否遗漏了任何内容,而没有想到映射已经在网络中这一事实。请问您是否也测试了 chance_homophily 功能?你认为这也很好还是我错过了什么?只是想检查一下,因为这是我第一次实施和应用这个指标。谢谢
    • 是的,机会同质性也可以。
    猜你喜欢
    • 2014-06-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-07-13
    • 2010-12-15
    相关资源
    最近更新 更多