数据框中的Python列和行交互答案

【问题标题】：Python column and row interaction in dataframe数据框中的Python列和行交互
【发布时间】：2017-05-16 02:51:22
【问题描述】：

假设我有一个数据框：

question  user   level 
    1      a       1     
    1      b       2     
    1      a       3     
    2      a       1     
    2      b       2     
    2      a       3     
    2      b       4     
    3      c       1     
    3      b       2     
    3      c       3     
    3      a       4     
    3      b       5

列级别指定主题的发起人和回复人。如果用户的级别为1，则表示他提出了问题。如果用户的等级为2，则表示他回复了提问的用户。如果用户的等级为3，则表示他回复了等级为2的用户，以此类推。

我想提取一个新的数据框，该数据框应该通过问题呈现用户之间的交流。它应该包含三列：“用户来源”、“用户目的地”和“回复计数”。回复计数是用户目标“直接”回复用户源的次数。

    us_source us_dest reply_count
        a        b       2
        a        c       0
        b        a       0
        b        c       0
        c        a       0
        c        b       1

我尝试使用此代码查找前两列..

idx_cols = ['question']
std_cols = ['user_x', 'user_y']
df1 = df.merge(df, on=idx_cols)
df2 = df1.loc[f1.user_x != f1.user_y, idx_cols + std_cols]

df2.loc[:, std_cols] = np.sort(df2.loc[:, std_cols])

有人对第三栏有什么建议吗？当且仅当 B 在第 k 层回复同一主题中第 k-1 层的 A 的消息时，考虑将 B 的回复“直接”回复给 A。如果一个话题是学生A发起的（1级发消息），B回复A（2级发消息），所以B直接回复A。只有2级到1级的学生回复。

【问题讨论】：

Python interaction between columns and rows的可能重复
你一直在问同样的问题。 stackoverflow.com/q/43900180 stackoverflow.com/q/43865536 stackoverflow.com/q/43825697 stackoverflow.com/q/43742173停止转发。

标签： python pandas

【解决方案1】：

我的建议：

我会使用包含“source-destination”作为键和reply_counts 作为值的字典。

遍历第一个数据框，对于每个问题，将发布第一条消息的人存储为目标，将发布第二条消息的人存储为源，在字典中的键“源-目标”处添加一个计数器。 eg（不熟悉pandas，我让你格式化好）：

from itertools import permutations
reply_counts = {}  # the dictionary where results is going to be stored
users = set()
destination = False  # a simple boolean to make sure message 2 follows message 1

for row in dataframe:  # iterate over the dataframe
    users.add(row[1])  # collect users' name
    if row[2] == 1:  # if it is an initial message
        destination = row[1]  # we store users as destination
    elif row[2] == 2 and destination:  # if this is a second message 
        source = row[1]  # store user as source
        key = source + "-" + destination  # construct a key based on source/destination
        if key not in reply_counts:  # if the key is new to dictionary
            reply_counts[key] = 1  # create the new entry
        else:  # otherwise
            reply_counts[key] += 1  # add a counter to the existing entry
        destination = False  # reset destination

    else:
        destination = False  # reset destination

# add the pairs of source-destination who didn't interact in the dictionnary
for pair in permutations(users, 2):
    if "-".join(pair) not in reply_counts:
        reply_counts["-".join(pair)] = 0

然后您可以将字典转换回数据框。

【讨论】：

谢谢！！！对学习新方法很有用.. :) 你对这个stackoverflow.com/questions/43865536/… 有什么想法吗？ @Trolldejo