您可以创建 DataFrame 列表或创建一个 DataFrame。一种方法是itertools.combinations + DataFrame.groupby。在第一种情况下,您可以使用.to_numpy() 来获取矩阵。
from itertools import combinations
l = [df[[*comb]].groupby([*comb]).size().unstack(fill_value=0)
for comb in combinations(df, 2)]
print(l[0])
B 0 1
A
0 0 1
1 1 1
new_df = pd.DataFrame({comb : df[[*comb]].groupby([*comb]).size()
for comb in combinations(df, 2)}).fillna(0)
print(new_df)
A B ... C \
B C D E F G C D E F ... D E F
0 0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 ... 2.0 0.0 1.0
1 1.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 ... 0.0 2.0 1.0
1 0 1.0 1.0 2.0 0.0 1.0 1.0 2.0 2.0 0.0 1.0 ... 1.0 0.0 0.0
1 1.0 1.0 0.0 2.0 1.0 1.0 0.0 0.0 2.0 1.0 ... 0.0 1.0 1.0
D E F
G E F G F G G
0 0 0.0 0.0 1.0 1.0 0.0 0.0 0.0
1 2.0 3.0 2.0 2.0 0.0 0.0 1.0
1 0 1.0 0.0 0.0 0.0 1.0 1.0 1.0
1 0.0 0.0 0.0 0.0 2.0 2.0 1.0
我们可以看到size A, B 对于每个组合 (0, 1), (1, 0) 等。
详情
list(combinations(df, 2))
[('A', 'B'),
('A', 'C'),
('A', 'D'),
('A', 'E'),
('A', 'F'),
('A', 'G'),
('B', 'C'),
('B', 'D'),
('B', 'E'),
('B', 'F'),
('B', 'G'),
('C', 'D'),
('C', 'E'),
('C', 'F'),
('C', 'G'),
('D', 'E'),
('D', 'F'),
('D', 'G'),
('E', 'F'),
('E', 'G'),
('F', 'G')]
优雅的外观
from itertools import permutations
new_df = pd.DataFrame({comb : df[[*comb]].groupby([*comb]).size()
for comb in permutations(df, 2)})\
.stack(dropna=False).unstack(level=0).fillna(0).swaplevel().sort_index()
print(new_df)
A B C D E F G
0 1 0 1 0 1 0 1 0 1 0 1 0 1
A 0 0.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0
1 0.0 0.0 1.0 1.0 1.0 1.0 2.0 0.0 0.0 2.0 1.0 1.0 1.0 1.0
B 0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0
1 1.0 1.0 0.0 0.0 2.0 0.0 2.0 0.0 0.0 2.0 1.0 1.0 0.0 2.0
C 0 1.0 1.0 0.0 2.0 0.0 0.0 2.0 0.0 0.0 2.0 1.0 1.0 0.0 2.0
1 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0
D 0 1.0 2.0 1.0 2.0 2.0 1.0 0.0 0.0 0.0 3.0 1.0 2.0 1.0 2.0
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
E 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 1.0 2.0 1.0 2.0 2.0 1.0 3.0 0.0 0.0 0.0 1.0 2.0 1.0 2.0
F 0 0.0 1.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
1 1.0 1.0 1.0 1.0 1.0 1.0 2.0 0.0 0.0 2.0 0.0 0.0 1.0 1.0
G 0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0
1 1.0 1.0 0.0 2.0 2.0 0.0 2.0 0.0 0.0 2.0 1.0 1.0 0.0 0.0