相关性仅适用于数值变量。当您查看相关性时,您本质上是在问,“随着 x 增加/减少,y 是否增加/减少?”
您的问题在“随着勒布朗·詹姆斯的得分增加/减少,球员 B 的得分增加/减少”的意义上是正确的。但是您的数据未设置为执行此操作。
playerdf.T
Out[66]:
2 4 ... 409 423
Name Dennis Schroder LeBron James ... Markieff Morris Marc Gasol
Date 2020-12-22 2020-12-22 ... 2020-12-25 2020-12-25
Points 43 35.25 ... 24.25 12.75
Team LAL LAL ... LAL LAL
[4 rows x 26 columns]
我很好奇他们是如何得分的???
我们需要旋转,以便每个实例/行都是日期/比赛,列是球员姓名,数值是得分。一旦你这样做了,你可以把它扔到.corr()方法中。
这样一来,你不会看到只有 2 场比赛/日期的数据:
import pandas as pd
file = '"https://docs.google.com/spreadsheets/d/e/2PACX-1vRlZiz12o4zOCRrjuTgBFlUwRjWKz2v2o4-B8dZ6C-kHwkmI5wRWMO4vS9u2bRVtCy9UJkwPXp-BKCw/pub?gid=0&single=true&output=csv"'
df = pd.read_csv(file)
playerdf = df[['Name', 'Date', 'Points', 'Team']]
playerdf = playerdf[playerdf['Team']=='LAL']
playerdf = playerdf.pivot(index='Date',
columns='Name',
values='Points').fillna(0)
corr = playerdf.corr()
输出:
print (corr.to_string())
Name Alex Caruso Alfonzo McKinnie Anthony Davis Dennis Schroder Jared Dudley Kentavious Caldwell-Pope Kyle Kuzma LeBron James Marc Gasol Markieff Morris Montrezl Harrell Quinn Cook Talen Horton-Tucker Wes Matthews
Name
Alex Caruso 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
Alfonzo McKinnie 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
Anthony Davis 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
Dennis Schroder -1.0 -1.0 -1.0 1.0 -1.0 1.0 -1.0 -1.0 -1.0 -1.0 1.0 NaN -1.0 -1.0
Jared Dudley 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
Kentavious Caldwell-Pope -1.0 -1.0 -1.0 1.0 -1.0 1.0 -1.0 -1.0 -1.0 -1.0 1.0 NaN -1.0 -1.0
Kyle Kuzma 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
LeBron James 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
Marc Gasol 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
Markieff Morris 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
Montrezl Harrell -1.0 -1.0 -1.0 1.0 -1.0 1.0 -1.0 -1.0 -1.0 -1.0 1.0 NaN -1.0 -1.0
Quinn Cook NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Talen Horton-Tucker 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
Wes Matthews 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
如果我回去获得一整季的价值:
import requests
from bs4 import BeautifulSoup
import re
url = 'https://www.basketball-reference.com/teams/LAL/2019_games.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table')
links = table.find_all('a', href=True)
boxscore_links = []
for link in links:
if 'boxscores' in link['href'] and '.html' in link['href']:
boxscore_links.append('https://www.basketball-reference.com' + link['href'])
playerdf = pd.DataFrame()
for link in boxscore_links:
print (link)
temp_df = pd.read_html(link, header=1,attrs={'id':'box-LAL-game-basic'})[0]
temp_df = temp_df[['Starters', 'PTS']]
temp_df = temp_df[temp_df['Starters'] != 'Team Totals']
temp_df = temp_df[temp_df['Starters'] != 'Reserves']
temp_df['PTS'] = temp_df['PTS'].replace('Did Not Play', 0)
temp_df['PTS'] = temp_df['PTS'].replace('Did Not Dress', 0)
temp_df['PTS'] = temp_df['PTS'].replace('Not With Team', 0)
temp_df['PTS'] = temp_df['PTS'].astype(int)
temp_df['Date'] = re.findall("\d+", link.split('/')[-1].split('.html')[0])[0]
temp_df = temp_df.rename(columns={'Starters':'Name', 'PTS':'Points'})
playerdf = playerdf.append(temp_df, sort=False).reset_index(drop=True)
playerdf = playerdf.pivot(index='Date',
columns='Name',
values='Points').fillna(0)
corr = playerdf.corr()
那么你可能会发现一些相关性:
输出:
print (corr.to_string())
Name Alex Caruso Andre Ingram Brandon Ingram Isaac Bonga Ivica Zubac JaVale McGee Jemerrio Jones Johnathan Williams Josh Hart Kentavious Caldwell-Pope Kyle Kuzma Lance Stephenson LeBron James Lonzo Ball Michael Beasley Mike Muscala Moritz Wagner Rajon Rondo Reggie Bullock Scott Machado Sviatoslav Mykhailiuk Tyson Chandler
Name
Alex Caruso 1.000000 NaN -0.502772 0.356931 -0.223081 0.360708 0.520267 0.635980 -0.377755 0.331362 -0.427086 -0.279960 -0.258477 -0.395673 -0.190208 0.614652 0.462480 0.282011 0.295477 0.180002 -0.240216 -0.272816
Andre Ingram NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Brandon Ingram -0.502772 NaN 1.000000 -0.311075 0.280328 -0.212760 -0.252852 -0.502750 0.064457 -0.330685 0.015547 -0.034681 -0.116722 0.068030 0.256519 -0.273952 -0.423331 -0.075037 -0.010224 -0.167714 -0.029635 0.142737
Isaac Bonga 0.356931 NaN -0.311075 1.000000 -0.014284 0.052887 0.212814 0.317496 -0.170178 0.018247 -0.210940 0.033076 -0.215860 -0.107862 -0.046352 0.249809 0.506899 0.069940 -0.003765 0.237553 0.191829 -0.104224
Ivica Zubac -0.223081 NaN 0.280328 -0.014284 1.000000 -0.348919 -0.125094 -0.255467 0.097697 0.003421 0.032512 0.154095 -0.462171 0.142622 0.449249 -0.204575 -0.046258 -0.060691 -0.268645 -0.082973 0.308421 0.115336
JaVale McGee 0.360708 NaN -0.212760 0.052887 -0.348919 1.000000 0.131512 0.203464 -0.195306 0.088362 -0.161654 0.007220 0.071916 -0.250259 -0.189589 0.220799 0.025695 0.074450 0.051457 0.142273 -0.038746 -0.271256
Jemerrio Jones 0.520267 NaN -0.252852 0.212814 -0.125094 0.131512 1.000000 0.544439 -0.246812 0.401716 -0.362906 -0.201776 -0.287865 -0.191340 -0.111905 0.805160 0.250571 0.039685 -0.040080 -0.032381 -0.126897 -0.151910
Johnathan Williams 0.635980 NaN -0.502750 0.317496 -0.255467 0.203464 0.544439 1.000000 -0.223735 0.216588 -0.335991 -0.076575 -0.112725 -0.280153 -0.212707 0.530976 0.638914 0.057808 0.074619 0.179093 -0.220783 -0.310233
Josh Hart -0.377755 NaN 0.064457 -0.170178 0.097697 -0.195306 -0.246812 -0.223735 1.000000 -0.202327 0.112090 0.106432 0.062429 0.359006 0.053293 -0.312218 -0.323296 -0.165224 -0.300856 -0.163708 0.190857 0.196536
Kentavious Caldwell-Pope 0.331362 NaN -0.330685 0.018247 0.003421 0.088362 0.401716 0.216588 -0.202327 1.000000 -0.254029 -0.053019 -0.329252 -0.151266 -0.087638 0.381221 0.187377 0.011464 0.038160 0.039444 0.037875 0.050367
Kyle Kuzma -0.427086 NaN 0.015547 -0.210940 0.032512 -0.161654 -0.362906 -0.335991 0.112090 -0.254029 1.000000 0.039111 0.187677 0.355282 0.081492 -0.370250 -0.338748 -0.254589 -0.105824 0.049026 0.018252 0.141192
Lance Stephenson -0.279960 NaN -0.034681 0.033076 0.154095 0.007220 -0.201776 -0.076575 0.106432 -0.053019 0.039111 1.000000 -0.048462 0.085465 0.009354 -0.265252 -0.066810 -0.071756 -0.357791 0.079382 0.264893 0.044603
LeBron James -0.258477 NaN -0.116722 -0.215860 -0.462171 0.071916 -0.287865 -0.112725 0.062429 -0.329252 0.187677 -0.048462 1.000000 -0.021212 -0.417934 -0.336107 -0.227264 0.032238 0.098842 -0.119156 -0.177819 -0.099600
Lonzo Ball -0.395673 NaN 0.068030 -0.107862 0.142622 -0.250259 -0.191340 -0.280153 0.359006 -0.151266 0.355282 0.085465 -0.021212 1.000000 0.078883 -0.312913 -0.298580 -0.442047 -0.410911 -0.126914 0.211892 0.520982
Michael Beasley -0.190208 NaN 0.256519 -0.046352 0.449249 -0.189589 -0.111905 -0.212707 0.053293 -0.087638 0.081492 0.009354 -0.417934 0.078883 1.000000 -0.183008 0.025792 -0.254584 -0.240322 -0.074226 0.167759 0.073540
Mike Muscala 0.614652 NaN -0.273952 0.249809 -0.204575 0.220799 0.805160 0.530976 -0.312218 0.381221 -0.370250 -0.265252 -0.336107 -0.312913 -0.183008 1.000000 0.306389 0.203155 0.207427 -0.052954 -0.207525 -0.248431
Moritz Wagner 0.462480 NaN -0.423331 0.506899 -0.046258 0.025695 0.250571 0.638914 -0.323296 0.187377 -0.338748 -0.066810 -0.227264 -0.298580 0.025792 0.306389 1.000000 0.016732 0.147417 0.341310 -0.074224 -0.206353
Rajon Rondo 0.282011 NaN -0.075037 0.069940 -0.060691 0.074450 0.039685 0.057808 -0.165224 0.011464 -0.254589 -0.071756 0.032238 -0.442047 -0.254584 0.203155 0.016732 1.000000 0.378034 -0.021978 -0.267364 -0.450237
Reggie Bullock 0.295477 NaN -0.010224 -0.003765 -0.268645 0.051457 -0.040080 0.074619 -0.300856 0.038160 -0.105824 -0.357791 0.098842 -0.410911 -0.240322 0.207427 0.147417 0.378034 1.000000 -0.069539 -0.272518 -0.296419
Scott Machado 0.180002 NaN -0.167714 0.237553 -0.082973 0.142273 -0.032381 0.179093 -0.163708 0.039444 0.049026 0.079382 -0.119156 -0.126914 -0.074226 -0.052954 0.341310 -0.021978 -0.069539 1.000000 -0.084170 -0.100761
Sviatoslav Mykhailiuk -0.240216 NaN -0.029635 0.191829 0.308421 -0.038746 -0.126897 -0.220783 0.190857 0.037875 0.018252 0.264893 -0.177819 0.211892 0.167759 -0.207525 -0.074224 -0.267364 -0.272518 -0.084170 1.000000 0.255530
Tyson Chandler -0.272816 NaN 0.142737 -0.104224 0.115336 -0.271256 -0.151910 -0.310233 0.196536 0.050367 0.141192 0.044603 -0.099600 0.520982 0.073540 -0.248431 -0.206353 -0.450237 -0.296419 -0.100761 0.255530 1.000000
热图:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
f, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(corr, mask=np.zeros_like(corr, dtype=np.bool), cmap=sns.diverging_palette(220, 10, as_cmap=True),
square=True, ax=ax)