python函数的默认参数并不总是有效答案

【问题标题】：Default parameter on python function not always workingpython函数的默认参数并不总是有效
【发布时间】：2018-01-21 02:08:52
【问题描述】：

我正在阅读 Programming Collective Intelligence 并以比书中写的更 Pythonic 的方式编写一些代码，只是为了学习。

第一章是关于推荐系统的。在下一个字典的基础上，提出了一些相似性度量。

critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane':
3.5,
        'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
        'The Night Listener': 3.0},
    'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
        'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
        'You, Me and Dupree': 3.5},
    'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
        'Superman Returns': 3.5, 'The Night Listener': 4.0},
    'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
        'The Night Listener': 4.5, 'Superman Returns': 4.0,
        'You, Me and Dupree': 2.5},
    'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
        'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
        'You, Me and Dupree': 2.0},
    'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
        'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
    'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}

鉴于 unique_pairs 是一个包含不同可能人对的元组列表，

unique_pairs = list(itertools.combinations(people, 2))

unique_pairs
[('Michael Phillips', 'Mick LaSalle'),
 ('Michael Phillips', 'Lisa Rose'),
 ('Michael Phillips', 'Toby'),
 ('Michael Phillips', 'Jack Matthews'),
 ('Michael Phillips', 'Gene Seymour'),
 ('Michael Phillips', 'Claudia Puig'),
 ('Mick LaSalle', 'Lisa Rose'),
 ('Mick LaSalle', 'Toby'),
 ('Mick LaSalle', 'Jack Matthews'),
 ('Mick LaSalle', 'Gene Seymour'),
 ('Mick LaSalle', 'Claudia Puig'),
 ('Lisa Rose', 'Toby'),
 ('Lisa Rose', 'Jack Matthews'),
 ('Lisa Rose', 'Gene Seymour'),
 ('Lisa Rose', 'Claudia Puig'),
 ('Toby', 'Jack Matthews'),
 ('Toby', 'Gene Seymour'),
 ('Toby', 'Claudia Puig'),
 ('Jack Matthews', 'Gene Seymour'),
 ('Jack Matthews', 'Claudia Puig'),
 ('Gene Seymour', 'Claudia Puig')]

我尝试通过在函数的结果中添加一个p值来改进书中建议的Pearson Correlation相似度函数，只有在函数的参数p_value为真时才输出。函数是这样定义的：

def sim_pearson(prefs, p1, p2, p_value=False):
    """Returns the pearson correlation coefficient and the p-value (optional)
    of the ratings of the movies that both p1 and p2 have rated"""

    # Creates a list with the movies that both p1 and p2 have rated
    movies = [movie for movie in prefs[p1] if movie in prefs[p2]]

    # List of the scores that both p1 and p2 have given to the movies in common
    scores_p1 = [prefs[p1][movie] for movie in movies]
    scores_p2 = [prefs[p2][movie] for movie in movies]

    corr, p_value = scipy.stats.pearsonr(scores_p1, scores_p2)

    if p_value:
        return (corr, p_value)
    else:
        return corr

我的问题是该函数无法按预期工作，因为当 p 值为 True 时，它不会始终返回 (correlation coefficient, p-value) 的元组，并且当 p 值为 True 时它会产生相同的结果p_value 为真，当它为假时。为什么会发生这种情况，我该如何解决？

这是一个列表，其中包含将函数应用于每一对可能的人的结果，看看我说了什么。 p_value=True 和 p_value=False 的结果是一样的，我就粘贴前一种情况。

pearson_results = [(pair[0][:5], 
                    pair[1][:5], 
                    sim_pearson(critics, pair[0], pair[1], p_value=True)) 
                    for pair in unique_pairs]

pearson_results
[('Micha', 'Mick ', (-0.2581988897471611, 0.74180111025283857)),
 ('Micha', 'Lisa ', (0.40451991747794525, 0.59548008252205464)),
 ('Micha', 'Toby', -1.0),
 ('Micha', 'Jack ', (0.13483997249264842, 0.8651600275073511)),
 ('Micha', 'Gene ', (0.20459830184114206, 0.79540169815885797)),
 ('Micha', 'Claud', 1.0),
 ('Mick ', 'Lisa ', (0.59408852578600457, 0.21370636293028805)),
 ('Mick ', 'Toby', (0.92447345164190498, 0.24901011701138964)),
 ('Mick ', 'Jack ', (0.21128856368212914, 0.73299431171284912)),
 ('Mick ', 'Gene ', (0.41176470588235292, 0.41726032973743138)),
 ('Mick ', 'Claud', (0.56694670951384085, 0.3189317919127756)),
 ('Lisa ', 'Toby', (0.99124070716193036, 0.084323216321943714)),
 ('Lisa ', 'Jack ', (0.74701788083399601, 0.14681146067336839)),
 ('Lisa ', 'Gene ', (0.39605901719066977, 0.43697492654267506)),
 ('Lisa ', 'Claud', (0.56694670951384085, 0.3189317919127756)),
 ('Toby', 'Jack ', (0.66284898035987017, 0.53869426797895403)),
 ('Toby', 'Gene ', (0.38124642583151169, 0.75098988298861025)),
 ('Toby', 'Claud', (0.89340514744156441, 0.29661883133160016)),
 ('Jack ', 'Gene ', (0.96379568187563314, 0.0082243534847899202)),
 ('Jack ', 'Claud', (0.028571428571428571, 0.9714285714285712)),
 ('Gene ', 'Claud', (0.31497039417435602, 0.60570041941160946))]

【问题讨论】：

您正在使用 corr, p_value = scipy.stats.pearsonr(scores_p1, scores_p2) 行重新定义 p_value，因此此处赋予函数的参数无关紧要
啊，太明显了。谢谢@PRMoureu！

标签： python python-3.x data-analysis pearson

【解决方案1】：

将函数的底部更改为：

corr, p_value2 = scipy.stats.pearsonr(scores_p1, scores_p2)

if p_value:
    return (corr, p_value2)
else:
    return corr

【讨论】：