【问题标题】:Python Mann-Whitney confidence intervalPython Mann-Whitney 置信区间
【发布时间】:2018-08-14 16:07:06
【问题描述】:

我有两个数据集(Pandas 系列) - ds1 和 ds2 - 我想计算平均值(如果正常)或中位数(非正常)差异的 95% 置信区间。

对于平均值的差异,我计算 t 检验统计量和 CI:

import statsmodels.api as sm
tstat, p_value, dof = sm.stats.ttest_ind(ds1, ds2)
CI = sm.stats.CompareMeans.from_data(ds1, ds2).tconfint_diff()

对于中位数,我愿意:

from scipy.stats import mannwhitneyu
U_stat, p_value = mannwhitneyu(ds1, ds2, True, "two-sided")

如何计算中位数差异的 CI?

【问题讨论】:

    标签: python statistics


    【解决方案1】:

    我看到一篇论文(计算一些非参数的置信区间 MICHAEL J CAMPBELL, MARTIN J GARDNER) 的分析给出了 CI 公式。

    基于此:

    from scipy.stats import norm
    
    ct1 = ds1.count()  #items in dataset 1
    ct2 = ds2.count()  #items in dataset 2
    alpha = 0.05       #95% confidence interval
    N = norm.ppf(1 - alpha/2) # percent point function - inverse of cdf
    
    # The confidence interval for the difference between the two population
    # medians is derived through these nxm differences.
    diffs = sorted([i-j for i in ds1 for j in ds2])
    
    # For an approximate 100(1-a)% confidence interval first calculate K:
    k = int(round(ct1*ct2/2 - (N * (ct1*ct2*(ct1+ct2+1)/12)**0.5)))
    
    # The Kth smallest to the Kth largest of the n x m differences 
    # ct1 and ct2 should be > ~20
    CI = (diffs[k], diffs[len(diffs)-k])
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-03-08
      • 2014-10-11
      • 2013-07-11
      • 2019-02-15
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多