【问题标题】:Write a function that performs multiple student t-test for a list of DataFrames编写一个对 DataFrame 列表执行多个学生 t 检验的函数
【发布时间】:2021-06-09 09:47:37
【问题描述】:

我有这个数据框:

print(TempvsDType)
              CurrentThermostatTemp
DwellingType                       
Bungalow                        0.0
Bungalow                       22.0
Bungalow                       22.0
Bungalow                       25.0
Bungalow                       18.0
Bungalow                       21.0
Bungalow                       22.0
Bungalow                       10.0
Bungalow                       18.0
Bungalow                       20.0
Bungalow                       20.0
Bungalow                       22.0
Bungalow                       20.0
Bungalow                       10.0
Bungalow                       30.0
Bungalow                       22.0
Bungalow                       20.0
Bungalow                       20.0
Bungalow                       19.0
Bungalow                       20.0
Bungalow                       22.0
Bungalow                       20.0
Bungalow                       21.0
Bungalow                       22.0
Bungalow                       15.0
Bungalow                       22.0
Bungalow                        0.0
Bungalow                       24.0
Bungalow                       30.0
Bungalow                       20.0
...                             ...
Park Home                      20.0
Park Home                      23.0
Park Home                      20.0
Park Home                      20.0
Park Home                      20.0
Park Home                      18.0
Park Home                      20.0
Park Home                      15.0
Park Home                      12.0
Park Home                      20.0
Park Home                      20.0
Park Home                      23.0
Park Home                      21.0
Park Home                      20.0
Park Home                      20.0
Park Home                      20.0
Park Home                      23.0
Park Home                      18.0
Park Home                      20.0
Park Home                      18.0
Park Home                      16.0
Park Home                      17.0
Park Home                      20.0
Park Home                      20.0
Park Home                      18.0
Park Home                      18.0
Park Home                      20.0
Park Home                      20.0
Park Home                      15.0
Park Home                      21.0

[6247 rows x 1 columns]

我已经用 .truncate() 方法分隔了每个变量:


Flat = TempvsDType.truncate(before="Flat",after="Flat")
House = TempvsDType.truncate(before="House",after="House")
Bungalow = TempvsDType.truncate(before="Bungalow",after="Bungalow")
Maisonette = TempvsDType.truncate(before="Maisonette",after="Maisonette")
ParkHome = TempvsDType.truncate(before="Park Home",after="Park Home")

我的目标是对变量之间的所有可能组合执行学生 t 检验,重复或重复对除外。但是,我必须手动执行此操作,这非常耗时,特别是对于其他变量超过 5 个且组合数量大幅增加的脚本。这是我的手动方法:

from scipy.stats import ttest_ind
#All possible combinations:
Flat_House = ttest_ind(Flat,House)
Flat_Bungalow = ttest_ind(Flat,Bungalow)
Flat_Maisonette = ttest_ind(Flat,Maisonette)
Flat_ParkHome = ttest_ind(Flat,ParkHome)
House_Bungalow = ttest_ind(House,Bungalow)
House_Maisonette = ttest_ind(House,Maisonette)
House_ParkHome = ttest_ind(House,ParkHome)
Bungalow_Maisonette = ttest_ind(Bungalow,Maisonette)
Bungalow_ParkHome = ttest_ind(Bungalow,ParkHome)
Maisonette_ParkHome = ttest_ind(Maisonette, ParkHome)
#t-test between each combination
print("t-test between {} and {} is {} and p-value:{}".format(u[0],u[1],Flat_House[0],Flat_House[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[0],u[2],Flat_Bungalow[0],Flat_Bungalow[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[0],u[3],Flat_Maisonette[0],Flat_Maisonette[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[0],u[4],Flat_ParkHome[0],Flat_ParkHome[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[1],u[2],House_Bungalow[0],House_Bungalow[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[1],u[3],House_Maisonette[0],House_Maisonette[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[1],u[4],House_ParkHome[0],House_ParkHome[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[2],u[3],Bungalow_Maisonette[0],Bungalow_Maisonette[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[2],u[4],Bungalow_ParkHome[0],Bungalow_ParkHome[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[3],u[4],Maisonette_ParkHome[0],Maisonette_ParkHome[1]))

因此,我想知道如何编写一个可以自动执行此操作的函数,即打印除重复项和现有对之外的所有可能组合的学生 t 检验,并以我手动打印的方式返回它。我已经尝试了很多次,但都没有成功。如果有人可以帮助我,我会非常高兴。谢谢。

【问题讨论】:

    标签: python pandas function dataframe scipy.stats


    【解决方案1】:
    from itertools import combinations
    from scipy.stats import ttest_ind
    
    dfs = dict(tuple(TempvsDType.drop_duplicates(inplace=False).groupby('DwellingType'))) #  drop duplicate rows, and create a dictionary of dataframes after grouping by DwellingType
    
    def ttest(pair):
        results= ttest_ind(dfs[pair[0]]['CurrentThermostatTemp'], dfs[pair[1]]['CurrentThermostatTemp'])
        print(f"t-test between {pair[0]} and {pair[1]} is {results[0]} and p-value: {results[1]}")
    
    all_combinations = list(combinations(list(dfs.keys()), 2)) # find all combinations in the keys of the dict with dataframes
    [ttest(i) for i in all_combinations] # pass all combinations through the function ttest
    

    输出: t-test between Bungalow and Park Home is 0.2594309721800956 and p-value: 0.7984182890048678

    【讨论】:

    • 非常感谢先生。通过拿走python drop_duplicates(inplace=False),它可以完全按照我的意愿工作。感谢您的帮助。
    猜你喜欢
    • 2016-08-19
    • 1970-01-01
    • 1970-01-01
    • 2018-11-19
    • 2020-12-29
    • 1970-01-01
    • 2020-07-26
    • 2021-10-12
    • 1970-01-01
    相关资源
    最近更新 更多