【发布时间】:2020-10-12 09:17:10
【问题描述】:
我有以下数据框,并且我正在为每个 ID 在工作日的所有天和周末的所有天之间执行 t 检验。
> +-----+------------+-----------+---------+-----------+ | id | usage_day | dow | tow | daily_avg |
> +-----+------------+-----------+---------+-----------+ | c96 | 01/09/2020 | Tuesday | week | 393.07 |
> +-----+------------+-----------+---------+-----------+ | c96 | 02/09/2020 | Wednesday | week | 10.38 |
> +-----+------------+-----------+---------+-----------+ | c96 | 03/09/2020 | Thursday | week | 429.35 |
> +-----+------------+-----------+---------+-----------+ | c96 | 04/09/2020 | Friday | week | 156.20 |
> +-----+------------+-----------+---------+-----------+ | c96 | 05/09/2020 | Saturday | weekend | 346.22 |
> +-----+------------+-----------+---------+-----------+ | c96 | 06/09/2020 | Sunday | weekend | 106.53 |
> +-----+------------+-----------+---------+-----------+ | c96 | 08/09/2020 | Tuesday | week | 194.74 |
> +-----+------------+-----------+---------+-----------+ | c96 | 10/09/2020 | Thursday | week | 66.30 |
> +-----+------------+-----------+---------+-----------+ | c96 | 17/09/2020 | Thursday | week | 163.84 |
> +-----+------------+-----------+---------+-----------+ | c96 | 18/09/2020 | Friday | week | 261.81 |
> +-----+------------+-----------+---------+-----------+ | c96 | 19/09/2020 | Saturday | weekend | 410.30 |
> +-----+------------+-----------+---------+-----------+ | c96 | 20/09/2020 | Sunday | weekend | 266.28 |
> +-----+------------+-----------+---------+-----------+ | c96 | 23/09/2020 | Wednesday | week | 346.18 |
> +-----+------------+-----------+---------+-----------+ | c96 | 24/09/2020 | Thursday | week | 20.67 |
> +-----+------------+-----------+---------+-----------+ | c96 | 25/09/2020 | Friday | week | 222.23 |
> +-----+------------+-----------+---------+-----------+ | c96 | 26/09/2020 | Saturday | weekend | 449.84 |
> +-----+------------+-----------+---------+-----------+ | c96 | 27/09/2020 | Sunday | weekend | 438.47 |
> +-----+------------+-----------+---------+-----------+ | c96 | 28/09/2020 | Monday | week | 10.44 |
> +-----+------------+-----------+---------+-----------+ | c96 | 29/09/2020 | Tuesday | week | 293.59 |
> +-----+------------+-----------+---------+-----------+ | c96 | 30/09/2020 | Wednesday | week | 194.49 |
> +-----+------------+-----------+---------+-----------+
我的脚本如下,可惜太慢了,不是pandas的做事方式。 我怎样才能更有效地做到这一点?
from scipy.stats import ttest_ind, ttest_ind_from_stats
p_val = []
stat_flag = []
all_ids = df.id.unique()
alpha = 0.05
print(len(all_ids))
for id in all_ids:
t = df[df.id == id]
group1 = t[t.tow == 'week']
group2 = t[t.tow == 'weekend']
t, p_value_ttest = ttest_ind(group1.daily_avg, group2.daily_avg, equal_var=False)
if p_value_ttest < alpha:
p_val.append(p_value_ttest)
stat_flag.append(1)
else:
p_val.append(p_value_ttest)
stat_flag.append(0)
p-val 给出每个 id 的 p 值。
【问题讨论】:
-
不要截图,而是复制粘贴数据。图片对于数值数据来说是非常差的媒介。另请阅读minimal reproducible example,以更好地理解为什么拥有一个可行的输入数据样本很重要。
-
感谢您提供的信息!我已经用数据替换了图像。
标签: python pandas dataframe t-test