【问题标题】:Error when Identifying Effects of Causal Model识别因果模型的影响时的错误
【发布时间】:2022-01-14 22:57:23
【问题描述】:

我正在尝试使用 CausalModel 和 Econml 库来确定变量对以下数据集中显示的不同场景的影响:

首先,我导入以下库:

import pandas as pd
import econml
import dowhy
from dowhy import CausalModel

然后我使用 pandas read_csv 导入数据集并将其命名为“df”。

之后,我将因果模型定义如下:

model = CausalModel(data=df.fillna(0),
                    treatment='ai_host.disk.write.bytes',
                    outcome='scenario',
                    common_causes='col'
                    )

model.view_model()

以下是输出

然后我生成估算值:

identified_estimand= model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)

输出如下:

Estimand type: nonparametric-ate

### Estimand : 1
Estimand name: backdoor
Estimand expression:
             d                                        
───────────────────────────(Expectation(scenario|col))
d[ai_host.disk.write.bytes]                           
Estimand assumption 1, Unconfoundedness: If U→{ai_host.disk.write.bytes} and U→scenario then P(scenario|ai_host.disk.write.bytes,col,U) = P(scenario|ai_host.disk.write.bytes,col)

### Estimand : 2
Estimand name: iv
No such variable found!

### Estimand : 3
Estimand name: frontdoor
No such variable found!

在此之后,我终于尝试计算因果效应:

identified_estimand_experiment = model.identify_effect(proceed_when_unidentifiable=True)

from sklearn.ensemble import RandomForestRegressor
metalearner_estimate = model.estimate_effect(identified_estimand_experiment,
method_name="backdoor.econml.metalearners.TLearner",
confidence_intervals=False,
method_params={
     "init_params":{'models': RandomForestRegressor()},
     "fit_params":{}
              })
print(metalearner_estimate)

但我每次都会收到以下错误:

ValueError                                Traceback (most recent call last)
<ipython-input-15-6f34377dbe77> in <module>()
      8 method_params={
      9      "init_params":{'models': RandomForestRegressor()},
---> 10      "fit_params":{}
     11               })
     12 print(metalearner_estimate)

7 frames
/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_encoders.py in _transform(self, X, handle_unknown, force_all_finite, warn_on_unknown)
    140                         " during transform".format(diff, i)
    141                     )
--> 142                     raise ValueError(msg)
    143                 else:
    144                     if warn_on_unknown:

ValueError: Found unknown categories [0] in column 0 during transform

请有人帮助我理解和纠正这个错误。另请注意,要使用 Econml,您需要 Python 3.8 及更低版本。

【问题讨论】:

    标签: python pandas causality causalml


    【解决方案1】:

    我也遇到了这个问题,但是当我使用线性回归模型而不是随机森林回归元学习器时,我没有遇到任何问题。

    这需要替换

    identified_estimand_experiment = model.identify_effect(proceed_when_unidentifiable=True)
    
    from sklearn.ensemble import RandomForestRegressor
    metalearner_estimate = 
    model.estimate_effect(identified_estimand_experiment,
    method_name="backdoor.econml.metalearners.TLearner",
    confidence_intervals=False,
    method_params={
                   "init_params":{'models': RandomForestRegressor()},
                   "fit_params":{}
                   })
    print(metalearner_estimate)
    
        
    

    identified_estimand_experiment = model.identify_effect(proceed_when_unidentifiable=True)
    
    linreg_estimate = model.estimate_effect(identified_estimand_experiment,
                                method_name="backdoor.linear_regression",
                                confidence_intervals=False)
    print(linreg_estimate)
    

    其他方法如使用

    method_name = "backdoor.propensity_score_stratification" 
    

    method_name = "backdoor.propensity_score_matching"
    

    也可能感兴趣。

    【讨论】:

      猜你喜欢
      • 2021-10-15
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-11-01
      • 1970-01-01
      • 2020-11-03
      • 2022-01-05
      相关资源
      最近更新 更多