具有面板数据的 Probit 模型答案

【问题标题】：Probit model with panel data具有面板数据的 Probit 模型
【发布时间】：2021-02-08 18:18:32
【问题描述】：

我是 R 的新手用户。我正在使用版本 1.3.1093 和 Windows。

我正在研究 2005 年至 2019 年期间欧洲对冲基金的所有激进干预措施的面板数据集（时间变量）。所以我有固定年份水平的数据。我创建了 ID 变量，给每个公司一个编号。我已经计算了每个公司年度的一些财务比率：Roa、Ebitda 利润率、销售增长、杠杆等。我还有关于账面市值比和市值 Ln 的数据，我想运行二进制概率模型来解释对冲基金目标（目标 = 1，非目标 = 0）与上述几个变量（滞后一年）。这是数据集的一部分：[在此处输入图像描述][1]

   ï..Company.code Company Targeted T.of.intervation  TRBC  Year Book.to.market Capex.to.sales EBITDA.MARGIN Leverage Ln.of.Mv
             <int> <chr>      <int> <chr>            <int> <int>          <dbl>          <dbl>         <dbl>    <dbl>    <dbl>
 1               1 BALDA ~        0 2006              5110  2005          0.387         0.0816       0.185      0.219     5.65
 2               1 BALDA ~        1 2006              5110  2006          0.554         0.0935      -0.0548     0.426     5.46
 3               1 BALDA ~        1 2006              5110  2007          0.292         0.137       -0.0993     0.337     5.69
 4               1 BALDA ~        1 2006              5110  2008          3.55          0.144       -0.00861    0.263     4.44
 5               2 SUEZ SA        0 2006              5910  2005          0.733         0.0925       0.180      0.445     6.65
 6               2 SUEZ SA        1 2006              5910  2006          1.11          0.0877       0.175      0.417     6.51
 7               2 SUEZ SA        1 2006              5910  2007          0.949         0.0941       0.168      0.526     6.58
 8               2 SUEZ SA        1 2006              5910  2008          0.600         0.0925       0.150      0.551     6.77
 9               3 ASM IN~        0 2007              5710  2006          0.321         0.0449       0.193      0.340     5.93
10               3 ASM IN~        1 2007              5710  2007          0.354         0.0494       0.185      0.260     5.95
# ... with 3,357 more rows, and 7 more variables: Nwc.to.sales <dbl>, ROA <dbl>, Sales.Growth <dbl>, Industrial <int>,
#   NR <int>, Tmt <int>, Consumer <int>````    

  [1]: https://i.stack.imgur.com/a3nJj.png

【问题讨论】：

能否在您的问题中提供示例数据（例如，作为 tribble df <- tribble(~col1, ~col2, ... ~colN, val1-1, val2-1, ..., valN-1, val1-2, val2-2, ..., valN-2, ...)）。建议答案会容易得多。注意使用“插入代码”选项（即每行以四个空格开头。
不确定我的编辑是否是您的意思，请告诉我
请看下面的答案。如果您添加可以简单地复制/粘贴到 IDE（在本例中为 R）的示例数据，您将有更好的机会获得可用代码的回复。我发现tribble 命令最实用，因为它对人类来说非常易读，并且允许从您可能使用的任何原始来源轻松格式化。

标签： r

【解决方案1】：

建立逻辑回归模型 (LR) 的机制非常简单。 R stats modul 支持以下 LR 机制：

library(tidyverse) #this helps import data from example


df <- tribble(
~id, ~Company.code, ~Company, ~Targeted, ~T.of.intervation,  ~TRBC,  ~Year, ~Book.to.market, ~Capex.to.sales,  ~EBITDA.MARGIN, ~Leverage, ~Ln.of.Mv,
1,               1, "BALDA",     0,        2006,              5110,  2005,          0.387,         0.0816,       0.185,      0.219,     5.65,
2,               1, "BALDA",     1,        2006,              5110,  2006,          0.554,         0.0935,      -0.0548,     0.426,     5.46,
3,               1, "BALDA",     1,        2006,              5110,  2007,          0.292,         0.137,       -0.0993,     0.337,     5.69,
4,               1, "BALDA",     1,        2006,              5110,  2008,          3.55,          0.144,       -0.00861,    0.263,     4.44,
5,               2, "SUEZ",      0,        2006,              5910,  2005,          0.733,         0.0925,       0.180,      0.445,     6.65,
6,               2, "SUEZ",      1,        2006,              5910,  2006,          1.11,          0.0877,       0.175,      0.417,     6.51,
7,               2, "SUEZ",      1,        2006,              5910,  2007,          0.949,         0.0941,       0.168,      0.526,     6.58,
8,               2, "SUEZ",      1,        2006,              5910,  2008,          0.600,         0.0925,       0.150,      0.551,     6.77,
9,               3, "ASM",       0,        2007,              5710,  2006,          0.321,         0.0449,       0.193,      0.340,     5.93,
10,              3, "ASM",       1,        2007,              5710,  2007,          0.354,         0.0494,       0.185,      0.260,     5.95
) 

lr <- glm(Targeted~Book.to.market+Capex.to.sales+EBITDA.MARGIN+Leverage+Ln.of.Mv, family = "binomial", data = df) #this is the model training (fails for this dataset)

prediction <- predict(lr, df, type = "response") #this applies model to the data 
#note: this example doesn't make sense - I didn't have enough data to make both training and validation
#      datasets - you should keep part of your data and use it instead `df` in this step

可以在许多培训中找到更多详细信息 - 例如here.

【讨论】：