具有国家-行业-时间变量的固定效应模型答案

【问题标题】：fixed effects model with country - industry - time variables具有国家-行业-时间变量的固定效应模型
【发布时间】：2021-12-24 01:28:20
【问题描述】：

我正在研究国家和行业因素（例如 GDP、进口、出口）对工资差异的影响。我在 5 年内收集了 75 个国家、19 个行业的行业级数据，并尝试使用固定效应模型进行分析。

我想知道如何使用 R 将数据集识别为按部门和时间划分的面板数据。我了解到以下代码将用于 Stata。 R有类似的代码吗？

egen country_industry = group(country industry)
xtset country_industry time

我尝试在 R 中使用以下代码，但没有成功：

library(plm)
panel8 =pdata.frame(sampledata7_industry, index=c("id","industry","year"))

错误代码如下：

> library(plm)
> panel8 =pdata.frame(sampledata7_industry, index=c("id","industry","year"))
Warning message:
In pdata.frame(sampledata7_industry, index = c("id", "industry",  :
  duplicate couples (id-time) in resulting pdata.frame
 to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")

我的数据前几行如下： sampledata7_industry

我应该重新编号ID以结合国家和行业吗？一种思路如下：

Re-numbering of sampledata7_industry

【问题讨论】：

请提供数据。至少有几行，所以我们可以重现它。您可以使用函数dput() 发布最小数据集。
@Bloxx 感谢您的评论。如上所述，我添加了我的数据集“sampledata7_industry”的前几行。

标签： r panel-data plm

【解决方案1】：

我认为问题在于，在 Stata 中，您的分组变量是 country-industry，而在 R 中，您试图对两个变量进行分组，即国家和行业。根据 pdata.frame 的文档：

index 参数表示面板的尺寸。可以是：

• 包含个人姓名和时间索引的两个字符串的向量

因此，如果您像在 Stata 中那样将 'id' 和 'industry' 组合成一个变量，它应该可以工作。

【讨论】：

感谢您的回答。我尝试使用以下代码将“id”和“行业”结合起来，但我不知道在这些代码之后该怎么办。如果您能告诉我前进的代码，我将不胜感激。库（dplyr）面板= group_by（sampledata7_industry，id，行业）
或者我应该重新编号ID以结合国家和行业？我在上面发布了这个更新数据集的前几行作为“对 sampledata7_industry 重新编号”。