基于单个数据集上的单个变量或多个变量匹配观察 - stata答案

【问题标题】：Matchin observations based on a single variable or multiple variables on a single data set - stata基于单个数据集上的单个变量或多个变量匹配观察 - stata
【发布时间】：2022-06-17 06:18:56
【问题描述】：

对于我的论文，我需要根据衡量家庭条件、年龄、性别、教育等个人变量和年份的指数变量来匹配观察结果。我的家庭索引变量是数字变量（从 0 到 103），个人特征是虚拟变量或分类变量。对于我的分析，我需要根据这些变量匹配最相似的观察结果。这是一种最近邻匹配，但没有对照组或治疗组。

数据集看起来像这样。

indice_hogar anio mes directorio orden mujer nivel__educativo_cat trabaja
0 2018 08 4700731 1 1 4 1
0 2018 08 4700731 2 0 5 1
0 2018 11 4777752 1 0 5 1
37 2018 04 4605803 1 0 3 1
42 2011 07 2735691 1 1 4 1
42 2018 02 4545459 1 0 3 1
43 2018 12 4803694 1 0 5 1
44 2018 10 4747974 1 0 5 1
46 2018 05 4610096 1 0 3 1
47 2018 04 4598828 1 1 1 0
47 2018 08 4687722 1 0 1 0
48 2018 04 4592941 1 0 5 0
48 2018 06 4636177 1 0 3 1
50 2018 06 4645892 1 0 1 1
50 2018 06 4645892 2 1 4 1

为了更好地理解，我使用的IV是根据指数和个人特征最相似的人的能力。这意味着我需要找到与例如人 A 最相似的观察结果，然后能够利用其匹配的能力并将其用于回归。如果有人知道如何做到这一点，那将有很大帮助

我无法创建代码

【问题讨论】：

在statalist.org/forums/forum/general-stata-discussion/general/… 交叉发布，更合适的地方。

标签： compare stata matching

【解决方案1】：

复制您的数据集，并使用 nnmatch 将第一个副本与第二个副本匹配。

* Duplicate the data set
gen byte treat = 1
gen nobs = _N
save temp, replace
replace treat = 0
append using temp

* Make a fake outcome variable to keep nnmatch happy
gen byte outcome = runiform()<.5

* nnmatch performs a nearest neighbor match, return the id of the matched cases as nnid
teffects nnmatch (outcome indice_hogar nivel_educativo_cat trabaja) (treat), gen(nnid)

* Unduplicate the data set
keep if treat == 0

* change nnid to point to the 1st copy of the data set, not the 2nd
replace nnid = nnid - nobs

【讨论】：