【问题标题】:How to Simply Create New Variable Based on Ranges of Another如何根据另一个变量的范围简单地创建新变量
【发布时间】:2015-09-24 16:09:45
【问题描述】:

假设我有var1 是连续的:

clear
set obs 1000
gen var1 = runiform()
sum var1

现在我想根据var1 的范围创建var2。我可以这样做:

gen var2 = "Lowest" if var1<.25
replace var2 = "Low" if var1>=.25 & var1<.5
replace var2 = "High" if var1>=.5 & var1<.75
replace var2 = "Highest" if var1>=.75

我希望能够在一行中做到这一点。伪代码:

gen var2 = (ranges(0 .25 .5 .75 1) values("Lowest" "Low" "High" "Highest"))

R 中使用cut 做一些非常相似的事情的方法可以在Create categorical variable in R based on range 找到

是否有任何命令可以在 Stata 中执行类似于 R 版本的操作?想象一下,有 10,000 个范围需要进入 var2。那么更好的方法会有很大帮助。

另一种在 Stata 的一行上执行此操作的方法很笨拙,可以在 http://www.stata.com/support/faqs/data-management/multiple-operations/ 找到:

generate var2 = cond(var1<=.25, "Lowest", cond(var1<=.50, "Low", cond(var1<=.75, "High", cond(var1<=1.00, "Highest", ""))))

有没有更好的办法?

【问题讨论】:

    标签: variables stata


    【解决方案1】:

    cond() 函数是暗示的笨拙函数。有关示例,请参阅下面的 var3。它具有信号优势,您可以在代码中明确地显示不等式,并且完全按照您的意愿进行,egen, cut() 都不是这样。

    在这个特定的例子中,至少还有一个技巧是可能的。请参阅下面的var4 了解它是什么。

    . clear
    
    . set obs 15
    number of observations (_N) was 0, now 15
    
    . set seed 2803 
    
    . gen var1 = runiform()
    
    . sort var1 
    
    . gen var2 = "Lowest" if var1<.25
    (9 missing values generated)
    
    . replace var2 = "Low" if var1>=.25 & var1<.5
    (4 real changes made)
    
    . replace var2 = "High" if var1>=.5 & var1<.75
    (2 real changes made)
    
    . replace var2 = "Highest" if var1>=.75
    variable var2 was str6 now str7
    (3 real changes made)
    
    . gen var3 = cond(var1 < .25, "Lowest", cond(var1 <.5, "Low", cond(var1 <.75, "
    > High", "Highest"))) 
    
    . gen var4 = word("Lowest Low High Highest", ceil(4 * var1)) 
    
    . list 
    
         +----------------------------------------+
         |     var1      var2      var3      var4 |
         |----------------------------------------|
      1. | .0200225    Lowest    Lowest    Lowest |
      2. | .0360774    Lowest    Lowest    Lowest |
      3. | .0934085    Lowest    Lowest    Lowest |
      4. | .0950848    Lowest    Lowest    Lowest |
      5. | .1040797    Lowest    Lowest    Lowest |
         |----------------------------------------|
      6. | .1795591    Lowest    Lowest    Lowest |
      7. | .3326341       Low       Low       Low |
      8. | .3383934       Low       Low       Low |
      9. | .3870576       Low       Low       Low |
     10. | .3980427       Low       Low       Low |
         |----------------------------------------|
     11. | .6264514      High      High      High |
     12. | .6305373      High      High      High |
     13. | .7739685   Highest   Highest   Highest |
     14. | .7935746   Highest   Highest   Highest |
     15. | .9243789   Highest   Highest   Highest |
         +----------------------------------------+
    

    但是,如果您确实有 10,000 个范围要指定,并且它们没有归结为一些简单的规则,那么您自然不会采用这两种方式。您应该将它们放在一个文件中并使用一些基于merge 的代码。

    【讨论】:

      【解决方案2】:

      Stata 确实有一个cut 函数,作为egen 命令的一部分。使用它的选项并定义和分配值标签可以获得所需的结果(尽管是三行而不是一行,但它们是相当简洁的三行)。 例如,

      clear
      set obs 15
      gen var1 = runiform()
      sum var1
      
      gen var2 = "Lowest" if var1<.25
      replace var2 = "Low" if var1>=.25 & var1<.5
      replace var2 = "High" if var1>=.5 & var1<.75
      replace var2 = "Highest" if var1>=.75
      
      // =======================================================
      // Using egen , cut()
      // =======================================================
      label define rank 0 "Lowest" 1 "Low" 2 "High" 3 "Highest"
      egen var3 = cut(var1) , at(0(.25)1) icodes
      label values var3 rank
      
      li
      

      结果

           +------------------------------+
           |     var1      var2      var3 |
           |------------------------------|
        1. | .6658295      High      High |
        2. | .3690664       Low       Low |
        3. | .5983131      High      High |
        4. | .2658775       Low       Low |
        5. | .1211114    Lowest    Lowest |
           |------------------------------|
        6. | .2296222    Lowest    Lowest |
        7. | .7229139      High      High |
        8. | .2501513       Low       Low |
        9. | .7775574   Highest   Highest |
       10. | .2839603       Low       Low |
           |------------------------------|
       11. | .8396428   Highest   Highest |
       12. | .4838379       Low       Low |
       13. | .2610629       Low       Low |
       14. | .3855471       Low       Low |
       15. | .3447088       Low       Low |
           +------------------------------+
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2019-07-31
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-04-28
        • 2020-09-30
        相关资源
        最近更新 更多