【问题标题】:Find frequencies of all factor combinations of all column combinations查找所有列组合的所有因子组合的频率
【发布时间】:2019-10-07 03:48:58
【问题描述】:

我有一个包含 n 个变量的数据框,这些变量的值都是因子。现在我想从这个数据框中选择 m 列 (m

我已经查过了,但我只发现了在选择特定列的情况下如何找到因子组合的频率。在我的情况下,可能有许多列组合,因为 m

这是我们的数据,所有变量都有因子值。

company <- data.frame("country" = c("USA", "China", 'France', "Germany"),
                    "category" = c("C-corp", "S-corp", "C-corp", "LLC"),
                    "Type" = c("Public", "Private", "Private", "Private"),
                    "Profit" = c("High", "High", "High", "Low"))

现在我想选择 2 列 (m = 2) 并找出所有可能选择的变量的因子组合的频率

在这种情况下,我可以有“country = USA & category = S-Corp”、“country = USA & category = C-Corp”、“country = China & category = LLC”。但我也可以选择其他列并设置“国家 = 美国 & 利润 = 低”、“国家 = 中国 & 类型 = 公共”。我想知道所有这些组合的频率

编辑:我的预期输出类似于

country = USA, category = C-corp  freq 1
country = USA, category = S-corp  freq 0
country = USA, category = LLC  freq 0
country = China, category = LLC  freq 0
country = France, category = C-corp  freq 1
country = USA, type = Public    freq 1
country = China, type = Public    freq 0
Type = Private, Profit = High   freq 2
Type = Public, category = LLC  freq 0
category = Private, Profit = Low freq 1

如果我需要选择 2 列,我需要所有可能的列组合,顺序无关紧要

【问题讨论】:

    标签: r dataframe


    【解决方案1】:

    组合部分听起来像expand.grid()

    expand.grid(company[, 1:2])
    
       country category
    1      USA   C-corp
    2    China   C-corp
    3   France   C-corp
    4  Germany   C-corp
    5      USA   S-corp
    6    China   S-corp
    7   France   S-corp
    8  Germany   S-corp
    9      USA   C-corp
    10   China   C-corp
    11  France   C-corp
    12 Germany   C-corp
    13     USA      LLC
    14   China      LLC
    15  France      LLC
    16 Germany      LLC
    
    # or if you want 4 columns with all countries, do a cross join:
    
    merge(company[, 1, drop = F], company[, -1], by = NULL)
    
    #or if you want 4 columns with all possible results, do expand.grid without subsetting:
    
    expand.grid(company)
    

    第二部分听起来像table()。可以直接在companydata.frame 上执行:

    table(company)
    
    , , Type = Private, Profit = High
    
             category
    country   C-corp LLC S-corp
      China        0   0      1
      France       1   0      0
      Germany      0   0      0
      USA          0   0      0
    
    , , Type = Public, Profit = High
    
             category
    country   C-corp LLC S-corp
      China        0   0      0
      France       0   0      0
      Germany      0   0      0
      USA          1   0      0
    
    , , Type = Private, Profit = Low
    
             category
    country   C-corp LLC S-corp
      China        0   0      0
      France       0   0      0
      Germany      0   1      0
      USA          0   0      0
    
    , , Type = Public, Profit = Low
    
             category
    country   C-corp LLC S-corp
      China        0   0      0
      France       0   0      0
      Germany      0   0      0
      USA          0   0      0
    

    【讨论】:

      【解决方案2】:

      您可以使用表格函数的嵌套循环来做到这一点:

      for (j in 1:ncol(company)) {
          for (i in 1:ncol(company)) {
                print(table(company[[j]],
                            company[[i]]))
            }
      }
      

      它很丑,有很多重复,但它对你的目的来说又快又容易。

      【讨论】:

      • 或许内循环只需要去j? for (i in 1:j) ... ?
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2015-10-15
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多