【问题标题】:Finding the length of each string within a column of a data-frame in R在R中的数据框的一列中查找每个字符串的长度
【发布时间】:2015-05-11 13:25:23
【问题描述】:

我希望计算name 列的每个字符串的字符数。我的数据框sample 如下所示:

date        name           expenditure      type
23MAR2013   KOSH ENTRP     4000             COMPANY
23MAR2013   JOHN DOE       800              INDIVIDUAL
24MAR2013   S KHAN         300              INDIVIDUAL
24MAR2013   JASINT PVT LTD 8000             COMPANY
25MAR2013   KOSH ENTRPRISE 2000             COMPANY
25MAR2013   JOHN S DOE     220              INDIVIDUAL
25MAR2013   S KHAN         300              INDIVIDUAL
26MAR2013   S KHAN         300              INDIVIDUAL

为什么nchar 给了我一个随机数列表? str_length() 来自 stringr 包也是如此

Length <- aggregate(nchar(sample$name), by=list(sample$name), FUN=nchar)

输出

         Group.1       x
1 JASINT PVT LTD       2
2       JOHN DOE       1
3     JOHN S DOE       2
4     KOSH ENTRP       2
5 KOSH ENTRPRISE       2
6         S KHAN 1, 1, 1

期望的输出:

     Group.1       x
1 JASINT PVT LTD       14
2       JOHN DOE       8
3     JOHN S DOE       10
4     KOSH ENTRP       10
5 KOSH ENTRPRISE       14
6         S KHAN       6

上表的csv:

"Date","name","expenditure","type"
"23MAR2013","KOSH ENTRP",4000,"COMPANY"
"23MAR2013 ","JOHN DOE",800,"INDIVIDUAL"
"24MAR2013","S KHAN",300,"INDIVIDUAL"
"24MAR2013","JASINT PVT LTD",8000,"COMPANY"
"25MAR2013","KOSH ENTRPRISE",2000,"COMPANY"
"25MAR2013","JOHN S DOE",220,"INDIVIDUAL"
"25MAR2013","S KHAN",300,"INDIVIDUAL"
"26MAR2013","S KHAN",300,"INDIVIDUAL"

【问题讨论】:

  • 您是否需要将spaces 也包括在计数中?在预期的输出中,字符数有一些不一致。例如,在第一行,空格也被计算在内,但在最后一行,5空格被省略如果是错字@987654333 @

标签: r dataframe string-length


【解决方案1】:

你也可以applynchar到你的dataframe并从对应的列中获取结果:

data.frame(names=temp$name,chr=apply(temp,2,nchar)[,2])
      names chr
1     KOSH ENTRP  10
2       JOHN DOE   8
3         S KHAN   6
4 JASINT PVT LTD  14
5 KOSH ENTRPRISE  14
6     JOHN S DOE  10
7         S KHAN   6
8         S KHAN   6

【讨论】:

    【解决方案2】:

    如果“Desired Output”中的最后一行是错字,

     aggregate(name~name1, transform(sample, name1=name),
                             FUN=function(x) nchar(unique(x)))
     #            name1 name
     #1 JASINT PVT LTD   14
     #2       JOHN DOE    8
     #3     JOHN S DOE   10
     #4     KOSH ENTRP   10
     #5 KOSH ENTRPRISE   14
     #6         S KHAN    6
    

    或者

     Un1 <- unique(sample$name)
     data.frame(Group=Un1, x=nchar(Un1))
    

    【讨论】:

      【解决方案3】:

      或者,使用data.table

      dtx[,PepSeqLen := nchar(PepSeq)]
      

      【讨论】:

      • 错误信息:'nchar()' 需要一个字符向量
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2016-06-19
      • 2018-06-20
      • 2021-06-01
      • 2016-12-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多