【问题标题】:How to count the occurrences of "c(\" in a string in a data frame in R?如何计算R中数据框中字符串中“c(\”)的出现次数?
【发布时间】:2021-12-29 23:34:31
【问题描述】:

我有一个数据框,其中某些列包含来自 Mplus 的错误和警告消息。文本以一种奇怪的格式保存,因此我希望通过计算单元格中 c(\ 的出现次数来简单地计算消息的数量,而不是尝试处理每条消息,因为它是出现在每个消息之前的唯一字符组合警告或错误。

例如,一个单元格包含消息:

[[1]]
[1] "c(\"All variables are uncorrelated with all other variables within class.\""
[2] " \"Check that this is what is intended.\""                                  
[3] " \"1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS\")"                         
[4] " c(\"WARNING:  THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED.  THE\""     
[5] " \"SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA.  INCREASE THE\""    
[6] " \"NUMBER OF RANDOM STARTS.\")" 

而另一个包含这样的较短消息:

[[1]]
[1] "c(\"All variables are uncorrelated with all other variables within class.\""
[2] " \"Check that this is what is intended.\""                                  
[3] " \"1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS\")" 

我尝试了几种不同的方式使用 str_count,包括我最近的尝试:

    str_count(test#, '//c(\//')

但我收到错误:Error: '\/' is an unrecognized escape in character string starting "'//c(\/"。理想情况下,第一个示例返回 2,第二个示例返回 1。

当这个唯一字符串包含的字符无法封装或转义时,我如何计算它的出现次数?

这里有一些易于使用的测试代码来试一试!

test1 <- '"c(\"All variables are uncorrelated with all other variables within class.\"" " \"Check that this is what is intended.\"" " \"1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS\")"'

test2 <- '"c(\"All variables are uncorrelated with all other variables within class.\"" " \"Check that this is what is intended.\"" " \"1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS\")" " c(\"WARNING:  THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED.  THE\"" " \"SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA.  INCREASE THE\"" " \"NUMBER OF RANDOM STARTS.\")"'

【问题讨论】:

  • 不是您的问题的解决方案,但您是否考虑过使用lavaan 直接在 R 中进行 SEM?
  • 在我看来,将问题简化为只找到c( 可能更容易,您可以这样做:str_count(test1, "c\\(")
  • 这看起来 data.frame 构造不佳;最好保留原始的“字符向量列表”格式(或者它是否更复杂?)并按照df = data.frame(x = 1:2); df$y = list(c("a", "b"), "d"); lengths(df$y) 的行使用,例如lengths()
  • 我们查看了 lavaan,但是关于估计器或整个输入选项的一些事情让我的顾问认为 Mplus 是最好的选择,所以此时我无法控制。 @deschen
  • @D.J 这实际上可以很好地工作,我想我没有完全理解转义选项是如何完全工作的 - ( 和 \ 都给我带来了很多麻烦。

标签: r string count


【解决方案1】:

您可以尝试在我的评论中减少要计算的部分

str_count(test1, "c\\(")

或者您可以通过检查c(\" 来延长参数并使用fixed() 参数:

str_count(test1, fixed('c(\"'))

如您所见,两种方式都显示正确答案:

string1 <- 'c(\"All variables are uncorrelated with all other variables within class.\"" 
             " \"Check that this is what is intended.\"" 
             " \"1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS\")" 
             " c(\"WARNING:  THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED. 
             THE\"" " \"SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA.  INCREASE THE\""
             " \"NUMBER OF RANDOM STARTS.\")'

> str_count(string1, fixed('c(\"'))
[1] 2
> str_count(string1, "c\\(")
[1] 2

【讨论】:

    【解决方案2】:

    你可以试试gregexpr()

    test1 <- '"c(\" foo bar baz'
    test2 <- '"c(\" foo bar baz "c(\" baz bar foo'
    
    length(unlist(gregexpr('c\\(', test1)))
    # [1] 1
    length(unlist(gregexpr('c\\(', test2)))
    # [1] 2
    length(unlist(gregexpr('c\\(', list(test1, test2))))
    # [1] 3
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-08-02
      • 1970-01-01
      • 2023-02-04
      • 2020-09-23
      • 2011-04-21
      • 2011-05-13
      相关资源
      最近更新 更多