【问题标题】:Remove double quote \" symbol from string从字符串中删除双引号 \" 符号
【发布时间】:2018-08-15 12:45:02
【问题描述】:

我需要从向量中删除\"。这是我的数据:

data <- c("\"https://click.linksynergy.com/link?id=RUxZriH*PWc&offerid=323058.1803224&type=2&murl=https%3A%2F%2Fwww.udemy.com%2Flinux-linux-security-masterclass-3-in-1%2F", 
"\"https://click.linksynergy.com/link?id=RUxZriH*PWc&offerid=323058.1848638&type=2&murl=https%3A%2F%2Fwww.udemy.com%2Fmastering-kali-linux%2F", 
"\"https://click.linksynergy.com/link?id=RUxZriH*PWc&offerid=323058.1426684&type=2&murl=https%3A%2F%2Fwww.udemy.com%2Finformation-gathering-with-kali-linux%2F", 
"\"https://click.linksynergy.com/link?id=RUxZriH*PWc&offerid=323058.1628300&type=2&murl=https%3A%2F%2Fwww.udemy.com%2Flinux-switchblade%2F", 
"\"https://click.linksynergy.com/link?id=RUxZriH*PWc&offerid=323058.1615700&type=2&murl=https%3A%2F%2Fwww.udemy.com%2Fadministrador-de-sistemas-junior-en-windows-server-y-linux%2F", 
"\"https://click.linksynergy.com/link?id=RUxZriH*PWc&offerid=323058.809770&type=2&murl=https%3A%2F%2Fwww.udemy.com%2Flearn-bash-shell-in-linux-for-beginners-lite%2F", 
"\"https://click.linksynergy.com/link?id=RUxZriH*PWc&offerid=323058.574388&type=2&murl=https%3A%2F%2Fwww.udemy.com%2Fhow-to-install-linux-ubuntu-server%2F", 
"\"https://click.linksynergy.com/link?id=RUxZriH*PWc&offerid=323058.1436610&type=2&murl=https%3A%2F%2Fwww.udemy.com%2Fcentos-and-ubuntu-managing-packages%2F", 
"\"https://click.linksynergy.com/link?id=RUxZriH*PWc&offerid=323058.1771266&type=2&murl=https%3A%2F%2Fwww.udemy.com%2Flinux-foundation-certified-system-administrator-exam%2F", 
"\"https://click.linksynergy.com/link?id=RUxZriH*PWc&offerid=323058.1734052&type=2&murl=https%3A%2F%2Fwww.udemy.com%2Flinux-server-security%2F"
)

如您所见,每个对象都以\" 开头。如何专门删除这些字符并留下链接?

【问题讨论】:

    标签: r regex gsub stringr


    【解决方案1】:

    你可以试试这个。请注意,您真正想要的是删除\",而不是"\(如您问题的未编辑版本中所建议的那样)。第一个"需要代表字符中的每个元素。

    gsub('[\"]', '', data)
    

    【讨论】:

      【解决方案2】:

      或者我们可以在模式上使用'"'

      gsub('"', "", data)
      

      【讨论】:

        【解决方案3】:

        如果它总是第一个字符,那么只需使用 substring:

        substring(data, 2)
        

        这应该比任何 regex 解决方案都要快。

        data <- rep(data, 1000)
        
        microbenchmark::microbenchmark(
          a = substring(data, 2),  
          b = gsub("\"", "", data, fixed = TRUE),
          c = gsub('"', "", data),
          d = gsub('[\"]', '', data),
          e = stringr::str_replace(data, '[\"]', ''),
          f = gsub("^.","",data)
          )
        # Unit: milliseconds
        # expr       min        lq      mean    median        uq       max neval
        #    a  2.835013  2.849838  2.933796  2.857393  2.900301  4.446956   100
        #    b  4.728632  4.739751  4.788882  4.754861  4.795203  5.200185   100
        #    c  7.388025  7.413684  7.503427  7.458444  7.555520  8.160925   100
        #    d  7.390876  7.412686  7.530044  7.454453  7.533568  8.535544   100
        #    e 12.019154 12.205608 12.430870 12.316084 12.581081 13.917336   100
        #    f 15.712882 15.735975 15.875353 15.770043 15.861275 18.906262   100
        

        【讨论】:

          【解决方案4】:

          使用fixed = TRUE 将模式匹配为字符串:

          gsub("\"", "", data, fixed = TRUE)
          

          【讨论】:

            【解决方案5】:

            这也有效:

            gsub("\"", "", data)
            

            【讨论】:

              【解决方案6】:

              @milan 更快 :)

              stringr 的方法是

              library(stringr)
              str_replace(data, '[\"]', '')
              

              【讨论】:

                【解决方案7】:

                您还可以删除第一个字符,跳过令人头疼的反斜杠:

                gsub("^.","",data)
                

                【讨论】:

                  【解决方案8】:

                  我使用 gsub() 和 noquote() 的组合

                  for (i in data){
                     print(gsub('"','',(noquote(i))))
                  }
                  

                  【讨论】:

                    猜你喜欢
                    • 1970-01-01
                    • 1970-01-01
                    • 2013-10-09
                    • 2012-09-26
                    • 1970-01-01
                    • 1970-01-01
                    • 2021-02-26
                    • 1970-01-01
                    • 2021-06-14
                    相关资源
                    最近更新 更多