【问题标题】:Cleanup A String from Most Simply清理最简单的字符串
【发布时间】:2014-02-21 22:33:55
【问题描述】:

什么是清理用户输入字符串的简短方法。 这是我在清理混乱时所依赖的代码。如果可以使用更短的更智能版本,那就太好了。

invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
for c in invalid: 
    if len(line)>0: line=line.replace(c,'')

PS 我如何将这个 for(带有嵌套 if)函数放在一行上?

【问题讨论】:

    标签: python string if-statement for-loop strip


    【解决方案1】:

    最快的方法是使用str.translate:

    >>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
    >>> s = '@#$%^&*fdsfs#$%^&*FGHGJ'
    >>> s.translate(None, ''.join(invalid))
    'fdsfsFGHGJ'
    

    时间比较

    >>> s = '@#$%^&*fdsfs#$%^&*FGHGJ'*100
    
    >>> %timeit re.sub('[#@$%^&*()-+!]', '', s)
    1000 loops, best of 3: 766 µs per loop
    
    >>> %timeit re.sub('[#@$%^&*()-+!]+', '', s)
    1000 loops, best of 3: 215 µs per loop
    
    >>> %timeit "".join(c for c in s if c not in invalid)
    100 loops, best of 3: 1.29 ms per loop
    
    >>> %timeit re.sub(invalid_re, '', s)
    1000 loops, best of 3: 718 µs per loop
    
    >>> %timeit s.translate(None, ''.join(invalid))         #Winner
    10000 loops, best of 3: 17 µs per loop
    

    在 Python3 上,您需要执行以下操作:

    >>> trans_tab = {ord(x):None for x in invalid}
    >>> s.translate(trans_tab)
    'fdsfsFGHGJ'
    

    【讨论】:

      【解决方案2】:
      import re
      re.sub('[#@$%^&*()-+!]', '', line)
      

      re 是正则表达式模块。使用方括号意味着“匹配括号内的任何一项”。所以调用说,“在括号内的 line 中找到任何内容,然后将其替换为空内容 ('')。

      【讨论】:

      • 顺便说一句,如果你也想清除方括号,你必须转义右边的那个:re.sub('[#@$%^&*()-+![\]]', '', line)
      【解决方案3】:

      你可以这样做:

      from string import punctuation # !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
      
      line = "".join(c for c in line if c not in punctuation)
      

      例如:

      'hello, I @m pleased to meet you! How *about (you) try something > new?'
      

      变成

      'hello I m pleased to meet you How about you try something  new'
      

      【讨论】:

        【解决方案4】:

        这是正则表达式实际上有用的一种情况。

        >>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
        >>> import re
        >>> invalid_re = '|'.join(map(re.escape, invalid))
        >>> re.sub(invalid_re, '', 'foo * bar')
        'foobar'
        

        【讨论】:

          【解决方案5】:

          这是我在自己的代码中使用的 sn-p。您基本上使用正则表达式来指定允许的字符,匹配这些字符,然后将它们连接在一起。

          import re
          
          def clean(string_to_clean, valid='ACDEFGHIKLMNPQRSTVWY'):
              """Remove unwanted characters from string.
          
              Args:
              clean: (str) The string from which to remove
               unwanted characters.
          
               valid_chars: (str) The characters that are valid and should be
               included in the returned sequence. Default character
               set is: 'ACDEFGHIKLMNPQRSTVWY'.
          
               Returns: (str) A sequence without the invalid characters, as a string.
          
               """
              valid_string = r'([{}]+)'.format(valid)
              valid_regex = re.compile(valid_string, re.IGNORECASE)
          
              # Create string of matching characters, concatenate to string
              # with join().
              return (''.join(valid_regex.findall(string_to_clean)))
          

          【讨论】:

            【解决方案6】:

            使用简单的列表推导:

            >>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
            >>> x = 'foo * bar'
            >>> "".join(i for i in x if i not in invalid)
            'foobar'
            

            string.punctuation+\s使用列表推导:

            >>> import string
            >>> x = 'foo * bar'
            >>> "".join(i for i in x if i not in string.punctuation)
            'foo  bar'
            >>> "".join(i for i in x if i not in string.punctuation+" ")
            'foobar'
            

            使用str.translate

            >>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
            >>> x = 'foo * bar'
            >>> x.translate(None,"".join(invalid))
            'foobar'
            

            使用re.sub:

            >>> import re
            >>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
            >>> x = 'foo * bar'
            >>> y = "["+"".join(invalid)+"]"
            >>> re.sub(y,'',x)
            'foobar'
            >>> re.sub(y+'+','',x)
            'foobar'
            

            【讨论】:

              【解决方案7】:

              这行得通

              invalid = '#@$%^_ '
              line = "#master_Of^Puppets#@$%Yeah"
              line = "".join([for l in line if l not in invalid])
              #line will be - 'masterOfPuppetsYeah'
              

              【讨论】:

                猜你喜欢
                • 1970-01-01
                • 1970-01-01
                • 1970-01-01
                • 2016-06-18
                • 2011-06-22
                • 1970-01-01
                • 1970-01-01
                • 1970-01-01
                • 1970-01-01
                相关资源
                最近更新 更多