【问题标题】:Python - How do I separate punctuation from words by white space leaving only one space between the punctuation and the word?Python - 如何通过空格将标点符号与单词分开,在标点符号和单词之间只留下一个空格?
【发布时间】:2015-03-04 20:11:56
【问题描述】:

我有以下字符串:

input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

除“/”、“'”、“-”、“+”和“$”外,所有标点符号都应与单词分开。

所以输出应该是:

"I love programming with Python-3 . 3 ! Do you ? It's great . . . I give it a 10/10. It's free-to-use , no $$$ involved !"

我使用了以下代码:

for x in string.punctuation:
    if x == "/":
        continue
    if x == "'":
        continue
    if x == "-":
        continue
    if x == "+":
        continue
    if x == "$":
        continue
    input = input.replace(x," %s " % x)

我得到以下输出:

I love programming with Python-3 . 3 !  Do you ?  It's great .  .  .  I give it a 10/10 .  It's free-to-use ,  no $$$ involved ! 

它有效,但问题是它有时会在标点符号和单词之间留下两个空格,例如句子中的第一个感叹号和单词“Do”之间。这是因为它们之间已经有了空间。

这个问题也会发生在:input = "Hello. (hi)"。输出将是:

" Hello .  ( hi ) "

注意左括号前的两个空格。

我需要在任何标点和单词之间只有一个空格的输出,除了上面提到的 5 个标点,它们没有与单词分开。我怎样才能解决这个问题?或者,有没有更好的方法使用正则表达式来做到这一点?

提前致谢。

【问题讨论】:

    标签: python regex


    【解决方案1】:
    # Approach 1
    
    import re
    
    sample_input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
    
    sample_input = re.sub(r"([^\s])([^\w\/'+$\s-])", r'\1 \2', sample_input)
    print(re.sub(r"([^\w\/'+$\s-])([^\s])", r'\1 \2', sample_input))
    
    # Approach 2
    
    import string
    
    sample_input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
    
    punctuation = string.punctuation.replace('/', '').replace("'", '') \
            .replace('-', '').replace('+', '').replace('$', '')
    
    i = 0
    
    while i < len(sample_input):
        if sample_input[i] not in punctuation:
            i += 1
            continue
    
        if i > 0 and sample_input[i-1] != ' ':
            sample_input = sample_input[:i] + ' ' + sample_input[i:]
            i += 1
    
        if i + 1 < len(sample_input) and sample_input[i+1] != ' ':
            sample_input = sample_input[:i+1] + ' ' + sample_input[i+1:]
            i += 1
    
        i += 1
    
    print(sample_input)
    

    【讨论】:

      【解决方案2】:

      在我看来,否定字符类更简单:

      import re
      
      input_string = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
      
      print re.sub(r"\s?([^\w\s'/\-\+$]+)\s?", r" \1 ", input_string)
      

      输出:

      I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free-to-use , no $$$ involved ! 
      

      【讨论】:

        【解决方案3】:

        由于缺乏声誉无法发表评论,但在这种情况下

        在句子中的第一个感叹号和单词“Do”之间

        看起来有两个空格,因为 ! 之间已经有空格了并做

        ! 做

        所以,如果标点符号后面已经有空格,就不要再放一个空格了。

        另外,这里有一个类似的问题:python regex inserting a space between punctuation and letters

        所以也许考虑使用re

        【讨论】:

          【解决方案4】:

          看起来re 可以为你做到...

          >>> import re
          >>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r"\1 ", input)
          "I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free-    to-use , no $$$ involved ! "
          

          >>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r"\1 ", "Hello. (hi)")
          'Hello . ( hi ) '
          

          如果尾随空格有问题,.rtrim(theresult, ' ') 应该会为您解决问题:-)

          【讨论】:

          • 优秀。但是,多个标点之间应该有一个空格,例如!!!。输出应该是! ! !, 并不是 !!!。但是通过这样做,我现在可以在解析过程中考虑像 :) 或 :( 这样的表情符号。所以这对我来说是一个巨大的优势。感谢你@Alex Martelli +1
          • @modarwish,要分隔点字符,请通过删除结尾 + 来更改上面的 [^\w/'+$\s-]+
          【解决方案5】:

          我可以试试这个方法吗:

          >>> import string
          >>> input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
          >>> ls = []
          >>> for x in input:
          ...     if x in string.punctuation:
          ...         ls.append(' %s' % x)
          ...     else:
          ...         ls.append(x)
          ...
          >>> ''.join(ls)
          "I love programming with Python -3 .3 ! Do you ? It 's great . . . I give it a 10 /10 . It 's free -to -use , no  $ $ $ involved !"
          >>>
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2020-12-17
            • 1970-01-01
            相关资源
            最近更新 更多