在python中识别交替的大写和小写字符[关闭]答案

【问题标题】：Identify alternating Uppercase and lower case characters in python [closed]在python中识别交替的大写和小写字符[关闭]
【发布时间】：2017-01-02 07:00:34
【问题描述】：

我的数据如下，

data['word']

1  Word1
2  WoRdqwertf point
3  lengthy word
4  AbCdEasc
5  Not to be filtered
6  GiBeRrIsH
7  zSxDcFvGnnn

我想找出字符串中交替的大写和小写字母，并删除那些包含此类单词的行。例如，如果我们在这里看到，WoRdqwertf , AbCdEasc, GiBeRrIsH,zSxDcFvGnnn 有交替字符，我需要删除这些字符。

这里的重点是，包含Word1 的第一行不应该被删除，因为它只有一个大写，后跟一个小写。只有当它有一个 caps, small, caps 排列或 small, caps, small 排列时，我才想删除这些行。我的输出应该是，

data['word']

1  Word1
3  lengthy word
5  Not to be filtered

任何机构可以帮助我或提供一些解决此问题的想法吗？

【问题讨论】：

你尝试过什么吗？
试试re.search(r'[a-z][A-Z][a-z]|[A-Z][a-z][A-Z]', x)
@depperm 我不太确定如何尝试。

标签： python regex string python-2.7 python-3.x

【解决方案1】：

您可以使用字符串方法。 详细 ->

l = ['Word1','WoRdqwertf point','lengthy word','AbCdEasc', 'Not to be filtered','GiBeRrIsH', 'zSxDcFvGnnn']

n = []
for section in l:
    new_section = []
    for w in section.split():
        if w == w.title() or w == w.lower():
            new_section.append(w)
    s = ' '.join(new_section)
    if s:
        n.append(s)
    del new_section
print n

单线 ->

print filter(len,[' '.join(w for w in s.split()if w[1:].islower())for s in l])

输出：

['Word1', 'point', 'lengthy word', 'Not to be filtered']

【讨论】：

谢谢。我可以要求单独删除那个词而不是整行吗？例如，在 WoRdqwertf 点中，使其单独指向，在 AbCdEasc 中使其为空。
嗯，你原来的帖子里没有这个要求，我会更新一下。
抱歉，现在才想到这个。
或许re.sub(r'\s*\b(?=\w*(?:[a-z][A-Z]+[a-z]|[A-Z][a-z]+[A-Z]))\w+', '', section)
@haimen ideone.com/PxIFZx

【解决方案2】：

你也可以使用filter

data=['Word1','WoRdqwertf point','lengthy word','AbCdEasc','Not to be filtered','GiBeRrIsH','zSxDcFvGnnn']
str_list = filter(lambda item: (item[0].isupper() and item[1:].lower()==item[1:]) or item.islower(), data)
print(list(str_list))
#['Word1', 'lengthy word', 'Not to be filtered']

过滤器只会添加小写item.islower()的项目和仅以大写开头的项目(item[0].isupper() and item[1:].lower()==item[1:])

【讨论】：

【解决方案3】：

您可以使用正则表达式^(?:\w[a-z0-9]*(?: |$))*$:

data = ['Word1','WoRdqwertf point','lengthy word','AbCdEasc', 'Not to be filtered','GiBeRrIsH', 'zSxDcFvGnnn']
import re
for line in data:
    if re.search(r'^(?:\w[a-z0-9]*(?: |$))*$', line):
         print (line)

见live

【讨论】：

【解决方案4】：

还有一个正则表达式解决方案：

import re

rx = re.compile(r'(?=.*(?:\b[A-Z][a-z\d]+\b)|^[a-z ]+$).+')

lst = ['Word1', 'WoRdqwertf point', 'lengthy word', 'AbCdEasc', 'Not to be filtered', 'GiBeRrIsH', 'zSxDcFvGnnn']
new_list = [item \
            for item in lst \
            if rx.match(item)]

print(new_list)
# ['Word1', 'lengthy word', 'Not to be filtered']

【讨论】：