【问题标题】:If-Else rules to hard code the parsing of an addressIf-Else 规则对地址的解析进行硬编码
【发布时间】:2019-04-01 10:32:09
【问题描述】:

我想从 Python 中的硬编码、基于规则的结构开始,最好是 使用 IF-ELSE 解决以下问题:

例如,我的格式正确

UK postal address:
Flat 8, The Apartment, King Philip Street, SE1 3WX

从上述实际地址可以得出的不同变体有:

这些侧重于地址变体的第一行:

Flat 8 - Actual
8
F8
f8
flat 8
flat8
FLAT8
FLAT 8

这些侧重于地址变化的第二行:

The Apartment - Actual
Apartment, 
TheApartment
theapartment
the apartment

这些侧重于地址变体的第三行:

King Philip Street - Actual
King Philip St
King Philip st
King Philip street
King Philip STREET
king philip St
king philip st
king philip street
king philip STREET

这些集中在地址变体的第四行:

SE1 3WX - Actual
SE13WX
SE1 3WX
se1 3wx
se13wx

因此,一旦将地址输入到函数中,Python函数应该能够解析并输出上述分段结果。

我还有几千个这样的地址需要解析。

以前有没有人做过类似的事情,有人可以帮我看看如何实现吗?

示例函数用法:

Python_Function("Flat 8, The Apartment, King Philip Street, SE1 3WX, England")

输出应该是:

首行地址:

Flat 8
8
F8
f8
flat 8
flat8
FLAT8
FLAT 8

二线地址:

The Apartment
Apartment
TheApartment
theapartment
the apartment

第三行地址:

King Philip Street
King Philip St
King Philip st
King Philip street
King Philip STREET
king philip St
king philip st
king philip street
king philip STREET

第四行地址:

SE1 3WX
SE13WX
SE1 3WX
se1 3wx
se13wx

第五行地址:

England
england
eng

【问题讨论】:

  • 发布示例输出?
  • 你有什么理由不做split(',')
  • 是的,@GBrandt 说了什么,甚至更好:.split(', ')(避免在某些值的开头出现 sup 空格)
  • Split 将分割文本并产生输出。但我也需要变体,我需要同时生成小写、大写和中间的空格(模仿人们在纸上实际写地址的方式)。
  • 你真的尝试过什么吗?

标签: python regex if-statement


【解决方案1】:

放弃硬编码的方法,转而采用更通用的方法。

我提供了开始(公寓和公寓的逻辑)。希望您能自己完成剩下的工作。

import re
from itertools import product

digits_regex = re.compile('\d+')

address = "Flat 8, The Apartment, King Philip Street, SE1 3WX, England"

def generate(full_address):
    def generate_flat(flat_number, prefixes=('f', 'flat')):
        flat_options = [str(flat_number)]
        for prefix in prefixes:
            flat_options.append('{}{}'.format(prefix, flat_number))
            flat_options.append('{} {}'.format(prefix, flat_number))
            flat_options.append('{}{}'.format(prefix.upper(), flat_number))
            flat_options.append('{} {}'.format(prefix.upper(), flat_number))
        return flat_options

    def generate_apartment(apartment):
        prefix, *rest = apartment.split()
        joined = ''.join((prefix, *rest))
        return [apartment, rest[0], joined, joined.lower(), ' '.join((prefix.lower(), *map(str.lower, rest)))]

    flat, apartment, street, area, country = full_address.split(', ')

    return [', '.join(variation) for variation in product(generate_flat(digits_regex.findall(flat)[0]), generate_apartment(apartment))]


for variation in generate(address):
    print(variation)

输出

8, The Apartment
8, Apartment
8, TheApartment
8, theapartment
8, the apartment
f8, The Apartment
f8, Apartment
f8, TheApartment
f8, theapartment
f8, the apartment
f 8, The Apartment
f 8, Apartment
f 8, TheApartment
f 8, theapartment
f 8, the apartment
F8, The Apartment
F8, Apartment
F8, TheApartment
F8, theapartment
F8, the apartment
F 8, The Apartment
F 8, Apartment
F 8, TheApartment
F 8, theapartment
F 8, the apartment
flat8, The Apartment
flat8, Apartment
flat8, TheApartment
flat8, theapartment
flat8, the apartment
flat 8, The Apartment
flat 8, Apartment
flat 8, TheApartment
flat 8, theapartment
flat 8, the apartment
FLAT8, The Apartment
FLAT8, Apartment
FLAT8, TheApartment
FLAT8, theapartment
FLAT8, the apartment
FLAT 8, The Apartment
FLAT 8, Apartment
FLAT 8, TheApartment
FLAT 8, theapartment
FLAT 8, the apartment

【讨论】:

  • 您好,您提供的代码非常棒,但是您能否帮助我尝试生成类似于我在上面的问题中显示的输出?例如,有一个输入,被打印成五个不同的段:(第一行地址、第二行地址、第三行地址、第四行地址等)。我真的很抱歉,因为总的来说我对编码很陌生。我一定会努力改进和学习你们所有人在这里分享的内容。
  • @Dinesh 然后,您需要做的就是在generate 的末尾删除对product 的调用,并简单地打印generate_flatgenerate_apartment 函数的输出。跨度>
  • 真的很抱歉,我尝试了不同的方法,它不起作用。我肯定做错了什么。我注释掉了返回函数,并用你提到的那一行替换了最后一行,我没有得到任何输出。最终,如果该函数能够一次输出所有片段,那么没有多个不同的函数调用将非常棒。
  • 是否可以删除以下内容:address = "Flat 8, The Apartment, King Philip Street, SE1 3WX, England" 并允许该函数能够接受任何类似的地址输入并具有该函数从地址输出所需的段?
【解决方案2】:

这里是一些有用的工具的快速入门(无需在你的位置做所有事情):

for i,field in enumerate(input.split(', ')):
    lower_case = field.lower()
    upper_case = field.upper()
    capital = field.proper()
    without_spaces = field.replace(' ','') # you can remove spaces for above as well

    #then you can had specifics
    if i == 1:
        only_first_three = field[:3]

    #now if someones really writes weirdly:
    splitted_field = field.split(' ')
    random_capitals = [ word.proper() for word in splitter_fields if ...]

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2018-11-15
    • 2018-02-19
    • 2019-12-04
    • 1970-01-01
    • 2021-01-27
    • 1970-01-01
    • 2012-06-05
    • 2018-06-09
    相关资源
    最近更新 更多