【问题标题】:How can extract multiple email id's and phone numbers from a single text file with python?如何使用 python 从单个文本文件中提取多个电子邮件 ID 和电话号码?
【发布时间】:2021-11-20 08:34:14
【问题描述】:

您好,我有一个包含多个信息的大文本文件。我想使用 python 程序或工具仅提取电子邮件 ID 和电话号码。

HTTP/1.1 200 OK

{"id":"269","first_name":"N S","last_name":"","balance":"0","phonecode":null,"mobile":null,"email":"wand412@gmail.com","verified":"0","password":""}


HTTP/1.1 200 OK


{"id":"303","first_name":"Devi","last_name":"Baruah","balance":"0","phonecode":null,"mobile":null,"email":"dxxxxxx@yahoo.com","verified":"0","password":""}


HTTP/1.1 200 OK


{"id":"306","first_name":"Rashmi","last_name":"Kumari","balance":"24","phonecode":"91","mobile":"9xxxxxxx","email":"xxxxxxx7@gmail.com","verified":"1","password":"xxxx"}


HTTP/1.1 200 OK

{"id":"308","first_name":"ashwini","last_name":"gokhale","balance":"7","phonecode":"1","mobile":"61xxxx","email":"axxxx@gmail.com","verified":"1","password":"xxxxxxx"}


HTTP/1.1 200 OK

{"id":"307","first_name":"Rama","last_name":"De","balance":"0","phonecode":"91","mobile":"73xxxxxx","email":"dexxxx@gmail.com","verified":"1","password":"xxxx"}

【问题讨论】:

    标签: python json text


    【解决方案1】:

    如果你的文件名为test.txt,你可以使用下面的sn-p从文件中解析json部分,一次一行:

    import json
    
    
    items = []
    with open("test.txt") as file_handle:
        for line in file_handle:
            try:
                if item := json.loads(line):
                    items.append(item)
            except json.decoder.JSONDecodeError:
                pass
    
    # 'items' is a list of dictionaries that contain each user's details.
    # If you want to extract the IDs, email addresses and phone numbers into separate lists, one way to do it is:
    ids = [item.get("id") for item in items]
    email = [item.get("email") for item in items]
    mobile = [item.get("mobile") for item in items]
    

    【讨论】:

    • 我似乎收到以下错误:Traceback(最近一次调用最后一次):文件“C:\Users\iamun\PycharmProjects\pythonProject2\main.py”,第 15 行,在 ids = [int(item.get("id")) for item in items] 文件“C:\Users\iamun\PycharmProjects\pythonProject2\main.py”,第 15 行,在 中 ids = [int(item. get("id")) for item in items] AttributeError: 'NoneType' object has no attribute 'get' Process finished with exit code 1
    • 我得到了几行没有的输出
    • 很好,至少有一行被解析为None。我更新了代码,使其跳过任何Nones。我使用的是海象运算符,所以如果你使用的python版本低于3.8,你需要用旧语法替换if item := json.loads(line):
    • 等一下,我想我收到了错误,因为某些数据为空而不是 json 文件
    【解决方案2】:

    看起来这是来自网络服务器的日志。如果可能,请先尝试使用更干净的文件,
    无论如何:

    import json
    
    mandatory_keys = ['email', 'mobile']
    file_str = []
    out = []
    with open('test') as fd:
        file_str = [x.rstrip('\n') for x in fd.readlines() if x.startswith('{')]
    for j_str in file_str:
        try:
            j = json.loads(j_str)
            assert [x for x in mandatory_keys if x in j.keys()] == mandatory_keys, f'missing mandatory_keys'
            out.append({k: v for k, v in j.items() if k in mandatory_keys})
        except:
            raise ValueError('Something wrong with the json')
        
    print(out)
    

    此外,您可能希望使用一些 json 模型验证器作为“jsonschema”来替换那里的断言行并获得明确的错误消息。
    更改mandatory_key 列表,您可以轻松更新您的输出。

    【讨论】:

    • 代码写得很好,但是,我只想要电子邮件和手机中的数据,我该怎么做?除此之外,我必须说代码完美运行
    • out dict 会过滤数据,您可以根据需要打印:csv: f"{out['mobile']; out['email']}",只需print: f"{out['mobile'] out['email']}", sql: f"{insert into tab select out['mobile'], out['email']}" ..
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-01-11
    • 1970-01-01
    • 1970-01-01
    • 2017-09-20
    • 1970-01-01
    相关资源
    最近更新 更多