【问题标题】:Print single word from Parsed Json text从解析的 Json 文本中打印单个单词
【发布时间】:2013-12-31 18:30:35
【问题描述】:

这是我在编写代码之前使用的原始数据。 (在我的代码发出调用后,我从 Twitter API 获取这些数据)

{"metadata":{"result_type":"recent","iso_language_code":"et"}
"created_at":"Tue Dec 03 01:41:53 +0000 2013","id":407686093790662656,"id_str":"407686093790662656","text":"@emblems123 justinbieberfan12599@gamil.com","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":407677310821613569,"in_reply_to_status_id_str":"407677310821613569","in_reply_to_user_id":2201997043,"in_reply_to_user_id_str":"2201997043","in_reply_to_screen_name":"emblems123","user":{"id":1220098345,"id_str":"1220098345","name":"PYD","screen_name":"bieberfan12599","location":

我运行以下代码:

import csv
import json
import oauth2 as oauth
import urllib
import sys
import requests
import time

CONSUMER_KEY = ""
CONSUMER_SECRET = ""
ACCESS_KEY = ""
ACCESS_SECRET = ""

class TwitterSearch:
    def __init__(self,
        ckey    = CONSUMER_KEY,
        csecret = CONSUMER_SECRET,
        akey    = ACCESS_KEY,
        asecret = ACCESS_SECRET,
        query   = 'https://api.twitter.com/1.1/search/tweets.{mode}?{query}'
    ):
        consumer     = oauth.Consumer(key=ckey, secret=csecret)
        access_token = oauth.Token(key=akey, secret=asecret)
        self.client  = oauth.Client(consumer, access_token)
        self.query   = query

    def search(self, q, mode='json', **queryargs):
        queryargs['q'] = q
        query = urllib.urlencode(queryargs)
        return self.client.request(self.query.format(query=query, mode=mode))

def write_csv(fname, rows, header=None, append=False, **kwargs):
    filemode = 'ab' if append else 'wb'
    with open(fname, filemode) as outf:
        out_csv = csv.writer(outf, **kwargs)
        if header:
            out_csv.writerow(header)
        out_csv.writerows(rows)

def main():
    ts = TwitterSearch()
    response, data = ts.search('@gmail.com', result_type='recent')
    js = json.loads(data)
    search_terms = ['@gmail.com']
    matches = []
    for term in search_terms:
        match = [word for word in js if term in word]
        matches.append(match)
    messages = ([msg['created_at'], msg['text'], msg['user']['id'], matches[0]] for msg in js.get('statuses', []))
    write_csv('twitter_gmail.csv', messages, append=True)

if __name__ == '__main__':
    main()

这是 .csv 中的输出:

Fri Dec 13 03:49:06 +0000 2013,I need some HARD TRAP beats producers help me out here...louiethefifthonline@gmail.com,490060971,[]

我的问题是我希望它只打印解析后的 JS 文本中的电子邮件地址。我试过 split() 但我不能用表达式来做到这一点。似乎无论我做什么它总是只是空白的“[]”

我真的很想弄清楚如何让它只打印“文本”中的电子邮件地址作为行的一部分。

【问题讨论】:

    标签: python json python-2.7 csv twitter


    【解决方案1】:

    假设你有一个字符串中的数据,那么你可以使用regex提取电子邮件:

    import re
    string = "Fri Dec 13 03:49:06 +0000 2013,I need some HARD TRAP beats producers help me out here...louiethefifthonline@gmail.com,490060971,[]"
    regex = "\w+@\w+\.com"
    match = re.findall(regex,string)
    print match
    

    包含所有匹配项的输出,在本例中为一个

    ['louiethefifthonline@gmail.com']
    

    即使您要将string 替换为通过使用str() 函数将dict 转换为字符串而获得的原始数据字符串:

    string = str({"metadata":{"result_type":"recent","iso_language_code":"et"},
                "created_at":"Tue Dec 03 01:41:53 +0000 2013","id":407686093790662656,"id_str":"407686093790662656","text":"@emblems123 justinbieberfan12599@gamil.com","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":False,"in_reply_to_status_id":407677310821613569,"in_reply_to_status_id_str":"407677310821613569","in_reply_to_user_id":2201997043,"in_reply_to_user_id_str":"2201997043","in_reply_to_screen_name":"emblems123","user":{"id":1220098345,"id_str":"1220098345","name":"PYD","screen_name":"bieberfan12599","location":"NY"}})
    

    您仍然可以得到预期的输出:

    ['justinbieberfan12599@gamil.com']
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-08-14
      • 1970-01-01
      • 1970-01-01
      • 2014-04-05
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多