【问题标题】:How to restrict tweepy to give only geotagged tweets如何限制 tweepy 只提供带有地理标记的推文
【发布时间】:2017-06-28 01:30:28
【问题描述】:

我正在尝试从特定国家/地区获取推文。我正在使用 tweepy api 来获取推文。这是我到目前为止的代码 -

api = tweepy.API(auth)
places = api.geo_search(query="India", granularity="country")
place_id = places[0].id
public_tweets = api.search(q="place:%s" % place_id)
for one in public_tweets:
        print(one.place)

这是我为上述代码 sn-p 得到的结果 -

None
None
Place(_api=<tweepy.api.API object at 0x1033f7690>, country_code=u'IN', url=u'https://api.twitter.com/1.1/geo/id/243cc16f6417a167.json', country=u'India', place_type=u'city', bounding_box=BoundingBox(_api=<tweepy.api.API object at 0x1033f7690>, type=u'Polygon', coordinates=[[[78.3897718, 17.3013989], [78.5404168, 17.3013989], [78.5404168, 17.4759], [78.3897718, 17.4759]]]), contained_within=[], full_name=u'Hyderabad, Andhra Pradesh', attributes={}, id=u'243cc16f6417a167', name=u'Hyderabad')
Place(_api=<tweepy.api.API object at 0x1033f7690>, country_code=u'IN', url=u'https://api.twitter.com/1.1/geo/id/1b8680cd52a711cb.json', country=u'India', place_type=u'city', bounding_box=BoundingBox(_api=<tweepy.api.API object at 0x1033f7690>, type=u'Polygon', coordinates=[[[77.3734736, 12.9190365], [77.7393706, 12.9190365], [77.7393706, 13.2313813], [77.3734736, 13.2313813]]]), contained_within=[], full_name=u'Bengaluru, Karnataka', attributes={}, id=u'1b8680cd52a711cb', name=u'Bengaluru')
None
None
None
None
None
None
Place(_api=<tweepy.api.API object at 0x1033f7690>, country_code=u'IN', url=u'https://api.twitter.com/1.1/geo/id/1dc2b546652c55dd.json', country=u'India', place_type=u'admin', bounding_box=BoundingBox(_api=<tweepy.api.API object at 0x1033f7690>, type=u'Polygon', coordinates=[[[73.8853747, 29.5438816], [76.9441213, 29.5438816], [76.9441213, 32.5763957], [73.8853747, 32.5763957]]]), contained_within=[], full_name=u'Punjab, India', attributes={}, id=u'1dc2b546652c55dd', name=u'Punjab')
Place(_api=<tweepy.api.API object at 0x1033f7690>, country_code=u'IN', url=u'https://api.twitter.com/1.1/geo/id/1dc2b546652c55dd.json', country=u'India', place_type=u'admin', bounding_box=BoundingBox(_api=<tweepy.api.API object at 0x1033f7690>, type=u'Polygon', coordinates=[[[73.8853747, 29.5438816], [76.9441213, 29.5438816], [76.9441213, 32.5763957], [73.8853747, 32.5763957]]]), contained_within=[], full_name=u'Punjab, India', attributes={}, id=u'1dc2b546652c55dd', name=u'Punjab')
None
None
Place(_api=<tweepy.api.API object at 0x1033f7690>, country_code=u'IN', url=u'https://api.twitter.com/1.1/geo/id/1b8680cd52a711cb.json', country=u'India', place_type=u'city', bounding_box=BoundingBox(_api=<tweepy.api.API object at 0x1033f7690>, type=u'Polygon', coordinates=[[[77.3734736, 12.9190365], [77.7393706, 12.9190365], [77.7393706, 13.2313813], [77.3734736, 13.2313813]]]), contained_within=[], full_name=u'Bengaluru, Karnataka', attributes={}, id=u'1b8680cd52a711cb', name=u'Bengaluru')

大多数推文没有地理标记。如何确保结果中只显示带有地理标记的推文?

【问题讨论】:

    标签: python twitter tweepy


    【解决方案1】:

    您以错误的方式处理此问题。这两个函数不是那样工作的。

    先看推特文档:

    1. GET geo/search,您正在正确查找信息,但如文档中所述,它不适用于 GET 搜索/推文

    这是使用查找可附加位置的推荐方法 状态/更新。

    1. GET search/tweets,它仅用于查找具有您要查找的特定单词(或单个单词)列表的推文。您不能将 geo_ids 包含在查询中,除非您正在寻找一条包含它的推文

    返回与指定查询匹配的相关推文集合

    1. geo_ids 用于here。如果您向下滚动并查看提供的示例会给您一个想法,或者在 (1) 下的文档中提到的状态/更新中。

    如果您想要地理编码的推文,您可以使用GET search/tweets 中的geocode 功能限制获取推文的位置。这将为您提供来自该位置的所有推文,一旦您获得这些推文,您就可以过滤地理编码的推文。

    过滤器必须由你完成,而不是 Twitter。

    【讨论】:

      【解决方案2】:

      我也遇到了这个问题,推文的实际地理代码总是丢失。然而,你不应该需要每条推文的实际地理代码来做你想做的事情;相反,您可以搜索特定地理区域内的推文,指定坐标和半径,如下所示:

      def wordsearch(word, max_tweets, lang, geocode, since, out):
          # Query for 100 tweets that have word in them and store it in a list 
          searched_tweets = [status for status in tweepy.Cursor(api.search, n=max_tweets, q=word, lang=lang,  geocode=geocode, since=since).items(max_tweets)]
          print("Number of Matches: %d\n" % len(searched_tweets))
          csvfile = open(out, 'a')
          csvWriter = csv.writer(csvfile)
          for t in searched_tweets:
              csvWriter.writerow([t.created_at, t.text.encode('utf-8'), t.author.screen_name, t.place, t.retweeted, t.retweet_count, (not t.retweeted and 'RT @' not in t.text)])
          csvfile.close()
      
      wordsearch('dead', 100, "en", "37.9,91.8,1000mi", "2017-01-01",      "result.csv")
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2017-07-31
        • 2015-06-20
        • 2011-05-19
        • 2017-07-16
        • 2021-12-12
        • 2021-03-19
        • 1970-01-01
        • 2018-01-26
        相关资源
        最近更新 更多