【问题标题】:Error when theres zero ocurrence of word单词出现为零时出错
【发布时间】:2016-11-19 05:31:54
【问题描述】:

首先,对不起我的英语不好。

我正在使用此代码来计算“勒布朗”或“库里”一词在推文中出现的次数。问题是,如果没有一条推文包含“LeBron”或“Curry”这个词,程序就会崩溃。是不是字都在,程序运行完美。

tweets_data_path = '/Users/HCruz/NetBeansProjects/elections3/data/fetched_tweets.txt'

tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
    try:
        tweet = json.loads(line)
        tweets_data.append(tweet)
    except:
        continue

tweets = pd.DataFrame()

tweets['text'] = map(lambda tweet: tweet['text'], tweets_data)

def word_in_text(word, text):
    word = word.lower()
    text = text.lower()
    match = re.search(word, text)
    if match:
        return True
        return False

tweets['LeBron'] = tweets['text'].apply(lambda tweet: word_in_text('LeBron', tweet))
tweets['Curry'] = tweets['text'].apply(lambda tweet: word_in_text('Curry', tweet))

LeBron = tweets['LeBron'].value_counts()[True]
Curry = tweets['Curry'].value_counts()[True]

print("LeBron %s" % LeBron)
print("Curry %s" % Curry)

当“库里”或“勒布朗”中至少有一个时,我会明白:

Processing...
LeBron 1
Curry 34

太完美了。

但是如果我删除了“LeBron”,所以没有出现 LeBron,程序就会崩溃。

Hectors-iMac:src HCruz$ python process_tweets.py
Processing...
Traceback (most recent call last):
  File "process_tweets.py", line 80, in <module>
    s.run()
  File     "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sched.py", line 117, in run
action(*argument)
  File "process_tweets.py", line 54, in processing
    process_tweets()
  File "process_tweets.py", line 44, in process_tweets
LeBron = tweets['LeBron'].value_counts()[True]
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/series.py", line 491, in __getitem__
result = self.index.get_value(self, key)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/index.py", line 1038, in get_value
return tslib.get_value_box(s, key)
  File "tslib.pyx", line 454, in pandas.tslib.get_value_box (pandas/tslib.c:9561)
  File "tslib.pyx", line 469, in pandas.tslib.get_value_box (pandas/tslib.c:9408)
IndexError: index out of bounds

【问题讨论】:

    标签: python pandas twitter tweepy


    【解决方案1】:

    通过使用 try/catch 包围第 44 行的代码来使用异常处理:

    try:
        LeBron = tweets['LeBron'].value_counts()[True]
    except IndexError:
        LeBron = 0
    

    【讨论】:

    • 出现错误,但我将 LeBron = None 更改为 LeBron = 0 并且效果很好。谢谢!
    • 太棒了,很高兴你明白了,干杯!
    • 如果我的回答帮到你了,请考虑采纳,谢谢!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2018-04-09
    • 1970-01-01
    • 1970-01-01
    • 2020-07-10
    • 1970-01-01
    • 1970-01-01
    • 2019-08-29
    相关资源
    最近更新 更多