【问题标题】:How to find if a sentence contains particular word如何查找句子是否包含特定单词
【发布时间】:2020-01-16 18:04:01
【问题描述】:

python中如何判断一个句子是否包含特定的单词?

我有两个文件,

播放器 [文件 1] 乔不喜欢踢足球 库马尔最喜欢的运动是曲棍球 莫希特喜欢足球比赛 纳文不喜欢板球 萨钦是一名板球运动员 萨凡喜欢板球 维诺德喜欢篮子 安迪喜欢排球

游戏 [文件 2]

hockey

足球 足吧 克里克 蟋蟀 篮球

输出期望: 玩家游戏得分 [%] Sachin 是一名 Crick 玩家 Crick 100 乔不喜欢踢足球footba 75 Naveen 不喜欢 cricket cricket 100 Savan喜欢板球板球100 Vinod 喜欢 Basketb 篮球 160 库马尔最喜欢的游戏是曲棍球曲棍球 100 安迪喜欢排球 null 没有比赛 Mohit 喜欢足球比赛足球 100

分数定义为“len(game)/len(matching word)

如果同一个玩家匹配了两场比赛,那么应该是最高分。

像这样我有超过 10000 条记录。

【问题讨论】:

  • 你如何区分句子?你能修复你的文件格式吗?另请查看word in sentence
  • 分数如何计算?
  • 基于匹配变量的分数。如果板球 = 板球则 100%。

标签: python-3.x fuzzy-logic fuzzywuzzy


【解决方案1】:

首先你需要读入播放器文件并将其分解成句子

>>> with open ('testfiles/player.txt') as f:
...    sentences = []
...    for line in f:
...        sentences.append (line.strip ())
>>> sentences
['Sachin was a cricket player', 'Mohit likes soccer game', 'Kumar favourite game is hockey', "Joe doesn't like to play football"]

以不同的方式对 Game 执行相同的操作,但将其转换为一个集合以实现唯一性和效率:

>>> with open ('testfiles/games.txt') as f:
...    games = set ([line.strip () for line in f])
...
>>> games
{'hockey', 'crick', 'soccer', 'volleyball', 'badminton'}

现在我们只需要在句子中查找关键字,就可以得到下面的输出。

>>> game_score = {}
...game_found = set ()
...for sentence in sentences:
...    for game in games:
...        if game in sentence:
...            game_score.setdefault (game, [sentence, '100%'])  # Save game name as key and set sentence a list of value that include sentence and % matching
...            game_found.add (sentence)  # Save the game name that are found to be checked against the game name that isn't found
>>> game_score
{'hockey': ['Kumar favourite game is hockey', '100%'], 'crick': ['Sachin was a cricket player', '100%'], 'soccer': ['Mohit likes soccer game', '100%']}
>>> game_found
{'Mohit likes soccer game', 'Kumar favourite game is hockey', 'Sachin was a cricket player'}

将 game_found 与玩家的句子进行比较,并将未找到的游戏添加到 game_score 中:

>>> for i, sentence in enumerate (sentences):
...    if sentence not in game_found:
...        game_name = 'null-%d' % i  # Dictionary key cannot contain duplicate
...        game_score.setdefault (game_name, [sentence, 'No match'])
...
>>> game_score
{'hockey': ['Kumar favourite game is hockey', '100%'], 'crick': ['Sachin was a cricket player', '100%'], 'soccer': ['Mohit likes soccer game', '100%'], 'null-3': ["Joe doesn't like to play football", 'No match']}

最后,打印结果:

>>> print ('Output%sGame%sMatching Score' % (' ' * 35, ' ' * 10))
...for k in game_score:
...    spacing = 41 - len (game_score [k][0])
...    print ('%s%s%s%s%s' % (game_score [k][0], ' ' * spacing, k, ' ' * (55 - (len (game_score [k][0]) + spacing + len (k))), game_score [k][1]))
...
Output                                   Game          Matching Score
Kumar favourite game is hockey           hockey        100%
Sachin was a cricket player              crick         100%
Mohit likes soccer game                  soccer        100%
Joe doesn't like to play football        null-3        No match

你应该想出一个逻辑来处理包含多种运动的句子,例如“简既踢曲棍球又踢足球。

【讨论】:

  • 请注意,在查看问题的降价后,很明显每个句子和游戏都在单独的一行上。所以试试with open(file1) as f: sentences = f.read().splitlines()。您还缺少输出格式和分数(没有指定评分标准)。
  • 感谢您的回复。但是在函数中如何计算分数以及如果句子中没有匹配的另一件事,该句子应该显示为“不匹配”。提前致谢
  • @dreamzboy 我已经用更多细节编辑了这个问题。提前致谢
  • @NAVEENN.C,您不应该在两年后编辑该问题,但可能会提出一个新问题,其中包含您的问题的最小化示例,而不是完整的上下文
  • 不仅如此,两次更改原始问题后取消标记答案并不好,我不得不更新答案。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2020-07-06
  • 2017-11-10
  • 2012-01-08
  • 1970-01-01
  • 2018-04-21
  • 1970-01-01
  • 2011-05-20
相关资源
最近更新 更多