【发布时间】:2019-02-02 11:45:19
【问题描述】:
我必须从由 re.findall 创建的列表的每个元素中计算每个单词出现的次数。
例如: jobs = ["Java 开发人员","数据科学家","业务架构师流程挖掘","JavaScript 开发人员"]
jobs_split = ["Java","Developer","Data","Scientist","Business","Architect", "Process","Mining","JavaScript","Developer"]
然后计算每个单词的出现次数并显示它 f.e.在文件中作为 Word:出现次数
我知道我可以在 python 中构建“计数器”,但是我不知道如何拆分列表中的所有元素
import urllib.request
import re
from collections import Counter
jobs = []
jobs_split = []
from urllib.request import urlopen, Request
for i in range(10):
html = Request("https://mysite?pn={}".format(i), headers={'User-Agent': 'Mozilla/5.0'})
page = urlopen(html).read().decode('utf-8')
jobs += re.findall(r'"@type":"JobPosting","title":"([A-Za-z0-9 -/]+)","description"', page)
my_set = set(jobs)
# print(Counter(my_set))
print(my_set)
【问题讨论】:
-
你能添加预期的输出吗?
-
开发人员:2,Java:1,数据:1,科学家:1,业务:1,架构师:1,流程:1,挖掘:1,JavaScript:1