这里有几种不同的可能方法;使用适合您的。我在下面的所有代码示例都使用requests 对 API 的 HTTP 请求;如果你有 Pip,你可以安装 requests 和 pip install requests。他们也都使用Mediawiki API,两个使用query端点;如果您需要文档,请点击这些链接。
1。使用 extracts 属性直接从 API 中获取整个页面或页面“提取”的纯文本表示
请注意,此方法仅适用于带有 TextExtracts extension 的 MediaWiki 站点。这尤其包括维基百科,但不包括一些较小的 Mediawiki 网站,例如 http://www.wikia.com/
您想点击类似的网址
https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Bla_Bla_Bla&prop=extracts&exintro&explaintext
分解后,我们有以下参数(记录在https://www.mediawiki.org/wiki/Extension:TextExtracts#query+extracts):
-
action=query、format=json 和 title=Bla_Bla_Bla 都是标准的 MediaWiki API 参数
-
prop=extracts 让我们使用 TextExtracts 扩展
-
exintro 将响应限制在第一节标题之前的内容
-
explaintext 使响应中的提取成为纯文本而不是 HTML
然后解析 JSON 响应并提取提取:
>>> import requests
>>> response = requests.get(
... 'https://en.wikipedia.org/w/api.php',
... params={
... 'action': 'query',
... 'format': 'json',
... 'titles': 'Bla Bla Bla',
... 'prop': 'extracts',
... 'exintro': True,
... 'explaintext': True,
... }
... ).json()
>>> page = next(iter(response['query']['pages'].values()))
>>> print(page['extract'])
"Bla Bla Bla" is the title of a song written and recorded by Italian DJ Gigi D'Agostino. It was released in May 1999 as the third single from the album, L'Amour Toujours. It reached number 3 in Austria and number 15 in France. This song can also be heard in an added remixed mashup with L'Amour Toujours (I'll Fly With You) in its US radio version.
2。使用parse端点获取页面的完整HTML,解析它,提取第一段
MediaWiki 有一个parse endpoint,您可以使用https://en.wikipedia.org/w/api.php?action=parse&page=Bla_Bla_Bla 之类的URL 来获取页面的HTML。然后,您可以使用lxml 之类的 HTML 解析器对其进行解析(首先使用 pip install lxml 安装它)以提取第一段。
例如:
>>> import requests
>>> from lxml import html
>>> response = requests.get(
... 'https://en.wikipedia.org/w/api.php',
... params={
... 'action': 'parse',
... 'page': 'Bla Bla Bla',
... 'format': 'json',
... }
... ).json()
>>> raw_html = response['parse']['text']['*']
>>> document = html.document_fromstring(raw_html)
>>> first_p = document.xpath('//p')[0]
>>> intro_text = first_p.text_content()
>>> print(intro_text)
"Bla Bla Bla" is the title of a song written and recorded by Italian DJ Gigi D'Agostino. It was released in May 1999 as the third single from the album, L'Amour Toujours. It reached number 3 in Austria and number 15 in France. This song can also be heard in an added remixed mashup with L'Amour Toujours (I'll Fly With You) in its US radio version.
3。自己解析 wikitext
您可以使用query API 获取页面的wikitext,使用mwparserfromhell 解析它(首先使用pip install mwparserfromhell 安装它),然后使用strip_code 将其缩减为人类可读的文本。 strip_code 在撰写本文时并不完美(如下例所示),但有望改进。
>>> import requests
>>> import mwparserfromhell
>>> response = requests.get(
... 'https://en.wikipedia.org/w/api.php',
... params={
... 'action': 'query',
... 'format': 'json',
... 'titles': 'Bla Bla Bla',
... 'prop': 'revisions',
... 'rvprop': 'content',
... }
... ).json()
>>> page = next(iter(response['query']['pages'].values()))
>>> wikicode = page['revisions'][0]['*']
>>> parsed_wikicode = mwparserfromhell.parse(wikicode)
>>> print(parsed_wikicode.strip_code())
{{dablink|For Ke$ha's song, see Blah Blah Blah (song). For other uses, see Blah (disambiguation)}}
"Bla Bla Bla" is the title of a song written and recorded by Italian DJ Gigi D'Agostino. It was released in May 1999 as the third single from the album, L'Amour Toujours. It reached number 3 in Austria and number 15 in France. This song can also be heard in an added remixed mashup with L'Amour Toujours (I'll Fly With You) in its US radio version.
Background and writing
He described this song as "a piece I wrote thinking of all the people who talk and talk without saying anything". The prominent but nonsensical vocal samples are taken from UK band Stretch's song "Why Did You Do It"''.
Music video
The song also featured a popular music video in the style of La Linea. The music video shows a man with a floating head and no arms walking toward what appears to be a shark that multiplies itself and can change direction. This style was also used in "The Riddle", another song by Gigi D'Agostino, originally from British singer Nik Kershaw.
Chart performance
Chart (1999-00)PeakpositionIreland (IRMA)Search for Irish peaks23
References
External links
Category:1999 singles
Category:Gigi D'Agostino songs
Category:1999 songs
Category:ZYX Music singles
Category:Songs written by Gigi D'Agostino