【发布时间】:2021-12-09 13:24:10
【问题描述】:
谁能帮我解决这个问题,我不知道为什么会出现这个错误。
我正在尝试使用某人制作的 python 程序,我试图弄乱它,但我无法找出问题所在。
错误:
PS D:\Python> python .\quizlet.py
Traceback (most recent call last):
File "D:\Python\quizlet.py", line 69, in <module>
q = QuizletParser(website)
File "D:\Python\quizlet.py", line 17, in QuizletParser
data = json.loads(BeautifulSoup(session.get(link).content, features="lxml").find_all('script')[-6].string[44:-152])
File "C:\Users\john\AppData\Local\Programs\Python\Python39\lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Users\john\AppData\Local\Programs\Python\Python39\lib\json\decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 14 (char 13)
我正在尝试使用我之前在这里找到的代码:https://github.com/daijro/python-quizlet
来源:
from requests_html import HTMLSession
from box import Box
import box
import json
from bs4 import BeautifulSoup
from difflib import SequenceMatcher
def FindFlashcard(flashcards: box.box_list.BoxList, match: str):
similar = lambda a, b: SequenceMatcher(None, a, b).ratio()
data = max(list(zip([similar(match, x.term) for x in flashcards], [x for x in range(len(flashcards))])))
flashcard = flashcards[data[1]]
flashcard.update({'similarity': data[0]})
return flashcard
def QuizletParser(link: str):
session = HTMLSession()
data = json.loads(BeautifulSoup(session.get(link).content, features="lxml").find_all('script')[-6].string[44:-152])
flashcards = []
for i in list(data['termIdToTermsMap'].values()):
i = {
'index': i['rank'],
'id': i['id'],
'term': i['word'],
'definition': i['definition'],
'setId': i['setId'],
'image': i['_imageUrl'],
'termTts': 'https://quizlet.com'+i['_wordTtsUrl'],
'termTtsSlow': 'https://quizlet.com'+i['_wordSlowTtsUrl'],
'definitionTts': 'https://quizlet.com'+i['_definitionTtsUrl'],
'definitionTtsSlow': 'https://quizlet.com'+i['_definitionSlowTtsUrl'],
'lastModified': i['lastModified'],
}
flashcards.append(i)
output = {
'title': data['set']['title'],
'flashcards': flashcards,
'author': {
'name': data['creator']['username'],
'id': data['creator']['id'],
'timestamp': data['creator']['timestamp'],
'lastModified': data['creator']['lastModified'],
'image': data['creator']['_imageUrl'],
'timezone': data['creator']['timeZone'],
'isAdmin': data['creator']['isAdmin'],
},
'id': data['set']['id'],
'link': data['set']['_webUrl'],
'thumbnail': data['set']['_thumbnailUrl'],
'timestamp': data['set']['timestamp'],
'lastModified': data['set']['lastModified'],
'publishedTimestamp': data['set']['publishedTimestamp'],
'authorsId': data['set']['creatorId'],
'termLanguage': data['set']['wordLang'],
'definitionLanguage': data['set']['defLang'],
'description': data['set']['description'],
'numTerms': data['set']['numTerms'],
'hasImages': data['set']['hasImages'],
'hasUploadedImage': data['hasUploadedImage'],
'hasDiagrams': data['set']['hasDiagrams'],
'hasImages': data['set']['hasImages'],
}
return Box(output)
website = 'https://quizlet.com/475389316/python-web-scraping-flash-cards/'
text = 'Two popular parsers'
q = QuizletParser(website)
flashcard = FindFlashcard(q.flashcards, match=text) # finds the flashcard most similar to the input
print(flashcard.term + " " + flashcard.definition) # calculates how similar the identified flashcard is to the input
【问题讨论】:
-
错误表示字符串不是json字符串。它指向破坏字符串的字符的位置。
标签: python json beautifulsoup python-requests