【发布时间】:2020-05-21 13:24:05
【问题描述】:
我正在尝试将我的输出(即og = OpenGraph(i, ["og:title", "og:description", "og:image", "og:url"]))写入 JSON 文件。但是当我看到验证输出时,它说它不是正确的 JSON 标准共振峰。谁能帮助我,我做错了什么。
# -*- coding: utf-8 -*-
import scrapy
from..items import news18Item
import re
from webpreview import web_preview
from webpreview import OpenGraph
import json
class News18SSpider(scrapy.Spider):
name = 'news18_story'
page_number = 2
start_urls = ['https://www.news18.com/movies/page-1/']
def parse(self, response):
items = news18Item()
page_id = response.xpath('/html/body/div[2]/div[5]/div[2]/div[1]/div[*]/div[*]/p/a/@href').getall()
items['page_id'] = page_id
story_url = page_id
for i in story_url :
og = OpenGraph(i, ["og:title", "og:description", "og:image", "og:url"])
dictionary =[{ "page_title": og.title }, { "description": og.description }, { "image_url": og.image }, { "post_url": og.url}]
with open("news18_new.json", "a") as outfile:
json.dump(dictionary, outfile)
outfile.write("\n")
# json.dump("\n",outfile)
next_page = 'https://www.news18.com/movies/page-' + str(News18SSpider.page_number) + '/'
if News18SSpider.page_number <= 20:
News18SSpider.page_number += 1
yield response.follow(next_page, callback = self.parse)
pass
【问题讨论】:
-
您能否提供您在
news18_new.json中编写的示例输出 -
og:title o/p
Mammootty, Kamal Haasan And More Celebs Wish Mohanlal On His Birthdayog:description o/pOn Malayalam superstar Mohanlal’s birthday, several members from the world of entertainment including Mammootty, Kamal Haasan, Nivin Pauly extended their best wishes to him.og:image o/phttps://images.news18.com/ibnlive/uploads/2020/05/1590065340_1590065211213_copy_875x583.jpgog:url o/phttps://www.news18.com/news/movies/mammootty-kamal-haasan-and-more-celebs-wish-mohanlal-on-his-birthday-2630693.html这是示例输出@喜满洲``` -
{"page_title": "Sonakshi Sinha To Auction Sketch Of Buddha To Help Migrant Labourers", "description": "Sonakshi Sinha took to Instagram to share a timelapse video of a sketch of Buddha that she made to auction to raise funds for migrant workers affected by Covid-19 crisis. ", "image_url": "https://images.news18.com/ibnlive/uploads/2020/05/1589815261_1589815196489_copy_875x583.jpg", "post_url": "https://www.news18.com/news/movies/sonakshi-sinha-to-auction-sketch-of-buddha-to-help-migrant-labourers-2626123.html"}news18_new.json的输出 -
将错误、数据和其他有问题的信息放在评论中,这样会更易读。
-
在当前版本中,您创建多 JSON 文件 - 包含许多 JSON 对象的文件。但在普通 JSON 文件中,您必须先创建包含所有数据的列表,然后将此列表保存为一个对象。
标签: python json python-3.x web-scraping scrapy