【发布时间】:2021-10-20 08:16:43
【问题描述】:
我正在尝试通过云功能->PubSub--> BigQuery 进行网络抓取
我编写了一个 python 代码并将我的代码部署到云函数。此代码的文本结果变为“正常”,我可以在日志中看到爬取的数据。但是当我试图从主题中提取消息时,我无法获得任何数据。当我检查 PubSub Api 指标时,我看到 404 响应。我应该如何编写将消息发布到 PubSub 主题的代码?
这是我目前写的代码:
import base64
from bs4 import BeautifulSoup
import requests
from google.cloud import pubsub_v1
def hello_pubsub(event, context):
publisher = pubsub_v1.PublisherClient()
# The `topic_path` method creates a fully qualified identifier
# in the form `projects/{project_id}/topics/{topic_id}`
topic_path = publisher.topic_path("tokyo-ring-<secret>", "webscraping")
html_text = requests.get('https://www.arabam.com/ikinci-el?take=50').text
#print(html_text)
soup = BeautifulSoup(html_text,'lxml')
models = soup.find_all('tr', class_='listing-list-item pr should-hover bg-white')
for model in models:
model_name = model.find('td', class_='listing-modelname pr').text
title = model.find('td', class_='horizontal-half-padder-minus pr').text
model_year = model.find('td', class_='listing-text pl8 pr8 tac pr').text
price = model.find('td', 'pl8 pr8 tac pr').text.replace('TL','').replace(' ','').replace('.','')
publish_date = model.find('td', class_='listing-text tac pr').text
location = model.find('div', style='display:flex;justify-content:center;align-items:center;height:81px').text.split(' ', 1)[0]
data= "{"+"\"model_name\":\""+model_name+"\""+","+"\"title\":"+"\""+title+"\",\""+"model_year\""+":\""+model_year+"\""+",\"price\":\""+price+"\""+",\"publish_date\":\""+publish_date+"\","+"\"location\":\""+location+"\"}"
#pubsub_message = base64.b64decode(event['data']).decode('utf-8')
print(data)
【问题讨论】:
标签: python google-cloud-functions google-cloud-pubsub