【发布时间】:2021-02-25 14:36:24
【问题描述】:
下午的人
接上一个问题,
我创建了一个小型 python 应用程序,它可以为我从网站上抓取招聘信息。
我将它们保存在一个小的 sql 数据库中
列是 Job_id / job_title / 工作公司 / job_salary / job_location / job_post_date
我计划每天运行一次我的脚本并希望忽略重复的条目。
什么样的查询可以检查 2 列 I. E. Title 和 company 以确保它不会再次被插入,发布的日期将始终每天增加 1
Python 代码
import mysql.connector
import requests
from bs4 import BeautifulSoup
my_db = mysql.connector.connect(
host="192.168.1.1",
user='job_user',
password='job1',
database='job_db'
)
my_cursor = my_db.cursor()
radius = "10"
val1 = input("Enter Job: ")
val2 = input("Enter Location: ")
val3 = input("Enter Radius default:(10): ")
url = "https://www.indeed.co.uk/jobs?q={}&l={}&radius={}".format(val1, val2, val3)
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
result1 = soup.find_all(class_="jobsearch-SerpJobCard")
for results in result1:
job = results.find('a', attrs={'data-tn-element': 'jobTitle'})
company = results.find('span', attrs={'class': 'company'})
location = results.find('span', attrs={'class': 'location accessible-contrast-color-location'})
salary = results.find('span', attrs={'class': 'salaryText'})
date_pos = results.find('span', attrs={'class': 'date'})
i1 = job.text.strip()
i2 = company.text.strip()
if location is not None:
i3 = location.text.strip()
else:
i3 = "N/A"
if salary is not None:
i4 = salary.text.strip()
else:
i4 = "N/A"
i5 = date_pos.text.strip()[:1]
print(i1)
print(i2)
print(i3)
print(i4)
print("\n")
sql = "INSERT INTO job_tbl (job_title, job_company, job_salary, job_location, job_posted) \
VALUES (%s, %s, %s, %s, %s)"
val = (i1, i2, i3, i4, i5)
my_cursor.execute(sql, val)
my_db.commit()
SQL 查询
+--------+-------------------------------------------------------------------+-------------------------------+----------------+----------------------------+------------+
| job_id | job_title | job_company | job_salary | job_location | job_posted |
+--------+-------------------------------------------------------------------+-------------------------------+----------------+----------------------------+------------+
| 1 | IT Technician | Strathallan School | N/A | £19,000 - £23,000 a year | 3 |
| 2 | English into Romanian IT/Technical Translator (relocation to... | Alpha CRC Ltd. | N/A | £22,000 - £25,000 a year | 7 |
| 3 | IT/Trainee IT Support Analyst | CJ Lang & Son Limited | Dundee DD4 8JU | N/A | 3 |
| 4 | IT Technical Support Apprentice | GP Strategies Training Ltd | Dundee | £10,000 - £12,000 a year | 1 |
| 5 | IT Operations Manager - IRC84524 | Scottish Government | Dundee DD1 | £48,930 - £61,006 a year | 3 |
| 6 | Temporary IT Buyer | brightsolid | Dundee | N/A | 7 |
| 7 | IT Site Support Analyst | Thermo Fisher Scientific | Perth | N/A | 6 |
| 8 | Network and System Administrator | Solutions Driven | Forfar | £30,000 - £35,000 a year | 3 |
| 9 | IT Service Desk Team Leader | Cross Resourcing | Dundee | N/A | 3 |
| 10 | Senior Network Engineer | Raytheon Intelligence & Space | Glenrothes | N/A | 3 |
| 11 | Solutions Architect | NCR | Dundee | N/A | 3 |
| 12 | Technical Support Specialist | The Army | N/A | £15,985 - £20,400 a year | 3 |
| 13 | Pre-Sales Solutions Architect – 12 Month Graduate Internship... | DELL | N/A | N/A | 3 |
+--------+-------------------------------------------------------------------+-------------------------------+----------------+----------------------------+------------+
13 rows in set (0.002 sec)
如果我再次运行相同的应用程序,它将添加相同的结果,我想要的是匹配标题和公司并检查它是否已经添加
【问题讨论】:
-
发布示例数据和预期结果以阐明您想要什么。
-
您可以使两列(标题和公司)唯一并使用
ON DUPLICATE:dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html运行查询
标签: mysql sql sql-update sql-insert unique-constraint