【发布时间】:2021-02-03 10:45:37
【问题描述】:
我是使用 Python 提取数据的新手。我想做excel文件作为从网站拉表。
The website url : "https://seffaflik.epias.com.tr/transparency/piyasalar/gop/arz-talep.xhtml"
在这个网页中,分页有表格,用于小时数据。由于一小时包含大约500个数据,因此页面被划分。
我想提取每小时的所有数据。
但我的错误是即使页面发生变化也拉同一张桌子。
我正在使用美丽的汤、熊猫、硒库。我将向您展示我解释自己的代码。
import requests
r = requests.get('https://seffaflik.epias.com.tr/transparency/piyasalar/gop/arz-talep.xhtml')
from bs4 import BeautifulSoup
source = BeautifulSoup(r.content,"lxml")
metin =source.title.get_text()
source.find("input",attrs={"id":"j_idt206:txt1"})
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd
tarih = source.find("input",attrs={"id":"j_idt206:date1_input"})["value"]
import datetime
import time
x = datetime.datetime.now()
today = datetime.date.today()
# print(today)
tomorrow = today + datetime.timedelta(days = 1)
tomorrow = str(tomorrow)
words = tarih.split('.')
yeni_tarih = '.'.join(reversed(words))
yeni_tarih =yeni_tarih.replace(".","-")
def tablo_cek():
tablo = source.find_all("table")#sayfadaki tablo
dfs = pd.read_html(str(tablo))#tabloyu dataframe e çekmek
dfs.append(dfs)#tabloya yeni çekilen tabloyu ekle
print(dfs)
return tablo
if tomorrow == yeni_tarih :
print(yeni_tarih == tomorrow)
driver = webdriver.Chrome("C:/Users/tugba.ozkan/AppData/Local/SeleniumBasic/chromedriver.exe")
driver.get("https://seffaflik.epias.com.tr/transparency/piyasalar/gop/arz-talep.xhtml")
time.sleep(1)
driver.find_element_by_xpath("//select/option[@value='96']").click()
time.sleep(1)
user = driver.find_element_by_name("j_idt206:txt1")
nextpage = driver.find_element_by_xpath("//a/span[@class ='ui-icon ui-icon-seek-next']")
num=0
while num < 24 :
user.send_keys(num) #saate veri gönder
driver.find_element_by_id('j_idt206:goster').click() #saati uygula
nextpage = driver.find_element_by_xpath("//a/span[@class ='ui-icon ui-icon-seek-next']")#o saatteki next page
nextpage.click() #next page e geç
user = driver.find_element_by_name("j_idt206:txt1") #tekrar getiriyor saat yerini
time.sleep(1)
tablo_cek()
num = num + 1 #saati bir arttır
user.clear() #saati sıfırla
else:
print("Güncelleme gelmedi")
在这种情况下:
nextpage = driver.find_element_by_xpath("//a/span[@class ='ui-icon ui-icon-seek-next']")#o saatteki next page
nextpage.click()
当python单击按钮进入下一页时,下一页显示然后它需要拉下一个表,如图表。但它不起作用。 在输出中,我看到附加的值相同的表。像这样: 这是我的输出:
[ Fiyat (TL/MWh) Talep (MWh) Arz (MWh)
0 0 25.0101 19.15990
1 1 24.9741 19.16390
2 2 24.9741 19.18510
3 85 24.9741 19.18512
4 86 24.9736 19.20762
5 99 24.9736 19.20763
6 100 24.6197 19.20763
7 101 24.5697 19.20763
8 300 24.5697 19.20768
9 301 24.5697 19.20768
10 363 24.5697 19.20770
11 364 24.5497 19.20770
12 400 24.5497 19.20771
13 401 24.5297 19.20771
14 498 24.5297 19.20773
15 499 24.5297 19.36473
16 500 24.5297 19.36473
17 501 24.4097 19.36473
18 563 24.4097 19.36475
19 564 24.3897 19.36475
20 999 24.3897 19.36487
21 1000 24.3097 19.36487
22 1001 24.1897 19.36487
23 1449 24.1897 19.36499, [...]]
[ Fiyat (TL/MWh) Talep (MWh) Arz (MWh)
0 0 25.0101 19.15990
1 1 24.9741 19.16390
2 2 24.9741 19.18510
3 85 24.9741 19.18512
4 86 24.9736 19.20762
5 99 24.9736 19.20763
6 100 24.6197 19.20763
7 101 24.5697 19.20763
8 300 24.5697 19.20768
9 301 24.5697 19.20768
10 363 24.5697 19.20770
11 364 24.5497 19.20770
12 400 24.5497 19.20771
13 401 24.5297 19.20771
14 498 24.5297 19.20773
15 499 24.5297 19.36473
16 500 24.5297 19.36473
17 501 24.4097 19.36473
18 563 24.4097 19.36475
19 564 24.3897 19.36475
20 999 24.3897 19.36487
21 1000 24.3097 19.36487
22 1001 24.1897 19.36487
23 1449 24.1897 19.36499, [...]]
[ Fiyat (TL/MWh) Talep (MWh) Arz (MWh)
0 0 25.0101 19.15990
1 1 24.9741 19.16390
2 2 24.9741 19.18510
3 85 24.9741 19.18512
4 86 24.9736 19.20762
5 99 24.9736 19.20763
6 100 24.6197 19.20763
7 101 24.5697 19.20763
8 300 24.5697 19.20768
9 301 24.5697 19.20768
10 363 24.5697 19.20770
11 364 24.5497 19.20770
12 400 24.5497 19.20771
13 401 24.5297 19.20771
14 498 24.5297 19.20773
15 499 24.5297 19.36473
16 500 24.5297 19.36473
17 501 24.4097 19.36473
18 563 24.4097 19.36475
19 564 24.3897 19.36475
20 999 24.3897 19.36487
21 1000 24.3097 19.36487
22 1001 24.1897 19.36487
23 1449 24.1897 19.36499, [...]]
[ Fiyat (TL/MWh) Talep (MWh) Arz (MWh)
0 0 25.0101 19.15990
1 1 24.9741 19.16390
2 2 24.9741 19.18510
3 85 24.9741 19.18512
4 86 24.9736 19.20762
5 99 24.9736 19.20763
6 100 24.6197 19.20763
7 101 24.5697 19.20763
8 300 24.5697 19.20768
9 301 24.5697 19.20768
10 363 24.5697 19.20770
11 364 24.5497 19.20770
12 400 24.5497 19.20771
13 401 24.5297 19.20771
14 498 24.5297 19.20773
15 499 24.5297 19.36473
16 500 24.5297 19.36473
17 501 24.4097 19.36473
18 563 24.4097 19.36475
19 564 24.3897 19.36475
20 999 24.3897 19.36487
21 1000 24.3097 19.36487
22 1001 24.1897 19.36487
23 1449 24.1897 19.36499, [...]]
【问题讨论】:
标签: python pandas selenium beautifulsoup datatable