【发布时间】:2021-07-28 07:37:52
【问题描述】:
我正在寻找一些关于在 python 中使用 selenium 进行抓取的帮助。您需要一个付费帐户才能查看此页面,因此无法创建可复制的内容。
我正在尝试从蓝点和黑箭头中提取数据。 数据在这段 HTML 中。
<svg viewBox="0 0 105 68" class="video-summaries__field-arrows" preserveAspectRatio="none" xmlns="http://www.w3.org/2000/svg">
<defs>
<marker fill="#000" id="default_arrow" markerWidth="5" markerHeight="4" orient="auto" refX="5" refY="2" stroke="none">
<polygon points="0 0, 5 2, 0 4"></polygon>
</marker>
<marker fill="#0033ff" id="hover_arrow" markerWidth="2.9" markerHeight="2.4" orient="auto" refX="2.5" refY="1.2" stroke="none">
<polygon points="0 0, 2.9 1.2, 0 2.4"></polygon>
</marker>
</defs>
<path class="videosummaries-arrows" d="M52.5 35.1 37.6 33.3" fill="none" marker-end="url(#default_arrow)" stroke="url(#gradient_0)" style="stroke-width: 0.25;"></path>
<linearGradient gradientUnits="userSpaceOnUse" id="gradient_0" x1="52.5" x2="37.6" y1="35.1" y2="33.3">
<stop offset="5%" stop-color="#000" stop-opacity="0.1"></stop>
<stop offset="100%" stop-color="#000" stop-opacity="1"></stop>
</linearGradient>
<path class="videosummaries-arrows" d="M38.2 34.7 76.6 62" fill="none" marker-end="url(#default_arrow)" stroke="url(#gradient_1)" style="stroke-width: 0.25;"></path>
<linearGradient gradientUnits="userSpaceOnUse" id="gradient_1" x1="38.2" x2="76.6" y1="34.7" y2="62">
<stop offset="5%" stop-color="#000" stop-opacity="0.1"></stop>
<stop offset="100%" stop-color="#000" stop-opacity="1"></stop>
</linearGradient>
<path class="videosummaries-arrows" d="M61.6 67.8 36.3 63.9" fill="none" marker-end="url(#default_arrow)" stroke="url(#gradient_2)" style="stroke-width: 0.25;"></path>
<linearGradient gradientUnits="userSpaceOnUse" id="gradient_2" x1="61.6" x2="36.3" y1="67.8" y2="63.9">
<stop offset="5%" stop-color="#000" stop-opacity="0.1"></stop>
<stop offset="100%" stop-color="#000" stop-opacity="1"></stop>
</linearGradient>
<path class="videosummaries-arrows" d="M36.3 63.9 36.5 26.700000000000003" fill="none" marker-end="url(#default_arrow)" stroke="url(#gradient_3)" style="stroke-width: 0.25;"></path>
<linearGradient gradientUnits="userSpaceOnUse" id="gradient_3" x1="36.3" x2="36.5" y1="63.9" y2="26.700000000000003">
<stop offset="5%" stop-color="#000" stop-opacity="0.1"></stop>
<stop offset="100%" stop-color="#000" stop-opacity="1"></stop>
</linearGradient>
我正在专门尝试抓取
x1,x2,y1,y2
来自linearGradient 标签的数据。
我通过运行获取页面源代码。
options = Options()
options.add_argument("start-maximized")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Users\James\OneDrive\Desktop\webdriver\chromedriver.exe')
driver.get('https://football.instatscout.com/teams/9487/video')
print("Page Title is : %s" %driver.title)
driver.find_element_by_name('email').send_keys('')
driver.find_element_by_name('pass').send_keys('')
driver.find_element_by_xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "hRAqIl", " " ))]').click()
driver.implicitly_wait(10)
#driver.find_element_by_css_selector('.dropdown-btn:nth-child(12) .video-summaries__checkbox_red ').click()
driver.find_element_by_css_selector('.dropdown-btn:nth-child(12) > .video-summaries__checkbox').click()
driver.implicitly_wait(10)
driver.find_element_by_xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "ixmoFk", " " ))]').click()
driver.implicitly_wait(10)
driver.find_element_by_xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "video-summaries__checkbox-column-inner", " " ))]//*[contains(concat( " ", @class, " " ), concat( " ", "video-summaries__checkbox-column-row", " " )) and (((count(preceding-sibling::*) + 1) = 10) and parent::*)]//*[contains(concat( " ", @class, " " ), concat( " ", "video-summaries__checkbox", " " ))]').click()
driver.find_element_by_xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "dropdown-btn", " " )) and (((count(preceding-sibling::*) + 1) = 12) and parent::*)]//*[contains(concat( " ", @class, " " ), concat( " ", "video-summaries__checkbox_red", " " ))]').click()
html = driver.page_source
在硒中 - 但我不知道从那里去哪里。
最后我想把它刮到一个数据框中,有 'Name' 'X1' 'Y1' 'X2' 'Y2' 列。
【问题讨论】:
标签: python selenium beautifulsoup