【问题标题】:Extract all the elements within a class in python在python中提取类中的所有元素
【发布时间】:2021-08-09 14:18:12
【问题描述】:

为了提取类中的第一个元素,我做了以下操作:

if var_source == "Image":
    outcsvfile = 'Image_Ids' + file + '_' + timestamp +'.csv'
    with open(outcsvfile, 'w', encoding='utf-8', newline='') as csvfile:
            csv_writer = csv.writer(csvfile) 
            csv_writer.writerow(['ax','physical_id'])
    for i in range(len(var_ax)):    
        browser.get('https://test.com' + str(mpid) + '&ax=' + var_ax[i])
        self.master.update()
        self.status.config(text = str(i+1) + "/" + str(len(var_ax)) + " Extracting AX: " + var_ax[i])
        try:
            ph_id = browser.find_element_by_xpath("//div[contains(@class, 'a-image-wrapper')]").get_attribute("alt")
            print(i+1,': extract AX:',var_ax[i])
            with open(outcsvfile, 'a+', encoding='utf-8', newline='') as csvfile:
                csv_writer = csv.writer(csvfile) 
                csv_writer.writerow([var_ax[i],ph_id])
        except:
            print(i+1,': extract AX:',var_ax[i])
            with open(outcsvfile, 'a+', encoding='utf-8', newline='') as csvfile:
                csv_writer = csv.writer(csvfile) 
                csv_writer.writerow([var_ax[i],'[missing AX]'])

我有两个问题:

  1. 如何提取以逗号分隔的同一单元格中的所有物理 ID(单元格 B2 = "physical_id1,physical_id2,physical_id3")?
  2. 如何计算 C 列中导出的物理 ID 的数量(例如:对于 C2,我们将有 3 个,因为在 B2 中我们导出了 3 个物理 ID)?

源代码:

<div alt="51d5gBEzhjL" style="width:220px;float:left;margin-left:34px;margin-bottom:10px;border:1px solid #D0D0D0" class="a-image-wrapper a-lazy-loaded MAIN GLOBAL 51d5gBEzhjL"><h1 class="a-size-medium a-spacing-mini a-spacing-top-mini a-color-information a-text-center a-text-bold">MAIN</h1><h1 class="a-size-base a-spacing-mini a-spacing-top-mini a-color-information a-text-center a-text-bold"> ou GLOBAL / Merch 1</h1></div>
<h1 class="a-size-medium a-spacing-mini a-spacing-top-mini a-color-information a-text-center a-text-bold">FACT</h1>
<h1 class="a-size-base a-spacing-mini a-spacing-top-mini a-color-information a-text-center a-text-bold"> ou GLOBAL / Merch 1</h1>
<span class="a-declarative" data-action="a-modal"><center><img class="ecx" id="51S+wTs36zL" src="https://test.com/images/I/51S+wTs36zL._AA200_.jpg" alt="51S+wTs36zL"></center></span>
<center>
<img class="ecx" id="51S+wTs36zL" src="https://test.com/images/I/51S+wTs36zL._AA200_.jpg" alt="51S+wTs36zL">
</center>
</span>
<h5 class="physical-id">51S+wTs36zL</h5>
<h1 class="a-size-medium a-spacing-mini a-spacing-top-mini a-color-information a-text-center a-text-bold" style="background:#D0D0D0">UPLOADED</h1>
<h1 class="a-size-base a-spacing-mini a-spacing-top-mini a-color-information a-text-center a-text-bold">19/Apr/2016:17:45:40</h1>
</div>

【问题讨论】:

  • 分享 URL 或代表HTML 示例怎么样?
  • 已添加。不知道 -1 是什么意思,但无论如何
  • 这里有一个专业提示:永远不要分享代码图片(在这里和其他任何你有程序员同事的地方),因为这绝对没用而且非常重要真气。话虽如此,删除该图像并将HTML 粘贴为纯文本。 PS。那票否决票可能是针对该代码图像的。见idownvotedbecau.se/imageofcode
  • -1 是在我上传图片之前给出的。我已经在 pastebin 上编辑并上传了源代码(对于我的帖子来说太大了)
  • 如果您的代码太大而无法在问题中发布,这不是一个最小的代表性示例,请查看Minimal Reproducible Example,以便我们更好地帮助您。

标签: python-3.x selenium csv web-scraping


【解决方案1】:

这对我有用并解决了我的两个问题:

    if var_source == "Image":
        outcsvfile = 'Image_Ids-' + file + '_' + timestamp +'.csv'
        with open(outcsvfile, 'w', encoding='utf-8', newline='') as csvfile:
                csv_writer = csv.writer(csvfile) 
                csv_writer.writerow(['ax','physical_id','image_count'])
        for i in range(len(var_ax)):    
            browser.get('https://test.com' + str(mpid) + '&ax=' + var_ax[i])
            self.master.update()
            self.status.config(text = str(i+1) + "/" + str(len(var_ax)) + " Extracting AX: " + var_ax[i])
            try:
                ph_id = browser.find_element_by_xpath("//div[contains(@class, 'a-image-wrapper')]").get_attribute("alt")
                ids1 = browser.find_elements_by_class_name("physical-id")
                ids1Text = []
                for a in ids1:
                    ids1Text.append(a.text)
                nr = str(len(ids1))
                ax = ', '.join(ids1Text)
                print(i+1,': extract AX:',var_ax[i])
                with open(outcsvfile, 'a+', encoding='utf-8', newline='') as csvfile:
                    csv_writer = csv.writer(csvfile)
                    csv_writer.writerow([var_ax[i], ax, nr])
            except:
                print(i+1,': extract AX:',var_ax[i])
                with open(outcsvfile, 'a+', encoding='utf-8', newline='') as csvfile:
                    csv_writer = csv.writer(csvfile) 
                    csv_writer.writerow([var_ax[i],'[missing AX]'])

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-03-11
    • 2019-09-26
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多