【问题标题】:Check if value exists, if not add to list检查值是否存在,如果不添加到列表
【发布时间】:2020-08-24 00:57:54
【问题描述】:

我的目标:检查值是否存在,如果值不存在,则添加到 A 列的末尾。如果值存在,则跳过 ID 并检查下一个值。

实际结果:不管值是否存在,ID 都会添加到 A 行的末尾。因此我得到了重复。

我尝试使用“if”,但出现错误。

我的编码:

Option Explicit


Sub ExposeID()

Dim browser As Object   'Aufnehmen der verwendeten Instanz des Browsers (Internet Explorer)
Dim knotenAst As Object 'Aufnehmen einer HTML Struktur aus dem Browser Dokument
Dim n As Integer
Dim url As String       'Aufnehmen der auszulesenden Adresse

Dim ExposeID As String
Dim letztezeile As Integer
Dim nodeList As Object, i As Long

Set browser = CreateObject("internetexplorer.application")
    browser.Visible = False

For n = 0 To 1
    
    url = "https://www.
    browser.navigate url
    Do Until browser.readyState = 4: DoEvents: Loop

    letztezeile = ActiveSheet.Cells(Rows.Count, 1).End(xlUp).Row

Set nodeList = browser.document.querySelectorAll(".result-list__listing[data-id]")

For i = 0 To nodeList.Length - 1

    ''' HERE IS THE PROBLEM '''
    If nodeList.Item(i).getAttribute("data-id") <> Cells.Range("A:A") Then
       Cells(letztezeile + i + 1, 1) = nodeList.Item(i).getAttribute("data-id")
       
    Else
    
    End If
    
Next i

Next n

Set nodeList = Nothing
browser.Quit
  
End Sub

【问题讨论】:

  • 试试这个:If IsError(Application.Match(nodeList.Item(i).getAttribute("data-id"), Cells.Range("A:A"), 0)) Then.
  • 感谢您的评论。我已经添加了编码并尝试了。现在我没有错误。但是又出现了重复。
  • 这可能发生在两个工作表中吗?我的意思是您曾经使用过ActiveSheet,但在另外两次(Cells.Range..., Cells(letztezeile ...)中什么也没使用。
  • 我已经更正了这一点。一切都发生在同一个工作表中。但它不会起作用。 Dim ws As Worksheet Set ws = Tabelle1 ws.Name = "IDs"

标签: excel vba for-loop if-statement web-scraping


【解决方案1】:

我找到了一种方法。但是有一个问题。它会跳过已经存在一行的每个条目。

Option Explicit

'Version vom 08.05.2020
' Coding funktioniert, aber Leerzeichen


Sub ExposeID()

Dim browser As Object   'Aufnehmen der verwendeten Instanz des Browsers (Internet Explorer)
Dim knotenAst As Object 'Aufnehmen einer HTML Struktur aus dem Browser Dokument
Dim n As Integer
Dim url As String       'Aufnehmen der auszulesenden Adresse

Dim ExposeID As String
Dim letztezeile As Integer
Dim nodeList As Object, i As Long

Set browser = CreateObject("internetexplorer.application")
    browser.Visible = False

For n = 0 To 1

    url = "https://www.immobilienscout24.de/Suche/de/niedersachsen/oldenburg-oldenburg/haus-kaufen?pagenumber=" & n + 1
    browser.navigate url
    Do Until browser.readyState = 4: DoEvents: Loop

    letztezeile = ActiveSheet.Cells(Rows.Count, 1).End(xlUp).Row

Set nodeList = browser.document.querySelectorAll(".result-list__listing[data-id]")

For i = 0 To nodeList.Length - 1
Dim x As Integer

Dim FindString As String
Dim Rng As Range
FindString = nodeList.Item(i).getAttribute("data-id")
If Trim(FindString) <> "" Then
    With Sheets("IDs").Range("A:A") 'searches all of column A
        Set Rng = .Find(What:=FindString, _
                        After:=.Cells(.Cells.Count), _
                        LookIn:=xlValues, _
                        LookAt:=xlWhole, _
                        SearchOrder:=xlByRows, _
                        SearchDirection:=xlNext, _
                        MatchCase:=False)
        If Not Rng Is Nothing Then

        Else
          Cells(letztezeile + i + 1, 1) = nodeList.Item(i).getAttribute("data-id")
        End If
    End With
End If

Next i

Next n

Set nodeList = Nothing
browser.Quit

End Sub

【讨论】:

    【解决方案2】:

    识别为文本的值

    您可以使用 Match 函数来比较 cmets 中所写的值:

    If IsError(Application.Match(nodeList.Item(i).getAttribute("data-id"), Cells.Range("A:A"), 0)) Then
    

    问题是数据被识别为文本,当写入工作表时,它被转换为整数。您可以使用Val 函数将文本转换为数字。查看代码中的关键行:

    Option Explicit
    
    Sub ExposeID()
    
        Dim browser As Object   'Aufnehmen der verwendeten Instanz des Browsers (Internet Explorer)
        Dim knotenAst As Object 'Aufnehmen einer HTML Struktur aus dem Browser Dokument
        Dim n As Integer
        Dim url As String       'Aufnehmen der auszulesenden Adresse
    
        Dim ExposeID As String
        Dim letztezeile As Integer
        Dim nodeList As Object, i As Long
    
        Set browser = CreateObject("internetexplorer.application")
            browser.Visible = False
    
        For n = 0 To 1
    
            url = "https://www.immobilienscout24.de/Suche/de/niedersachsen/" _
              & "oldenburg-oldenburg/haus-kaufen?pagenumber=" & n + 1
            browser.navigate url
            Do Until browser.readyState = 4: DoEvents: Loop
    
            letztezeile = Cells(Rows.Count, 1).End(xlUp).Row
    
            Set nodeList = browser.document.querySelectorAll( _
              ".result-list__listing[data-id]")
    
            For i = 0 To nodeList.Length - 1
    
                If IsError(Application.Match(Val(nodeList.Item(i) _
                  .getAttribute("data-id")), Cells.Range("A:A"), 0)).Value Then
                    Cells(letztezeile + i + 1, 1).Value = nodeList.Item(i) _
                      .getAttribute("data-id")
                Else
    
                End If
    
            Next i
    
        Next n
    
        Set nodeList = Nothing
        browser.Quit
    
    End Sub
    

    我认为这个 Internet Explorer 版本太慢了,所以你可以问另一个问题,如何使用 xhr(XML HTTP 请求)来解决这个问题。

    【讨论】:

      猜你喜欢
      • 2017-07-13
      • 2016-01-21
      • 1970-01-01
      • 2020-09-30
      • 2022-12-11
      • 2015-03-12
      • 1970-01-01
      • 2020-03-21
      • 1970-01-01
      相关资源
      最近更新 更多