【问题标题】:Check valid URLs for Wistia检查 Wistia 的有效 URL
【发布时间】:2019-09-22 04:14:17
【问题描述】:

我找到了一个代码,我将其转换为 UDF 以检查 wistia 的 url 是否有效..

Sub Test()
MsgBox CheckValidURL("https://fast.wistia.net/embed/iframe/vud7ff4i6w")
End Sub

Function CheckValidURL(sURL As String) As Boolean
Dim oXMLHTTP        As Object
Dim sResponseText   As String
Dim aScriptParts    As Variant

Set oXMLHTTP = CreateObject("MSXML2.XMLHTTP")
oXMLHTTP.Open "GET", sURL, False
oXMLHTTP.Send

sResponseText = oXMLHTTP.responseText
aScriptParts = Split(sResponseText, "<script", , vbTextCompare)
If UBound(aScriptParts) > 0 Then CheckValidURL = True
End Function

我用几个链接测试了 UDF,我得到了正确的结果,但我不确定 UDF 是否正确 你能给我建议或改进那个UDF吗? 感谢高级帮助

【问题讨论】:

  • 由于您的代码没有任何问题/错误,请尝试在代码审查网站上发布此内容
  • 感谢您的建议。事实上,我正在寻找替代解决方案
  • 什么决定是否有效?如下所示的 200 响应代码?请提供有效和无效的网址
  • 这是有效的'fast.wistia.net/embed/iframe/vud7ff4i6w',您可以在网址末尾添加一些字母以获取无效链接
  • 假设页面实现保持不变,您的方法没有任何问题。

标签: excel vba web-scraping xmlhttprequest


【解决方案1】:

而不是

oXMLHTTP.responseText

你可以使用

oXMLHTTP.Status = 200 

这里是 xmlHttp 的状态列表

https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ms767625(v%3Dvs.85)

【讨论】:

  • 非常感谢。我尝试了两个链接,一个有效,一个无效,都返回 True !!!
  • 在这种情况下,“无效 URL”仍然返回 200,只是带有“未找到”消息。
  • 很好,如果“未找到”则无效,您可以使用
  • 你能告诉我这个'not found'点如何实现吗?
  • 如果 oXMLHTTP.Status = 200 和 sResponseText "未找到" 然后 CheckValidURL = true end if
【解决方案2】:

你可以通过在sub中创建xhr对象并传递给函数来提高效率,然后只看响应头link来区分

Option Explicit
Public Sub Test()
    Dim urls(), i As Long, xhr As Object
    Set xhr = CreateObject("MSXML2.XMLHTTP")
    urls = Array("https://fast.wistia.net/embed/iframe/vud7ff4i6wyh", "https://fast.wistia.net/embed/iframe/vud7ff4i6w")
    For i = LBound(urls) To UBound(urls)
        MsgBox CheckValidURL(urls(i), xhr)
    Next
End Sub

Public Function CheckValidURL(ByVal url As String, ByVal xhr As Object) As Boolean
    With xhr
        .Open "GET", url, False
        .send
        CheckValidURL = Not .getResponseHeader("link") = vbNullString
    End With
End Function

替代方案:

在函数测试中是否存在仅在有效链接中的 id 或字符串(以您的方式)

Public Sub Test()
    Dim urls(), i As Long, html As HTMLDocument, xhr As Object
    Set xhr = CreateObject("MSXML2.XMLHTTP"): Set html = New HTMLDocument
    urls = Array("https://fast.wistia.net/embed/iframe/vud7ff4i6wyh", "https://fast.wistia.net/embed/iframe/vud7ff4i6w")
    For i = LBound(urls) To UBound(urls)
        MsgBox CheckValidURL(urls(i), xhr, html)
    Next
End Sub

Public Function CheckValidURL(ByVal sURL As String, ByVal xhr As Object, ByVal html As HTMLDocument) As Boolean
    With xhr
        .Open "GET", sURL, False
        .send
        html.body.innerHTML = .responseText
    End With
    CheckValidURL = html.querySelectorAll("#wistia_video").Length > 0
End Function

使用 Instr 也可以

Option Explicit
Public Sub Test()
    Dim urls(), i As Long, html As HTMLDocument, xhr As Object
    Set xhr = CreateObject("MSXML2.XMLHTTP")
    urls = Array("https://fast.wistia.net/embed/iframe/vud7ff4i6wyh", "https://fast.wistia.net/embed/iframe/vud7ff4i6w")
    For i = LBound(urls) To UBound(urls)
        MsgBox CheckValidURL(urls(i), xhr)
    Next
End Sub

Public Function CheckValidURL(ByVal sURL As String, ByVal xhr As Object) As Boolean
    With xhr
        .Open "GET", sURL, False
        .send
        CheckValidURL = InStr(.responseText, "html") > 0
    End With     
End Function

重写你的:

Option Explicit
Public Sub Test()
    Dim urls(), i As Long, html As HTMLDocument, xhr As Object
    Set xhr = CreateObject("MSXML2.XMLHTTP")
    urls = Array("https://fast.wistia.net/embed/iframe/vud7ff4i6wyh", "https://fast.wistia.net/embed/iframe/vud7ff4i6w")
    For i = LBound(urls) To UBound(urls)
        MsgBox CheckValidURL(urls(i), xhr)
    Next
End Sub

Public Function CheckValidURL(ByVal sURL As String, ByVal xhr As Object) As Boolean
    With xhr
        .Open "GET", sURL, False
        .send
        CheckValidURL = UBound(Split(.responseText, "<script", , vbTextCompare)) > 0
    End With
End Function

【讨论】:

  • 至于 xhr 的第一个解决方案,我都得到了 False 并且在测试 ?.getResponseHeader("Content-Encoding") 时我得到了 br 而不是 identity
  • 我希望有效的是身份。您能否将 .getAllResponseHeaders 的结果粘贴到其中以指示哪个应该通过,哪个应该失败?
  • 另外,试试:CheckValidURL = Not .getResponseHeader("link") = vbNullString
  • 是的,它现在正在使用“链接”。非常感谢您的帮助
  • 这种方式效率更高,因为您只处理标题
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2011-04-17
  • 2016-11-09
  • 2011-03-12
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多