【问题标题】:Parse ATOM rss feed and remove html tags解析 ATOM rss 提要并删除 html 标签
【发布时间】:2020-07-24 07:44:04
【问题描述】:

我正在使用 powershell 开发此代码。我需要能够提取 html 标签。

  Invoke-WebRequest -Uri 'https://psu.box.com/shared/static/jf36ohodxnw7oemghsau1t7qb0w4y708.rss' -  OutFile C:\users\anr2809\Documents\alerts.txt
  [xml]$Content = Get-Content C:\users\anr2809\Documents\alerts.txt -Raw
  $Regex = '(?s)SE1046.*?Description := "(?<Description>.*?)"'

 If ($Content -match $Regex) {
      "Description is '$($Matches['Description'])'"
      # do something here with $Matches['Description']
    }
 Else {
    "No match."
      }
   $Feed = $Content.rss.channel
 ForEach ($msg in $Feed.Item){
     $ParseData = (($msg.description))
    ForEach ($Datum in $ParseData){
     If ($Datum -like "Title"){[int]$Upvote = ($Datum).split(' ') | Select-Object -First 1}#EndIf
     If ($Datum -like "comments"){[int]$Downvote = ($Datum).split(' ') | Select-Object -First 1}    #EndIf
    }#EndForEach
     [PSCustomObject]@{
     'LastUpdated' = [datetime]$msg.pubDate
     'Title' = $msg.title
     'Category' = $msg.category
     'Author' = $msg.author
     'Link' = $msg.link
     'UpVotes' = $Upvote
     'DownVotes' = $Downvote
     'Validations' = $Validation
     'WorkArounds' = $Workaround
     'Comments' = $msg.description.InnerText                   
     'FeedbackID' = $FeedBackID
    }#EndPSCustomObject
   }

这是结果,我想删除html标签。

LastUpdated : 3/30/2020 9:45:52 AM
Title       : Enterprise Network Planned Outage
Category    : 
Author      : 
Link        : link
UpVotes     : 
DownVotes   : 
Validations : 
WorkArounds : 
Comments    : 
                    <p><strong>People and Locations Impacted:</strong><br />All    students, faculty, and staff at all State locations<br /><br />
FeedbackID  : 

【问题讨论】:

    标签: powershell powershell-4.0


    【解决方案1】:

    您可以将&lt;br/&gt; 替换为实际的换行符,然后将其余部分完全剥离:

    $commentsPlain = $msg.description.InnerText -replace '<br ?/?>',[System.Environment]::NewLine -replace '<[^>]+>'
    
    [PSCustomObject]@{
        'LastUpdated' = [datetime]$msg.pubDate
        'Title' = $msg.title
        'Category' = $msg.category
        'Author' = $msg.author
        'Link' = $msg.link
        'UpVotes' = $Upvote
        'DownVotes' = $Downvote
        'Validations' = $Validation
        'WorkArounds' = $Workaround
        'Comments' = $commentsPlain
        'FeedbackID' = $FeedBackID
    }
    

    【讨论】:

      【解决方案2】:

      您应该能够使用以下脚本。它利用了HTMLFile com 对象。

        Invoke-WebRequest -Uri 'https://*.rss' -  OutFile C:\*.rss
        [xml]$Content = Get-Content C:\*.rss -Raw
        $Regex = '(?s)SE1046.*?Description := "(?<Description>.*?)"'
      
       If ($Content -match $Regex) {
            "Description is '$($Matches['Description'])'"
            # do something here with $Matches['Description']
          }
       Else {
          "No match."
            }
         $Feed = $Content.rss.channel
       ForEach ($msg in $Feed.Item){
      
      
           $ParseData = $msg.description
          ForEach ($Datum in $ParseData){
           If ($Datum -like "Title"){[int]$Upvote = ($Datum).split(' ') | Select-Object -First 1}#EndIf
           If ($Datum -like "comments"){[int]$Downvote = ($Datum).split(' ') | Select-Object -First 1}    #EndIf
          }#EndForEach     
      
          $HTML = New-Object -ComObject "HTMLFile"
          $HTML.IHTMLDocument2_write($ParseData.InnerText)
      
           [PSCustomObject]@{
           'LastUpdated' = [datetime]$msg.pubDate
           'Title' = $msg.title
           'Category' = $msg.category
           'Author' = $msg.author
           'Link' = $msg.link
           'UpVotes' = $Upvote
           'DownVotes' = $Downvote
           'Validations' = $Validation
           'WorkArounds' = $Workaround
           'Comments' = $HTML.all.tags("p") | % InnerText           
           'FeedbackID' = $FeedBackID
          }#EndPSCustomObject
         }
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2012-04-02
        • 2012-04-15
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多