【发布时间】:2017-09-21 19:37:35
【问题描述】:
我有一个问题,我认为我会使用正则表达式,但如果有其他解决方法,我很想听听。
我的问题是我从网站上抓取产品的描述。现在要做到这一点,我正在使用类似:description= $('.description').html();。这从网站获得了我需要的所有内容,在这种情况下,是百思买的命运 2。结果就是 sn-p 中的内容。
<div id="synopsis">From the makers of the acclaimed hit game Destiny, comes the much-anticipated sequel. An action shooter that takes you on an epic journey across the solar system.<br><br>Humanity’s last safe city has fallen to an overwhelming invasion force, led by Ghaul, the imposing commander of the brutal Red Legion. He has stripped the city’s Guardians of their power, and forced the survivors to flee. You will venture to mysterious, unexplored worlds of our solar system to discover an arsenal of weapons and devastating new combat abilities. To defeat the Red Legion and confront Ghaul, you must reunite humanity’s scattered heroes, stand together, and fight back to reclaim our home.</div>
<div id="features"><div class="icon-feature-list"></div><div class="feature"><span class="type-paragraph-title">Includes: Destiny 2 Base Game</span><p></p></div><div class="feature"><span class="type-paragraph-title">Gameplay Features:</span><p>- Rich cinematic story campaign.</p></div><div class="feature"><p>- Multiple cooperative game modes for epic, social fun.</p></div><div class="feature"><p>- Intense 4v4 competitive multiplayer matches, including 5 different PVP modes.</p></div><div class="feature"><p>- Expansive, never-before-seen worlds and spaces to explore.</p></div><div class="feature"><p>- Customize your character’s weapons and armor with an all-new array of gear.</p></div><div class="feature"><p>- Discover Lost Sectors, complete new Adventure missions, or rally to Public Events with other Guardians.</p></div><div class="feature"><p>- Introducing a brand new Guided Games system that helps players find like-minded groups to experience Destiny 2’s most challenging activities, like the Raid.</p></div></div>
在显示结果之前,我需要删除所有标签和选择器并替换为 <p> 元素,<li> 和 <ul> 元素除外,这样当我重新显示它们时它们不会干扰任何东西,但内容仍然存在并且在一个新的行上。所以在这种情况下<div id="synopsis">This is text inside</div> 将等于<p>This is text inside</p>。
如果可能,我还想删除 <ul> 和 <li> 标签的所有属性,同时保留实际标签。
希望这是有道理的,我感谢任何人可以给我的任何帮助,如果有其他我没有想到的解决方案,我很想听听。
【问题讨论】:
-
stringOfHTML.replace(/<\/?[^>]+(>|$)/g, "<p>") -
只适用于
.
-
试试这个
str.replace(/((id=)".+")|(class=".*")/,""); -
出于某种原因 Sayam,这取出了除了最后一个 li 元素之外的大部分数据。我还需要删除的不仅仅是类和 id。许多网站都有我不想在我的网站上出现的“aria-label”或“data-recommendations”等属性。基本上 中的任何内容都需要替换为 p 标签。
标签: javascript jquery html regex parsing