【发布时间】:2022-01-18 05:02:20
【问题描述】:
我正在抓取一个网站,需要从我的字符串中删除所有 /n 和 /t。
我已经尝试了以下代码:
item.post_category = [];
Array.from($doc.find('h6.link')).forEach(function(link){
console.log(link.textContent.replace(/\t+\n+/gm, ""));
item.post_category.push(link.textContent);
})
//this removes the linebreaks but not the tabs
这是我必须迭代的多个示例数组:
["\n\t\t\t\t\tJune 15, 2021 • \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tFamily,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tGender Equality,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tIn the News\n\t\t\t\t"]
["\n\t\t\t\t\tJune 13, 2020 • \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tIn the News\n\t\t\t\t"]
["\n\t\t\t\t\tJuly 5, 2021 • \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tNews\n\t\t\t\t"]
理想情况下,我希望我的数组看起来像这样。删除 date 和 \n 和 \t。
["Family,Gender Equality,In the News"]
["In the News"]
["News"]
【问题讨论】:
-
replace()返回一个新字符串,它不会改变原始字符串。此外,您可以使用 .map() 而不是 .forEach() 并推送到新数组
标签: javascript arrays regex web-scraping