【问题标题】:How to remove all instances of line breaks and tabs using Javascript如何使用 Javascript 删除换行符和制表符的所有实例
【发布时间】:2022-01-18 05:02:20
【问题描述】:

我正在抓取一个网站,需要从我的字符串中删除所有 /n 和 /t。

我已经尝试了以下代码:

item.post_category = [];
 Array.from($doc.find('h6.link')).forEach(function(link){ 
            console.log(link.textContent.replace(/\t+\n+/gm, ""));        
            item.post_category.push(link.textContent);
          })
//this removes the linebreaks but not the tabs

这是我必须迭代的多个示例数组:

["\n\t\t\t\t\tJune 15, 2021 • \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tFamily,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tGender Equality,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tIn the News\n\t\t\t\t"]

["\n\t\t\t\t\tJune 13, 2020 • \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tIn the News\n\t\t\t\t"]

["\n\t\t\t\t\tJuly 5, 2021 • \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tNews\n\t\t\t\t"]

理想情况下,我希望我的数组看起来像这样。删除 date 和 \n 和 \t。

["Family,Gender Equality,In the News"]
["In the News"]
["News"]

【问题讨论】:

  • replace() 返回一个新字符串,它不会改变原始字符串。此外,您可以使用 .map() 而不是 .forEach() 并推送到新数组

标签: javascript arrays regex web-scraping


【解决方案1】:

有数百种方法可以做到这一点,您可以根据需要使用正则表达式或拆分。

这是可能的解决方案之一:

let str = "\n\t\t\t\t\tJune 15, 2021 • \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tFamily,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tGender Equality,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tIn the News\n\t\t\t\t"

// Remove all new lines and tabs with a regex. You could also add '\r\n' if necessary.
str = str.replace(/(\n|\t)/gm, '');

// Here we assume that your string will 
// always contain the date followed by this character: •. 
// So we split according to this character, and we select 
// the second item of the table, which corresponds to the text without the date.
let result = str.split('•')[1].trim()

console.log(result) // prints 'Family,Gender Equality,In the News'

【讨论】:

  • 哇,我不知道为什么我没有想到 replaceAll 而是尝试使用正则表达式。我真笨。感谢您的快速回复。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2010-09-24
  • 2013-04-27
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-10-04
相关资源
最近更新 更多