【问题标题】:How to remove elements of list that contains ascii characters?如何删除包含 ascii 字符的列表元素?
【发布时间】:2019-07-13 22:03:39
【问题描述】:

我正在寻找一种方法来删除以asccii 字符开头或包含asccii 字符的元素。

我有一个列表,其中包含一些在转换为 int 时具有以下代码的元素:

>   utf8ToInt(splitted[[1]][171]) ### "     1" 
 [1] 57412    32 57412    32 57412    32 57412    32 57412    32    49
>   utf8ToInt(splitted[[1]][181]) ### "     3"  
 [1] 57412    32 57412    32 57412    32 57412    32 57412    32    51

我看到一个模式,每个元素都以整数 57412 开头,我想从列表中删除这些元素。

实际列表:

这是实际列表。

splitted <- c("Tv Samsung 49\" Full Hd Wifi Un49j5290 sm...", "S/ 1,699.00 - 35%", 
"S/ 1,099.00", "TV SAMSUNG LED 23.5 INCH CURVED.", "S/ 729.00", 
"Televisor Samsung UN50MU6103 Uhd 4k 50\"...", "S/ 1,799.00 - 27%", 
"S/ 1,299.00", "SAMSUNG SMART TV UHD 4K 50\" 50RU7100 mod...", 
"S/ 1,999.00 - 36%", "S/ 1,274.90", "Samsung SMART TV QLED UHD 55'' QN55Q7FAM...", 
"S/ 4,999.00 - 6%", "S/ 4,679.00", "Televisor Samsung Led Smart Super UHD 4K...", 
"S/ 3,499.00 - 28%", "S/ 2,497.00", "Televisor LED 50<U+2033> UHD 4K Smart TV Samsun...", 
"S/ 1,999.00 - 35%", "S/ 1,299.00", "Televisor Samsung LED Smart TV UHD 4K 55...", 
"S/ 1,899.00 - 16%", "S/ 1,582.00", "Samsung Smart Tv led Ultra HD 4K 65\" UN6...", 
"S/ 2,499.00 - 4%", "S/ 2,398.99", "SAMSUNG SMART TV UHD 43'' 43NU7090 +Sou...", 
"S/ 1,299.00 - 23%", "S/ 999.00", "Televisor 55' 4K UHD SMART TV UN55NU8500...", 
"S/ 4,999.00", "SAMSUNG SMART TV UHD 43'' 43NU7090-Negro", "S/ 1,549.00 - 32%", 
"S/ 1,049.00", "Televisor Samsung 49\" UHD 4K Curvo Smart...", 
"S/ 2,499.00 - 32%", "S/ 1,699.00", "Televisor Samsung Led Smart UHD 4K Curvo...", 
"S/ 5,999.00", "Smart Tv Samsung 40\" UHD 4K UN40MU6103GX...", 
"S/ 2,199.00 - 45%", "S/ 1,199.00", "Televisor Led Samsung Smart TV UHD 4K Cu...", 
"S/ 2,199.00 - 18%", "S/ 1,799.00", "55\" NU7090 UHD Plano Smart TV 4K 2018", 
"S/ 1,999.00 - 12%", "S/ 1,749.00", "NUEVO. Samsung Smart TV UHD 50\" 50RU7100", 
"S/ 1,999.00 - 30%", "S/ 1,398.90", "<U+E044> <U+E044> <U+E044> <U+E044> <U+E044> 1", 
"Tv Led Samsung 49 Full Hd Wifi Un49j5290...", "S/ 1,599.00 - 22%", 
"S/ 1,239.00", "Samsung Smart Tv led Ultra HD 4K 65\" UN6...", 
"S/ 3,499.00 - 34%", "S/ 2,299.00", "<U+E044> <U+E044> <U+E044> <U+E044> <U+E044> 1", 
"Samsung - Televisor LED Smart TV UHD 4K...", "S/ 2,499.00 - 28%", 
"S/ 1,799.00", "Samsung - Televisor LED Smart Tv Ultra H...", 
"S/ 2,499.00 - 24%", "S/ 1,899.00", "Televisor Samsung 50\" Smart FHD 4K 50NU7...", 
"S/ 1,999.00 - 25%", "S/ 1,497.00", "Televisor Curvo Smart TV Samsung 65MU650...", 
"S/ 8,599.00", "Tv Samsung 50” Smart UHD 50NU7090 – Negr...", 
"S/ 1,999.00 - 35%", "S/ 1,297.00", "<U+E044> <U+E044> <U+E044> <U+E044> <U+E044> 1", 
"Tv Led Samsung 55'' Curvo 4K 55NU7300 +...", "S/ 2,999.00 - 23%", 
"S/ 2,299.00", "Samsung - Televisor Smart Tv FHD 40\" 40J...", 
"S/ 1,599.00 - 38%", "S/ 979.00", "Televisor Samsung 4K Smart TV - UN55NU7...", 
"S/ 2,499.00 - 36%", "S/ 1,599.00", "Tv Led Samsung 50 4k Smart Tv 50nu7100 U...", 
"S/ 2,499.00 - 46%", "S/ 1,349.00", "<U+E044> <U+E044> <U+E044> <U+E044> <U+E044> 3", 
"Samsung - Televisor Smart Tv Ultra HD 4k...", "S/ 3,299.00 - 15%", 
"S/ 2,789.00", "SAMSUNG SMART TV UHD 43'' 43NU7090 +Sou...", 
"S/ 1,600.00 - 26%", "S/ 1,179.00", "Tv Led Samsung 40'' 4k Smart Tv 40MU6103...", 
"S/ 1,299.00 - 3%", "S/ 1,249.00", "SMART TV SAMSUNG 55 UHD 4K UN55MU6105GXP...", 
"S/ 2,599.00 - 24%", "S/ 1,969.00", "Tv Led Smart Samsung UN49J5290AG 49\" Ful...", 
"S/ 1,499.00 - 31%", "S/ 1,029.00", "<U+E044> <U+E044> <U+E044> <U+E044> <U+E044> 3", 
"Samsung Smart Tv led Ultra HD 4K 65\" UN6...", "S/ 3,499.00 - 30%", 
"S/ 2,439.00", "SAMSUNG - SMART TV UN50MU6103G UHD 4K 50...", 
"S/ 1,890.00 - 10%", "S/ 1,690.00", "Televisor Samsung 49\" 4k UHD Curvo Smart...", 
"S/ 1,899.00 - 21%", "S/ 1,489.00", "SAMSUNG SMART TV UHD 43'' 43NU7090-Negro...", 
"S/ 1,399.00 - 28%", "S/ 999.00", "<U+E044> <U+E044> <U+E044> <U+E044> <U+E044> 1", 
"SAMSUNG TELEVISOR HOTELERO DE 32\" HG32NE...", "S/ 899.00 - 26%", 
"S/ 659.00", "<U+E044> <U+E044> <U+E044> <U+E044> <U+E044> 1", 
"Samsung - Televisor LED Smart Tv FHD 43\"...", "S/ 1,299.00 - 19%", 
"S/ 1,049.00", "Smart TV Curved Samsung 55\" UHD 4K 55MU6...", 
"S/ 2,999.00 - 20%", "S/ 2,389.00", "Tv Samsung 65\" 4k Smart Tv 65NU7090 Ultr...", 
"S/ 3,499.00 - 31%", "S/ 2,399.00", "Samsung - Smart Tv UHD 4K 50\" 50NU7090 +...", 
"S/ 1,799.00 - 5%", "S/ 1,699.00", "Led Samsung Smart Tv 50\" Ultra HD 4K 50R...", 
"S/ 1,599.00 - 14%", "S/ 1,369.00", "Televisor Samsung Smart 49\" 49J5290", 
"S/ 1,799.00 - 31%", "S/ 1,229.00", "<U+E044> <U+E044> <U+E044> <U+E044> <U+E044> 2", 
"Samsung - Televisor Smart Tv UHD 4K 58\"...", "S/ 1,999.00 - 15%", 
"S/ 1,699.00", "TV Samsung 32\" HD Smart Plano J4290 Negr...", 
"S/ 999.00 - 32%", "S/ 679.00", "TV LED SMART Samsung - 43NU7100 - 4K UHD...", 
"S/ 1,899.00 - 32%", "S/ 1,289.00", "Samsung - Televisor 40\" Smart Tv Full HD...", 
"S/ 4,499.00 - 55%", "S/ 2,019.00", "Televisor 43\" FHD SMART TV UN43J5202AGXP...", 
"S/ 1,399.00", "Televisor 50<U+2033> UHD 4K Smart TV Samsung 50...", 
"S/ 1,999.00 - 32%", "S/ 1,359.00", "Televisor Curvo Smart TV Samsung 55RU730...", 
"S/ 2,699.00", "TELEVISOR SAMSUNG 40 FULL HD,SMART UN40J...", 
"S/ 1,200.00 - 22%", "S/ 929.00", "Smart Tv Samsung 40\" FULL HD UN40J5290", 
"S/ 1,399.00 - 34%", "S/ 919.00", "Smart TV Samsung 32\" HD UN32J4300D Flat...", 
"S/ 1,299.00 - 34%", "S/ 849.00", "TV Smart Samsung UN49J5290AG 49\" Full HD...", 
"S/ 1,899.00 - 44%", "S/ 1,049.00", "<U+E044> <U+E044> <U+E044> <U+E044> <U+E044> 1", 
"Samsung - Televisor Smart Tv UHD 4K 55\"...", "S/ 2,499.00 - 36%", 
"S/ 1,599.00", "Tv led Smart Samsung 58\" UN58NU7100G Ult...", 
"S/ 2,499.00 - 26%", "S/ 1,840.00", "Tv Samsung 50” Smart UHD 50NU7090 – Negr...", 
"S/ 1,450.00 - 10%", "S/ 1,298.00", "<U+E044> <U+E044> <U+E044> <U+E044> <U+E044> 3", 
"Televisor LED UHD 4K Smart 55\"Samsung UN...", "S/ 1,699.00 - 5%", 
"S/ 1,599.00")

更新 1:

建议的答案仅在从dput 的输出构造列表时有效,因为符号 被转换为&lt;U+E044&gt;

我的意思是,提供的正则表达式没有捕获符号:

splitted[[1]] 是原始列表。

splitted_x #是由dput函数构造的列表。

idx <- grepl("\\<U\\+E044\\>", splitted[[1]])
sum(idx) #returns 0
idx <- grepl("\\<U\\+E044\\>", splitted_x)
sum(idx) #returns 10

【问题讨论】:

    标签: r


    【解决方案1】:

    如果你只是想消除包含"&lt;U+E044&gt;"的元素,这样就行了:

    idx <- grepl("\\<U\\+E044\\>", splitted)
    sum(idx)
    # [1] 10
    splitted2 <- splitted[!idx]
    length(splitted)
    # [1] 184
    length(splitted2)
    # [1] 174
    

    现在 splitted2 包含 174 个不包含 "&lt;U+E044&gt;" 的元素。如果您只需要删除包含该字符串的元素 5 次,请使用

    idx <- grepl("\\<U\\+E044\\>{5}", splitted)
    

    【讨论】:

    • 如果没有实际数据样本,我们将无能为力。也许像这样 sapply(splitted[[1]], function(x) utf8ToInt(x)[1] != 57412)。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-03-25
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多