Rust：迭代 ASCII 字符串字符的最有效方法答案

【问题标题】：Rust: Most efficient way to iterate over chars of an ASCII stringRust：迭代 ASCII 字符串字符的最有效方法
【发布时间】：2021-03-11 21:25:32
【问题描述】：

我原来的做法：

pub fn find_the_difference(s: String, t: String) -> char {
        let mut c:u8 = 0;
        for i in 0..s.chars().count() {
            c ^= t.chars().nth(i).unwrap() as u8 ^ s.chars().nth(i).unwrap() as u8;
        }
        return (c ^ t.chars().nth(s.chars().count()).unwrap() as u8) as char;
        
    }

但是它太慢了，而且我不得不写的所有东西都替换t[i] ^ s[i]（见下面的原始 C++ 函数）。所以我找了些别的东西，找到了this method，我们把字符串转换成char数组，得到了一些不错的结果（从8ms到0ms）。

pub fn find_the_difference(s1: String, t1: String) -> char {
        let mut c:u8 = 0;
        let s: Vec<char> = s1.chars().collect();
        let t: Vec<char> = t1.chars().collect();

        for i in 0..s1.chars().count() {
            c ^= t[i] as u8 ^ s[i] as u8;
        }
        return (c ^ t[s1.chars().count()] as u8) as char;

    }

但也许不需要收集，我也不关心索引，我只想一个接一个地迭代一个字符。我目前的尝试：

pub fn find_the_difference(s1: String, t1: String) -> char {
        let mut c:u8 = 0;
        let mut s = s1.chars();
        let mut t = t1.chars();
        let n = s.count();
        
        for i in 0..n {
            c ^= t.next().unwrap() as u8 ^ s.next().unwrap() as u8; // c ^= *t++ ^ *s++ translated in C++
        }
        return (c ^ t.next().unwrap() as u8) as char;
        
    }

我收到以下错误消息：

Line 9, Char 44: borrow of moved value: `s` (solution.rs)
   |
4  |         let mut s = s1.chars();
   |             ----- move occurs because `s` has type `std::str::Chars<'_>`, which does not implement the `Copy` trait
5  |         let mut t = t1.chars();
6  |         let n = s.count();
   |                 - value moved here
...
9 |             c ^= t.next().unwrap() as u8 ^ s.next().unwrap() as u8;
   |                                            ^ value borrowed here after move
error: aborting due to previous error

这种代码有没有可能实现c = *t++？

注意：s1.chars.count() = t1.chars.count() - 1 目标是在 t1 中找到多余的字母

NB2：原始 C++ 函数：

char findTheDifference(string s, string t) {
        char c = 0;
        for (int i = 0; t[i]; i++)
            c ^= t[i] ^ s[i];
        return c;
    }

【问题讨论】：

for char in str.chars() {...} 怎么样？你想同时迭代两个字符串吗？
请注意，Rust String 是 UTF8，char 是 1-4 字节的 unicode 标量值。如果您想要与 C++ 代码相同的行为和性能，那么您可能想要使用Vec<u8>。
我一开始尝试了 Vec 但是转换有问题
尝试使用 bytes() 而不是 chars()
当您考虑多字节字符时，尚不清楚该 C++ 函数的行为应该是什么。（在翻译没有明确区分字符和字节的 API 时，这是一个常见问题。）您可以将其翻译为 fn find_the_difference(s: &[u8], t: &[u8]) -> u8，但在应用于 UTF-8 字符串时可能不会给出“有趣”的结果.或者您可以写fn find_the_difference(s: &str, t: &str) -> u32（将字符视为 UTF-32 编码），但对于非 ASCII 文本则更加不同。

标签： rust

【解决方案1】：

我认为您对 C 和 Rust 字符串处理之间的差异以及 Rust 的 str、String、&[u8]、char 和 u8 类型之间的区别感到困惑。

也就是说，我将如何实现您的功能：

fn find_the_difference(s: &[u8], t: &[u8]) -> u8 {
    assert!(t.len() > s.len());
    let mut c: u8 = 0;
    for i in 0..s.len() {
        c ^= s[i] ^ t[i];
    }
    c ^ t[s.len()]
}

如果您的数据当前是String，您可以使用as_bytes() 方法获得它的&[u8] 视图。像这样：

let s: String = ...some string...;
let t: String = ...some string...;

let diff = find_the_difference(s.as_bytes(), t.as_bytes());

【讨论】：

【解决方案2】：

zip 两个iterators 在一起。

而且，as Peter Hall comments，它更安全。 You can't assume characters are 1 byte。只需使用!=。

fn main() {
    let a = "☃ Thiñgs";
    let b = "☃ Thiñks";

    let both = a.chars().zip(b.chars());
    
    for pair in both {
        if pair.0 != pair.1 {
            println!("Different {} {}", pair.0, pair.1);
        }
    }
}

当任一迭代器耗尽时，这将停止。

如果您也想要索引，请使用char_indicies。

因为它们是 Rust 的一个关键特性，所以迭代器是一种“零成本抽象”，这意味着 Rust 将为您进行优化。 Iterators are generally as fast or faster than hand-written loops.

【讨论】：

对不起，我忘了补充：在这种特殊情况下，字符串输入由 1 字节字符组成。我真的很喜欢您的解决方案，但目标是遍历所有字母，在您的示例中，您将在处理“HiWo”后停止。
@AntoninGAVREL 如果您对在一个字符串比另一个字符串短时提前停止感到不安，您应该澄清您的 C++ 代码，因为它会显示出越界访问 如果第二个字符串更长。
你错了@kmdreko，第二个字符串长了一个字符，所以没有越界，因为我们将第一个字符串中的\0与第二个字符串的最后一个非空字符异或。我的问题中已说明。
@AntoninGAVREL 您能否更详细地解释您要解决的全部问题？如果它们是 1 字节，您的函数不应假装它适用于字符串和字符，它需要 u8 切片并返回 u8。 Rust 不是 C++。
leetcode.com/problems/find-the-difference/description 不是“我的功能”;)