字符的 UTF-8 转换答案

【问题标题】：UTF-8 conversion for characters字符的 UTF-8 转换
【发布时间】：2018-04-05 00:15:37
【问题描述】：

我目前有一个std::string，它包含这个

"\xa9 2006 FooWorld"

基本上它包含符号 © 。这个字符串被传递给一个外部 API 的方法，该 API 采用 UTF-8。我怎样才能使这个字符串与 UTF-8 兼容？有什么建议么。我读了here 我可以使用std::wstring_convert 但我不确定如何在我的情况下应用它。任何建议，将不胜感激。

【问题讨论】：

对于那个角色来说，它可能不值得任何复杂的东西。只需硬编码 utf-8 等效项。 utf8-chartable.de
问题是它可能是多个字符
你可能应该在问题中有这个。 :) 就个人而言，我会使用这个：utfcpp.sourceforge.net
std::string 存储字节，而不是字符。因此，如果您不知道原始编码，则无法保证有效。如果您知道原始编码是 utf8，那么您不需要任何额外的东西，因为std::string 再次存储编码字节。
也许你想看The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

标签： c++ c++11 unicode

【解决方案1】：

这很简单：使用 UTF-8 字符串文字：

u8"\u00A9 2006 FooWorld"

这将产生一个const char[]，它是一个正确编码的 UTF-8 字符串。

【讨论】：

例如，如果我有 std::basic_string str = "\xa9 2006 FooWorld" 我如何将 u8 附加到它？
@MistyD：您将代码更改为：std::string str = u8"\u00A9 2006 FooWorld"。如果您不允许更改文字本身，那么这就是前面所述的重复。
当使用任何可识别 Unicode 的文字前缀时，您可以使用实际的 Unicode 字符而不是手动使用其代码点/代码单元，例如：u8"© 2006 FooWorld"。让编译器为您完成工作。

【解决方案2】：

在 C++11 及更高版本中，获取 UTF-8 编码字符串文字的最佳方法是使用 u8 前缀：

std:string str = u8"\u00A9 2006 FooWorld";

或：

std:string str = u8"© 2006 FooWorld";

但是，您也可以使用std::wstring_convert（尤其是当您的输入数据不是字符串文字时）：

#include <codecvt>
#include <locale>
#include <string>

std::wstring wstr = L"© 2006 FooWorld"; // or whatever...

std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> convert;

std::string str = convert.to_bytes(wstr);

【讨论】：