【问题标题】:CodePage ID to CodePage name: GetEncoding equivalent in Delphi?CodePage ID 到 CodePage 名称:Delphi 中的 GetEncoding 等效项?
【发布时间】:2010-04-04 11:51:41
【问题描述】:

我正在寻找在 Delphi7 中使用的 .Net Encoding.GetEncoding 方法的 Win32 等效项。

我想要实现的是将代码页 ID(即:28592)转换为代码页名称(在本例中为 iso-8859-2)。

我找到了一个名为 GetCPInfoEx 的 Win32 函数,但它返回一个长的 CodePage 名称,我需要一个短的,就像这个页面上列出的那样:(见名称列) http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx

谢谢!

【问题讨论】:

  • 对于使用最新版本 Delphi(XE) 的人,我们有 System.SysUtils.TEncoding.GetEncoding(CodePageId).EncodingName

标签: .net delphi encoding winapi codepages


【解决方案1】:

这是我的代码查找表,请随意使用。

type
  TCPData = record
    CPID: Integer;
    CPName: String;
  end;

const
  MaxEncodings = 140;

  Encodings: Array[0..MaxEncodings - 1] of TCPData =
  (
    (CPID: 37; CPName: 'IBM037'),
    (CPID: 437; CPName: 'IBM437'),
    (CPID: 500; CPName: 'IBM500'),
    (CPID: 708; CPName: 'ASMO-708'),
    (CPID: 720; CPName: 'DOS-720'),
    (CPID: 737; CPName: 'ibm737'),
    (CPID: 775; CPName: 'ibm775'),
    (CPID: 850; CPName: 'ibm850'),
    (CPID: 852; CPName: 'ibm852'),
    (CPID: 855; CPName: 'IBM855'),
    (CPID: 857; CPName: 'ibm857'),
    (CPID: 858; CPName: 'IBM00858'),
    (CPID: 860; CPName: 'IBM860'),
    (CPID: 861; CPName: 'ibm861'),
    (CPID: 862; CPName: 'DOS-862'),
    (CPID: 863; CPName: 'IBM863'),
    (CPID: 864; CPName: 'IBM864'),
    (CPID: 865; CPName: 'IBM865'),
    (CPID: 866; CPName: 'cp866'),
    (CPID: 869; CPName: 'ibm869'),
    (CPID: 870; CPName: 'IBM870'),
    (CPID: 874; CPName: 'windows-874'),
    (CPID: 875; CPName: 'cp875'),
    (CPID: 932; CPName: 'shift_jis'),
    (CPID: 936; CPName: 'gb2312'),
    (CPID: 949; CPName: 'ks_c_5601-1987'),
    (CPID: 950; CPName: 'big5'),
    (CPID: 1026; CPName: 'IBM1026'),
    (CPID: 1047; CPName: 'IBM01047'),
    (CPID: 1140; CPName: 'IBM01140'),
    (CPID: 1141; CPName: 'IBM01141'),
    (CPID: 1142; CPName: 'IBM01142'),
    (CPID: 1143; CPName: 'IBM01143'),
    (CPID: 1144; CPName: 'IBM01144'),
    (CPID: 1145; CPName: 'IBM01145'),
    (CPID: 1146; CPName: 'IBM01146'),
    (CPID: 1147; CPName: 'IBM01147'),
    (CPID: 1148; CPName: 'IBM01148'),
    (CPID: 1149; CPName: 'IBM01149'),
    (CPID: 1200; CPName: 'utf-16'),
    (CPID: 1201; CPName: 'unicodeFFFE'),
    (CPID: 1250; CPName: 'windows-1250'),
    (CPID: 1251; CPName: 'windows-1251'),
    (CPID: 1252; CPName: 'Windows-1252'),
    (CPID: 1253; CPName: 'windows-1253'),
    (CPID: 1254; CPName: 'windows-1254'),
    (CPID: 1255; CPName: 'windows-1255'),
    (CPID: 1256; CPName: 'windows-1256'),
    (CPID: 1257; CPName: 'windows-1257'),
    (CPID: 1258; CPName: 'windows-1258'),
    (CPID: 1361; CPName: 'Johab'),
    (CPID: 10000; CPName: 'macintosh'),
    (CPID: 10001; CPName: 'x-mac-japanese'),
    (CPID: 10002; CPName: 'x-mac-chinesetrad'),
    (CPID: 10003; CPName: 'x-mac-korean'),
    (CPID: 10004; CPName: 'x-mac-arabic'),
    (CPID: 10005; CPName: 'x-mac-hebrew'),
    (CPID: 10006; CPName: 'x-mac-greek'),
    (CPID: 10007; CPName: 'x-mac-cyrillic'),
    (CPID: 10008; CPName: 'x-mac-chinesesimp'),
    (CPID: 10010; CPName: 'x-mac-romanian'),
    (CPID: 10017; CPName: 'x-mac-ukrainian'),
    (CPID: 10021; CPName: 'x-mac-thai'),
    (CPID: 10029; CPName: 'x-mac-ce'),
    (CPID: 10079; CPName: 'x-mac-icelandic'),
    (CPID: 10081; CPName: 'x-mac-turkish'),
    (CPID: 10082; CPName: 'x-mac-croatian'),
    (CPID: 12000; CPName: 'utf-32'),
    (CPID: 12001; CPName: 'utf-32BE'),
    (CPID: 20000; CPName: 'x-Chinese-CNS'),
    (CPID: 20001; CPName: 'x-cp20001'),
    (CPID: 20002; CPName: 'x-Chinese-Eten'),
    (CPID: 20003; CPName: 'x-cp20003'),
    (CPID: 20004; CPName: 'x-cp20004'),
    (CPID: 20005; CPName: 'x-cp20005'),
    (CPID: 20105; CPName: 'x-IA5'),
    (CPID: 20106; CPName: 'x-IA5-German'),
    (CPID: 20107; CPName: 'x-IA5-Swedish'),
    (CPID: 20108; CPName: 'x-IA5-Norwegian'),
    (CPID: 20127; CPName: 'us-ascii'),
    (CPID: 20261; CPName: 'x-cp20261'),
    (CPID: 20269; CPName: 'x-cp20269'),
    (CPID: 20273; CPName: 'IBM273'),
    (CPID: 20277; CPName: 'IBM277'),
    (CPID: 20278; CPName: 'IBM278'),
    (CPID: 20280; CPName: 'IBM280'),
    (CPID: 20284; CPName: 'IBM284'),
    (CPID: 20285; CPName: 'IBM285'),
    (CPID: 20290; CPName: 'IBM290'),
    (CPID: 20297; CPName: 'IBM297'),
    (CPID: 20420; CPName: 'IBM420'),
    (CPID: 20423; CPName: 'IBM423'),
    (CPID: 20424; CPName: 'IBM424'),
    (CPID: 20833; CPName: 'x-EBCDIC-KoreanExtended'),
    (CPID: 20838; CPName: 'IBM-Thai'),
    (CPID: 20866; CPName: 'koi8-r'),
    (CPID: 20871; CPName: 'IBM871'),
    (CPID: 20880; CPName: 'IBM880'),
    (CPID: 20905; CPName: 'IBM905'),
    (CPID: 20924; CPName: 'IBM00924'),
    (CPID: 20932; CPName: 'EUC-JP'),
    (CPID: 20936; CPName: 'x-cp20936'),
    (CPID: 20949; CPName: 'x-cp20949'),
    (CPID: 21025; CPName: 'cp1025'),
    (CPID: 21866; CPName: 'koi8-u'),
    (CPID: 28591; CPName: 'iso-8859-1'),
    (CPID: 28592; CPName: 'iso-8859-2'),
    (CPID: 28593; CPName: 'iso-8859-3'),
    (CPID: 28594; CPName: 'iso-8859-4'),
    (CPID: 28595; CPName: 'iso-8859-5'),
    (CPID: 28596; CPName: 'iso-8859-6'),
    (CPID: 28597; CPName: 'iso-8859-7'),
    (CPID: 28598; CPName: 'iso-8859-8'),
    (CPID: 28599; CPName: 'iso-8859-9'),
    (CPID: 28603; CPName: 'iso-8859-13'),
    (CPID: 28605; CPName: 'iso-8859-15'),
    (CPID: 29001; CPName: 'x-Europa'),
    (CPID: 38598; CPName: 'iso-8859-8-i'),
    (CPID: 50220; CPName: 'iso-2022-jp'),
    (CPID: 50221; CPName: 'csISO2022JP'),
    (CPID: 50222; CPName: 'iso-2022-jp'),
    (CPID: 50225; CPName: 'iso-2022-kr'),
    (CPID: 50227; CPName: 'x-cp50227'),
    (CPID: 51932; CPName: 'euc-jp'),
    (CPID: 51936; CPName: 'EUC-CN'),
    (CPID: 51949; CPName: 'euc-kr'),
    (CPID: 52936; CPName: 'hz-gb-2312'),
    (CPID: 54936; CPName: 'GB18030'),
    (CPID: 57002; CPName: 'x-iscii-de'),
    (CPID: 57003; CPName: 'x-iscii-be'),
    (CPID: 57004; CPName: 'x-iscii-ta'),
    (CPID: 57005; CPName: 'x-iscii-te'),
    (CPID: 57006; CPName: 'x-iscii-as'),
    (CPID: 57007; CPName: 'x-iscii-or'),
    (CPID: 57008; CPName: 'x-iscii-ka'),
    (CPID: 57009; CPName: 'x-iscii-ma'),
    (CPID: 57010; CPName: 'x-iscii-gu'),
    (CPID: 57011; CPName: 'x-iscii-pa'),
    (CPID: 65000; CPName: 'utf-7'),
    (CPID: 65001; CPName: 'utf-8')
  );

implementation

function GetEncoding(CPID: Integer): String;
var
  I: Integer;
begin
  Result := 'iso-8859-2'; //put the default encoding here

  for I := 0 to MaxEncodings - 1 do
    if Encodings[I].CPID = CPID then
    begin
      Result := Encodings[I].CPName;
      break;
    end;
end;

感谢大家的回答,但事实证明这是在这种情况下唯一可用的解决方案...

【讨论】:

  • 在 Indy 10 (indyproject.org) 中,查看其 IdCharsets.pas 单元中的查找表。
  • 如果性能很重要,使用 THashedStringList 代替记录集可能会提高查找速度。
【解决方案2】:

您在寻找the IANA official names 吗?由于您希望它们与 Windows CP 标识符相关联,因此我认为您不能做得比 this table 更好。

【讨论】:

  • 所以基本上,我必须自己制作查找表?
  • @Ben - 在野外使用的许多代码页/名称没有出现在 MS 的列表中。 @Steve - 是的。
【解决方案3】:

您需要使用IMultiLanguage::GetCodePageInfo,作为 MLang.dll 的一部分导出。它包含在 IE4 及更高版本中。您可以使用 Delphi 的 Import Component... 命令自己创建导入库,或从 Colin Wilson 的 low-level utilities 包中获取 MultiLanguage_TLB.pas

function CodePageToCharSet(ACodePage: Cardinal): string;
var
  MimeCPInfo: TMimeCPInfo;
  MultiLanguage: IMultiLanguage;
begin
  if Succeeded(CoCreateInstance(CLASS_CMultiLanguage, nil, CLSCTX_INPROC_SERVER,
     IID_IMultiLanguage, MultiLanguage)) and
     Succeeded(MultiLanguage.GetCodePageInfo(ACodePage, MimeCPInfo)) then
    Result := string(MimeCPInfo.wszWebCharset)
  else
    Result := 'US-ASCII';
end;

【讨论】:

  • 感谢您的代码,它已编译,但在实践中它在某些情况下不起作用(例如,它无法识别 28591 - iso-8859-1,这是不行的)。请在下面查看我的“可重复使用”答案:-)
  • @Steve:可以验证。但是,如果您愿意,请尝试“IMultiLanguage2”界面。它似乎返回了正确的字符串。
  • @Sertac et all - 因为 WebCharset 仅在需要覆盖“family” BodyCharset 时设置 - 只需打印注册表并考虑一下。开发人员应该很好奇,真的:-)
【解决方案4】:

使用搜索,卢克。只是旧的好文本搜索。在注册编辑中。对于您需要的任何 cp 短名称 :-) 请记住,这些 ID 确实是 Internet MIME id

HKEY_CLASSES_ROOT\MIME\DataBase

http://msdn.microsoft.com/en-us/library/ms775147.aspx 此页面暗示可能 - 只是可能 - 自 MSIE4 以来存在该密钥 - 换句话说,自 Windows 98 以来

uses
{$IfDef MSWINDOWS}Registry, Windows, {$EndIf}
AnsiStrings;

/// todo - add implementation for non-Windows platforms
/// if anyone would need it :-)
/// probably via http://sourceforge.net/projects/natspec/
function CharSetByCodePage(const cp:Word): AnsiString;
{$IfDef MSWINDOWS }
var reg: TRegistry;
begin
// HKEY_CLASSES_ROOT\MIME\DataBase\Codepage
  reg := TRegistry.Create(KEY_READ);
  try
    reg.RootKey := HKEY_CLASSES_ROOT;
    if reg.OpenKeyReadOnly('MIME\DataBase\Codepage\' + IntToStr(cp)) then begin
       Result := Trim(AnsiString(reg.ReadString('WebCharset'))); // This key prevails, see #1251 for example
       if Result = '' then
          Result := Trim(AnsiString(reg.ReadString('BodyCharset')));
       if Result > '' then exit;
    end;
  finally
    reg.Free;
  end;
  Raise EZXSaveException.Create('No charset (MIME id) found for codepage '+IntToStr(cp));
end;
{$Else}
begin
  Raise EZXSaveException.Create('Cannot get charset by numeric codepage on this platform.');
  //not implemented, perhaps http://sourceforge.net/projects/natspec/ ?
end;
{$EndIf}

【讨论】:

    【解决方案5】:

    我认为您的意思是 LCIDToLocaleName 函数。

    【讨论】:

    • 我相信区域设置 ID 与代码页 ID 不同。区域设置 ID 包含区域设置信息,而代码页 ID 只是一个唯一 ID,每个代码页一个,iso-8859-2。
    【解决方案6】:
    type
      TCPInfoEx = record
        MaxCharSize: UINT;
        DefaultChar: array[0..MAX_DEFAULTCHAR - 1] of Byte;
        LeadByte: array[0..MAX_LEADBYTES - 1] of Byte;
        UnicodeDefaultChar: WideChar;
        CodePage: UINT;
        CodePageName: array[0..MAX_PATH - 1] of Char;
      end;
    
    function GetCPInfoEx(CodePage: UINT; Flags: DWORD; var lpCPInfoEx: TCPInfoEx): BOOL;
        stdcall; external 'kernel32.dll' name 'GetCPInfoExA';
    
    procedure TForm1.Button1Click(Sender: TObject);
    var
      CPInfoEx: TCPInfoEx;
    begin
      if GetCPInfoEx(28592, 0, CPInfoEx) then
        ShowMessage(CPInfoEx.CodePageName);
    end;
    

    【讨论】:

    • 谢谢,但您测试过这段代码吗?这与我最初使用的代码相同,对我来说,它不仅返回“iso-8859-2”字符串,还返回一些其他乱码信息——我无法始终从中提取实际的代码页名称.. .
    • Steve,我测试了我的系统支持的 CP,“CodePageName”由 CP 标识符和括号中的信息文本组成。例如。对于 28592,它返回“28592(ISO 8859-2 中欧)”。如果解析括号中的文本不适合您,我猜,您必须自己构建一个查找表。
    • 不幸的是,无法解析文本,因为无法从“28592(ISO 8859-2 中欧)”生成“iso-8859-2”;还是谢谢。
    • @Steve - 当然,有。删除所有内容,包括括号,小写 ISO,用连字符替换以下空格,然后删除其余文本。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2010-10-09
    • 2013-02-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2010-12-12
    • 1970-01-01
    相关资源
    最近更新 更多