【问题标题】:Are there standardized translations of Unicode character names?是否有 Unicode 字符名称的标准化翻译?
【发布时间】:2017-12-05 14:44:10
【问题描述】:

Unicode 标准中的每个代码点都有一个唯一的英文名称。我需要将这些名称(代码点的一小部分)翻译成德语、法语、日语等语言......结果不一定能很好地表示 Unicode 标准的意图。我想知道 Unicode 委员会是否已经努力将非英语语言的代码点名称标准化,以便我可以简单地参考他们的翻译?我在 unicode.org 上找不到除了英语以外的任何东西,但我仍然希望我错过了一些东西。提前致谢!

【问题讨论】:

    标签: unicode internationalization translation


    【解决方案1】:

    .NET / PowerShell 示例:[Microsofts.CharMap.UName]::Get('č')

    Windows 操作系统:在 localizedgetuname.dll 中保存了本地化的 Unicode 属性(至少是name)。直接使用以下脚本,或从那里获得灵感:

    <#
    Origin   by: http://poshcode.org/5234
    Improved by: https://stackoverflow.com/users/3439404/josefz
    
    Use this like this: "ábč",([char]'x'),0xBF | Get-CharInfo
    
    Activate dot-sourced like this (apply a real path instead of .\):
    
    . .\_get-CharInfo_1.1.ps1
    
    #>
    
    Set-StrictMode -Version latest
    
    Add-Type -Name UName -Namespace Microsofts.CharMap -MemberDefinition $(
        switch ("$([System.Environment]::SystemDirectory -replace 
                    '\\', '\\')\\getuname.dll") {
        {Test-Path -LiteralPath $_ -PathType Leaf} {@"
    [DllImport("${_}", ExactSpelling=true, SetLastError=true)]
    private static extern int GetUName(ushort wCharCode, 
        [MarshalAs(UnmanagedType.LPWStr)] System.Text.StringBuilder buf);
    
    public static string Get(char ch) {
        var sb = new System.Text.StringBuilder(300);
        UName.GetUName(ch, sb);
        return sb.ToString();
    }
    "@
        }
        default {'public static string Get(char ch) { return "???"; }'}
        })
    
    function Get-CharInfo {
        [CmdletBinding()]
        [OutputType([System.Management.Automation.PSCustomObject],[System.Array])]
        param(
            [Parameter(Position=0, Mandatory=$true, ValueFromPipeline=$true)]
            $InputObject
        )
        begin {
            function out {
                param(
                    [Parameter(Position=0, Mandatory=$true )] $ch,
                    [Parameter(Position=1, Mandatory=$false)]$nil=''
                     )
                if (0 -le $ch -and 0xFFFF -ge $ch) {
                    [pscustomobject]@{
                        Char = [char]$ch
                        CodePoint = 'U+{0:X4}' -f $ch
                        Category = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($ch)
                        Description = [Microsofts.CharMap.UName]::Get($ch)
                    }
                } elseif (0 -le $ch -and 0x10FFFF -ge $ch) {
                    $s = [char]::ConvertFromUtf32($ch)
                    [pscustomobject]@{
                        Char = $s
                        CodePoint = 'U+{0:X}' -f $ch
                        Category = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($s, 0)
                        Description = '???' + $nil
                    }
                } else {
                    Write-Warning ('Character U+{0:X} is out of range' -f $ch)
                }
            }
        }
        process {
            if ($PSBoundParameters['Verbose']) {
                Write-Warning "InputObject type = $($InputObject.GetType().Name)"}
            if ($null -cne ($InputObject -as [char])) {
                #Write-Verbose "A $([char]$InputObject) InputObject character"
                out $([int][char]$InputObject) ''
            } elseif ($InputObject -isnot [string] -and $null -cne ($InputObject -as [int])) {
                #Write-Verbose "B $InputObject InputObject"
                out $([int]$InputObject) ''
            } else {
                $InputObject = [string]$InputObject
                #Write-Verbose "C $InputObject InputObject.Length $($InputObject.Length)"
                for ($i = 0; $i -lt $InputObject.Length; ++$i) {
                    if (  [char]::IsHighSurrogate($InputObject[$i]) -and 
                          (1+$i) -lt $InputObject.Length -and 
                          [char]::IsLowSurrogate($InputObject[$i+1])) {
                        $aux = ' 0x{0:x4},0x{1:x4}' -f [int]$InputObject[$i], 
                                                       [int]$InputObject[$i+1]
                        Write-Verbose "surrogate pair $aux at position $i" 
                        out $([char]::ConvertToUtf32($InputObject[$i], $InputObject[1+$i])) $aux
                        $i++
                    } else {
                        out $([int][char]$InputObject[$i]) ''
                    }
                }
            }
        }
    }
    

    示例

    PS D:\PShell> "ábč",([char]'x'),0xBF | Get-CharInfo
    
    Char CodePoint         Category Description                    
    ---- ---------         -------- -----------                    
       á U+00E1     LowercaseLetter Latin Small Letter A With Acute
       b U+0062     LowercaseLetter Latin Small Letter B           
       č U+010D     LowercaseLetter Latin Small Letter C With Caron
       x U+0078     LowercaseLetter Latin Small Letter X           
       ¿ U+00BF    OtherPunctuation Inverted Question Mark         
    
    PS D:\PShell> Get-Content .\DataFiles\getcharinfoczech.txt
    
    Char CodePoint         Category Description                               
    ---- ---------         -------- -----------                               
       á U+00E1     LowercaseLetter Malé písmeno latinky a s čárkou nad vpravo
       b U+0062     LowercaseLetter Malé písmeno latinky b                    
       č U+010D     LowercaseLetter Malé písmeno latinky c s háčkem           
       x U+0078     LowercaseLetter Malé písmeno latinky x                    
       ¿ U+00BF    OtherPunctuation Znak obráceného otazníku                  
    
    PS D:\PShell>
    

    请注意,后者(半本地化)输出来自以下代码(在本地化用户下在同一台计算机上运行):

    "ábč",([char]'x'),0xBF | Get-CharInfo | Out-File .\DataFiles\getcharinfoczech.txt
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2015-10-31
      • 2015-03-27
      • 2015-01-13
      • 2014-08-10
      • 2011-02-11
      • 1970-01-01
      • 2012-10-28
      • 2011-07-30
      相关资源
      最近更新 更多