【问题标题】:Translating strings character by character逐个字符翻译字符串
【发布时间】:2020-01-20 11:07:02
【问题描述】:

我应该如何实现一种方法,该方法将由拉丁字符组成的字符串转换为由一组不同字符组成的字符串,比如说西里尔文。

以 PHP 为例:

function latin_to_cyrillic($string)
{
 $array = array(
  "а" => "a",
  "б" => "b",
  "в" => "v",
  "г" => "g",
  "д" => "d",
  "е" => "e",
  "ж" => "zh",
  "з" => "z",
  "и" => "i",
  "й" => "y",
  "к" => "k",
  "л" => "l",
  "м" => "m",
  "н" => "n",
  "о" => "o",
  "п" => "p",
  "р" => "r",
  "с" => "s",
  "т" => "t",
  "у" => "u",
  "ф" => "f",
  "х" => "h",
  "ц" => "ts",
  "ч" => "ch",
  "ш" => "sh",
  "щ" => "sht",
  "ь" => "y",
  "ъ" => "a",
  "ю" => "yu",
  "я" => "ya",
  "А" => "A",
  "Б" => "B",
  "В" => "V",
  "Г" => "G",
  "Д" => "D",
  "Е" => "E",
  "Ж" => "Zh",
  "З" => "Z",
  "И" => "I",
  "Й" => "Y",
  "К" => "K",
  "Л" => "L",
  "М" => "M",
  "Н" => "N",
  "О" => "O",
  "П" => "P",
  "Р" => "R",
  "С" => "S",
  "Т" => "T",
  "У" => "U",
  "Ф" => "F",
  "Х" => "H",
  "Ц" => "Ts",
  "Ч" => "Ch",
  "Ш" => "Sh",
  "Щ" => "Sht",
  "Ь" => "Y",
  "Ъ" => "A",
  "Ю" => "Yu",
  "Я" => "Ya",
  "–" => "-");

 return str_replace(array_values($array), array_keys($array), $string);

}

【问题讨论】:

  • 您的问题没有明确说明。术语“拉丁字符”和“西里尔字符”的定义不明确:存在许多不同的“拉丁字符”和“西里尔字符”字符集。如果您正在考虑特定的内容,例如两个特定的 Windows 字符集,请在您的问题中说明。在 Java 中,字符串使用 Unicode,而不是任何此类字符集,因此所提出的问题没有意义,您可能需要从/到此类字符集和 Unicode 的音译。
  • @reinierpost OP 使用 PHP 示例来指定他对“翻译”感兴趣的拉丁文和西里尔文字符

标签: java string


【解决方案1】:

首先你需要一个转换表,定义每个字符的翻译。

然后你逐个字符地读取字符串,并使用翻译表得到翻译。简单吧?

你可以使用这样的东西:

class Translator {
 HashMap<String,String> translation = new HashMap<String,String>();

 public Translator(){
  //Populate the translation table here;
 }

 public String translate(String origin){
  String destiny="";
  for(int i=0;i<origin.length();i++){
   char character = origin.charAt(i);
   destiny = destiny + translation.get(Character.toString(character));
  }
 return destiny;
 }
}

你也可以使用

replaceEach(String text, String[] searchList, String[] replacementList) 
           Replaces all occurrences of Strings within another String.

来自org.apache.commons.lang.StringUtils。 您可以使用拉丁字符(但为String)填充String[],然后使用西里尔字符作为String 填充另一个String[],并使用该函数。

String[] latinCharacters = [] //Populate them
String[] cyrillicCharacters = [] //Populate them

public String translate(String origin){
return replaceEach(origin,latinCharacters,cyrillicCharacters);
}

【讨论】:

  • 没有名为 HashTable 的 Java 类,只有 HashMap 和一个过时的 Hashtable。这些都不能用原始字符参数化。请更正一下。
  • 没有通过气味测试。几乎在所有情况下都应避免使用Hashtable,而应使用HashMap。您不能将原始类型用于泛型。 destiny 几乎肯定会遇到IndexOutOfBoundsException。该问题可能涉及将一个字符映射到多个字符。
  • 根据@Michael cmets 修复。我是用心做的,所以他可以知道怎么做。
  • 另外,避免在循环中使用字符串连接 (str = str + something)。相反,请使用StringBuilder
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2010-10-23
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2011-10-20
相关资源
最近更新 更多