【问题标题】:PHP Convert exotic characters to a-z A-Z 0-9 [duplicate]PHP将异国字符转换为a-z A-Z 0-9 [重复]
【发布时间】:2011-06-27 08:50:03
【问题描述】:

可能重复:
PHP: Replace umlauts with closest 7-bit ASCII equivalent in an UTF-8 string

我将处理可能会给我返回数据的外部来源,例如:

Tōkyō à á â ã

有没有办法将花哨的字符转换为标准的 a-z A-Z

Tokyo a a a a

如果有其他字符与任何字母不匹配,它们可以被忽略。

具有所有fromto 值的大型正则表达式是唯一的方法吗?或者,有没有更简单的方法?

【问题讨论】:

  • @Pekka 谢谢,找不到那个问题。已投票。

标签: php regex special-characters


【解决方案1】:

这样的内容(取自 Symphony CMS 项目)应该可以帮助您入门:

$transliterations = array(

    // Alphabetical

    '/À/' => 'A',       '/Á/' => 'A',       '/Â/' => 'A',       '/Ã/' => 'A',       '/Ä/' => 'Ae',
    '/Å/' => 'A',       '/Ā/' => 'A',       '/Ą/' => 'A',       '/Ă/' => 'A',       '/Æ/' => 'Ae',
    '/Ç/' => 'C',       '/Ć/' => 'C',       '/Č/' => 'C',       '/Ĉ/' => 'C',       '/Ċ/' => 'C',
    '/Ď/' => 'D',       '/Đ/' => 'D',       '/Ð/' => 'D',       '/È/' => 'E',       '/É/' => 'E',
    '/Ê/' => 'E',       '/Ë/' => 'E',       '/Ē/' => 'E',       '/Ę/' => 'E',       '/Ě/' => 'E',
    '/Ĕ/' => 'E',       '/Ė/' => 'E',       '/Ĝ/' => 'G',       '/Ğ/' => 'G',       '/Ġ/' => 'G',
    '/Ģ/' => 'G',       '/Ĥ/' => 'H',       '/Ħ/' => 'H',       '/Ì/' => 'I',       '/Í/' => 'I',
    '/Î/' => 'I',       '/Ï/' => 'I',       '/Ī/' => 'I',       '/Ĩ/' => 'I',       '/Ĭ/' => 'I',
    '/Į/' => 'I',       '/İ/' => 'I',       '/IJ/' => 'Ij',      '/Ĵ/' => 'J',       '/Ķ/' => 'K',
    '/Ł/' => 'L',       '/Ľ/' => 'L',       '/Ĺ/' => 'L',       '/Ļ/' => 'L',       '/Ŀ/' => 'L',
    '/Ñ/' => 'N',       '/Ń/' => 'N',       '/Ň/' => 'N',       '/Ņ/' => 'N',       '/Ŋ/' => 'N',
    '/Ò/' => 'O',       '/Ó/' => 'O',       '/Ô/' => 'O',       '/Õ/' => 'O',       '/Ö/' => 'Oe',
    '/Ø/' => 'O',       '/Ō/' => 'O',       '/Ő/' => 'O',       '/Ŏ/' => 'O',       '/Œ/' => 'Oe',
    '/Ŕ/' => 'R',       '/Ř/' => 'R',       '/Ŗ/' => 'R',       '/Ś/' => 'S',       '/Š/' => 'S',
    '/Ş/' => 'S',       '/Ŝ/' => 'S',       '/Ș/' => 'S',       '/Ť/' => 'T',       '/Ţ/' => 'T',
    '/Ŧ/' => 'T',       '/Ț/' => 'T',       '/Ù/' => 'U',       '/Ú/' => 'U',       '/Û/' => 'U',
    '/Ü/' => 'Ue',      '/Ū/' => 'U',       '/Ů/' => 'U',       '/Ű/' => 'U',       '/Ŭ/' => 'U',
    '/Ũ/' => 'U',       '/Ų/' => 'U',       '/Ŵ/' => 'W',       '/Ý/' => 'Y',       '/Ŷ/' => 'Y',
    '/Ÿ/' => 'Y',       '/Y/' => 'Y',       '/Ź/' => 'Z',       '/Ž/' => 'Z',       '/Ż/' => 'Z',
    '/Þ/' => 'T',
    '/à/' => 'a',       '/á/' => 'a',       '/â/' => 'a',       '/ã/' => 'a',       '/ä/' => 'ae',
    '/å/' => 'a',       '/ā/' => 'a',       '/ą/' => 'a',       '/ă/' => 'a',       '/æ/' => 'ae',
    '/ç/' => 'c',       '/ć/' => 'c',       '/č/' => 'c',       '/ĉ/' => 'c',       '/ċ/' => 'c',
    '/ď/' => 'd',       '/đ/' => 'd',       '/ð/' => 'd',       '/è/' => 'e',       '/é/' => 'e',
    '/ê/' => 'e',       '/ë/' => 'e',       '/ē/' => 'e',       '/ę/' => 'e',       '/ě/' => 'e',
    '/ĕ/' => 'e',       '/ė/' => 'e',       '/ĝ/' => 'g',       '/ğ/' => 'g',       '/ġ/' => 'g',
    '/ģ/' => 'g',       '/ĥ/' => 'h',       '/ħ/' => 'h',       '/ì/' => 'i',       '/í/' => 'i',
    '/î/' => 'i',       '/ï/' => 'i',       '/ī/' => 'i',       '/ĩ/' => 'i',       '/ĭ/' => 'i',
    '/į/' => 'i',       '/ı/' => 'i',       '/ij/' => 'ij',      '/ĵ/' => 'j',       '/ķ/' => 'k',
    '/ł/' => 'l',       '/ľ/' => 'l',       '/ĺ/' => 'l',       '/ļ/' => 'l',       '/ŀ/' => 'l',
    '/ñ/' => 'n',       '/ń/' => 'n',       '/ň/' => 'n',       '/ņ/' => 'n',       '/ŋ/' => 'n',
    '/ò/' => 'o',       '/ó/' => 'o',       '/ô/' => 'o',       '/õ/' => 'o',       '/ö/' => 'oe',
    '/ø/' => 'o',       '/ō/' => 'o',       '/ő/' => 'o',       '/ŏ/' => 'o',       '/œ/' => 'oe',
    '/ŕ/' => 'r',       '/ř/' => 'r',       '/ŗ/' => 'r',       '/ś/' => 's',       '/š/' => 's',
    '/ş/' => 's',       '/ŝ/' => 's',       '/ș/' => 's',       '/ť/' => 't',       '/ţ/' => 't',
    '/ŧ/' => 't',       '/ț/' => 't',       '/ù/' => 'u',       '/ú/' => 'u',       '/û/' => 'u',
    '/ü/' => 'ue',      '/ū/' => 'u',       '/ů/' => 'u',       '/ű/' => 'u',       '/ŭ/' => 'u',
    '/ũ/' => 'u',       '/ų/' => 'u',       '/ŵ/' => 'w',       '/ý/' => 'y',       '/ŷ/' => 'y',
    '/ÿ/' => 'y',       '/y/' => 'y',       '/ź/' => 'z',       '/ž/' => 'z',       '/ż/' => 'z',
    '/þ/' => 't',       '/ß/' => 'ss',      '/ſ/' => 'ss',      '/ƒ/' => 'f',       '/ĸ/' => 'k',
    '/ʼn/' => 'n',

    // Symbolic

    '/\(/' => null,     '/\)/' => null,     '/,/' => null,
    '/–/' => '-',       '/-/' => '-',       '/„/' => '"',
    '/“/' => '"',       '/”/' => '"',       '/—/' => '-',
    '/¿/' => null,      '/‽/' => null,      '/¡/' => null,

    // Ampersands

    '/©/' => 'c',
    '/^&(?!&)$/' => 'and',
    '/^&(?!&)/' => 'and-',
    '/&(?!&)&/' => '-and',
    '/&(?!&)/' => '-and-',

);

您也可以使用iconv,但这并非完美无缺,例如Ü 将返回为"U,而它应该返回为Ue

【讨论】:

猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2019-11-22
  • 1970-01-01
  • 2010-12-16
  • 1970-01-01
  • 2015-02-23
  • 2012-10-12
  • 1970-01-01
相关资源
最近更新 更多