【发布时间】:2017-01-05 12:04:30
【问题描述】:
我有一个问题.. 我有一个代码正在下载一些 XML 文件并删除一些我不需要的标签。从此一切都找到了。我的 XML 文件是 UTF-8 格式,我没有问题。
但由于我添加了一个代码来替换和更改标题值,我的 XML 文件不再是 UTF-8 格式,并且我收到以下错误消息:
"D:\Anwendung\PHP 7\php-win.exe" C:\Users\Jan\PhpstormProjects\censored\test.php
PHP Warning: DOMDocument::load(): Input is not proper UTF-8, indicate encoding !
Bytes: 0xE3 0xA4 0x63 0x68 in file:/C:/Users/Jan/PhpstormProjects/censored/data/gamesplanet.xml, line: 1423 in C:\Users\Jan\PhpstormProjects\censored\test.php on line 18
PHP Fatal error: Uncaught Error: Call to a member function getElementsByTagName() on null in C:\Users\Jan\PhpstormProjects\censored\test.php:23
Stack trace:
#0 C:\Users\Jan\PhpstormProjects\censored\test.php(86): countAd('data/gamesplane...')
#1 {main}
thrown in C:\Users\Jan\PhpstormProjects\censored\test.php on line 23
Process finished with exit code 255
在第 1423 行站着:W㥣hter Von Mittelerde
如果我不查看下面的代码,我不会收到任何错误消息,并且在第 1423 行:Wächter von Mittelerde
有人有想法可以帮助我吗?
代码:
function loadTitles($tagName, $path){
$dom = new DOMDocument('1.0', 'utf-8');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->load($path);
$marker = $dom->getElementsByTagName($tagName);
for ($i = $marker->length - 1; $i >= 0; $i--) {
$word = $marker->item($i)->textContent;
$escapedWord = escapWord($word);
$escapedWord = modifyWord($escapedWord);
$marker->item($i)->textContent = $escapedWord;
}
$dom->saveXML();
$dom->save($path);
}
function escapWord($string){
$replaceNothing = [":", ",", ";", "`", "#", "'", "´", "–", "!", "(", ")", ".", "@", "’", "+", "™"];
$replaceSpace = ["-", "–", "_", "/", ":"];
$delete = ["Steam", "Eu", "Key", "CD", "Gift", "Edition", "Pack", "Uplay", "Required", "Collection", "Origin", "HD", "Complete", "Digital", "Download", "EA", "Europa", "RPG", "Activated", "Access", "Code", "Limited", "Direct", "Bundle", "Special", "CDKEY", "GLOBAL", "EARLY", "ACCESS", "Card", "Cartel", "Player", "Trade", "DE", "GOG", "Multilanguage", "Multi", "Full", "Only", "UNCUT", "Cut", "Box", "Ps Vita", "VIP", "Rockstar", "Subscription"];
$string= str_replace($replaceNothing, '', $string);
$string= str_replace($replaceSpace, ' ', $string);
$string= preg_replace('~\b(?:' . implode('|', $delete) . ')\b~i', '', $string);
$string= str_replace("&", ' & ', $string);
$string= strtolower($string);
$string= ucwords($string);
$string= preg_replace('/\bAsia\b/i', 'ASIA', $string);
$string= preg_replace('/\buk\b/i', 'UK', $string);
$string= preg_replace('/\bAU\b/i', 'AU', $string);
$string= preg_replace('/\bXBOX\b/i', 'XBOX ', $string);
$string= preg_replace('/\bpc\b/i', 'PC', $string);
$string= preg_replace('/\bus\b/i', 'US', $string);
$string= preg_replace('/\bru\b/i', 'RUS', $string);
$string= preg_replace('/\bRUS\b/i', 'RUS', $string);
$string= preg_replace('/\bPS4\b/i', 'PS4', $string);
$string= preg_replace('/\bAddon\b/i', 'AddOn', $string);
$string= preg_replace('/\bPlay Station 4\b/i', 'PS4', $string);
$string= preg_replace('/\bPs4\b/i', 'PS4', $string);
$string= preg_replace('/\bPs3\b/i', 'PS3', $string);
$string= preg_replace('/\bPlayStation 4\b/i', 'PS4', $string);
$string= preg_replace('/\bPlay Station 3\b/i', 'PS3', $string);
$string= preg_replace('/\bPlayStation 3\b/i', 'PS3', $string);
$string= preg_replace('/\bPlayStation Network\b/i', 'PSN', $string);
$string= preg_replace('/\bPSN\b/i', 'PSN', $string);
$string= preg_replace('/\bXX\b/i', 'XX', $string);
$string= preg_replace('/\bXIX\b/i', 'XIX', $string);
$string= preg_replace('/\bXVIII\b/i', 'XVIII', $string);
$string= preg_replace('/\bXVII\b/i', 'XVII', $string);
$string= preg_replace('/\bXVI\b/i', 'XVI', $string);
$string= preg_replace('/\bXV\b/i', 'XV', $string);
$string= preg_replace('/\bXIV\b/i', 'XIV', $string);
$string= preg_replace('/\bXiii\b/i', 'XIII', $string);
$string= preg_replace('/\bXii\b/i', 'XII', $string);
$string= preg_replace('/\bXi\b/i', 'XI', $string);
$string= preg_replace('/\bIX\b/i', 'IX', $string);
$string= preg_replace('/\bVIII\b/i', 'VIII', $string);
$string= preg_replace('/\bVII\b/i', 'VII', $string);
$string= preg_replace('/\bVI\b/i', 'VI', $string);
$string= preg_replace('/\bV\b/i', 'V', $string);
$string= preg_replace('/\bIV\b/i', 'IV', $string);
$string= preg_replace('/\bIII\b/i', 'III', $string);
$string= preg_replace('/\bII\b/i', 'II', $string);
$string= preg_replace('/\bdlc\b/i', 'DLC', $string);
$string= trim(preg_replace('/\s\s+/', ' ', str_replace("\n", " ", $string)));
return $string;
}
function modifyWord($string){
if(strpos($string, "Counter Strike Offensive") !== false){
$newstring = explode("Offensive", $string);;
$newstring[0] = $newstring[0] . "Global Offensive";
$string = $newstring[0] . $newstring[1];
}
return $string;
}
您好,谢谢!
【问题讨论】:
-
问题是你使用了不支持多字节字符的函数(
str_replace、ucwords、strtolower、preg_replace,没有 u 修饰符)和多字节字符串(UTF8) .请改用mb_函数并将u 修饰符与preg_replace一起使用。 -
注意
preg_replace可以将数组作为第一个和第二个参数。 -
你能给我一个代码 sn-p 我该怎么做? - 因为我不知道 mb_functions 是什么意思,“u 修饰符”是什么意思?
-
1) 将
strtolower替换为mb_strtolower,将ucwords替换为mb_ucwords等 2) 在正则表达式末尾添加u("/something/iu").