中英文字符识别问题

小程序1. 统计中文汉字的多少

public static void main(String[] args) {
	int count = 0;
	String regEx = "[\\u4e00-\\u9fa5]";
	String str = "中文fdas ";
	
	Pattern p = Pattern.compile(regEx);
	Matcher m = p.matcher(str);
	while (m.find()) {
	   for (int i = 0; i <= m.groupCount(); i++) {
			count = count + 1;
	   }
	}
	System.out.println("共有 " + count + "个 ");
 }

　这里不会将逗号、顿号感叹号等符号认为是汉字字符。若包含繁体，则使用 p.Pattern="[\u4E00-\u9FA5\uFE30-\uFFA0]"。

小程序2. 是否含有汉字(或者各种稀奇古怪的中文标点符号)，原理是中文字符和英文字符所对应的字节长度是不同的，使用这一点来判断是否含有汉字。

	public static void main(String[] args) {
		// TODO Auto-generated method stub
		String[] word = {"a","aa","中","中国"};
		

		for (int i = 0; i < word.length; i++){
			// 全是英文字符的话，这两者的值是相等的
			if (word[i].length() == word[i].getBytes().length){
				System.out.println(word[i] + ": 字符长度 " + word[i].length() + " 字节长度 " + word[i].getBytes().length);
			}else{
				// 铁定包含中文字符(含各种稀奇古怪的中文标点符号)
				System.out.println(word[i] + ": 字符长度  " + word[i].length() + " 字节长度 " + word[i].getBytes().length);
			}
		}
	}

上述程序的输出结果为：　　

a: 字符长度 1 字节长度 1
aa: 字符长度 2 字节长度 2
中: 字符长度  1 字节长度 2
中国: 字符长度  2 字节长度 4

字节长度跟平台有关，在同一台机器上，Windows下单个汉字是2个字节，但是ubuntu下却是3个字节。