【问题标题】:What could cause String.charAt(0) to print nothing, and be of character type "16"?什么可能导致 String.charAt(0) 什么都不打印,并且是字符类型“16”?
【发布时间】:2012-02-12 19:28:50
【问题描述】:

有人知道这里会发生什么吗?

第一个块显示了我通常希望看到的内容 - 字符串的第一个字符在索引“0”中,“问题”字符串被注释掉,替换为完全相同的内容,但之前从未运行过。

public void finderTest(){
    String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
    //String wordOne = "‭abc"; // old, pre-used string, used to hold a comma.
    String wordOne = "abc";// new, never run before with a comma
    String wordTwo = "and";
    System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
    System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
    System.out.println();
    System.out.println("All of wordOne: "+"'"+wordOne+"'");
    System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
    System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
    System.out.println();
    System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
    System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}

提供输出:

/*
    Type of character at index '0' in theDoc: 1
Character at index '0' in theDoc: H

All of wordOne: 'abc'
Type of character at index '0' in wordOne: 2 // okay
Character at index '0' in wordOne: a // okay

Type of Character at index '0' in wordTwo: 2
Character at index '0' in wordTwo: a
*/

第二个块的'new'字符串被注释掉了,'wordOne'的第一个字符什么都没有。它不是空字符或换行符。我一直在使用该变量在“theDoc”中查找逗号……但是当我运行它时,索引“0”没有任何内容,而索引 1 中包含逗号。如果我复制并粘贴字符串,问题仍然存在。但是,将其注释掉/删除它可以解决问题。

    public void finderTest(){
    String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
    String wordOne = "‭abc"; // now running old string, used to hold comma
    //String wordOne = "abc"; 
    String wordTwo = "and";
    System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
    System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
    System.out.println();
    System.out.println("All of wordOne: "+"'"+wordOne+"'");
    System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
    System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
    System.out.println();
    System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
    System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}

提供输出:

/*  
    Type of character at index '0' in theDoc: 1
    Character at index '0' in theDoc: H

    All of wordOne: '‭abc'
    Type of character at index '0' in wordOne: 16 // What does this mean?
    Character at index '0' in wordOne: ‭   // where is the a? (well, its in wordOne index '1'... but why??)

    Type of Character at index '0' in wordTwo: 2
    Character at index '0' in wordTwo: a
*/

Java 中的逗号或符号是否会导致此类问题?我尝试使用字符数组,清理工作区以重新构建所有内容,但没有任何改变……当某些克像“,和”之类的东西时,这对于在句子中查找“ngrams”的索引是一个巨大的问题。昨晚的某个时候,它正在工作,然后突然开始不工作。我很困惑。

有什么想法吗?

谢谢,

安德鲁

【问题讨论】:

    标签: java string character comma gettype


    【解决方案1】:

    我尝试将您的示例粘贴到 Eclipse 中,它告诉我:

    某些字符无法使用“Cp1252”字符编码进行映射。

    并指出字符串中的第一个字符:

    String wordOne = "abc";
    

    "a 之间似乎有一个隐藏(不可打印)字符。

    【讨论】:

      【解决方案2】:

      字符类型 16 对应于 Unicode DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING (U+202B)。这是一个不可打印的字符;你可以打印它的十六进制值来确认。

      【讨论】:

      • 啊,你的(几乎)完全正确。结果是'202d'。但是,这可以解决问题。谢谢,非常感谢。
      • @user1205526 - 啊,没错。 Character.getType() 实际上返回的是通用类别,而不是 BiDi 字符类型。 (我讨厌方法名。)在这种情况下,一般类别 16 是FORMAT,其中包含很多字符,包括 U+202D(和 U+202B)。
      【解决方案3】:

      您的字符串包含一个您无法看到的字符(在“a”之前)。 Unicode 集中有几十个字符没有有意义的视觉表示 - 这可能就是其中之一。

      '16'是字符类型,例如:

      COMBINING_SPACING_MARK, CONNECTOR_PUNCTUATION, CONTROL, CURRENCY_SYMBOL, DASH_PUNCTUATION, DECIMAL_DIGIT_NUMBER, ENCLOSING_MARK, END_PUNCTUATION, FINAL_QUOTE_PUNCTUATION, FORMAT, INITIAL_QUOTE_PUNCTUATION, LETTER_NUMBER, LINE_SEPARATOR, LOWERCASE_LETTER, MATH_SYMBOL, MODIFIER_LETTER, MODIFIER_SYMBOL, NON_SPACING_MARK, OTHER_LETTER, OTHER_NUMBER, OTHER_PUNCTUATION, OTHER_SYMBOL, PARAGRAPH_SEPARATOR, PRIVATE_USE , SPACE_SEPARATOR, START_PUNCTUATION, SURROGATE, TITLECASE_LETTER, UNASSIGNED, UPPERCASE_LETTER

      所有这些都在Character 类中定义。我不能告诉你它是哪一个,因为这在理论上是依赖于实现的;您应该检查这些值。或者,更好的是,使用Character.getName 查找字符的人类可读描述。

      【讨论】:

        猜你喜欢
        • 2020-08-14
        • 2012-10-04
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2011-08-26
        • 1970-01-01
        相关资源
        最近更新 更多