【问题标题】:Transform a String to URL standard String in Java在 Java 中将字符串转换为 URL 标准字符串
【发布时间】:2010-09-17 08:23:40
【问题描述】:

我有一个字符串,例如:

Cerepedia, una apliación web

我想将其转换为有效的 URL,例如:

Cerepedia,unaaplicacionweb

注意:特殊字符转换和空格去除。

顺便问一下,网址中可以使用逗号吗?

【问题讨论】:

    标签: java


    【解决方案1】:

    你看过URLEncoder吗?这似乎做了你需要它做的事情。虽然特殊字符将被转换为转义实体,而不是从它们的“特殊”属性中剥离。

    【讨论】:

      【解决方案2】:

      在下面的类中尝试 convertNonAscii()

      public class AsciiUtils {
      
          /**
           * Contains a list of all the characters that map one to one for UNICODE.
           */
          private static final String PLAIN_ASCII = 
                    "AaEeIiOoUu"    // grave
                  + "AaEeIiOoUuYy"  // acute
                  + "AaEeIiOoUuYy"  // circumflex
                  + "AaEeIiOoUuYy"  // tilde
                  + "AaEeIiOoUuYy"  // umlaut
                  + "Aa"            // ring
                  + "Cc"            // cedilla
                  + "Nn"            // n tilde (spanish)
                  ;
      
          /**
           * Actual accented values, corresponds one to one with ASCII
           */
          private static final String UNICODE =
               "\u00C0\u00E0\u00C8\u00E8\u00CC\u00EC\u00D2\u00F2\u00D9\u00F9"             
              +"\u00C1\u00E1\u00C9\u00E9\u00CD\u00ED\u00D3\u00F3\u00DA\u00FA\u00DD\u00FD" 
              +"\u00C2\u00E2\u00CA\u00EA\u00CE\u00EE\u00D4\u00F4\u00DB\u00FB\u0176\u0177" 
              +"\u00C2\u00E2\u00CA\u00EA\u00CE\u00EE\u00D4\u00F4\u00DB\u00FB\u0176\u0177" 
              +"\u00C4\u00E4\u00CB\u00EB\u00CF\u00EF\u00D6\u00F6\u00DC\u00FC\u0178\u00FF" 
              +"\u00C5\u00E5"                                                             
              +"\u00C7\u00E7"  
              +"\u00D1\u00F1"
           ;
      
          // private constructor, can't be instanciated!
          private AsciiUtils() {      
          }
      
      
          /**
           * Removes accentued from a string and replace with ascii equivalent
           * @param s The string to englishify
           * @return The string without the french and spanish stuff.
           */
          public static String convertNonAscii(String s) {
      
              StringBuilder b = new StringBuilder();
      
              int n = s.length();
              for (int i = 0; i < n; i++) {
                  char c = s.charAt(i);
                  int pos = UNICODE.indexOf(c);
                  if (pos > -1) {
                    b.append(PLAIN_ASCII.charAt(pos));
                  } else {
                    b.append(c);
                  }
              }
      
             return b.toString();
      
          }
      
      }
      

      【讨论】:

      • 如果字符串在源代码中,它可以工作,但是,如果字符串是从 UTF-8 编码文件中检索的,则它不工作。
      【解决方案3】:

      URLEncoder 用 + 代替空格。 Don 发布的 Asccii 类不会删除空格,但下一个函数可用于该提案:

      public static String removeSpaces(String s) {
          StringTokenizer st = new StringTokenizer(s," ",false);
          String t="";
          while (st.hasMoreElements()) t += st.nextElement();
              return t;
      }
      

      【讨论】:

        【解决方案4】:

        注意 Don 解决方案适用于代码中的字符串,但不适用于来自 UTF-8 编码文件的字符串

        这是我最好的解决方案,使用 URLEncode 并在之后转义十六进制字符:

        String s = "Cerepedia, una apliación web";
        String ENCODING= "uft-8";
        String encoded_s = URLEncoder.encode(s,ENCODING); // Cerepedia+una+aplicaci%C3%83%C2%B3n+web
        String s_hexa_free = EncodingTableUtils.replaceHexa(,ENCODING)); //  Cerepedia+una+aplicacion+web
        

        EncodingTableUtils

        import java.util.HashMap;
        import java.util.Iterator;
        import java.util.Set;
        
        public class EncodingTableUtils {
            public final static HashMap iso88591 = new HashMap();
            static {
                iso88591.put("%C3%A1", "a"); // á
                iso88591.put("%C3%81", "A"); // Á
                iso88591.put("%C3%A9", "e"); // é
                iso88591.put("%C3%89", "E"); // É
                iso88591.put("%C3%AD", "i"); // í
                iso88591.put("%C3%8D", "I"); // Í
                iso88591.put("%C3%93", "O"); // Ó
                iso88591.put("%C3%B3", "o"); // ó
                iso88591.put("%C3%BA", "u"); // ú
                iso88591.put("%C3%9A", "U"); // Ú
                iso88591.put("%C3%91", "N"); // Ñ
                iso88591.put("%C3%B1", "n"); // ñ
            }
            public final static HashMap utf8 = new HashMap();
            static {
                utf8.put("%C3%83%C2%A1", "a"); // á
                utf8.put("%C3%83%EF%BF", "A"); // Á
                utf8.put("%BD%C3%83%C2", "e"); // é
                utf8.put("%A9%C3%83%E2", "E"); // É
                utf8.put("%80%B0%C3%83", "i"); // í
                utf8.put("%C2%AD%C3%83", "I"); // Í
                utf8.put("%EF%BF%BD%C3", "O"); // Ó
                utf8.put("%C3%83%C2%B3", "o"); // ó
                utf8.put("%83%E2%80%9C", "u"); // ú     
                utf8.put("%C3%83%C2%BA", "U"); // Ú
                utf8.put("%C3%83%C5%A1", "N"); // Ñ
                utf8.put("%C3%83%E2%80", "n"); // ñ
            }
        
            public final static HashMap enc_table = new HashMap();
            static {
                enc_table.put("iso-8859-1", iso88591);
                enc_table.put("utf-8", utf8);
            }
        
        
            /**
             * Replace Hexadecimal characters with equivalent english not special ones
             * <p>Example: á Hexa: %C3%A1 gets replaced with a</p>
             * @param s Usually a string coming from URLEncode.encode
             * @param enc Encoding UTF-8 or ISO-8850-1
             */
            public static String convertHexaDecimal(String s, String enc) {
                HashMap characters = (HashMap) enc_table.get(enc.toLowerCase());
                if(characters==null) return "";
                Set keys = characters.keySet();
                Iterator it = keys.iterator();
                while(it.hasNext()) {
                    String key = (String) it.next();
                    String regex = EscapeChars.forRegex(key);
                    String replacement = (String) characters.get(key); 
                    s = s.replaceAll(regex, replacement);           
                }
                return s;
            }
        }
        

        EscapeChars 类

        public final class EscapeChars {
        /**
          * Replace characters having special meaning in regular expressions
          * with their escaped equivalents, preceded by a '\' character.
          *
          * <P>The escaped characters include :
          *<ul>
          *<li>.
          *<li>\
          *<li>?, * , and +
          *<li>&
          *<li>:
          *<li>{ and }
          *<li>[ and ]
          *<li>( and )
          *<li>^ and $
          *</ul>
          */
          public static String forRegex(String aRegexFragment){
            final StringBuilder result = new StringBuilder();
        
            final StringCharacterIterator iterator = new StringCharacterIterator(aRegexFragment);
            char character =  iterator.current();
            while (character != CharacterIterator.DONE ){
              /*
              * All literals need to have backslashes doubled.
              */
              if (character == '.') {
                result.append("\\.");
              }
              else if (character == '\\') {
                result.append("\\\\");
              }
              else if (character == '?') {
                result.append("\\?");
              }
              else if (character == '*') {
                result.append("\\*");
              }
              else if (character == '+') {
                result.append("\\+");
              }
              else if (character == '&') {
                result.append("\\&");
              }
              else if (character == ':') {
                result.append("\\:");
              }
              else if (character == '{') {
                result.append("\\{");
              }
              else if (character == '}') {
                result.append("\\}");
              }
              else if (character == '[') {
                result.append("\\[");
              }
              else if (character == ']') {
                result.append("\\]");
              }
              else if (character == '(') {
                result.append("\\(");
              }
              else if (character == ')') {
                result.append("\\)");
              }
              else if (character == '^') {
                result.append("\\^");
              }
              else if (character == '$') {
                result.append("\\$");
              }
              else {
                //the char is not a special one
                //add it to the result as is
                result.append(character);
              }
              character = iterator.next();
            }
            return result.toString();
          }
        }
        

        【讨论】:

          【解决方案5】:

          试试这个代码

           public class Test {
          
              public static void main(final String[] args) {
                  String str = "Cerepedia, una apliación web";
                  String[] parts = str.split(" ");
                  int sum=0;
                  for (int i=0;i<=parts.length-1;i++) {
                      sum = sum+parts[i].length();
                  }
          
                  int k=0;
                  char[] url = new char[25];
                  for (int i=0;i<=parts.length-1;i++) {
                       char[] temp = parts[i].toCharArray();
          
          
                       for(int j=0;j<temp.length;j++){
          
                           url[k]=temp[j];
                           k++;
                       }
          
                  }
                  System.out.println(url);
          
              }
          }
          

          【讨论】:

          • 这打印出Cerepedia,unaapliaciónweb,这不是所要求的。
          猜你喜欢
          • 2012-05-19
          • 2012-01-26
          • 2011-03-31
          • 2023-03-17
          • 2016-04-21
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多