【问题标题】:Large string split into lines with maximum length in java在java中将大字符串拆分为最大长度的行
【发布时间】:2011-09-23 11:08:48
【问题描述】:
String input = "THESE TERMS AND CONDITIONS OF SERVICE (the Terms) ARE A LEGAL AND BINDING AGREEMENT BETWEEN YOU AND NATIONAL GEOGRAPHIC governing your use of this site, www.nationalgeographic.com, which includes but is not limited to products, software and services offered by way of the website such as the Video Player, Uploader, and other applications that link to these Terms (the Site). Please review the Terms fully before you continue to use the Site. By using the Site, you agree to be bound by the Terms. You shall also be subject to any additional terms posted with respect to individual sections of the Site. Please review our Privacy Policy, which also governs your use of the Site, to understand our practices. If you do not agree, please discontinue using the Site. National Geographic reserves the right to change the Terms at any time without prior notice. Your continued access or use of the Site after such changes indicates your acceptance of the Terms as modified. It is your responsibility to review the Terms regularly. The Terms were last updated on 18 July 2011.";

//text copied from http://www.nationalgeographic.com/community/terms/

我想把这个大字符串分成几行,每行的内容不应超过 MAX_LINE_LENGTH 个字符。

到目前为止我尝试了什么

int MAX_LINE_LENGTH = 20;    
System.out.print(Arrays.toString(input.split("(?<=\\G.{MAX_LINE_LENGTH})")));
//maximum length of line 20 characters

输出:

[THESE TERMS AND COND, ITIONS OF SERVICE (t, he Terms) ARE A LEGA, L AND B ...

它会导致断词。我不想要这个。 而不是我想得到这样的输出:

[THESE TERMS AND , CONDITIONS OF , SERVICE (the Terms) , ARE A LEGAL AND B ...

又添加了一个条件: 如果单词长度大于 MAX_LINE_LENGTH,则该单词应该被拆分。

并且解决方案应该没有外部罐子的帮助。

【问题讨论】:

标签: java regex string split word-wrap


【解决方案1】:

只需逐字遍历字符串,只要一个单词超过限制就中断。

public String addLinebreaks(String input, int maxLineLength) {
    StringTokenizer tok = new StringTokenizer(input, " ");
    StringBuilder output = new StringBuilder(input.length());
    int lineLen = 0;
    while (tok.hasMoreTokens()) {
        String word = tok.nextToken();

        if (lineLen + word.length() > maxLineLength) {
            output.append("\n");
            lineLen = 0;
        }
        output.append(word);
        lineLen += word.length();
    }
    return output.toString();
}

我只是徒手输入的,您可能需要稍微推动一下才能使其编译。

错误:如果输入中的单词比maxLineLength 长,它将被附加到当前行而不是它自己的太长的行。我假设您的行长大约是 80 或 120 个字符,在这种情况下,这不太可能成为问题。

【讨论】:

  • 没有。我们还需要修复这个错误。因为我的 max_line_length 是 30。我的行可能内容文件名也可能超过 30。在这种情况下,我们需要打破这个词。
  • 我刚刚确认文件名不会超过 15 个字符。所以加油朋友!!! \m/
  • 我只是更改了您代码中的一部分String word = tok.nextToken()+" ";
  • 很棒的解决方案,就像魅力一样。文字可以居中吗??
【解决方案2】:

最好的:使用 Apache Commons Lang:

org.apache.commons.lang.WordUtils

/**
 * <p>Wraps a single line of text, identifying words by <code>' '</code>.</p>
 * 
 * <p>New lines will be separated by the system property line separator.
 * Very long words, such as URLs will <i>not</i> be wrapped.</p>
 * 
 * <p>Leading spaces on a new line are stripped.
 * Trailing spaces are not stripped.</p>
 *
 * <pre>
 * WordUtils.wrap(null, *) = null
 * WordUtils.wrap("", *) = ""
 * </pre>
 *
 * @param str  the String to be word wrapped, may be null
 * @param wrapLength  the column to wrap the words at, less than 1 is treated as 1
 * @return a line with newlines inserted, <code>null</code> if null input
 */
public static String wrap(String str, int wrapLength) {
    return wrap(str, wrapLength, null, false);
}

【讨论】:

    【解决方案3】:

    你可以使用 Apache Commans Lang 的 WordUtils.wrap 方法

     import java.util.*;
     import org.apache.commons.lang3.text.WordUtils;
     public class test3 {
    
    
    public static void main(String[] args) {
    
        String S = "THESE TERMS AND CONDITIONS OF SERVICE (the Terms) ARE A LEGAL AND BINDING AGREEMENT BETWEEN YOU AND NATIONAL GEOGRAPHIC governing your use of this site, www.nationalgeographic.com, which includes but is not limited to products, software and services offered by way of the website such as the Video Player, Uploader, and other applications that link to these Terms (the Site). Please review the Terms fully before you continue to use the Site. By using the Site, you agree to be bound by the Terms. You shall also be subject to any additional terms posted with respect to individual sections of the Site. Please review our Privacy Policy, which also governs your use of the Site, to understand our practices. If you do not agree, please discontinue using the Site. National Geographic reserves the right to change the Terms at any time without prior notice. Your continued access or use of the Site after such changes indicates your acceptance of the Terms as modified. It is your responsibility to review the Terms regularly. The Terms were last updated on 18 July 2011.";
        String F = WordUtils.wrap(S, 20);
        String[] F1 =  F.split(System.lineSeparator());
        System.out.println(Arrays.toString(F1));
    
    }}
    

    输出

       [THESE TERMS AND, CONDITIONS OF, SERVICE (the Terms), ARE A LEGAL AND, BINDING AGREEMENT, BETWEEN YOU AND, NATIONAL GEOGRAPHIC, governing your use, of this site,, www.nationalgeographic.com,, which includes but, is not limited to, products, software, and services offered, by way of the, website such as the, Video Player,, Uploader, and other, applications that, link to these Terms, (the Site). Please, review the Terms, fully before you, continue to use the, Site. By using the, Site, you agree to, be bound by the, Terms. You shall, also be subject to, any additional terms, posted with respect, to individual, sections of the, Site. Please review, our Privacy Policy,, which also governs, your use of the, Site, to understand, our practices. If, you do not agree,, please discontinue, using the Site., National Geographic, reserves the right, to change the Terms, at any time without, prior notice. Your, continued access or, use of the Site, after such changes, indicates your, acceptance of the, Terms as modified., It is your, responsibility to, review the Terms, regularly. The Terms, were last updated on, 18 July 2011.]
    

    【讨论】:

      【解决方案4】:

      感谢 Barend Garvelink 的回答。我已经修改了上面的代码来修复 错误:“如果输入中的单词长于 maxCharInLine”

      public String[] splitIntoLine(String input, int maxCharInLine){
      
          StringTokenizer tok = new StringTokenizer(input, " ");
          StringBuilder output = new StringBuilder(input.length());
          int lineLen = 0;
          while (tok.hasMoreTokens()) {
              String word = tok.nextToken();
      
              while(word.length() > maxCharInLine){
                  output.append(word.substring(0, maxCharInLine-lineLen) + "\n");
                  word = word.substring(maxCharInLine-lineLen);
                  lineLen = 0;
              }
      
              if (lineLen + word.length() > maxCharInLine) {
                  output.append("\n");
                  lineLen = 0;
              }
              output.append(word + " ");
      
              lineLen += word.length() + 1;
          }
          // output.split();
          // return output.toString();
          return output.toString().split("\n");
      }
      

      【讨论】:

      • 你应该使用 output.append(word).append (" ");
      【解决方案5】:

      从@Barend 的建议开始,以下是我的最终版本,稍作修改:

      private static final char NEWLINE = '\n';
      private static final String SPACE_SEPARATOR = " ";
      //if text has \n, \r or \t symbols it's better to split by \s+
      private static final String SPLIT_REGEXP= "\\s+";
      
      public static String breakLines(String input, int maxLineLength) {
          String[] tokens = input.split(SPLIT_REGEXP);
          StringBuilder output = new StringBuilder(input.length());
          int lineLen = 0;
          for (int i = 0; i < tokens.length; i++) {
              String word = tokens[i];
      
              if (lineLen + (SPACE_SEPARATOR + word).length() > maxLineLength) {
                  if (i > 0) {
                      output.append(NEWLINE);
                  }
                  lineLen = 0;
              }
              if (i < tokens.length - 1 && (lineLen + (word + SPACE_SEPARATOR).length() + tokens[i + 1].length() <=
                      maxLineLength)) {
                  word += SPACE_SEPARATOR;
              }
              output.append(word);
              lineLen += word.length();
          }
          return output.toString();
      }
      
      System.out.println(breakLines("THESE TERMS AND CONDITIONS OF SERVICE (the Terms) ARE A     LEGAL AND BINDING " +
                      "AGREEMENT BETWEEN YOU AND NATIONAL GEOGRAPHIC governing     your use of this site, " +
                  "www.nationalgeographic.com, which includes but is not limited to products, " +
                  "software and services offered by way of the website such as the Video Player.", 20));
      

      输出:

      THESE TERMS AND
      CONDITIONS OF
      SERVICE (the Terms)
      ARE A LEGAL AND
      BINDING AGREEMENT
      BETWEEN YOU AND
      NATIONAL GEOGRAPHIC
      governing your use
      of this site,
      www.nationalgeographic.com,
      which includes but
      is not limited to
      products, software
      and services 
      offered by way of
      the website such as
      the Video Player.
      

      【讨论】:

        【解决方案6】:

        Java 8 开始,您还可以使用 Streams 来解决此类问题。

        您可以在下面找到一个使用 Reduction using the .collect() method 的完整示例

        我认为这个应该比其他非 3rd 方解决方案更短。

        private static String multiLine(String longString, String splitter, int maxLength) {
            return Arrays.stream(longString.split(splitter))
                    .collect(
                        ArrayList<String>::new,     
                        (l, s) -> {
                            Function<ArrayList<String>, Integer> id = list -> list.size() - 1;
                            if(l.size() == 0 || (l.get(id.apply(l)).length() != 0 && l.get(id.apply(l)).length() + s.length() >= maxLength)) l.add("");
                            l.set(id.apply(l), l.get(id.apply(l)) + (l.get(id.apply(l)).length() == 0 ? "" : splitter) + s);
                        },
                        (l1, l2) -> l1.addAll(l2))
                    .stream().reduce((s1, s2) -> s1 + "\n" + s2).get();
        }
        
        public static void main(String[] args) {
            String longString = "THESE TERMS AND CONDITIONS OF SERVICE (the Terms) ARE A LEGAL AND BINDING AGREEMENT BETWEEN YOU AND NATIONAL GEOGRAPHIC governing your use of this site, www.nationalgeographic.com, which includes but is not limited to products, software and services offered by way of the website such as the Video Player, Uploader, and other applications that link to these Terms (the Site). Please review the Terms fully before you continue to use the Site. By using the Site, you agree to be bound by the Terms. You shall also be subject to any additional terms posted with respect to individual sections of the Site. Please review our Privacy Policy, which also governs your use of the Site, to understand our practices. If you do not agree, please discontinue using the Site. National Geographic reserves the right to change the Terms at any time without prior notice. Your continued access or use of the Site after such changes indicates your acceptance of the Terms as modified. It is your responsibility to review the Terms regularly. The Terms were last updated on 18 July 2011.";
            String SPLITTER = " ";
            int MAX_LENGTH = 20;
            System.out.println(multiLine(longString, SPLITTER, MAX_LENGTH));
        }
        

        【讨论】:

          【解决方案7】:

          我最近编写了一些方法来执行此操作,如果其中一行中没有空白字符,则在使用中间字拆分之前选择拆分其他非字母数字字符。

          这对我来说是这样的:

          (使用我发布的herelastIndexOfRegex() 方法。)

          /**
           * Indicates that a String search operation yielded no results.
           */
          public static final int NOT_FOUND = -1;
          
          
          
          /**
           * Version of lastIndexOf that uses regular expressions for searching.
           * By Tomer Godinger.
           * 
           * @param str String in which to search for the pattern.
           * @param toFind Pattern to locate.
           * @return The index of the requested pattern, if found; NOT_FOUND (-1) otherwise.
           */
          public static int lastIndexOfRegex(String str, String toFind)
          {
              Pattern pattern = Pattern.compile(toFind);
              Matcher matcher = pattern.matcher(str);
          
              // Default to the NOT_FOUND constant
              int lastIndex = NOT_FOUND;
          
              // Search for the given pattern
              while (matcher.find())
              {
                  lastIndex = matcher.start();
              }
          
              return lastIndex;
          }
          
          /**
           * Finds the last index of the given regular expression pattern in the given string,
           * starting from the given index (and conceptually going backwards).
           * By Tomer Godinger.
           * 
           * @param str String in which to search for the pattern.
           * @param toFind Pattern to locate.
           * @param fromIndex Maximum allowed index.
           * @return The index of the requested pattern, if found; NOT_FOUND (-1) otherwise.
           */
          public static int lastIndexOfRegex(String str, String toFind, int fromIndex)
          {
              // Limit the search by searching on a suitable substring
              return lastIndexOfRegex(str.substring(0, fromIndex), toFind);
          }
          
          /**
           * Breaks the given string into lines as best possible, each of which no longer than
           * <code>maxLength</code> characters.
           * By Tomer Godinger.
           * 
           * @param str The string to break into lines.
           * @param maxLength Maximum length of each line.
           * @param newLineString The string to use for line breaking.
           * @return The resulting multi-line string.
           */
          public static String breakStringToLines(String str, int maxLength, String newLineString)
          {
              StringBuilder result = new StringBuilder();
              while (str.length() > maxLength)
              {
                  // Attempt to break on whitespace first,
                  int breakingIndex = lastIndexOfRegex(str, "\\s", maxLength);
          
                  // Then on other non-alphanumeric characters,
                  if (breakingIndex == NOT_FOUND) breakingIndex = lastIndexOfRegex(str, "[^a-zA-Z0-9]", maxLength);
          
                  // And if all else fails, break in the middle of the word
                  if (breakingIndex == NOT_FOUND) breakingIndex = maxLength;
          
                  // Append each prepared line to the builder
                  result.append(str.substring(0, breakingIndex + 1));
                  result.append(newLineString);
          
                  // And start the next line
                  str = str.substring(breakingIndex + 1);
              }
          
              // Check if there are any residual characters left
              if (str.length() > 0)
              {
                  result.append(str);
              }
          
              // Return the resulting string
              return result.toString();
          }
          

          【讨论】:

            【解决方案8】:

            我的版本(以前的不工作)

            public static List<String> breakSentenceSmart(String text, int maxWidth) {
            
                StringTokenizer stringTokenizer = new StringTokenizer(text, " ");
                List<String> lines = new ArrayList<String>();
                StringBuilder currLine = new StringBuilder();
                while (stringTokenizer.hasMoreTokens()) {
                    String word = stringTokenizer.nextToken();
            
                    boolean wordPut=false;
                    while (!wordPut) {
                        if(currLine.length()+word.length()==maxWidth) { //exactly fits -> dont add the space
                            currLine.append(word);
                            wordPut=true;
                        }
                        else if(currLine.length()+word.length()<=maxWidth) { //whole word can be put
                            if(stringTokenizer.hasMoreTokens()) {
                                currLine.append(word + " ");
                            }else{
                                currLine.append(word);
                            }
                            wordPut=true;
                        }else{
                            if(word.length()>maxWidth) {
                                int lineLengthLeft = maxWidth - currLine.length();
                                String firstWordPart = word.substring(0, lineLengthLeft);
                                currLine.append(firstWordPart);
                                //lines.add(currLine.toString());
                                word = word.substring(lineLengthLeft);
                                //currLine = new StringBuilder();
                            }
                            lines.add(currLine.toString());
                            currLine = new StringBuilder();
                        }
            
                    }
                    //
                }
                if(currLine.length()>0) { //add whats left
                    lines.add(currLine.toString());
                }
                return lines;
            }
            

            【讨论】:

              猜你喜欢
              • 1970-01-01
              • 2015-11-14
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 2011-02-07
              • 1970-01-01
              • 1970-01-01
              • 2022-07-12
              相关资源
              最近更新 更多