【问题标题】:How to split string preserving whole words?如何拆分保留整个单词的字符串?
【发布时间】:2011-05-22 20:19:45
【问题描述】:

我需要将长句分成保留整个单词的部分。每个部分都应给出最大字符数(包括空格、点等)。 例如:

int partLenght = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon."

输出:

1 part: "Silver badges are awarded for"
2 part: "longer term goals. Silver badges are"
3 part: "uncommon."

【问题讨论】:

  • 您是否尝试实现自动换行算法?
  • 顺便说一句,您的示例是错误的 :).... 第 2 部分不应包含我的解决方案所示的“are”。
  • 第 1 步使用给定长度拆分,第 2 步使用条件并检查单词。

标签: c# .net console formatting string-concatenation


【解决方案1】:

试试这个:

    static void Main(string[] args)
    {
        int partLength = 35;
        string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
        string[] words = sentence.Split(' ');
        var parts = new Dictionary<int, string>();
        string part = string.Empty;
        int partCounter = 0;
        foreach (var word in words)
        {
            if (part.Length + word.Length < partLength)
            {
                part += string.IsNullOrEmpty(part) ? word : " " + word;
            }
            else
            {
                parts.Add(partCounter, part);
                part = word;
                partCounter++;
            }
        }
        parts.Add(partCounter, part);
        foreach (var item in parts)
        {
            Console.WriteLine("Part {0} (length = {2}): {1}", item.Key, item.Value, item.Value.Length);
        }
        Console.ReadLine();
    }

【讨论】:

  • 如果第一个单词比 partLength 长,小变化: (!string.IsNullOrEmpty(part)) parts.Add(partCounter, part);
【解决方案2】:

我知道必须有一个很好的 LINQ-y 方式来做这件事,所以这里是为了好玩:

var input = "The quick brown fox jumps over the lazy dog.";
var charCount = 0;
var maxLineLength = 11;

var lines = input.Split(' ', StringSplitOptions.RemoveEmptyEntries)
    .GroupBy(w => (charCount += w.Length + 1) / maxLineLength)
    .Select(g => string.Join(" ", g));

// That's all :)

foreach (var line in lines) {
    Console.WriteLine(line);
}

显然,此代码仅在查询不并行时才有效,因为它依赖于 charCount 以“按字序”递增。

【讨论】:

  • 看来你需要在string.Join call中将g改为g.toArray()
  • 这里有一个错误,请参阅下面@JonLord 的回答:stackoverflow.com/a/17571171/364
  • @Jon 可能您需要将 .Net Framework v4.5split 方法 形式 input.Split(' ', StringSplitOptions.RemoveEmptyEntries) 更改为 input.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries)跨度>
【解决方案3】:

我一直在测试 Jon 和 Lessan 的答案,但如果您的最大长度需要是绝对的,而不是近似的,它们就不能正常工作。随着他们的计数器增加,它不计算行尾剩余的空白空间。

针对 OP 的示例运行他们的代码,您会得到:

1 part: "Silver badges are awarded for " - 29 Characters
2 part: "longer term goals. Silver badges are" - 36 Characters
3 part: "uncommon. " - 13 Characters

第二行的“是”应该在第三行。发生这种情况是因为计数器不包括第一行末尾的 6 个字符。

为了解决这个问题,我想出了对 Lessan 答案的以下修改:

public static class ExtensionMethods
{
    public static string[] Wrap(this string text, int max)
    {
        var charCount = 0;
        var lines = text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
        return lines.GroupBy(w => (charCount += (((charCount % max) + w.Length + 1 >= max) 
                        ? max - (charCount % max) : 0) + w.Length + 1) / max)
                    .Select(g => string.Join(" ", g.ToArray()))
                    .ToArray();
    }
}

【讨论】:

  • string[] texts = text.Wrap (50); ,非常感谢
  • 还有一个错误。将字符串“The quick brown fox jumps over the lazy”传递给它,最大值为 20。它应该返回 2 行,长度为 19,但它返回 3 行。第一行有 'fox' 的空间,为第二行的其余字符串腾出空间。也许更容易理解的非 linq 版本会不那么酷,但实际上会产生工作代码?仅在这个问题上就有三个人尝试过但失败了;)
【解决方案4】:

(空格)分割字符串,从结果数组中构建新字符串,在每个新段的限制之前停止。

未经测试的伪代码:

string[] words = sentence.Split(new char[] {' '});
IList<string> sentenceParts = new List<string>();
sentenceParts.Add(string.Empty);

int partCounter = 0;    

foreach (var word in words)
{
  if(sentenceParts[partCounter].Length + word.Length > myLimit)
  {
     partCounter++;
     sentenceParts.Add(string.Empty);
  }

  sentenceParts[partCounter] += word + " ";
}

【讨论】:

    【解决方案5】:

    起初我认为这可能是一种正则表达式,但这是我的想法:

    List<string> parts = new List<string>();
    int partLength = 35;
    string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
    
    string[] pieces = sentence.Split(' ');
    StringBuilder tempString = new StringBuilder("");
    
    foreach(var piece in pieces)
    {
        if(piece.Length + tempString.Length + 1 > partLength) 
        {
            parts.Add(tempString.ToString());
            tempString.Clear();        
        }
        tempString.Append(" " + piece); 
    }
    

    【讨论】:

      【解决方案6】:

      扩展上面乔恩的答案;我需要将gg.toArray() 切换,并将max 更改为(max + 2) 以获得最大字符的精确换行。

      public static class ExtensionMethods
      {
          public static string[] Wrap(this string text, int max)
          {
              var charCount = 0;
              var lines = text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
              return lines.GroupBy(w => (charCount += w.Length + 1) / (max + 2))
                          .Select(g => string.Join(" ", g.ToArray()))
                          .ToArray();
          }
      }
      

      这里是 NUnit 测试的示例用法:

      [Test]
      public void TestWrap()
      {
          Assert.AreEqual(2, "A B C".Wrap(4).Length);
          Assert.AreEqual(1, "A B C".Wrap(5).Length);
      
          Assert.AreEqual(2, "AA BB CC".Wrap(7).Length);
          Assert.AreEqual(1, "AA BB CC".Wrap(8).Length);
      
          Assert.AreEqual(2, "TEST TEST TEST TEST".Wrap(10).Length);
          Assert.AreEqual(2, "  TEST TEST TEST TEST  ".Wrap(10).Length);
          Assert.AreEqual("TEST TEST", "  TEST TEST TEST TEST  ".Wrap(10)[0]);
      }
      

      【讨论】:

        【解决方案7】:

        似乎每个人都在使用某种形式的“Split 然后重建句子”......

        我想我会按照我的大脑在逻辑上考虑手动执行此操作的方式进行尝试,即:

        • 按长度拆分
        • 返回最近的空间并使用该块
        • 删除使用的块并重新开始

        代码最终比我希望的要复杂一些,但是我相信它可以处理大多数(所有?)边缘情况 - 包括比 maxLength 更长的单词,当单词恰好在 maxLength 上结束时,等等。

        这是我的功能:

        private static List<string> SplitWordsByLength(string str, int maxLength)
        {
            List<string> chunks = new List<string>();
            while (str.Length > 0)
            {
                if (str.Length <= maxLength)                    //if remaining string is less than length, add to list and break out of loop
                {
                    chunks.Add(str);
                    break;
                }
        
                string chunk = str.Substring(0, maxLength);     //Get maxLength chunk from string.
        
                if (char.IsWhiteSpace(str[maxLength]))          //if next char is a space, we can use the whole chunk and remove the space for the next line
                {
                    chunks.Add(chunk);
                    str = str.Substring(chunk.Length + 1);      //Remove chunk plus space from original string
                }
                else
                {
                    int splitIndex = chunk.LastIndexOf(' ');    //Find last space in chunk.
                    if (splitIndex != -1)                       //If space exists in string,
                        chunk = chunk.Substring(0, splitIndex); //  remove chars after space.
                    str = str.Substring(chunk.Length + (splitIndex == -1 ? 0 : 1));      //Remove chunk plus space (if found) from original string
                    chunks.Add(chunk);                          //Add to list
                }
            }
            return chunks;
        }
        

        测试用法:

        string testString = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
        int length = 35;
        
        List<string> test = SplitWordsByLength(testString, length);
        
        foreach (string chunk in test)
        {
            Console.WriteLine(chunk);  
        }
        
        Console.ReadLine();
        

        【讨论】:

          【解决方案8】:

          Joel 您的代码中有一个小错误,我已在此处更正:

          public static string[] StringSplitWrap(string sentence, int MaxLength)
          {
                  List<string> parts = new List<string>();
                  string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
          
                  string[] pieces = sentence.Split(' ');
                  StringBuilder tempString = new StringBuilder("");
          
                  foreach (var piece in pieces)
                  {
                      if (piece.Length + tempString.Length + 1 > MaxLength)
                      {
                          parts.Add(tempString.ToString());
                          tempString.Clear();
                      }
                      tempString.Append((tempString.Length == 0 ? "" : " ") + piece);
                  }
          
                  if (tempString.Length>0)
                      parts.Add(tempString.ToString());
          
                  return parts.ToArray();
          }
          

          【讨论】:

            【解决方案9】:

            这行得通:

            int partLength = 35;
            string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
            List<string> lines =
                sentence
                    .Split(' ')
                    .Aggregate(new [] { "" }.ToList(), (a, x) =>
                    {
                        var last = a[a.Count - 1];
                        if ((last + " " + x).Length > partLength)
                        {
                            a.Add(x);
                        }
                        else
                        {
                            a[a.Count - 1] = (last + " " + x).Trim();
                        }
                        return a;
                    });
            

            它给了我:

            银质徽章被授予 长期目标。银质徽章 不常见。

            【讨论】:

              【解决方案10】:

              虽然CsConsoleFormat† 主要用于为控制台设置文本格式,但它也支持生成纯文本。

              var doc = new Document().AddChildren(
                new Div("Silver badges are awarded for longer term goals. Silver badges are uncommon.") {
                  TextWrap = TextWrapping.WordWrap
                }
              );
              var bounds = new Rect(0, 0, 35, Size.Infinity);
              string text = ConsoleRenderer.RenderDocumentToText(doc, new TextRenderTarget(), bounds);
              

              而且,如果您确实需要像问题中那样修剪过的字符串:

              List<string> lines = text.Trim()
                .Split(new[] { Environment.NewLine }, StringSplitOptions.None)
                .Select(s => s.Trim())
                .ToList();
              

              除了对空格进行自动换行之外,您还可以正确处理连字符、零宽度空格、不间断空格等。

              †​​ CsConsoleFormat 是我开发的。

              【讨论】:

                猜你喜欢
                • 1970-01-01
                • 2014-11-28
                • 1970-01-01
                • 2017-04-04
                • 1970-01-01
                • 2018-07-09
                • 1970-01-01
                • 2020-04-01
                • 2020-09-13
                相关资源
                最近更新 更多