【问题标题】:String compression for repeated chars or substrings [closed]重复字符或子字符串的字符串压缩[关闭]
【发布时间】:2017-05-06 20:07:16
【问题描述】:

我有一个可以包含任何字母的字符串

string uncompressed = "dacacacd";

我需要按格式压缩这个字符串

string compressed = "d3(ac)d";

如果可能的话,也可以压缩任何子字符串,例如:

string uncompressed = "dabcabcdabcabc";
string compressed = "2(d2(abc))";

有没有办法在没有任何第三方库的情况下实现这一点?

【问题讨论】:

  • 肯定有办法的。你试过什么?
  • 在谷歌和维基百科中查找。这里是您实施时获得帮助的地方。
  • @Abion47 我尝试使用运行长度编码来实现它,但这样我只能压缩单个字符。
  • @TaW 我整个下午都在搜索,但找不到方法。
  • 第二个例子不应该是“2(dabcabc)”吗?我认为“2(d2(abcabc))”将解码为“dabcabcabcabcdabcabcabcabc”

标签: c# string compression


【解决方案1】:

这是一个首先根据最长子字符串进行压缩的示例。对于像“abababcabc”这样的东西,也许不是最有效或最好的压缩,但至少应该让你开始。

public class CompressedString
{
    private class Segment
    {
        public Segment(int count, CompressedString value)
        {
            Count = count;
            Value = value;
        }
        public int Count { get; set; }
        public CompressedString Value { get; set; }
    }

    private List<Segment> segments = new List<Segment>();
    private string uncompressible;

    private CompressedString(){}

    public static CompressedString Compress(string s)
    {
        var compressed = new CompressedString();
        // longest possible repeating substring is half the length of the
        // string, so try that first and work down to shorter lengths
        for(int len = s.Length/2; len > 0; len--)
        {
            // look for the substring at each possible index
            for(int i = 0; i < s.Length - len - 1; i++)
            {
                var sub = s.Substring(i, len);
                int count = 1;

                // look for repeats of the substring immediately after it.
                for(int j = i + len; j <= s.Length - len; j += len)
                {
                    // increase the count of times the substring is found
                    // or stop looking when it doesn't match
                    if(sub == s.Substring(j, len))
                    {
                        count++;
                    }
                    else
                    {
                        break;
                    }
                }

                // if we found repeats then handle the substring before the 
                // repeats, the repeast, and everything after.
                if(count > 1)
                {
                    // if anything is before the repeats then add it to the
                    // segments with a count of one and compress it.
                    if (i > 0)
                    {
                        compressed.segments.Add(new Segment(1, Compress(s.Substring(0, i))));
                    }

                    // Add the repeats to the segments with the found count
                    // and compress it.
                    compressed.segments.Add(new Segment(count, Compress(sub)));

                    // if anything is after the repeats then add it to the
                    // segments with a count of one and compress it.
                    if (s.Length - (count * len) > i)
                    {
                        compressed.segments.Add(new Segment(1, Compress(s.Substring(i + (count * len)))));
                    }

                    // We're done compressing so break this loop and the
                    // outer by setting len to 0.
                    len = 0;
                    break;
                }
            }
        }

        // If we failed to find any repeating substrings then we just have
        // a single uncompressible string.
        if (!compressed.segments.Any())
        {
            compressed.uncompressible = s;
        }

        // Reduce the the compression for something like "2(2(ab))" to "4(ab)"
        compressed.Reduce();
        return compressed;
    }

    private void Reduce()
    {
        // Attempt to reduce each segment.
        foreach(var seg in segments)
        {
            seg.Value.Reduce();
            // If there is only one sub segment then we can reduce it.
            if(seg.Value.segments.Count == 1)
            {
                var subSeg = seg.Value.segments[0];
                seg.Value = subSeg.Value;
                seg.Count *= subSeg.Count;
            }
        }
    }

    public override string ToString()
    {
        if(segments.Any())
        {
            StringBuilder builder = new StringBuilder();

            foreach(var seg in segments)
            {
                if (seg.Count == 1)
                    builder.Append(seg.Value.ToString());
                else
                {
                    builder.Append(seg.Count).Append("(").Append(seg.Value.ToString()).Append(")");
                }
            }

            return builder.ToString();
        }

        return uncompressible;
    }
}

【讨论】:

  • 谢谢,这对我有用,我只是做了一点改动,以便将“aaa”压缩为“3a”而不是“3(a)”
  • 这对于大字符串来说会很慢
  • @Smiley1000 是的,对于很长的字符串,您需要使用一种使用某种类型的 trie 或后缀树的方法。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2013-10-30
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多