【问题标题】:Improve performance when comparing two integer arrays of different lengths比较两个不同长度的整数数组时提高性能
【发布时间】:2013-02-27 16:59:47
【问题描述】:

我对这个比较器有一个瓶颈(或者至少是我认为我可以做得更好的一个领域),它基本上是一个序数字符串比较器,但适用于整数(ushort,虽然我不认为这很重要)数组。

数组可以有不同的长度,但只有元素 [0..n] (其中 n 是最短数组的长度)匹配时,长度才有意义。在这种情况下,更长的数组被认为是“更大”。

1 2 3

1 2 3 5

1 2 5 3 > 1 2 4

1 2 3 4 > 1 2 3

    public int Compare(ushort[] x, ushort[] y)
    {
        int pos = 0;
        int len = Math.Min(x.Length, y.Length);
        while (pos < len && x[pos] == y[pos])
            pos++;

        return pos < len ?
            x[pos].CompareTo(y[pos]) :
            x.Length.CompareTo(y.Length);

    }

关于如何优化它的任何想法?

更新

回应一位评论者关于我在这里实际所做的事情:我意识到我很久以前就问过一个与此相关的问题,它准确地显示了我在上下文中所做的事情。唯一的主要区别是我现在使用ushort 数组而不是字符串作为键,因为它更紧凑。

使用整个路径作为键,我可以使用部分键从排序集中获取视图,从而为子集查询提供高性能。我试图在构建索引时提高性能。 Data structure for indexed searches of subsets

顺便说一句,到目前为止,我对这里的回答印象非常深刻,多年来我已经就 SO 提出了很多问题,但这是我所见过的最周到和最有趣的答案集合。我还不确定我的具体问题的正确答案是什么(即——数百万次短数组的比较),但他们每个人都教会了我一些我不知道的东西。

【问题讨论】:

  • 你可以跳过Math.Min: int len = x.Length &gt; y.Length ? y.Length : x.Length;
  • 您是在对小数组进行大量比较,还是对非常大的数组进行一些比较?
  • 如果你必须遍历数组,那就再好不过了...你能否将你的数据表示为其他东西(即类似于 radix tree ),所以相同部分的比较不需要步行所有元素?
  • 我正在对相对较小的数组进行大量比较。这些数组表示 HTML 文档中节点的路径;元素的数量是节点的深度,因此它在大多数情况下不太可能超过 30 或 40。我正在使用SortedSet,这是关键,因此我无法轻易更改数据结构而不会对其他地方产生重大影响。
  • @Romoku 尚未尝试所有建议,但到目前为止,您的评论已经产生了最大的不同:-/ .. 我还在 return 中删除了 CompareTo 并将其编码出来长格式也有帮助。有时这是最明显的事情..

标签: c# performance algorithm


【解决方案1】:

这是我想出的,我使用您的代码和一些并行代码的组合测试了大约 16 百万 (2^24)。

public int CompareParallel(ushort[]x, ushort[] y, int len, int segLen)
{
    int compareArrLen = ( len / segLen ) + 1;
    int [ ] compareArr = new int [ compareArrLen ];
    Parallel.For ( 0 , compareArrLen , 
                   new Action<int , ParallelLoopState> ( ( i , state ) =>
    {
        if ( state.LowestBreakIteration.HasValue 
                 && state.LowestBreakIteration.Value < i )
            return;
        int segEnd = ( i + 1 ) * segLen;
        int k = len < segEnd ? len : segEnd;
        for ( int j = i * segLen ; j < k ; j++ )
            if ( x [ j ] != y [ j ] )
            {
                compareArr [ i ] = ( x [ j ].CompareTo ( y [ j ] ) );
                state.Break ( );
                return;
            }
    } ) );
    int r = compareArrLen - 1;
    while ( r >= 0 )
    {
        if ( compareArr [ r ] != 0 )
            return compareArr [ r ];
        r--;
    }
    return x.Length.CompareTo ( y.Length );
}

public int CompareSequential ( ushort [ ] x , ushort [ ] y, int len )
{
    int pos = 0;
    while ( pos < len && x [ pos ] == y [ pos ] )
        pos++;

    return pos < len ?
        x [ pos ].CompareTo ( y [ pos ] ) :
        x.Length.CompareTo ( y.Length );

}

public int Compare( ushort [ ] x , ushort [ ] y ) 
{
    //determined through testing to be the best on my machine
    const int cutOff = 4096;
    int len = x.Length < y.Length ? x.Length : y.Length;
    //check if len is above a specific threshold 
    //and if first and a number in the middle are equal
    //chose equal because we know that there is a chance that more
    //then 50% of the list is equal, which would make the overhead
    //worth the effort
    if ( len > cutOff && x [ len - 1 ] == y [ len - 1 ] 
           && x [ len/2 ] == y [ len/2 ] )
    {
        //segment length was determined to be best through testing
        //at around 8% of the size of the array seemed to have the
        //on my machine
        return CompareParallel ( x , y , len , (len / 100)*8 );
    }
    return CompareSequential ( x , y, len );
}

这是我写的测试:

class Program
{
    [Flags]
    private enum InfoLevel:byte
    {
        Detail=0x01, Summary=0x02
    }

    private static InfoLevel logLevel = InfoLevel.Summary;

    private static void LogDetail ( string content ) 
    {
        LogInfo ( InfoLevel.Detail,content );
    }

    private static void LogSummary ( string content ) 
    {
        LogInfo ( InfoLevel.Summary , content );
    }

    private static void LogInfo ( InfoLevel level , string content ) 
    {
        if ( ( level & logLevel ) == level )
            Console.WriteLine ( content );
    }

    private static void LogInfo ( InfoLevel level , string format, 
                                  params object[] arg )
    {
        if ( ( level & logLevel ) == level )
            Console.WriteLine ( format:format, arg:arg  );
    }

    private static void LogDetail ( string format , params object [ ] arg )
    {
        LogInfo ( InfoLevel.Detail , format, arg );
    }

    private static void LogSummary ( string format , params object [ ] arg )
    {
        LogInfo ( InfoLevel.Summary , format, arg );
    }

    const string _randTestResultHeader = "\r\nRandom Array Content\r\n";
    const string _equalArrayResultHeader = "Only Length Different\r\n\r\n";
    const string _summaryTestResultsHeader = 
                                "Size\t\tOrig Elps\tPara Elps\tComp Elps\r\n";
    const string _summaryBodyContent = 
                         "{0}\t\t{1:0.0000}\t\t{2:0.0000}\t\t{3:0.00000}\r\n";

    static void Main ( string [ ] args )
    {
        Console.SetOut(new StreamWriter(File.Create("out.txt")));

        int segLen = 0;
        int segPercent = 7;
        Console.WriteLine ( "Algorithm Test, Time results in milliseconds" );
        for ( ; segPercent < 13; segPercent ++ )
        {
            Console.WriteLine ( 
                      "Test Run with parallel Dynamic segment size at {0}%"
                       +" of Array Size (Comp always at 8%)\r\n" , segPercent);

            StringBuilder _aggrRandResults = new StringBuilder ( );
            StringBuilder _aggrEqualResults = new StringBuilder ( );

            _aggrRandResults.Append ( _randTestResultHeader );
            _aggrEqualResults.Append ( _equalArrayResultHeader );

            _aggrEqualResults.Append ( _summaryTestResultsHeader );
            _aggrRandResults.Append ( _summaryTestResultsHeader );


            for ( int i = 10 ; i < 25 ; i++ )
            {
                int baseLen = ( int ) Math.Pow ( 2 , i );
                segLen = ( baseLen / 100 ) * segPercent;

                var testName = "Equal Length ";
                var equalTestAverage = RandomRunTest ( testName , baseLen , 
                                                       baseLen, segLen );
                testName = "Left Side Larger";
                var lslargerTestAverage=RandomRunTest(testName,baseLen+10, 
                                                      baseLen, segLen );
                testName = "Right Side Larger";
                var rslargerTestAverage = RandomRunTest ( testName , baseLen ,
                                                        baseLen + 10, segLen );

                double [ ] completelyRandomTestAvg = new double [ 3 ];
                for ( int l = 0 ; l < completelyRandomTestAvg.Length ; l++ )
                    completelyRandomTestAvg [ l ] = ( equalTestAverage [ l ] +
                                                 lslargerTestAverage [ l ] +
                                              rslargerTestAverage [ l ] ) / 3;

                LogDetail ( "\r\nRandom Test Results:" );
                LogDetail ("Original Composite Test Average: {0}" ,
                           completelyRandomTestAvg [ 0 ] );
                LogDetail ( "Parallel Composite Test Average: {0}" ,
                            completelyRandomTestAvg [ 1 ]  );

                _aggrRandResults.AppendFormat ( _summaryBodyContent , 
                    baseLen , 
                    completelyRandomTestAvg [ 0 ] , 
                    completelyRandomTestAvg [ 1 ] , 
                    completelyRandomTestAvg [ 2 ]);

                testName = "Equal Len And Values";
                var equalEqualTest = EqualTill ( testName , baseLen , 
                                                 baseLen, segLen );

                testName = "LHS Larger";
                var equalLHSLargerTest = EqualTill ( testName , baseLen + 10 , 
                                                     baseLen, segLen );

                testName = "RHS Larger";
                var equalRHSLargerTest = EqualTill ( testName , baseLen , 
                                                     baseLen + 10, segLen );

                double [ ] mostlyEqualTestAvg = new double [ 3 ];
                for ( int l = 0 ; l < mostlyEqualTestAvg.Length ; l++ )
                    mostlyEqualTestAvg [ l ] = ( ( equalEqualTest [ l ] +
                                            equalLHSLargerTest [ l ] +
                                            equalRHSLargerTest [ l ] ) / 3 );

                LogDetail( "\r\nLength Different Test Results" );
                LogDetail( "Original Composite Test Average: {0}" , 
                           mostlyEqualTestAvg [ 0 ] );
                LogDetail( "Parallel Composite Test Average: {0}" , 
                            mostlyEqualTestAvg [ 1 ] );

                _aggrEqualResults.AppendFormat ( _summaryBodyContent , 
                                                 baseLen , 
                                                 mostlyEqualTestAvg [ 0 ] , 
                                                 mostlyEqualTestAvg [ 1 ] ,
                                                 mostlyEqualTestAvg [ 2 ]);
            }

            LogSummary ( _aggrRandResults.ToString() + "\r\n");
            LogSummary ( _aggrEqualResults.ToString()+ "\r\n");

        }
        Console.Out.Flush ( );
    }


    private const string _testBody = 
                  "\r\n\tOriginal:: Result:{0}, Elapsed:{1}"
                 +"\r\n\tParallel:: Result:{2}, Elapsed:{3}"
                 +"\r\n\tComposite:: Result:{4}, Elapsed:{5}";
    private const string _testHeader = 
                  "\r\nTesting {0}, Array Lengths: {1}, {2}";
    public static double[] RandomRunTest(string testName, int shortArr1Len, 
                                         int shortArr2Len, int parallelSegLen)
    {

        var shortArr1 = new ushort [ shortArr1Len ];
        var shortArr2 = new ushort [ shortArr2Len ];
        double [ ] avgTimes = new double [ 3 ];

        LogDetail ( _testHeader , testName , shortArr1Len , shortArr2Len ) ;
        for ( int i = 0 ; i < 10 ; i++ )
        {
            int arrlen1 = shortArr1.Length , arrlen2 = shortArr2.Length;

            double[] currResults = new double [ 3 ];

            FillCompareArray ( shortArr1 , shortArr1.Length );
            FillCompareArray ( shortArr2 , shortArr2.Length );

            var sw = new Stopwatch ( );

            //Force Garbage Collection 
            //to avoid having it effect 
            //the test results this way 
            //test 2 may have to garbage 
            //collect due to running second
            GC.Collect ( );
            sw.Start ( );
            int origResult = Compare ( shortArr1 , shortArr2 );
            sw.Stop ( );
            currResults[0] = sw.Elapsed.TotalMilliseconds;
            sw.Reset ( );

            GC.Collect ( );
            sw.Start ( );
            int parallelResult = CompareParallelOnly ( shortArr1 , shortArr2, 
                                                       parallelSegLen );
            sw.Stop ( );
            currResults [ 1 ] = sw.Elapsed.TotalMilliseconds;
            sw.Reset ( );

            GC.Collect ( );
            sw.Start ( );
            int compositeResults = CompareComposite ( shortArr1 , shortArr2 );
            sw.Stop ( );                
            currResults [ 2 ] = sw.Elapsed.TotalMilliseconds;

            LogDetail ( _testBody, origResult , currResults[0] , 
                        parallelResult , currResults[1], 
                        compositeResults, currResults[2]);

            for ( int l = 0 ; l < currResults.Length ; l++ )
                avgTimes [ l ] = ( ( avgTimes[l]*i)+currResults[l]) 
                                    / ( i + 1 );
        }
        LogDetail ( "\r\nAverage Run Time Original: {0}" , avgTimes[0]);
        LogDetail ( "Average Run Time Parallel: {0}" , avgTimes[1]);
        LogDetail ( "Average Run Time Composite: {0}" , avgTimes [ 2 ] );

        return avgTimes;
    }

    public static double [ ] EqualTill ( string testName, int shortArr1Len , 
                                       int shortArr2Len, int parallelSegLen)
    {

        const string _testHeader = 
               "\r\nTesting When Array Difference is "
               +"Only Length({0}), Array Lengths: {1}, {2}";

        int baseLen = shortArr1Len > shortArr2Len 
                          ? shortArr2Len : shortArr1Len;

        var shortArr1 = new ushort [ shortArr1Len ];
        var shortArr2 = new ushort [ shortArr2Len ];
        double [ ] avgTimes = new double [ 3 ];

        LogDetail( _testHeader , testName , shortArr1Len , shortArr2Len );
        for ( int i = 0 ; i < 10 ; i++ )
        {

            FillCompareArray ( shortArr1 , shortArr1Len);
            Array.Copy ( shortArr1 , shortArr2, baseLen );
            double [ ] currElapsedTime = new double [ 3 ];
            var sw = new Stopwatch ( );
            //See previous explaination 
            GC.Collect ( );
            sw.Start ( );
            int origResult = Compare ( shortArr1 , shortArr2 );
            sw.Stop ( );
            currElapsedTime[0] = sw.Elapsed.TotalMilliseconds;
            sw.Reset ( );

            GC.Collect ( );
            sw.Start ( );
            int parallelResult = CompareParallelOnly ( shortArr1, shortArr2, 
                                     parallelSegLen );
            sw.Stop ( );
            currElapsedTime[1] = sw.Elapsed.TotalMilliseconds;
            sw.Reset ( );

            GC.Collect ( );
            sw.Start ( );
            var compositeResult = CompareComposite ( shortArr1 , shortArr2 );
            sw.Stop ( );
            currElapsedTime [ 2 ] = sw.Elapsed.TotalMilliseconds;

            LogDetail ( _testBody , origResult , currElapsedTime[0] , 
                parallelResult , currElapsedTime[1], 
                compositeResult,currElapsedTime[2]);

            for ( int l = 0 ; l < currElapsedTime.Length ; l++ )
                avgTimes [ l ] = ( ( avgTimes [ l ] * i ) 
                                   + currElapsedTime[l])/(i + 1);
        }
        LogDetail ( "\r\nAverage Run Time Original: {0}" , avgTimes [ 0 ] );
        LogDetail ( "Average Run Time Parallel: {0}" , avgTimes [ 1 ] );
        LogDetail ( "Average Run Time Composite: {0}" , avgTimes [ 2 ] );
        return avgTimes;
    }


    static Random rand = new Random ( );
    public static void FillCompareArray ( ushort[] compareArray, int length ) 
    {
        var retVals = new byte[length];
        ( rand ).NextBytes ( retVals );
        Array.Copy ( retVals , compareArray , length);
    }

    public static int CompareParallelOnly ( ushort [ ] x , ushort[] y, 
                                            int segLen ) 
    {
       int len = x.Length<y.Length ? x.Length:y.Length;
       int compareArrLen = (len/segLen)+1;
       int[] compareArr = new int [ compareArrLen ];
       Parallel.For ( 0 , compareArrLen , 
           new Action<int , ParallelLoopState> ( ( i , state ) =>
       {
           if ( state.LowestBreakIteration.HasValue 
                    && state.LowestBreakIteration.Value < i )
               return;
           int segEnd = ( i + 1 ) * segLen;
           int k = len<segEnd?len:segEnd;

           for ( int j = i * segLen ; j < k ; j++ )
               if ( x [ j ] != y [ j ] )
               {
                   compareArr [ i ] = ( x [ j ].CompareTo ( y [ j ] ) );
                   state.Break ( );
                   return;
               }
       } ) );
       int r=compareArrLen-1;
       while ( r >= 0 ) 
       {
           if ( compareArr [ r ] != 0 )
               return compareArr [ r ];
           r--;
       }
       return x.Length.CompareTo ( y.Length );
    }

    public static int Compare ( ushort [ ] x , ushort [ ] y )
    {
        int pos = 0;
        int len = Math.Min ( x.Length , y.Length );
        while ( pos < len && x [ pos ] == y [ pos ] )
            pos++;

        return pos < len ?
            x [ pos ].CompareTo ( y [ pos ] ) :
            x.Length.CompareTo ( y.Length );

    }

    public static int CompareParallel ( ushort[] x, ushort[] y, int len, 
                                        int segLen )
    {
        int compareArrLen = ( len / segLen ) + 1;
        int [ ] compareArr = new int [ compareArrLen ];
        Parallel.For ( 0 , compareArrLen , 
            new Action<int , ParallelLoopState> ( ( i , state ) =>
        {
            if ( state.LowestBreakIteration.HasValue 
                 && state.LowestBreakIteration.Value < i )
                return;
            int segEnd = ( i + 1 ) * segLen;
            int k = len < segEnd ? len : segEnd;
            for ( int j = i * segLen ; j < k ; j++ )
                if ( x [ j ] != y [ j ] )
                {
                    compareArr [ i ] = ( x [ j ].CompareTo ( y [ j ] ) );
                    state.Break ( );
                    return;
                }
        } ) );
        int r = compareArrLen - 1;
        while ( r >= 0 )
        {
            if ( compareArr [ r ] != 0 )
                return compareArr [ r ];
            r--;
        }
        return x.Length.CompareTo ( y.Length );
    }

    public static int CompareSequential(ushort [ ] x , ushort [ ] y, int len)
    {
        int pos = 0;
        while ( pos < len && x [ pos ] == y [ pos ] )
            pos++;

        return pos < len ?
            x [ pos ].CompareTo ( y [ pos ] ) :
            x.Length.CompareTo ( y.Length );

    }

    public static int CompareComposite ( ushort [ ] x , ushort [ ] y ) 
    {
        const int cutOff = 4096;
        int len = x.Length < y.Length ? x.Length : y.Length;

        if ( len > cutOff && x [ len - 1 ] == y [ len - 1 ]
                 && x [ len/2 ] == y [ len/2 ] )
            return CompareParallel ( x , y , len , (len / 100)*8 );

        return CompareSequential ( x , y, len );
    }
}

注意:

确保您使用优化的代码进行构建,当我不包含此步骤时,结果会大不相同,它使并行代码看起来比实际的改进要大得多。

我得到的结果是,对于非常长的相等数字集,执行时间减少了大约 33%。它仍然随着输入的增加而线性增长,但速度较慢。对于小型数据集(在我的机器上少于 4092),它的启动速度也较慢,但通常所花费的时间足够小(在我的机器上为 0.001 毫秒),以防万一你得到它是值得使用的一个大的几乎相等的数组。

【讨论】:

  • "我在做很多比较小数组的比较。这些数组代表HTML文档中一个节点的路径;元素个数就是一个节点的深度,所以不是特别可能大多数情况下将超过 30 或 40 个。” (参见 OP 的 cmets)。不管怎样,你做得很好,查尔斯!
  • 接受这个答案,因为它可能是最普遍适用的,尽管除了@Romoku 的评论“摆脱 Math.min”之外,它们都没有真正为我非常具体的用例做太多事情。不过,建议阅读本文的任何人查看所有答案,这里有很多见解。
【解决方案2】:

可能不会有很大的不同,但您可以将最后一个元素设置为不同的,以摆脱 while 循环中的 pos &lt; len 检查。还有相当琐碎的pos++++pos

public int Compare(ushort[] x, ushort[] y)
{
    int pos = 0;
    int len = Math.Min(x.Length, y.Length);

    // the below is probably not worth it for less than 5 (or so) elements,
    //   so just do the old way
    if (len < 5)
    {
      while (pos < len && x[pos] == y[pos])
        ++pos;

      return pos < len ?
        x[pos].CompareTo(y[pos]) :
        x.Length.CompareTo(y.Length);
    }

    ushort lastX = x[len-1];
    bool lastSame = true;
    if (x[len-1] == y[len-1])
        --x[len-1]; // can be anything else
    else
        lastSame = false;

    while (x[pos] == y[pos])
        ++pos;

    return pos < len-1 ?
        x[pos].CompareTo(y[pos]) :
        lastSame ? x.Length.CompareTo(y.Length)
                 : lastX.CompareTo(y[len-1]);
}

编辑:只有在许多元素从一开始就相同的情况下,您才能真正获得性能提升(如 pkuderov 所提到的,当存在早期差异时,情况会更糟)。

【讨论】:

  • 似乎合法!因为在前几个项目不同(早期循环中断)的情况下性能会更差,但我认为这是值得的
  • 有趣。因为数组表示树中的路径(我之前应该提到过),实际上当数组长度相同时,大多数元素会频繁匹配。我要试试这个..
【解决方案3】:

对不起,回答太长了,但这个问题让我很感兴趣,我花了几个小时调查,我想分享结果。 我写了一些测试用例生成器和粗略的性能测试器

那里有什么:

  1. 生成完全随机的数组
  2. 检查 3 种比较方法的执行时间
  3. 生成相似概率高的数组
  4. 检查执行时间。

我用了3种方法

  1. OP 的
  2. 我的 - 想法 - 将两个索引操作更改为指针增量
  3. Dukeling 的 - 想法 - 删除不必要的比较

我从短数组开始(长度为 5-15)

方法 1 在两种测试变体中都是最快的(由 pkuderov 预测)

如果我们增加数组的长度情况会发生变化。

这是我在数组长度在 500 到 1500 之间时得到的

Generating test cases ...
Done. (5258 milliseconds)
Compare1 took 18 milliseconds
Compare2 took 18 milliseconds
Compare3 took 33 milliseconds
Generating 'similar' test cases ...
Done. (5081 milliseconds)
Compare1 took 359 milliseconds
Compare2 took 313 milliseconds
Compare3 took 295 milliseconds

因此,与方法 1 相比,方法 2 的增益较小,与方法 2 相比,方法 3 的增益甚至更小;

分辨率:

1.如果您的阵列足够短和/或很有可能 从小索引值开始的差异 - 你能做的不多 做(使用建议的方法)
2.否则你可以尝试一些 方法 2 和 3 的组合。

代码:

using System;
using System.Diagnostics;

namespace ConsoleExamples
    {
        class ArrayComparePerformance
        {
            static readonly int testArraysNum = 100000;
            static readonly int maxArrayLen = 1500;
            static readonly int minArrayLen = 500;
            static readonly int maxValue = 10;

            public static void RunTest()
            {
                //Generate random arrays;
                ushort[][] a = new ushort[testArraysNum][];
                ushort[][] b = new ushort[testArraysNum][];

                Random rand = new Random();

                Console.WriteLine("Generating test cases ... " );

                Stopwatch sw = new Stopwatch();
                sw.Start();

                for (int i = 0; i < testArraysNum; i++)
                {
                    int len = rand.Next(maxArrayLen) + 1;
                    a[i] = new ushort[len];
                    for (int j = 0; j < len; j++)
                    {
                        a[i][j] = (ushort) rand.Next(maxValue);
                    }



                    len = rand.Next(maxArrayLen) + 1;
                    b[i] = new ushort[len];
                    for (int j = 0; j < len; j++)
                    {
                        b[i][j] = (ushort) rand.Next(maxValue);
                    }



                }

                sw.Stop();
                Console.WriteLine("Done. ({0} milliseconds)", sw.ElapsedMilliseconds);


                //compare1
                sw.Restart();
                for (int i = 0; i < testArraysNum; i++)
                {
                    int result = Compare1(a[i], b[i]);
                }
                sw.Stop();
                Console.WriteLine("Compare1 took " + sw.ElapsedMilliseconds.ToString() + " milliseconds");

                //compare2
                sw.Restart();
                for (int i = 0; i < testArraysNum; i++)
                {
                    int result = Compare2(a[i], b[i]);
                }
                sw.Stop();
                Console.WriteLine("Compare2 took " + sw.ElapsedMilliseconds.ToString() + " milliseconds");

                //compare3
                sw.Restart();
                for (int i = 0; i < testArraysNum; i++)
                {
                    int result = Compare3(a[i], b[i]);
                }
                sw.Stop();
                Console.WriteLine("Compare3 took " + sw.ElapsedMilliseconds.ToString() + " milliseconds");


                //Generate "similar" arrays;

                Console.WriteLine("Generating 'similar' test cases ... ");

                sw.Restart();

                for (int i = 0; i < testArraysNum; i++)
                {
                    int len = rand.Next(maxArrayLen - minArrayLen) + minArrayLen -1;
                    a[i] = new ushort[len];
                    for (int j = 0; j < len; j++)
                    {
                        if (j < len/2)
                            a[i][j] = (ushort)j;
                        else
                            a[i][j] = (ushort)(rand.Next(2)  + j);
                    }



                    len = rand.Next(maxArrayLen - minArrayLen) + minArrayLen - 1;
                    b[i] = new ushort[len];
                    for (int j = 0; j < len; j++)
                    {
                        if (j < len/2)
                            b[i][j] = (ushort)j;
                        else
                            b[i][j] = (ushort)(rand.Next(2)  + j);
                    }
                }

                sw.Stop();
                Console.WriteLine("Done. ({0} milliseconds)", sw.ElapsedMilliseconds);


                //compare1
                sw.Restart();
                for (int i = 0; i < testArraysNum; i++)
                {
                    int result = Compare1(a[i], b[i]);
                }
                sw.Stop();
                Console.WriteLine("Compare1 took " + sw.ElapsedMilliseconds.ToString() + " milliseconds");

                //compare2
                sw.Restart();
                for (int i = 0; i < testArraysNum; i++)
                {
                    int result = Compare2(a[i], b[i]);
                }
                sw.Stop();
                Console.WriteLine("Compare2 took " + sw.ElapsedMilliseconds.ToString() + " milliseconds");

                //compare3
                sw.Restart();
                for (int i = 0; i < testArraysNum; i++)
                {
                    int result = Compare3(a[i], b[i]);
                }
                sw.Stop();
                Console.WriteLine("Compare3 took " + sw.ElapsedMilliseconds.ToString() + " milliseconds");


                Console.ReadKey();
            }

            public static int Compare1(ushort[] x, ushort[] y)
            {
                int pos = 0;
                int len = Math.Min(x.Length, y.Length);
                while (pos < len && x[pos] == y[pos])
                    pos++;

                return pos < len ?
                    x[pos].CompareTo(y[pos]) :
                    x.Length.CompareTo(y.Length);
            }

            public unsafe static int Compare2(ushort[] x, ushort[] y)
            {
                int pos = 0;
                int len = Math.Min(x.Length, y.Length);
                fixed (ushort* fpx = &x[0], fpy = &y[0])
                {
                    ushort* px = fpx;
                    ushort* py = fpy;
                    while (pos < len && *px == *py)
                    {
                        px++;
                        py++;
                        pos++;
                    }
                }

                return pos < len ?
                    x[pos].CompareTo(y[pos]) :
                    x.Length.CompareTo(y.Length);
            }

            public static int Compare3(ushort[] x, ushort[] y)
            {
                int pos = 0;
                int len = Math.Min(x.Length, y.Length);

                // the below is probably not worth it for less than 5 (or so) elements,
                //   so just do the old way
                if (len < 5)
                {
                    while (pos < len && x[pos] == y[pos])
                        ++pos;

                    return pos < len ?
                      x[pos].CompareTo(y[pos]) :
                      x.Length.CompareTo(y.Length);
                }

                ushort lastX = x[len - 1];
                bool lastSame = true;
                if (x[len - 1] == y[len - 1])
                    --x[len - 1]; // can be anything else
                else
                    lastSame = false;

                while (x[pos] == y[pos])
                    ++pos;

                return pos < len - 1 ?
                    x[pos].CompareTo(y[pos]) :
                    lastSame ? x.Length.CompareTo(y.Length)
                             : lastX.CompareTo(y[len - 1]);
            }
        }
    }

【讨论】:

  • 这很棒。方法 3(取消比较)的主要问题是它具有破坏性;之后我必须将其设置回来,这会导致线程安全问题。所以这可能不是一个选择。有趣的是,到目前为止,最大的不同是摆脱Math.Min,正如第一个评论者所建议的那样。去尝试你的指针建议。我什至尝试使用 memcmp 来比较等长部分,但这实际上减慢了速度,我认为短数组编组的开销不值得。
【解决方案4】:

只是一些想法(可能有误,需要测试):

首先。较大的项目类型(例如,x32 的 int 或 x64 的 long - 我们将此类型命名为 TLong)可能会提供更好的性能。
如果您将多个ushort 项目打包成TLong 项目(在big-endian 订单中!),您将能够一次比较多个项目。
但是,如果新数组 [TLong] 类型的最后一项未满,您将需要处理它。可能会有一些“棘手的情况”。我现在看不到任何东西,但我不确定。

第二。 更多!在某些情况下,我们可以在TLong 类型的项目中打包更多原始项目。
让我们回到ushort 类型的初始数组:假设K - 存在于所有数组中的最大数字(即您要排序的所有路径!)(即对于每个存储的数字t在每个ushort 真正:t &lt;= K)。然后让我们假设每个t 只是base-K 数字系统中的一个“数字”。
这意味着图表中的每条路径(即每个ushort 数组)只决定了这个数字系统中的一个数字。因此,您不需要像这样操作 ushort 数组:

  1. 确定K 的最大力量适合TLong 类型 - 假设它是p

    int p = 0;
    while (Math.Exp(K, p) <= TLong.MaxValue)
        p++;
    
  2. ushort数组的第i个p项并在base-Knumeric系统中计算适当的数字并将其保留为TLong数组类型的第i项:

    List<TLong> num = new List<TLong>();
    int i = 0;
    while (p * i < a.Length)
    {
        TLong y = 0;
    
        //transform from base-10 to base-K in chunks of p elements
        for (int j = p * i; j < Math.Min(p * (i + 1), a.Length); j++)
            y = y * K + a[j];
    
        num.Add(sum);
        i++;
    }
    TLong[] result = num.ToArray();
    
  3. 这是一个动态的预先计算的转换,因此对于不同的 HTML 文档 K 可能会有所不同,对于 K 远小于 255 的情况,它会比第一个想法更快。此外,预先计算的变换具有线性复杂性,因此不会对您的性能产​​生很大影响。

  4. 通常,您将数组转换为 base-K num 中的大数。系统。存储在数组中。就是这样!
  5. 并且无需通过其他 cmets 的改进来更改初始排序算法(无论如何检查一下 - 我可能错了!)

希望对你有帮助。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2013-04-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多