使用多线程程序变得更慢答案

【问题标题】：program become much slower using Multithreading使用多线程程序变得更慢
【发布时间】：2012-03-16 03:23:22
【问题描述】：

我的任务是比较数据库中两个表中的数据以获得相似度，例如，如果每个表有 5 条记录，那么我需要将表 A 中的每条记录与表 B 中的所有记录进行比较，以获得相似度。在我使用单线程之前，如果每个表有 500 条记录，并且它使用 4 分钟，现在我使用 4 个线程，它使用半小时！这里是我的想法，我把第一个表分成4个表，每个表存储部分数据，然后用线程池中的4个线程开始对比，这是代码，p1,p2 是表格

Deduplication d = new Deduplication(pr2, threshold);

Func<List<ParentRecord>, List<ParentRecord>> method = d.Find;

for (int i = 0; i < 4; i++)
{
    IEnumerable<ParentRecord> temp = pr1.Skip(i*part).Take(part);
    method.BeginInvoke(temp.ToList(), CallBackMethod, method);
}

private void CallBackMethod(IAsyncResult result)
{
    countThread++;

    var target = (Func<List<ParentRecord>, List<ParentRecord>>)result.AsyncState;
    List<ParentRecord> p=target.EndInvoke(result);
    lock (_locker)
    {
        records.AddRange(p);
    }
    if (countThread > 3)
    {
        this.BeginInvoke(new PopulateDelegate(PopulateGridView), new object[] { records });
    }
}

private void PopulateGridView(List<ParentRecord> p)
{ 
    dataGridViewParent.DataSource = p;
    dataGridViewDuplication.DataSource = null;
}

对不起，我是多线程的新手，所以这个想法可能听起来有点愚蠢，如果你能解释一下，我将不胜感激，谢谢。

更新

public List<ParentRecord> Find()
    {
        List<ParentRecord> result = new List<ParentRecord>();

        foreach (ParentRecord p1 in DataSource1)
        {
   List<DuplicateRecord> addedDuplicateRecords = new List<DuplicateRecord>();
            int num = 0;
            foreach (ParentRecord p2 in DataSource2)
            {

                //Check if these two rows have the same primary keys
                if (p1.PrimaryKey != p2.PrimaryKey)
                {
                    float similarity = 0F;
                    //Check if these two rows are the simply the same
                if (p1.CompareRow.ToUpper() == p2.CompareRow.ToUpper()) similarity = 1;
                    else similarity = GetSimilarity(p1.CompareRow, p2.CompareRow);
                    if (similarity >= threshold)
                    {
                        DuplicateRecord duplicateRecord = new DuplicateRecord();
                        duplicateRecord.PrimaryKey = p2.PrimaryKey;
                        duplicateRecord.CompareToRow = p2.CompareRow;
                        duplicateRecord.Similarity = similarity;
                        addedDuplicateRecords.Add(duplicateRecord);
                        num++;
                    }
                }
            }
            //Check if there are any reocrds meet the threadhold
            if (num > 0)
            {
                ParentRecord parentRecord = new ParentRecord();
                parentRecord.PrimaryKey = p1.PrimaryKey;
                parentRecord.CompareRow = p1.CompareRow;
                parentRecord.duplicateRecordList = addedDuplicateRecords;
                result.Add(parentRecord);
            } 
        }
        return result; 
    }

    private float GetSimilarity(object obj1, object obj2)
    {
        float similarity = 1;


        MatchsMaker match = new MatchsMaker(obj1.ToString(), obj2.ToString());

        similarity = match.Score;


        return similarity;
    }

}

【问题讨论】：

你能贴出 d.Find 的代码吗？这就是问题所在。

标签： c# multithreading

【解决方案1】：

很难理解这是做什么的。我会尝试以完全不同的方式解决问题，也许是通过使用精心设计的查询或将所有数据读入内存，而不是尝试从多个线程逐步执行查询。

【讨论】：

【解决方案2】：

我猜你要么在业务层死锁和/或在数据库中获取表级锁。

在我的脑海中，替代方案：

两个DataReaders 被并排读取
数据库游标（这可能是合法用途）
根据一系列INNER JOINs 比较记录数
CTE（假设您使用的是 SQL Server）

如果您只使用 500 条记录运行性能测试，我无法想象这些方法中的任何一种都需要几分钟。

【讨论】：

但是所有的数据都已经不在数据库中了，我将它们复制到不同的数组列表中，你的猜测还会是这样吗？