【发布时间】:2014-05-26 07:00:03
【问题描述】:
我正在尝试创建一些代码来从大约 20 万到 100 万条记录的列表中找出记录。显然,我希望这个过程尽可能快。基本思想如下,大列表中的记录是要保持在一起的数字组合。例如:
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,400076,400097,800076,800097
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,200032,200078,500032,500078
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,300043,300083,600043,600083
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,600026,600077,900026,900077
0,0,0,0,0,0,0,0,0,0,0,0,0,0,100008,100028,400028,400056,600008,600056
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,400042,400098,500042,500098
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,86,500015,500086
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,400013,400076,800013,800076
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,700024,700083,900024,900083
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,100003,100047,800003,800047
记录的最大长度为 20,这就是附加零的原因。让我们暂时不要担心这些。因此,我想“捞出”一些记录,以免观察到重复。如果有重复,我可以丢弃该记录,不再进一步查看。因此,我必须编译一个如下所示的列表:
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,400076,400097,800076,800097
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,200032,200078,500032,500078
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,300043,300083,600043,600083
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,600026,600077,900026,900077
0,0,0,0,0,0,0,0,0,0,0,0,0,0,100008,100028,400028,400056,600008,600056
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,400042,400098,500042,500098
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,86,500015,500086
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,700024,700083,900024,900083
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,100003,100047,800003,800047
注意上面列表中的记录号。缺少 8,因为数字 400076 已存在于先前的记录中。
我用来执行此操作的代码如下:
void Make_List(ConfigList *pathgroups, ConfigList *configlist)
{
int i,j,k,l,flag,pg_num=0,len,p_num=0;
for(i = 0;i<configlist->num_total;i++)
{
flag = 0;
for(j = configlist->configsize-1;j>=0;j--)
{
if(configlist->pathid[i][j])
{
for(k = 0;k<pg_num;k++)
{
for(l = pathgroups->configsize-1;l>=0;l--)
{
if(pathgroups->pathid[k][l])
{
if(configlist->pathid[i][j]==pathgroups->pathid[k][l])
{
flag++;
break;
}
}
else
{
break;
}
}
if(flag)
{
break;
}
}
}
else
{
break;
}
if(flag)
{
break;
}
}
if(!flag)
{
len = 0;
for(j = configlist->configsize-1;j>=0;j--)
{
pathgroups->pathid[pg_num][j]=configlist->pathid[i][j];
if(configlist->pathid[i][j])
{
len++;
}
}
pg_num++;
p_num+=len;
if(p_num>=totpaths)
{
break;
}
}
}
Print_ConfigList(stderr,pathgroups);
}
ConfigList 结构基本上存储了二维数组以及程序不同部分中使用的其他内容。
num_total 告诉我们数组中的行数,而configsize 告诉我们数组中的列数。
totpaths 是一个断点,它会在分配完全完成时提前终止循环。
【问题讨论】:
-
@MBaas 抱歉,我应该提到。我正在使用 C
-
我看到的最大数字是 900083。允许的最大数字是多少?
-
@user3386109 99999999 是允许的最大数字
-
巴勃罗打败了我。创建一个 100000000 字节的数组。使用
memset将数组清0。在处理记录时,设置每个数字对应的字节。这样就很容易检查哪些号码已被使用。 -
@user3386109 我不明白你在说什么
标签: c arrays optimization bit-manipulation