【发布时间】:2014-09-05 21:11:12
【问题描述】:
TCLAP 是一个 C++ 模板化的仅标头库,用于解析命令行参数。
我正在使用 TCLAP 处理多线程程序中的命令行参数:在主函数中读取参数,然后启动多个线程来处理由参数定义的任务(NLP 的一些参数任务)。
我已经开始显示线程每秒处理的字数,我发现如果我将参数硬编码到 main 中而不是使用 TCLAP 从 cli 读取它们,吞吐量是 6 倍更快!
我正在使用带有 -O2 参数的 gcc,我发现在编译期间(当不使用 TCLAP 时)不优化时,速度提高了大约 10 倍......所以似乎使用 TCLAP 以某种方式否定了部分编译器优化的优势。
这是我唯一使用 TCLAP 的主要功能,如下所示:
int main(int argc, char** argv)
{
uint32_t mincount;
uint32_t dim;
uint32_t contexthalfwidth;
uint32_t negsamples;
uint32_t numthreads;
uint32_t randomseed;
string corpus_fname;
string output_basefname;
string vocab_fname;
Eigen::initParallel();
try {
TCLAP::CmdLine cmd("Driver for various word embedding models", ' ', "0.1");
TCLAP::ValueArg<uint32_t> dimArg("d","dimension","dimension of word representations",false,300,"uint32_t");
TCLAP::ValueArg<uint32_t> mincountArg("m", "mincount", "required minimum occurrence count to be added to vocabulary",false,5,"uint32_t");
TCLAP::ValueArg<uint32_t> contexthalfwidthArg("c", "contexthalfwidth", "half window size of a context frame",false,15,"uint32_t");
TCLAP::ValueArg<uint32_t> numthreadsArg("t", "numthreads", "number of threads",false,12,"uint32_t");
TCLAP::ValueArg<uint32_t> negsamplesArg("n", "negsamples", "number of negative samples for skipgram model",false,15,"uint32_t");
TCLAP::ValueArg<uint32_t> randomseedArg("s", "randomseed", "seed for random number generator",false,2014,"uint32_t");
TCLAP::UnlabeledValueArg<string> corpus_fnameArg("corpusfname", "file containing the training corpus, one paragraph or sentence per line", true, "corpus", "corpusfname");
TCLAP::UnlabeledValueArg<string> output_basefnameArg("outputbasefname", "base filename for the learnt word embeddings", true, "wordreps-", "outputbasefname");
TCLAP::ValueArg<string> vocab_fnameArg("v", "vocabfname", "filename for the vocabulary and word counts", false, "wordsandcounts.txt", "filename");
cmd.add(dimArg);
cmd.add(mincountArg);
cmd.add(contexthalfwidthArg);
cmd.add(numthreadsArg);
cmd.add(randomseedArg);
cmd.add(corpus_fnameArg);
cmd.add(output_basefnameArg);
cmd.add(vocab_fnameArg);
cmd.parse(argc, argv);
mincount = mincountArg.getValue();
dim = dimArg.getValue();
contexthalfwidth = contexthalfwidthArg.getValue();
negsamples = negsamplesArg.getValue();
numthreads = numthreadsArg.getValue();
randomseed = randomseedArg.getValue();
corpus_fname = corpus_fnameArg.getValue();
output_basefname = output_basefnameArg.getValue();
vocab_fname = vocab_fnameArg.getValue();
}
catch (TCLAP::ArgException &e) {};
/*
uint32_t mincount = 5;
uint32_t dim = 50;
uint32_t contexthalfwidth = 15;
uint32_t negsamples = 15;
uint32_t numthreads = 10;
uint32_t randomseed = 2014;
string corpus_fname = "imdbtrain.txt";
string output_basefname = "wordreps-";
string vocab_fname = "wordsandcounts.txt";
*/
string test_fname = "imdbtest.txt";
string output_fname = "parreps.txt";
string countmat_fname = "counts.hdf5";
Vocabulary * vocab;
vocab = determineVocabulary(corpus_fname, mincount);
vocab->dump(vocab_fname);
Par2VecModel p2vm = Par2VecModel(corpus_fname, vocab, dim, contexthalfwidth, negsamples, randomseed);
p2vm.learn(numthreads);
p2vm.save(output_basefname);
p2vm.learnparreps(test_fname, output_fname, numthreads);
}
使用多线程的唯一地方是 Par2VecModel::learn 函数:
void Par2VecModel::learn(uint32_t numthreads) {
thread* workers;
workers = new thread[numthreads];
uint64_t numwords = 0;
bool killflag = 0;
uint32_t randseed;
ifstream filein(corpus_fname.c_str(), ifstream::ate | ifstream::binary);
uint64_t filesize = filein.tellg();
fprintf(stderr, "Total number of in vocab words to train over: %u\n", vocab->gettotalinvocabwords());
for(uint32_t idx = 0; idx < numthreads; idx++) {
randseed = eng();
workers[idx] = thread(skipgram_training_thread, this, numthreads, idx, filesize, randseed, std::ref(numwords));
}
thread monitor(monitor_training_thread, this, numthreads, std::ref(numwords), std::ref(killflag));
for(uint32_t idx = 0; idx < numthreads; idx++)
workers[idx].join();
killflag = true;
monitor.join();
}
这部分根本不涉及TCLAP,那是怎么回事? (我也在使用 c++11 功能,所以有 -std=c++11 标志,如果有区别的话)
【问题讨论】:
-
没有看到你的任何代码,这是不可能的。
标签: c++ multithreading compiler-optimization