【发布时间】:2014-12-11 11:34:36
【问题描述】:
我正在尝试使用最新版本的 Mahout 对一组文档进行主题分析。
主题到术语映射的输出是正确的,每个主题都有具有相应概率的术语列表。
但是当我尝试获取文档到主题的映射时,它只显示一组以某个字母开头的主题。就像在这种情况下所有以字母 a 开头的主题
以下是用于生成文档主题映射的示例代码:
VectorDumper.main(new String[]
{
"-i" , inputDocTopicsDir
, "-o", oututDocTopicsDir
, "-d", inputDictionaryDir
, "-dt", "sequencefile"
, "-sort", "true"
, "-vs", "10" });
示例输出: {2D:0.019996671414880783,3d:0.019994853350969108,4d:0.02000171234917903,5d:0.019994290328033588,a.config:0.01999309367417373,又名:0.02000227944902019,a.system:0.01999771644223781,AAA:0.020003361639812457,AAM:0.019990182999365072,AAPM:0.020012465032122083,AAPV:0.01999879522431889,AAR :0.019995543474585993,AAS:0.019995157547471696,AAV:0.02000267326012652,AB:0.020025978185034182,ABA:0.01999553819903237,放弃:0.020013355238553677,弃:0.01999559962237951,遗弃:0.019994194616256,退让:0.02001433184497984,减污:0.01997728075793184,abberationa:0.020001189392395737}
【问题讨论】:
标签: java hadoop cluster-analysis mahout