【发布时间】:2013-10-25 15:28:07
【问题描述】:
我想使用原生 R 创建一个文档术语矩阵(无需额外的插件,例如 tm)。数据结构如下:
Doc1: the test was to test the test
Doc2: we did prepare the exam to test the exam
Doc3: was the test the exam
Doc4: the exam we did prepare was to test the test
Doc5: we were successful so we all passed the exam
我想要达到的目标如下:
Term Doc1 Doc2 Doc3 Doc4 Doc5 DF
1 all 0 0 0 0 1 1
2 did 0 1 0 1 0 2
3 exam 0 2 1 1 1 4
4 passed 0 0 0 0 1 1
【问题讨论】:
-
您可以查看
tm包中的源代码...并重写它...您为什么不想使用现成的工具? -
我会先查看已经存在的函数的源代码。