【发布时间】:2016-01-27 16:57:14
【问题描述】:
我有一个 Java 代码,它将从包含多个句子的字符串中提取一个唯一的单词,并计算每个句子中单词的出现次数。
这是用于实现此目的的 Java 编码。或者,您可以尝试here。
import java.util.*;
class Main {
public static void main(String[] args) {
String someText = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";
List<List<String>> sort = new ArrayList<>();
Map<String, ArrayList<Integer>> res = new HashMap<>();
for (String sentence : someText.split("[.?!]\\s*"))
{
sort.add(Arrays.asList(sentence.split("[ ,;:]+"))); //put each sentences in list
}
int sentenceCount = sort.size();
for (List<String> sentence: sort) {
sentence.stream().forEach(s -> res.put(s, new ArrayList<Integer>(Collections.nCopies(sentenceCount, 0))));
}
int index = 0;
for (List<String> sentence: sort) {
for (String s : sentence) {
res.get(s).set(index, res.get(s).get(index) + 1);
}
index++;
}
System.out.println(res);
}
}
代码的输出是这样的:
{standard=[0, 1, 0, 0], but=[0, 0, 1, 0], ..... }
这意味着“标准”这个词没有出现在第 1 句中,在第 2 句中出现了 1 次,在第 3 和第 4 句中没有出现。
但是,数据在列表中。我如何将数据转换成二维矩阵的形式,使其变得有点像这样:
double[][] multi = new double[][]{
{ 0, 1, 0, 0 },
{ 0, 0, 1, 0 },
{ 0, 1, 0, 0 },
{ 0, 0, 1, 0 },
{ 0, 0, 1, 0 } } //data stored in a 2D array named multi
感谢您对此的帮助。谢谢。
【问题讨论】:
-
请记住,这将丢失有关哪一行对应于哪个单词的信息。 HashMap 不保证顺序。
标签: java arrays matrix multidimensional-array