【发布时间】:2015-11-23 23:30:33
【问题描述】:
我正在开发一个程序,该程序将根据候选人在总统辩论中所说的话创建一个词云。文本文件的设置方式一个人可以说多行,我想记录所有这些行,这样我就可以计算他们说的话的频率。还有一个stop words 的列表不会被计算在词云中。 stop words 的一些示例是:“is”、“a”、“the”等。到目前为止,我已经能够接收所有stop words 和整个辩论记录,并从记录中删除stop words。现在我想将成绩单分成每个候选人所说的内容,但我遇到了麻烦,因为一个人代表多行。一些帮助将不胜感激。
到目前为止的代码:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
public class ResendizYonzon {
public static void main(String[] args) throws FileNotFoundException
{
readTextFile("democratic-debate2015Oct13.txt");
}
public static String readTextFile(String text) throws FileNotFoundException {
File f = new File(text);
Scanner out = new Scanner(f);
String word = "";
File f1 = new File("stopwords.txt");
Scanner out1 = new Scanner(f1);
ArrayList<String> stopWords = new ArrayList<String>();
ArrayList<String> words = new ArrayList<String>();
while (out1.hasNext()) {
stopWords.add(out1.next());
}
while (out.hasNext()) {
words.add(out.next());
}
words.removeAll(stopWords);
out.close();
out1.close();
return word;
}
}
成绩单sn-p:
CLINTON: No. I think that, like most people that I know, I have a range of views, but they are rooted in my values and my experience. And I don't take a back seat to anyone when it comes to progressive experience and progressive commitment.
You know, when I left law school, my first job was with the Children's Defense Fund, and for all the years since, I have been focused on how we're going to un-stack the deck, and how we're gonna make it possible for more people to have the experience I had.
You know, to be able to come from a grandfather who was a factory worker, a father who was a small business person, and now asking the people of America to elect me president.
COOPER: Just for the record, are you a progressive, or are you a moderate?
CLINTON: I'm a progressive. But I'm a progressive who likes to get things done. And I know...
(APPLAUSE)
...how to find common ground, and I know how to stand my ground, and I have proved that in every position that I've had, even dealing with Republicans who never had a good word to say about me, honestly. But we found ways to work together on everything from...
COOPER: Secretary...
CLINTON: ...reforming foster care and adoption to the Children's Health Insurance Program, which insures...
COOPER: ...thank you...
CLINTON: ...8 million kids. So I have a long history of getting things done, rooted in the same values...
COOPER: ...Senator...
CLINTON: ...I've always had.
COOPER: Senator Sanders. A Gallup poll says half the country would not put a socialist in the White House. You call yourself a democratic socialist. How can any kind of socialist win a general election in the United States?
SANDERS: Well, we're gonna win because first, we're gonna explain what democratic socialism is.
And what democratic socialism is about is saying that it is immoral and wrong that the top one-tenth of 1 percent in this country own almost 90 percent - almost - own almost as much wealth as the bottom 90 percent. That it is wrong, today, in a rigged economy, that 57 percent of all new income is going to the top 1 percent.
That when you look around the world, you see every other major country providing health care to all people as a right, except the United States. You see every other major country saying to moms that, when you have a baby, we're not gonna separate you from your newborn baby, because we are going to have - we are gonna have medical and family paid leave, like every other country on Earth.
Those are some of the principles that I believe in, and I think we should look to countries like Denmark, like Sweden and Norway, and learn from what they have accomplished for their working people.
(APPLAUSE)
【问题讨论】:
-
您面临的问题是什么?有什么错误吗?
-
没有错误,只是想弄清楚如何接受一位特定候选人所说的一切。例如,如果用户想要希拉里,那么我只需要希拉里所说的一切。
-
嗯,对不起,但我不敢相信你不知道如何解决这个问题。你怎么知道哪个文字来自克林顿?提示:当说话人改变时,行首有说话人的名字。首先确定说话人的名字。然后在一个名字之后直到下一个名字或文件末尾的每个文本都与这个扬声器相关联。
-
我知道。如果一切都在一条线上,那将很容易。但是一个人会说多条台词,而我在弄清楚如何使用所有这些台词时遇到了麻烦。
-
也许最好逐行阅读文本。使用带有readLine 方法的BufferedReader 而不是
Scanner。然后在空白处拆分行并查找第一个单词。如果全部大写并以冒号结尾,则为说话人姓名。以下单词与此扬声器相关联。阅读下一行。如果第一个词不是演讲者姓名,则将这些词添加到最后一个演讲者,依此类推。