【问题标题】:Select multiple lines from text file从文本文件中选择多行
【发布时间】:2015-11-23 23:30:33
【问题描述】:

我正在开发一个程序,该程序将根据候选人在总统辩论中所说的话创建一​​个词云。文本文件的设置方式一个人可以说多行,我想记录所有这些行,这样我就可以计算他们说的话的频率。还有一个stop words 的列表不会被计算在词云中。 stop words 的一些示例是:“is”、“a”、“the”等。到目前为止,我已经能够接收所有stop words 和整个辩论记录,并从记录中删除stop words。现在我想将成绩单分成每个候选人所说的内容,但我遇到了麻烦,因为一个人代表多行。一些帮助将不胜感激。

到目前为止的代码:

import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;

public class ResendizYonzon {
   public static void main(String[] args) throws FileNotFoundException
   {
       readTextFile("democratic-debate2015Oct13.txt");
   }

   public static String readTextFile(String text) throws FileNotFoundException {
       File f = new File(text);
       Scanner out = new Scanner(f);
       String word = "";
       File f1 = new File("stopwords.txt");
       Scanner out1 = new Scanner(f1);
       ArrayList<String> stopWords = new ArrayList<String>();
       ArrayList<String> words = new ArrayList<String>();
       while (out1.hasNext()) {
           stopWords.add(out1.next());
       }
       while (out.hasNext()) {
           words.add(out.next());
       }
       words.removeAll(stopWords);
       out.close();
       out1.close();
       return word;
   }
}

成绩单sn-p:

CLINTON:  No.  I think that, like most people that I know, I have a range of views, but they are rooted in my values and my experience. And I don't take a back seat to anyone when it comes to progressive experience and progressive commitment.
You know, when I left law school, my first job was with the Children's Defense Fund, and for all the years since, I have been focused on how we're going to un-stack the deck, and how we're gonna make it possible for more people to have the experience I had.
You know, to be able to come from a grandfather who was a factory worker, a father who was a small business person, and now asking the people of America to elect me president.
COOPER:  Just for the record, are you a progressive, or are you a moderate?
CLINTON:  I'm a progressive.  But I'm a progressive who likes to get things done.  And I know...
(APPLAUSE)
...how to find common ground, and I know how to stand my ground, and I have proved that in every position that I've had, even dealing with Republicans who never had a good word to say about me, honestly. But we found ways to work together on everything from...
COOPER:  Secretary...
CLINTON:  ...reforming foster care and adoption to the Children's Health Insurance Program, which insures...
COOPER:  ...thank you...
CLINTON:  ...8 million kids.  So I have a long history of getting things done, rooted in the same values...
COOPER:  ...Senator...
CLINTON:  ...I've always had.
COOPER:  Senator Sanders.  A Gallup poll says half the country would not put a socialist in the White House.  You call yourself a democratic socialist.  How can any kind of socialist win a general election in the United States?
SANDERS:  Well, we're gonna win because first, we're gonna explain what democratic socialism is.
And what democratic socialism is about  is saying that it is immoral and wrong that the top one-tenth of 1 percent in this country own almost 90 percent - almost - own almost as much wealth as the bottom 90 percent.  That it is wrong, today, in a rigged economy, that 57 percent of all new income is going to the top 1 percent.
That when you look around the world, you see every other major country providing health care to all people as a right, except the United States.  You see every other major country saying to moms that, when you have a baby, we're not gonna separate you from your newborn baby, because we are going to have - we are gonna have medical and family paid leave, like every other country on Earth.
Those are some of the principles that I believe in, and I think we should look to countries like Denmark, like Sweden and Norway, and learn from what they have accomplished for their working people.
(APPLAUSE)

【问题讨论】:

  • 您面临的问题是什么?有什么错误吗?
  • 没有错误,只是想弄清楚如何接受一位特定候选人所说的一切。例如,如果用户想要希拉里,那么我只需要希拉里所说的一切。
  • 嗯,对不起,但我不敢相信你不知道如何解决这个问题。你怎么知道哪个文字来自克林顿?提示:当说话人改变时,行首有说话人的名字。首先确定说话人的名字。然后在一个名字之后直到下一个名字或文件末尾的每个文本都与这个扬声器相关联。
  • 我知道。如果一切都在一条线上,那将很容易。但是一个人会说多条台词,而我在弄清楚如何使用所有这些台词时遇到了麻烦。
  • 也许最好逐行阅读文本。使用带有readLine 方法的BufferedReader 而不是Scanner。然后在空白处拆分行并查找第一个单词。如果全部大写并以冒号结尾,则为说话人姓名。以下单词与此扬声器相关联。阅读下一行。如果第一个词不是演讲者姓名,则将这些词添加到最后一个演讲者,依此类推。

标签: java text file-io


【解决方案1】:

根据您在问题中对问题的描述以及对问题的评论,您希望将用户给出的所有语音连接成一个,以便当用户要求演讲者i.e. CLINTON您的语音时只需给他们CLINTON 的词性即可。

这很容易实现。哪一个? colon (:) 是您解决此问题的门票。如果您查看输入文件,只要有新的扬声器,该行以扬声器名称开头,后跟冒号。

您需要做的是以下操作列表:

  • 打开文件
  • 逐行读取文件
  • 检查每一行是否包含冒号 (:)
  • 如果该行包含冒号,那么您需要使用冒号作为分隔符来分割该行
  • 假设你在别处没有冒号,分割下面一行

    COOPER:为了记录,你是进步派还是温和派?

为您提供以下标记(假设其他地方没有冒号)

代币[0] = COOPER

Token[1] = 仅作记录,您是进步派还是温和派?

  • 现在您已经获得了两个令牌,请检查您是刚刚开始阅读成绩单文件还是已经有扬声器
  • 如果刚开始阅读文件,那么您就有了第一个扬声器,所以添加他并初始化变量
  • 如果已经有其他演讲者,则添加前一位演讲者(如果尚未添加)并更新他/她的演讲,然后再阅读新的(或重新访问的)演讲者的演讲。

如果您继续上述步骤,则每次遇到演讲者时,您都会为其更新语音并将其添加到哈希映射中,直到到达行尾。

下面是上面的示例代码,并带有完整的注释以帮助您理解它。

   //public static HashSet that stores your speakers.
   public static Map<String, String> speakerSpeech = new HashMap<String, String>(); 

   public static void main(String[] args) throws FileNotFoundException
   {
       readTextFile("C:\\test_java\\transcript.txt");
   }

   public static void readTextFile(String text) throws FileNotFoundException {
       File f = new File(text);

       String line; 
       BufferedReader br; 
       try {
           //open input stream to the path passed as text
           FileInputStream fstream = new FileInputStream(text);

           //open buffered reader using the input stream
           br = new BufferedReader(new InputStreamReader(fstream));

           //String builder used to append speech and lines (String is immutable) 
           StringBuilder speech = new StringBuilder(); 

           // currentSpeaker is used for history. when new speaker is found, we should know who was previous one
           // so we save all the speech that so far we have read
           String currentSpeaker = null; 

           // while loop keeps looping over file line by line and terminates when line == null
           // that is when end of file is reached. 
           while((line=br.readLine()) != null) {

               //if line contains : then it is a line having a speaker, based on structure of your input file
               if(line.contains(":")) {
                   //split the line using colon as seperator gives us 2 values (speaker and sentence) based
                   //on structure of your file
                   String[] chunks = line.split(":");

                   //store the speaker name CLINTON that was chunks[0] because left most value to colon
                   //triming whitespace (leading and trailing if any)
                   String speakerName = chunks[0].trim(); 

                   //condition to check if we just started reading transcripts or already read some
                   if(currentSpeaker == null) {
                       //just started reading transcript file, this is the first speaker ever
                       // assign the speaker to currentSpeaker
                       currentSpeaker = speakerName; 

                       //add the remainder of speech after colon : to the speech StringBuilder 
                       speech.append(chunks[1]); 
                   } else {
                       //else because currentSpeaker is not null, we already have read speakers before
                       //current speaker is old speaker and we are about to scan new speaker so

                       //condition to check if speaker is already added to out list of speakers
                       if(speakerSpeech.containsKey(currentSpeaker)) {
                           //yes speaker is already added in map, then get its previous speechs
                           String previousSpeech = speakerSpeech.get(currentSpeaker); 

                           //re-add the speaker in map and but this time with updated speech
                           //concatenating previous speech with current speech
                           speakerSpeech.put(currentSpeaker, previousSpeech + " >>> " + speech.toString()); 
                       } else {
                           //no speaker is new, then add it to the map with its speech
                           speakerSpeech.put(currentSpeaker, speech.toString()); 
                       }

                       //after storing previous speaker in list, add current speaker for record
                       currentSpeaker = speakerName.trim(); 

                       //initialize speech variable with new speakers speech after : colon
                       speech = new StringBuilder(chunks[1]); 
                   }
               } else {
                   //this else is because line did not have colon : hence, its continuation of speech 
                   // of current speaker, just append to the speech
                   speech.append(line); 
               }
           }

           //because last line == null and loop terminates, we have to add the last speaker's speech to 
           //the list manually. 
           if(speakerSpeech.containsKey(currentSpeaker)) {
               String previousSpeech = speakerSpeech.get(currentSpeaker); 
               speakerSpeech.put(currentSpeaker, previousSpeech + " >>> " + speech.toString()); 
           } else {
               speakerSpeech.put(currentSpeaker, speech.toString()); 
           }

           System.out.println("No. of speakers: " + speakerSpeech.size());
       } catch(Exception ex) {
           //handle error
       }

       //all speakers with their speech one giant string. 
       System.out.println(speakerSpeech.toString());
}

执行上述操作会得到以下输出:

{COOPER=  Just for the record, are you a progressive, or are you a moderate? >>>   Secretary... >>>   ...thank you... >>>   ...Senator... >>>   Senator Sanders.  A Gallup poll says half the country would not put a socialist in the White House.  You call yourself a democratic socialist.  How can any kind of socialist win a general election in the United States?, SANDERS=  Well, we're gonna win because first, we're gonna explain what democratic socialism is.And what democratic socialism is about  is saying that it is immoral and wrong that the top one-tenth of 1 percent in this country own almost 90 percent - almost - own almost as much wealth as the bottom 90 percent.  That it is wrong, today, in a rigged economy, that 57 percent of all new income is going to the top 1 percent.That when you look around the world, you see every other major country providing health care to all people as a right, except the United States.  You see every other major country saying to moms that, when you have a baby, we're not gonna separate you from your newborn baby, because we are going to have - we are gonna have medical and family paid leave, like every other country on Earth.Those are some of the principles that I believe in, and I think we should look to countries like Denmark, like Sweden and Norway, and learn from what they have accomplished for their working people.(APPLAUSE), CLINTON=  No.  I think that, like most people that I know, I have a range of views, but they are rooted in my values and my experience. And I don't take a back seat to anyone when it comes to progressive experience and progressive commitment.You know, when I left law school, my first job was with the Children's Defense Fund, and for all the years since, I have been focused on how we're going to un-stack the deck, and how we're gonna make it possible for more people to have the experience I had.You know, to be able to come from a grandfather who was a factory worker, a father who was a small business person, and now asking the people of America to elect me president. >>>   I'm a progressive.  But I'm a progressive who likes to get things done.  And I know...(APPLAUSE)...how to find common ground, and I know how to stand my ground, and I have proved that in every position that I've had, even dealing with Republicans who never had a good word to say about me, honestly. But we found ways to work together on everything from... >>>   ...reforming foster care and adoption to the Children's Health Insurance Program, which insures... >>>   ...8 million kids.  So I have a long history of getting things done, rooted in the same values... >>>   ...I've always had.}

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2019-12-31
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-01-10
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多