比较同一文件中的数据行答案

【问题标题】：Comparing lines of data in the same file比较同一文件中的数据行
【发布时间】：2016-02-26 14:08:28
【问题描述】：

我目前正在为我的 CS 入门课程做一个项目。我们对 C++ 还很陌生，并且使用诸如 while 和 for 循环以及文件流之类的基本概念。下面的问题应该可以在不借助数组、向量或函数等高级特性的情况下得到解决。

基本上，我采用一个包含学生和课程数据的文本文件（文件一）并创建一个新文件。文件一（我从中输入数据的地方）有 6k 行。下面是一个例子：

20424297    1139    CSCI       16000    W   -1  3.00    RNL 
20424297    1142    PSYCH      18000    W   -1  3.00    RLA 
20424297    1142    PSYCH      22000    W    -1 3.00    RLA 
20608974    1082    ENGL       12000    A-  3.7 3.00    RECR    
20608974    1082    HIST       15200    B+  3.3 3.00    FUSR    
20608974    1082    PHILO      10100    A+  4   3.00    FISR

看到第一列了吗？每组唯一的数字代表一个学生（也称为 eiD）。文件一是学生上过的每门课的巨大清单，包括他们获得的科目、课程和成绩。

这个项目的重点是创建一个新的文本文件，用于总结每个学生的 GPA。那部分我相当有信心我能弄清楚（以累积的 GPA 数据）。让我感到困惑的是我应该如何将文件中的行相互比较。

我的教授确实通过将所有数据按学生分组在一起使事情变得简单。这减轻了我的负担。我基本上要逐行浏览这个文件，并与下一行进行比较，看它是否具有相同的学号。

我的第一个想法是创建一系列嵌套的 while 循环。只要正在读取数据，第一个循环就会处于活动状态。我的下一个倾向是在另一个循环中重复这个。然后，我将创建变量来保存前一行的学生 ID 号和当前行的学生 ID 号，根据它们是否相同来创建有效的条件：

   while (sdStream2 >> eiD_2 >> semester_2 >> subject_2 >> coursenumSD_2 >> grade_2 >> gpa_2 >> courseHours2 >> code_2) // This loop will keep running until there's no data left
   {
      string eiD_base = eiD_2; // eiD_base was the variable I made to hold the "previous" student's ID, for comparison to the next line
      while (sdStream2 >> eiD_2 >> semester_2 >> subject_2 >> coursenumSD_2 >> grade_2 >> gpa_2 >> courseHours2 >> code_2) // This loop unfortunately reads the entire file, defeating its intent
      {
         string eiD_temp = eiD_2; // eiD_temp was the variable I made to hold the current student ID, for comparison
         if (eiD_base == eiD_temp)
         {

            outputStream2 << "Same line :( " << endl; 
         }
         else
         {
            outputStream2 << eiD_2 << endl; // this is where you post the student data from the previous line!
         }   
      }
   }

在编译和运行上面的代码之后，我意识到这种方法行不通，因为第二个嵌套循环将遍历 FILE ONE 中的每一行而不触及第一个循环。我最终想出了另一种使用计数器的方法：

  // NOTE: The logic of the below code is as follows:
  // Create a counter to note what the first student ID is.
  // Store that value in eiD_Base when counter = 0. Increment counter.
  // Now change eiD_Base everytime you find a line where eiD_temp
  // differs from eiD_base. 

  string eiD_base;
  string eiD_temp;
  int counter = 0; // counter to help figure out what the first student ID was
   while (sdStream2 >> eiD_2 >> semester_2 >> subject_2 >> coursenumSD_2 >> grade_2 >> gpa_2 >> courseHours2 >> code_2)
   {
      eiD_temp = eiD_2;
      if (counter == 0)
      {
        eiD_base = eiD_2; // basically, set the first student ID to eiD_base when counter is 0. This counter is incremented only once. 
        counter++;
      }

      if (eiD_base == eiD_temp)
      {
        outputStream2 << "Same ID:  " << eiD_2 << endl; 

    // NOTE: This is my first instinct as to where the code for calculating GPAs should go. 
    // The problem is that if that if the code is here, how do I factor in GPA data 
    // from a line that doesn't meet (eiD_base == eiD_temp)? I feel like that data would
    // be jettisoned from calculations.

      }
      else
      {
        outputStream2 << "Previous ID: " << eiD_base << " and this is what eiD is now is now: " << eiD_temp <<  endl; // This is my first instict for 
        eiD_base = eiD_2; // if eiD_base !== eiD_temp, have eiD_base reset here. 
      }   

   }

这似乎更接近我所需要的。然而，我注意到另一个问题。使用这种方法，当我创建的用于记录学生 ID 变化的变量（eiD_base 和 eiD_temp）在一行数据上不相等时，似乎该行被丢弃了。鉴于我需要为每个学生计算 GPA 数据之类的一些东西，因此使用一种不允许为不同学生的第一行累积数据的方法并不是一个好的解决方案。

我不知道我是否应该完全放弃 counter 方法（在这种情况下，我会欢迎有关如何最好地替换它的建议），或者我的 counter 方法是否可以通过更有策略地放置用于计算 GPA 的代码来实现。任何见解或帮助都将受到欢迎！

【问题讨论】：

太长了；没读过。很抱歉，我不想粗鲁，但你要求我们阅读太多文本来解决一个问题，我认为这些问题可能会在几个结构良好的句子中暴露出来。请阅读How to Ask 和minimal reproducible example。一旦您edited 您的问题包括一个简短的问题和一个最小的示例，我们将很乐意为您提供帮助；)。
那么你基本上想要做的是：逐行遍历文件，只要学生id相同，将GPA分数相加？这应该在一个逐行遍历文件内容的循环中是可行的。您需要一个保存当前学生 ID 和 GPA 总和的变量。然后对于一个新行，您比较 ids，如果相同，则 sum 如果不输出结果，则将 sum 和 id 设置为当前值，冲洗重复。
YSC：我会尝试修改它，使其更简洁、更中肯，谢谢！ LiMuBei，谢谢你的建议-不幸的是，我必须计算多个 GPA（主要 gpa，仅适用于 comp sci 课程的 gpa），并且我还需要确保 GPA 根据课程时间进行相应加权。但我会检查你的建议，看看是否可行。
@LiMuBei：回答这个问题。（它是答案）。
回到我的代码后，我不得不同意 Martin 的观点，并说我认为 LiMuBei 完全命名了它。谢谢！不过，我不确定如何将您的评论标记为答案。

标签： c++ file

【解决方案1】：

我的回答风格是我尝试关注：https://meta.stackexchange.com/questions/10811/how-do-i-ask-and-answer-homework-questions

您的一个问题是，您不知道是否应该完全放弃 counter 方法（在这种情况下，您会欢迎有关如何最好地替换它的建议），或者您的 counter 方法是否可以通过放置代码进行计算GPA 更具战略性。

对于前者，李牧北已经提到了方法。当您计算多个 GPA（主要 gpa，仅适用于 comp sci 类的 gpa）时，您需要将多个 GPA 与多个变量相加。

对于后者，您需要考虑在每个 if/while 语句中改变场景的未知元素。 (counter == 0) 是第一行的场景。 (eiD_base == eiD_temp) 是第一行的场景，至少有2行的场景，当前行与前一行具有相同的ID。 (eiD_base != eiD_temp) 是至少有 2 行时的场景，当前行的 ID 与前一行不同。以下是未知元素：{1 行，至少 2 行}，{sameID, differentID}。当未知元素为 {1 line} 时，您必须修改 (counter == 0) 和 (eiD_base == eiD_temp)。在 (counter == 0) 中，您修改适用于第一行和唯一 1 行的代码。在 (eiD_base == eiD_temp) 中，适用于 {1 line} 和 {at least 2 lines}，{sameID}，代码必须适用于 2 个场景。

对于完整的解决方案，您将在 while 循环之前声明变量，在 (eiD_base == eiD_temp) 中聚合变量，打印前一个 ID 的 GPA 值并为新学生的第一行设置变量(eiD_base != eiD_temp)，并在 while 循环后打印最后一个 ID 的 GPA 值。

double csci_Grape_Point;
// more variables for doing the calculation

while (sdStream2 >> eiD_2 >> semester_2 >> subject_2 >> coursenumSD_2 >> grade_2 >> gpa_2 >> courseHours2 >> code_2) {
  eiD_temp = eiD_2;
  if (counter == 0)
  {
    eiD_base = eiD_2; 
    counter++;
    csci_Grape_Point = 0.0;
    // more initialization of variables for doing the calculation
  }
  if (eiD_base == eiD_temp)
  {
    csci_Grape_Point = csci_Grape_Point + (gpa_2 * courseHours2);
    // more sum calculation, such as total csci credit hours 
  }
  else
  {
    outputStream2 << "Previous ID: " << eiD_base << " and this is what eiD is now is now: " << eiD_temp <<  endl;
    eiD_base = eiD_2;

    // for the previous ID, calculate gpa for just comp sci classes
    // for the previous ID, calculate more gpa's

    // set the variable to include the first line of data of a new student
    csci_Grape_Point = (gpa_2 * courseHours2);
    // set more variables for doing the calculation
  }
}
// for the last ID, calculate gpa for just comp sci classes
// for the last ID, calculate more gpa's

您的另一个问题是关于 (eiD_base == eiD_temp) 中的数据计算。当一行不相交时（eiD_base == eiD_temp），当前行与前一行不同。您从汇总的数据 (eiD_base == eiD_temp) 和为新学生的第一行设置的数据 (eiD_base != eiD_temp) 中考虑 GPA 数据。

你可能想先解决一个更简单的问题，用一个有 1 行和 2 行的文件，如果这个问题对你来说不容易解决，并且你想在编程方面做得很好。

【讨论】：

这是对我之前阅读的所有内容的极好综合，并带有额外的见解。非常感谢你。