在文件中查找一行并提取信息答案

【问题标题】：Find a line in a file and extract informations在文件中查找一行并提取信息
【发布时间】：2015-10-19 21:45:58
【问题描述】：

我必须在文本文件中找到以关键字开头的特定行，然后我必须分析该行以提取信息。我举个例子说清楚：

processor       : 0 
vendor_id       : GenuineIntel 
cpu family      : 6 
model           : 5 
model name      : Pentium II (Deschutes) 
stepping        : 2 
cpu MHz         : 400.913520 
cache size      : 512 KB 
fdiv_bug        : no 
hlt_bug         : no

这是文本文件（来自 Linux 的 /proc/cpuinfo）。我必须编写一个函数来解析文件，直到找到“模型名称：”，然后它必须将信息“Pentium II (Deschutes)”存储在一个字符数组中。这是我到现在为止的编码：

int get_cpu(char* info)
{
    FILE *fp; 
    char buffer[1024]; 
    size_t bytes_read; 
    char *match; 

    /* Read the entire contents of /proc/cpuinfo into the buffer.  */ 
    fp = fopen("/proc/cpuinfo", "r"); 

    bytes_read = fread(buffer, 1, sizeof (buffer), fp); 

    fclose (fp); 

    /* Bail if read failed or if buffer isn't big enough.  */ 
    if (bytes_read == 0 || bytes_read == sizeof (buffer)) 
        return 0; 

    /* NUL-terminate the text.  */ 
    buffer[bytes_read] == '\0'; 

    /* Locate the line that starts with "model name".  */ 
    match = strstr(buffer, "model name"); 

    if (match == NULL) 
        return 0; 

    /* copy the line */
    strcpy(info, match);
}

说缓冲区总是不够大......

【问题讨论】：

“它说缓冲区总是不够大”。那是因为它不够大。只需手动运行cat /proc/cpuinfo > output 并查看output 文件的大小。你发现了什么？
在我的系统上这是一个非常大的文档...我应该编写一个可以在每个 Linux 系统上运行的程序
@kaylum useless use of cat 检测到 *scnr*（仍然 +1）[解释：如果你想用 ls 进行检查，一个简单的 cp 就可以了 - - 最好使用wc -c]

标签： c stdio string.h

【解决方案1】：

超越/proc/cpuinfo 通常大于 1024 字节这一简单事实：

> wc -c </proc/cpuinfo
3756

当然，你的缓冲区太小，无法一次读取整个文件...

您在这里尝试的是处理一个文本文件，而这样做的自然方法是逐行。

试试类似的东西

（编辑：终于用测试过的代码替换了整个东西......这不是那么容易得到strtok()对......呵呵）

#include <stdio.h>
#include <string.h>

int main(void)
{
    char buf[1024];
    char *val = 0;
    FILE *fp = fopen("/proc/cpuinfo", "r");
    if (!fp)
    {
        perror("opening `/proc/cpuinfo'");
        return 1;
    }

    while (fgets(buf, 1024, fp))        /* reads one line */
    {
        char *key = strtok(buf, " ");   /* gets first word separated by space */
        if (!strcmp(key, "model"))
        {
            key = strtok(0, " \t");     /* gets second word, separated by
                                         * space or tab */
            if (!strcmp(key, "name"))
            {
                strtok(0, " \t");         /* read over the colon */
                val = strtok(0, "\r\n");  /* read until end of line */
                break;
            }
        }
    }

    fclose(fp);

    if (val)
    {
        puts(val);
    }
    else
    {
        fputs("CPU model not found.\n", stderr);
    }
    return 0;
}

用法：

> gcc -std=c89 -Wall -Wextra -pedantic -o cpumodel cpumodel.c
> ./cpumodel
AMD A6-3670 APU with Radeon(tm) HD Graphics

【讨论】：

我曾经像你一样使用 NULL 指针为 0 的隐式测试，以及 strtok() 中的显式使用，直到我遇到诸如this SO answer之类的文章。 stackoverflow.com/questions/9894013/is-null-always-zero-in-c
@WeatherVane NULL 不需要是 0（所有位都为零），但在指针上下文中 0 表示 NULL 指针，无论其表示形式如何, 并且 NULL 指针始终在布尔上下文中评估为 false ...因此“传统”方式是安全的。
@WeatherVane 话虽如此，我只是更喜欢这里的“简洁”版本。这是一个风格问题，NULL 当然具有更具表现力的好处。
我对 int 使用隐含作为布尔值（我讨厌正式类型）表示你的立场 - 例如，如果我有任何 if(apples) 将是真的 - 甚至欠一些！
@WeatherVane 不完全确定你的目标是什么，但我猜你的意思是在布尔上下文中，只有 0 是 false，其他都是 true - 导致非常像if (!(strcmp(...)) 这样的奇怪结构，用于测试“字符串”的相等性。

【解决方案2】：

请试试这个，它有效，有不同的方法可以做到。

#include <usual.h>

int get_cpu( char *info )
{
  FILE *fp;
  char buffer[1024];
  size_t bytes_read;
  char *match;
  char *matchend;
  /* Read the entire contents of /proc/cpuinfo into the buffer.  */
  fp = fopen( "/proc/cpuinfo", "r" );
  bytes_read = fread( buffer, 1, sizeof( buffer ), fp );
  fclose( fp );
  /* Bail if read failed or if buffer isn't big enough.  */
  if ( bytes_read == 0 || bytes_read == sizeof( buffer ) )
    return 0;
  /* NUL-terminate the text.  */
  buffer[bytes_read] == '\0';
  // match=buffer;
  /* Locate the line that starts with "model name".  */
  match = strstr( buffer, "model name" );
  if ( match == NULL )
    return 0;
  /* copy the line */
  strncpy( info, match, 41 );
}

int main(  )
{
  char info[255];
  memset( info, '\0', 255 );
  get_cpu( info );
  printf( "\nthe data we extracted: %s ", info );
  getchar(  );
}

【讨论】：

usual.h 到底是什么？这个 not 将如何以与问题中的代码相同的方式失败？
那是我通常包含的地方，所以它变成了：通常，更简单。现在你可以编译和运行了。它有效。
不，只要/proc/cpuinfo 大于 1024 字节，它就不会。与问题中的问题相同。
顺便说一句，这样做真的是一个坏主意 ... 明确地使用#includes。准确包含您需要的内容，不多也不少。
我拿了你展示的文件，它有 10 行，对于那个文件，它可以工作。如果您的真实文件更大，您将需要更多空间。