警告：数组下标的类型为 char答案

【问题标题】：Warning: array subscript has type char警告：数组下标的类型为 char
【发布时间】：2021-03-19 00:33:31
【问题描述】：

当我运行这个程序时，我收到警告“数组下标的类型为 'char'”。请帮助我哪里出错了。我正在使用 code::blocks IDE

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
void NoFive()
{
    long long int cal;
    char alpha[25];
    char given[100] = "the quick brown fox jumped over the cow";
    int num[25];
    int i, k;
    char j;
    j = 'a';
    k = 26;
    cal = 1;
    for(i = 0; i <= 25; i++)
    {
        alpha[i] = j++;
        num[i] = k--;
      //  printf("%c = %d \n", alpha[i], num[i]);
    }
    for(i = 0; i <= (strlen(given) - 1); i++)
    {
        for(j = 0; j <= 25; j++)
        {
         if(given[i] == alpha[j]) ***//Warning array subscript has type char***
         {
            cal = cal * num [j]; ***//Warning array subscript has type char***
         }
         else
         {

         }
        }
    }
printf(" The value of cal is %I64u ", cal);
}

main()
{
NoFive();
}

【问题讨论】：

gcc.gnu.org/onlinedocs/gcc/Warning-Options.html 将阐明为什么这是一个警告。
for(i = 0; i <= 25; i++) 也是错误的（两次）。应该是 for(i = 0; i < 25; i++) {...} 数组有 25 个元素。而for(i = 0; i <= (strlen(given) - 1); i++) 值得商榷。
@ta.speot.is 不幸的是，GCC 文档没有任何说明为什么。它甚至没有试图解释这种情况。
@RolandIllig 它说 如果数组下标的类型为 char 则发出警告。这是错误的常见原因，因为程序员经常忘记这种类型是在某些机器上签名的。此警告由 -Wall 启用。 为什么要使用负下标？
@ta.speot.is 我不想要负下标，我隐含地得到它而不做任何事情。这就是问题所在。

标签： c codeblocks

【解决方案1】：

简单，改变

char j;

到

unsigned char j;

或者只是一个普通的(u)int

unsigned int j;
int j;

来自GCC Warnings

-Wchar-subscripts 如果数组下标的类型为 char，则发出警告。这是一个常见的错误原因，因为程序员经常忘记这种类型是在某些机器上签名。此警告由 -Wall 启用。

编译器不希望您无意中指定负数组索引。因此发出警告！

【讨论】：

使用int 类型的数组索引确实不会导致任何警告，尽管它也会允许负索引...@Pavan Manjunath
@alk 啊。这是一个错字。我的意思是unsigned char 而不仅仅是unsigned。无论如何，我只是在强调负指数。尽管如此，我还是编辑了我的帖子，以便将来的访问者清楚:)
@alk：int 和 char 之间的一些区别： (1) 没有任何编译器（至少没有一个甚至不会被认为是远程“正常”），其中int 可能被合理地预期为未签名； (2) 使用char 类型作为数组下标的代码比使用int 类型的代码更有可能假定所有字符文字或字符串文字中的所有字符都表示正值。我不确定“C 字符集”中的所有字符是否都必须是正数，但我知道该集之外的字符不是。
我收到了以下代码的警告： context->ptr[0] = (char)toupper(c);其中“ptr”的类型为“char *”。编译器是否认为 0 是有符号字符，因此可能是负数？
简单地将类型从char 更改为int 或unsigned int 是错误的。阅读有关 <ctype.h> 函数的任何优秀手册以了解详细信息。

【解决方案2】：

这是 GCC 在其诊断中使用过于官僚和间接的措辞的典型案例，这使得人们很难理解这个有用警告背后的真正问题。

// Bad code example
int demo(char ch, int *data) {
    return data[ch];
}

根本问题是C编程语言为“字符”定义了几种数据类型：

char 可以包含“基本执行字符集中的字符”（至少包括 A-Z、a-z、0-9 和几个标点符号）。
unsigned char 至少可以保存 0 到 255 范围内的值。
signed char 至少可以保存 -127 到 127 范围内的值。

C 标准定义char 类型的行为方式与signed char 或unsigned char 相同。实际选择哪种类型取决于编译器和操作系统，并且必须由它们记录。

当arr[index] 表达式访问数组的元素时，GCC 将index 称为下标。在大多数情况下，此数组索引是无符号整数。这是常见的编程风格，如果数组索引为负数，Java 或 Go 等语言会抛出异常。

在 C 中，越界数组索引被简单地定义为调用未定义的行为。编译器不能在所有情况下拒绝负数组索引，因为以下代码完全有效：

const char *hello = "hello, world";
const char *world = hello + 7;
char comma = world[-2];   // negative array index

C 标准库中有一个地方很难正确使用，那就是来自标头<ctype.h> 的字符分类函数，例如isspace。表达式isspace(ch) 看起来好像将一个字符作为其参数：

isspace(' ');
isspace('!');
isspace('ä');

前两种情况都可以，因为空格和感叹号来自基本执行字符集，因此无论编译器是否将char定义为签名或未签名。

但最后一种情况，变音符号'ä'，是不同的。它通常位于基本执行字符集之外。在 1990 年代流行的字符编码 ISO 8859-1 中，字符'ä' 表示如下：

unsigned char auml_unsigned = 'ä';   // == 228
signed   char auml_signed   = 'ä';   // == -28

现在假设isspace 函数是使用数组实现的：

static const int isspace_table[256] = {
    0, 0, 0, 0, 0, 0, 0, 0,
    1, 1, 1, 0, 0, 1, 0, 0,
    // and so on
};

int isspace(int ch)
{
    return isspace_table[ch];
}

这种实现技术很典型。

回到调用isspace('ä')，假设编译器已将char 定义为signed char，并且编码为ISO 8859-1。调用函数时，字符的值为-28，并将此值转换为int，保留该值。

这会产生表达式isspace_table[-28]，它访问数组边界之外的表。这会调用未定义的行为。

编译器警告所描述的正是这种情况。

从<ctype.h> 标头调用函数的正确方法是：

// Correct example: reading bytes from a file
int ch;
while ((ch = getchar()) != EOF) {
    isspace(ch);
}

// Correct example: checking the bytes of a string
const char *str = "hello, Ümläute";
for (size_t i = 0; str[i] != '\0'; i++) {
    isspace((unsigned char) str[i]);
}

还有几种方法看起来非常相似，但都是错误的。

// WRONG example: checking the bytes of a string
for (size_t i = 0; str[i] != '\0'; i++) {
    isspace(str[i]);   // WRONG: the cast to unsigned char is missing
}

// WRONG example: checking the bytes of a string
for (size_t i = 0; str[i] != '\0'; i++) {
    isspace((int) str[i]);   // WRONG: the cast must be to unsigned char
}

以上示例将字符值-28 直接转换为int 值-28，从而导致数组索引为负数。

// WRONG example: checking the bytes of a string
for (size_t i = 0; str[i] != '\0'; i++) {
    isspace((unsigned int) str[i]);   // WRONG: the cast must be to unsigned char
}

本示例将字符值-28 直接转换为unsigned int。假设一个 32 位平台具有通常的二进制补码整数表示，值 -28 通过重复添加 2^32 进行转换，直到该值在 unsigned int 的范围内。在这种情况下，这会导致数组索引 4_294_967_268，它太大了。

【讨论】：

"char 类型等同于signed char 或unsigned char"：char 的行为必须与@987654362 相同，并具有相同的表示形式@ 或 unsigned char，但 char、signed char 和 unsigned char 是 C 中的三种不同类型。“负数组索引被简单地定义为调用未定义的行为。”：数组索引与因为arr[n] 等价于*(arr + n)，所以负值在C 中是完美定义的。这可能导致未定义行为的一种方法是指针算术导致越界访问。
Reported as a GCC bug
我认为如果剩余代码依赖于各自的 ctype 函数来检查 EOF，那么盲目地将参数转换为 unsigned char 可能会引入回归。至少在某些假设的情况下，根据转换产生的值，EOF 可能被 ctype 函数错误地视为一个类的成员。您在示例中优雅地避免了这种可能性，但我认为这可能是一个值得一提的陷阱。此外，提及 ctype 函数的宏实现将清楚为什么编译器实际上会发出此特定警告（？）

【解决方案3】：

请注意，Roland Illig 的解释有些不完整；这些天来，'ä' 甚至可能无法编译（或者它可能编译为不适合字节的东西，但这非常依赖于实现，甚至可能是 UB）。如果您使用 UTF-8，则 "ä" 与 "\xc3\xa4" 相同。

【讨论】：