【发布时间】:2018-02-03 08:54:03
【问题描述】:
虽然下面的 C 函数可以很好地验证 URL/FQDN 的任何组合,但它无法验证 IPv4 地址和 IPv6 的速记符号以及某些其他 IPv6 格式地址。
下面的正则表达式可以临时验证 IPv4 地址和 IPv6 地址吗?
int validateURLPhase2(char *url)
{
int status;
regex_t re;
char *regexp = "^((ftp|http|https)://)?([a-z0-9]([-a-z0-9]*[a-z0-9])?\\.)|([0-9].[0-9].[0-9].[0-9])|(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))+((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])$";
if ( regcomp(&re, regexp, REG_EXTENDED|REG_NOSUB|REG_ICASE) != 0 )
{
printf( "Regex has invalidated FQDN 1\n");
return -1;
}
status = regexec(&re, url, (size_t) 0, NULL, 0);
regfree(&re);
if ( status != 0 )
{
printf("Regex has invalidated FQDN 2\n");
return -1;
}
return 0;
}
理想情况下应该接受但失败的有效 URL 格式: http://[2001::1]/abc 正则表达式已使 FQDN 2 无效 验证失败
理想情况下应该被拒绝但成功的无效 URL 格式: http://10.192.1 验证成功
【问题讨论】:
-
正则表达式绝对不是解决这个问题的正确方法。您永远不会在其中获得完整的顶级域列表...