今天写pig脚本时,范了个低级错误,在awk中使用了sub作为变量名,结果执行pig脚本总报错

2.txt文件有两列内容

256;005;006;578,005

005;006,007,259

007;598,007

功能要求:从第一列中匹配第二列的内容,匹配到的输出

--*********************************************************************

a = load '2.txt' using PigStorage(',') as (c1:chararray,c2:chararray);
b = stream a through `awk '
{
	sub1=$1;
	subc=$2;
	if(index(sub1,subc) > 0)
	{
		print $1"\t"$2;
	}
}'`;
dump b;

a = load '2.txt' using PigStorage(',') as (c1:chararray,c2:chararray);
b = stream a through `awk '
{
	sub1=$1;
	subc=$2;
	if(sub1 == "NA" || match(sub1,subc))
	{
		print $1"\t"$2;
	}
}'`;
dump b;

 

遇到分号,在pig stream awk中要用ASCII码

g = STREAM f1 THROUGH `awk '
{
	if(index($2,",") > 0)
	{
		split($2,arrjob,",");
	}
	else
	{
		split($2,arrjob,"\\x3B");
	}
	
	result="";
	for(i=1;i<=length(arrjob);i++)
	{
		split(arrjob[i],arrId,"_");
		result=$1"\t"arrId[1];
		print result;
	}	
}'`;

正则匹配举例,$4必须是数字
b8 = STREAM b71 THROUGH `awk '
{
	result=""
	if(length($4) == 15 && match($4,"^[0-9]+$"))
	{
		result=$1"\t"$2"\t"$3"\t1"
	}
	else if(length($4) == 20 && match($4,"^[0-9]+$"))
	{
		result=$1"\t"$2"\t"$3"\t"substr($4,1,9)"_"substr($4,10)
	}
	if(length(result) > 0)
		print result;
}'`;

  

 

 替换字符串中的"{}()"

a = load '1.txt' as (data:chararray);
b = stream a through `awk '
{
u=$1
gsub("([{]|[}]|[\)]|[\(])*","",u)
print u
}'`;
dump b;

  

 

  

相关文章:

  • 2022-12-23
  • 2021-06-27
  • 2022-12-23
  • 2022-12-23
  • 2021-11-05
  • 2021-06-02
  • 2021-12-18
  • 2022-12-23
猜你喜欢
  • 2022-12-23
  • 2021-06-06
  • 2021-08-06
  • 2022-02-08
  • 2021-11-23
  • 2022-12-23
  • 2021-07-13
相关资源
相似解决方案