【发布时间】:2020-12-04 19:29:56
【问题描述】:
我有一个用于分析的大型数据集,我正在寻找 shell 脚本以仅将行过滤到我需要的行,因此我能够加载数据集以在 R 中进行进一步分析。
数据结构如下:
Size,ModifiedTime,AccessTime,contentid
4886,"Jun 11, 2009 06:51:08 PM","Mar 15, 2013 09:24:53 AM",000000285b7925f511b3159a72f80a4a
4096,"Aug 21, 2008 03:54:28 PM","May 12, 2009 04:45:41 PM",0000011afae4d1227c4df57b410ea52c
84848,"Feb 12, 2007 12:40:00 PM","Apr 07, 2014 09:39:03 AM",000001cec02017ca3eb81ddc4cd1c9ff
518,"Aug 22, 2006 02:12:03 PM","Dec 25, 2007 06:48:18 AM",00000233565d1c17c3135a9504c455ca
264158,"Dec 08, 2009 03:28:14 PM","Apr 08, 2013 11:52:15 AM",000003020ba74b9d1b6075d3c1b8fcb3
725963,"Sep 29, 2008 03:45:21 PM","May 17, 2011 08:48:40 AM",0000034b98d29d84ce7b61ee68be7658
1340,"Sep 07, 2011 03:36:54 AM","Mar 12, 2013 02:55:01 AM",000004ed899e26ae1c9b1ece35a98af1
75264,"Jul 28, 2011 05:09:58 PM","Jun 07, 2014 04:21:28 PM",000005a09fd2eb706c5800eb06084160
198724,"Jul 23, 2012 02:25:58 PM","Jan 21, 2013 12:58:07 PM",0000060b9d552c35f281b5033dcfa1b4
它本质上是一个大的 csv 文件。
现在我想过滤 AccessTime 小于 10 年的行,然后将其写入一个单独的 csv 文件,在这种情况下应该打印第二行(不包括标题)
我尝试了以下方法:创建一个临时时间变量并与AccessTime 进行比较,如果小于则打印行。
BEGIN{
FPAT = "([^,]+)|(\"[^\"]+\")"; #this to read csv as some column value contains ,
OFS=",";
date=$(date -d "-3650 days" +"%s"); #temp time variable in epoch format
}
{
command="date -d" $6 " +%s"; #$6 refers to AccessTime column
( command | getline temp ); #converts Accesstime value to epoch format
close(command);
if(temp<date) print $6
}
但是当我运行这个命令时,它不会打印任何东西。 非常感谢任何帮助。
期望的输出:
Size,ModifiedTime,AccessTime,contentid
4096,"Aug 21, 2008 03:54:28 PM","May 12, 2009 04:45:41 PM",0000011afae4d1227c4df57b410ea52c
518,"Aug 22, 2006 02:12:03 PM","Dec 25, 2007 06:48:18 AM",00000233565d1c17c3135a9504c455ca
【问题讨论】: