【发布时间】:2014-10-31 13:33:22
【问题描述】:
我需要使用多行数据框作为输入,返回字符串出现频率最高的 n 个。所有值都在名为“MissingDates”的同一列中
这里是示例数据,总共大约有 5000 行:
Symbol Count MissingDates
AD 27 1995-12-26, 1996-01-02, 1996-04-26, 1996-04-30, 1996-05-06, 1996-08-26, 1996-09-03, 1996-09-04, 1996-10-11, 1996-11-13, 1996-11-29, 1996-12-09, 1996-12-20, 1996-12-23, 1996-12-26, 1996-12-27, 1997-01-02, 1997-05-02, 1997-09-10, 1998-01-02, 1998-04-16, 1998-12-08, 1999-12-27, 1999-12-31, 2001-09-12, 2003-08-06, 2003-10-13
BP 14 1995-08-09, 1995-08-15, 1995-12-26, 1996-01-02, 1996-09-06, 1996-12-26, 1997-01-02, 1997-12-26, 1998-01-02, 1998-04-16, 2001-09-12, 2002-12-24, 2003-08-06, 2003-10-13
C 3 1999-12-31, 2001-12-24, 2002-12-24
CC 285 1994-05-18, 1994-05-19, 1994-05-20, 1994-05-23, 1994-05-24, 1994-05-25, 1994-05-26, 1994-05-27, 1994-05-31, 1994-06-01, 1994-06-02, 1994-06-03, 1994-06-06, 1994-06-07, 1994-06-08, 1994-06-09, 1994-06-10, 1994-06-13, 1994-06-14, 1994-06-15, 1994-06-16, 1994-06-17, 1994-06-20, 1994-06-21, 1994-06-23, 1994-06-24, 1994-06-27, 1994-06-28, 1994-06-29, 1994-06-30, 1994-07-01, 1994-07-06, 1994-07-14, 1994-07-15, 1994-07-18, 1994-07-19, 1994-07-21, 1994-07-25, 1994-07-27, 1994-07-28, 1994-08-03, 1994-08-04, 1994-08-08, 1994-08-09, 1994-08-10, 1994-08-11, 1994-08-12, 1994-08-15, 1994-08-17, 1994-08-18, 1994-08-19, 1994-08-22, 1994-08-23, 1994-08-24, 1994-08-25, 1994-08-29, 1994-08-31, 1994-09-01, 1994-09-02, 1994-09-06, 1994-09-07, 1994-09-08, 1994-09-09, 1994-09-12, 1994-09-13, 1994-09-15, 1994-09-16, 1994-09-19, 1994-09-20, 1994-09-21, 1994-09-22, 1994-09-23, 1994-09-27, 1994-09-28, 1994-09-29, 1994-09-30, 1994-10-03, 1994-10-04, 1994-10-06, 1994-10-14, 1994-10-18, 1994-10-19, 1994-10-25, 1994-10-26, 1994-10-27, 1994-10-28, 1994-10-31, 1994-11-01, 1994-11-09, 1994-11-10, 1994-11-11, 1994-11-16, 1994-11-17, 1994-11-25, 1994-11-28, 1994-12-01, 1994-12-02, 1994-12-06, 1994-12-07, 1994-12-08, 1994-12-09, 1994-12-12, 1994-12-13, 1994-12-14, 1994-12-15, 1994-12-16, 1994-12-23, 1994-12-27, 1994-12-29, 1994-12-30, 1995-01-03, 1995-01-05, 1995-01-09, 1995-01-11, 1995-01-13, 1995-01-16, 1995-01-17, 1995-01-18, 1995-01-19, 1995-01-20, 1995-01-24, 1995-01-25, 1995-02-13, 1995-02-17, 1995-05-01, 1995-07-03, 1995-11-24, 1995-12-26, 1996-01-08, 1996-01-09, 1996-07-05, 1996-11-29, 1996-12-26, 1997-11-28, 1997-12-26, 1998-01-02, 1998-11-27, 1999-06-17, 1999-06-18, 1999-06-21, 1999-06-22, 1999-06-23, 1999-06-24, 1999-06-25, 1999-06-28, 1999-06-29, 1999-06-30, 1999-07-01, 1999-07-02, 1999-07-06, 1999-07-07, 1999-07-08, 1999-07-09, 1999-07-12, 1999-07-13, 1999-07-14, 1999-07-15, 1999-07-16, 1999-07-19, 1999-07-20, 1999-07-21, 1999-07-22, 1999-07-23, 1999-07-26, 1999-07-27, 1999-07-28, 1999-07-29, 1999-07-30, 1999-08-02, 1999-08-03, 1999-08-04, 1999-08-05, 1999-08-06, 1999-08-09, 1999-08-10, 1999-08-11, 1999-08-12, 1999-08-13, 1999-08-16, 1999-08-17, 1999-08-18, 1999-08-19, 1999-08-20, 1999-08-23, 1999-08-24, 1999-08-25, 1999-08-26, 1999-08-27, 1999-08-30, 1999-08-31, 1999-09-01, 1999-09-02, 1999-09-03, 1999-09-07, 1999-09-08, 1999-09-09, 1999-09-10, 1999-09-13, 1999-09-14, 1999-09-15, 1999-09-16, 1999-09-17, 1999-09-20, 1999-09-21, 1999-09-22, 1999-09-23, 1999-09-24, 1999-09-27, 1999-09-28, 1999-09-29, 1999-09-30, 1999-10-01, 1999-10-04, 1999-10-05, 1999-10-06, 1999-10-07, 1999-10-08, 1999-10-11, 1999-10-12, 1999-10-13, 1999-10-14, 1999-10-15, 1999-10-18, 1999-10-19, 1999-10-20, 1999-10-21, 1999-10-22, 1999-10-25, 1999-10-26, 1999-10-27, 1999-10-28, 1999-10-29, 1999-11-01, 1999-11-02, 1999-11-03, 1999-11-04, 1999-11-05, 1999-11-08, 1999-11-09, 1999-11-10, 1999-11-11, 1999-11-12, 1999-11-15, 1999-11-16, 1999-11-17, 1999-11-18, 1999-11-19, 1999-11-22, 1999-11-23, 1999-11-24, 1999-11-26, 1999-11-29, 1999-11-30, 1999-12-01, 1999-12-02, 1999-12-03, 1999-12-06, 1999-12-07, 1999-12-08, 1999-12-09, 1999-12-10, 1999-12-13, 1999-12-31, 2000-07-03, 2000-11-24, 2001-09-13, 2001-09-14, 2001-11-23, 2001-12-24, 2001-12-26, 2001-12-31, 2002-07-05, 2002-11-29, 2002-12-26, 2003-02-18, 2003-11-28, 2004-06-11, 2004-11-26, 2004-12-31, 2005-11-25, 2006-11-24, 2007-01-02, 2007-11-23, 2007-12-24, 2011-01-03
CD 14 1995-08-09, 1995-12-26, 1996-01-02, 1996-06-11, 1996-06-20, 1996-09-09, 1996-09-11, 1996-12-26, 1997-01-02, 1997-12-26, 1998-01-02, 1998-04-16, 2001-01-02, 2001-09-12
CT 154 1995-11-24, 1996-01-08, 1996-07-05, 1996-11-29, 1996-12-24, 1997-11-28, 1997-12-26, 1998-11-27, 1999-11-26, 1999-12-31, 2000-07-03, 2000-11-24, 2001-09-11, 2001-09-12, 2001-09-13, 2001-09-14, 2001-11-12, 2001-11-23, 2001-12-24, 2001-12-31, 2002-05-21, 2002-05-22, 2002-05-23, 2002-05-24, 2002-05-28, 2002-05-29, 2002-05-30, 2002-05-31, 2002-06-03, 2002-06-04, 2002-06-05, 2002-06-06, 2002-06-07, 2002-06-10, 2002-06-11, 2002-06-12, 2002-06-13, 2002-06-14, 2002-06-17, 2002-06-18, 2002-06-19, 2002-06-20, 2002-06-21, 2002-06-24, 2002-06-25, 2002-06-26, 2002-06-27, 2002-06-28, 2002-07-01, 2002-07-02, 2002-07-03, 2002-07-05, 2002-07-08, 2002-07-09, 2002-07-10, 2002-07-11, 2002-07-12, 2002-07-15, 2002-07-16, 2002-07-17, 2002-07-18, 2002-07-19, 2002-07-22, 2002-07-23, 2002-07-24, 2002-07-25, 2002-07-26, 2002-07-29, 2002-07-30, 2002-07-31, 2002-08-01, 2002-08-02, 2002-08-05, 2002-08-06, 2002-08-07, 2002-08-08, 2002-08-09, 2002-08-12, 2002-08-13, 2002-08-14, 2002-08-15, 2002-08-16, 2002-08-19, 2002-08-20, 2002-08-21, 2002-08-22, 2002-08-23, 2002-08-26, 2002-08-27, 2002-08-28, 2002-08-29, 2002-08-30, 2002-09-03, 2002-09-04, 2002-09-05, 2002-09-06, 2002-09-09, 2002-09-10, 2002-09-11, 2002-09-12, 2002-09-13, 2002-09-16, 2002-09-17, 2002-09-18, 2002-09-19, 2002-09-20, 2002-09-23, 2002-09-24, 2002-09-25, 2002-09-26, 2002-09-27, 2002-09-30, 2002-10-01, 2002-10-02, 2002-10-03, 2002-10-04, 2002-10-07, 2002-10-08, 2002-10-09, 2002-10-10, 2002-10-11, 2002-10-14, 2002-10-15, 2002-10-16, 2002-10-17, 2002-10-18, 2002-10-21, 2002-10-22, 2002-10-23, 2002-10-24, 2002-10-25, 2002-10-28, 2002-10-29, 2002-10-30, 2002-10-31, 2002-11-01, 2002-11-04, 2002-11-05, 2002-11-06, 2002-11-07, 2002-11-29, 2002-12-24, 2003-02-18, 2003-11-28, 2003-12-26, 2004-01-02, 2004-06-11, 2004-11-26, 2004-12-31, 2005-11-25, 2006-11-24, 2007-01-02, 2007-11-23, 2007-12-24
因此,该函数将传递一个参数,该参数将从 data.frame 返回上述日期的 n 次最频繁出现。
我查看了 which.max,但无法弄清楚如何将其应用于多行(整个数据框列),或者给我多个日期 (n) 作为输出。
如果只有一个输出值的代码会简单得多,那作为我工作的起点是可以接受的。任何指针表示赞赏。
这是一个 pastebin,因为字符串的长度导致我遇到了问题: http://pastebin.com/B1YPicC8
> str(间隙) 'data.frame':5560 obs。 3个变量: $ 符号:因子 w/ 5560 级别 "@AD#","@BP#",..: 1 2 3 4 5 6 7 8 9 10 ... $计数:int 27 14 3 285 14 154 540 11 3 11 ... $ MissingDates:因子 w/ 3568 个级别“1995-12-26、1996-01-02、1996-04-26、1996-04-30、1996-05-06、1996-08-26、1996-09-03 , 1996-09-04, 1996-10-11, 1996-11-13, 1996-11"| __截断__,..:1 2 3 4 5 6 7 8 9 10 ...【问题讨论】:
-
如果它们都是日期,那么您可能只需使用
table(data)并从中逐行获取最大值 -
你的数据结构不清楚。可以提供
dput(head(df$MissingDates))吗? -
它们是日期字符串,但不归类为日期。而且我不需要最大值(最高/最近),我需要最频繁的。
-
@DavidArenburg 我尝试了该命令,但输出为 20,000 行(head 不起作用)。我认为这是因为在某些情况下,一行可能有一千个或更多日期,并且它会换行。该列的格式是字符串“日期”逗号空格
-
str(df$MissingDates)带给你什么?