1. 定义
TTL(Time to Live) 用于限定数据的超时时间。
2.原理
以Column Family的TTL为例介绍,
hbase(main):001:0> desc \'wxy:test\' Table wxy:test is ENABLED wxy:test COLUMN FAMILIES DESCRIPTION {NAME => \'cf\', DATA_BLOCK_ENCODING => \'NONE\', BLOOMFILTER => \'ROW\', REPLICATION_SCOPE => \'0\', VERSIONS = > \'2\', COMPRESSION => \'NONE\', MIN_VERSIONS => \'0\', TTL => \'FOREVER\', KEEP_DELETED_CELLS => \'FALSE\', BLOC KSIZE => \'65536\', IN_MEMORY => \'false\', BLOCKCACHE => \'true\'} {NAME => \'f1\', DATA_BLOCK_ENCODING => \'NONE\', BLOOMFILTER => \'ROW\', REPLICATION_SCOPE => \'0\', COMPRESSIO N => \'NONE\', VERSIONS => \'5\', TTL => \'FOREVER\', MIN_VERSIONS => \'0\', KEEP_DELETED_CELLS => \'FALSE\', BLOC KSIZE => \'65536\', IN_MEMORY => \'false\', BLOCKCACHE => \'true\'} 2 row(s) in 0.9730 seconds
CF默认的TTL值是FOREVER,也就是永不过期。
- 修改TTL的值,CF的TTL的值以秒为单位:
hbase(main):003:0> disable \'wxy:test\'
0 row(s) in 1.3500 seconds
hbase(main):004:0> alter \'wxy:test\', {NAME=>\'f1\', TTL => \'100\'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.1780 seconds
hbase(main):002:0> desc \'wxy:test\'
Table wxy:test is DISABLED
wxy:test
COLUMN FAMILIES DESCRIPTION
{NAME => \'cf\', DATA_BLOCK_ENCODING => \'NONE\', BLOOMFILTER => \'ROW\', REPLICATION_SCOPE => \'0\', VERSIONS =
> \'2\', COMPRESSION => \'NONE\', MIN_VERSIONS => \'0\', TTL => \'FOREVER\', KEEP_DELETED_CELLS => \'FALSE\', BLOC
KSIZE => \'65536\', IN_MEMORY => \'false\', BLOCKCACHE => \'true\'}
{NAME => \'f1\', DATA_BLOCK_ENCODING => \'NONE\', BLOOMFILTER => \'ROW\', REPLICATION_SCOPE => \'0\', COMPRESSIO
N => \'NONE\', VERSIONS => \'5\', TTL => \'100 SECONDS (1 MINUTE 40 SECOND)\', MIN_VERSIONS => \'0\', KEEP_DELET
ED_CELLS => \'FALSE\', BLOCKSIZE => \'65536\', IN_MEMORY => \'false\', BLOCKCACHE => \'true\'}
2 row(s) in 0.0680 seconds
hbase(main):003:0> enable \'wxy:test\'
0 row(s) in 0.2460 seconds
- scan现有的值:
hbase(main):007:0> scan \'wxy:test\' ROW COLUMN+CELL r1 column=cf:name, timestamp=1503047499079, value=lisi4 r1 column=cf:sex, timestamp=1502788726648, value=male r2 column=cf:age, timestamp=1503041691183, value=20 r3 column=cf:age, timestamp=1503041723715, value=23 r4 column=cf:name, timestamp=1503041738224, value=Alex 4 row(s) in 0.1140 seconds
- 更新表
hbase(main):007:0> put \'wxy:test\' ,\'r4\',\'f1:address\',\'shandi\' 0 row(s) in 0.2590 seconds hbase(main):008:0> scan \'wxy:test\' ROW COLUMN+CELL r1 column=cf:name, timestamp=1503047499079, value=lisi4 r1 column=cf:sex, timestamp=1502788726648, value=male r2 column=cf:age, timestamp=1503041691183, value=20 r3 column=cf:age, timestamp=1503041723715, value=23 r4 column=cf:name, timestamp=1503041738224, value=Alex r4 column=f1:address, timestamp=1505976958276, value=shandi 4 row(s) in 0.0680 seconds
- 过30秒后扫描表
hbase(main):012:0> scan \'wxy:test\' ROW COLUMN+CELL r1 column=cf:name, timestamp=1503047499079, value=lisi4 r1 column=cf:sex, timestamp=1502788726648, value=male r2 column=cf:age, timestamp=1503041691183, value=20 r3 column=cf:age, timestamp=1503041723715, value=23 r4 column=cf:name, timestamp=1503041738224, value=Alex r4 column=f1:address, timestamp=1505976958276, value=shandi 4 row(s) in 0.0460 seconds hbase(main):013:0> scan \'wxy:test\' ROW COLUMN+CELL r1 column=cf:name, timestamp=1503047499079, value=lisi4 r1 column=cf:sex, timestamp=1502788726648, value=male r2 column=cf:age, timestamp=1503041691183, value=20 r3 column=cf:age, timestamp=1503041723715, value=23 r4 column=cf:name, timestamp=1503041738224, value=Alex r4 column=f1:address, timestamp=1505976958276, value=shandi 4 row(s) in 0.0390 seconds
如上,连续扫描两次,数据没有变化
- 过100秒后扫描表
hbase(main):019:0> scan \'wxy:test\' ROW COLUMN+CELL r1 column=cf:name, timestamp=1503047499079, value=lisi4 r1 column=cf:sex, timestamp=1502788726648, value=male r2 column=cf:age, timestamp=1503041691183, value=20 r3 column=cf:age, timestamp=1503041723715, value=23 r4 column=cf:name, timestamp=1503041738224, value=Alex 4 row(s) in 0.0280 seconds
发现r4的f1不见了。这就是TTL的工作原理。
TTL=>的更新超时时间是指:该列最后更新的时间,到超时时间的限制,而不是第一次创建,到超时时间;
同时我们也注意到100秒后r4被删除,但是只删除掉了r1的f1列,如果r1有其他列,比如cf,则其他列保留,TTL的概念只针对CELL
如果一个Store file仅包括过期的rows, minor comact的时候会将这些文件删掉(可以参见HBase compact)。将hbase.store.delete.expired.storefile 设置成false或者将minimum number of versions 设置成除0意外的值可以将这个feature diable掉。number of versions的默认值是0:
hbase(main):001:0> desc \'wxy:test\' Table wxy:test is ENABLED wxy:test COLUMN FAMILIES DESCRIPTION {NAME => \'cf\', DATA_BLOCK_ENCODING => \'NONE\', BLOOMFILTER => \'ROW\', REPLICATION_SCOPE => \'0\', VERSIONS = > \'2\', COMPRESSION => \'NONE\', MIN_VERSIONS => \'0\', TTL => \'FOREVER\', KEEP_DELETED_CELLS => \'FALSE\', BLOC KSIZE => \'65536\', IN_MEMORY => \'false\', BLOCKCACHE => \'true\'} {NAME => \'f1\', DATA_BLOCK_ENCODING => \'NONE\', BLOOMFILTER => \'ROW\', REPLICATION_SCOPE => \'0\', COMPRESSIO N => \'NONE\', VERSIONS => \'5\', TTL => \'FOREVER\', MIN_VERSIONS => \'0\', KEEP_DELETED_CELLS => \'FALSE\', BLOC KSIZE => \'65536\', IN_MEMORY => \'false\', BLOCKCACHE => \'true\'} 2 row(s) in 0.9730 seconds
注意:修改表结构之前,需要先disable 表,否则表中的记录被清空!HBase不disable直接去alter 表是可以的! 参加如下测试过程:
hbase(main):004:0> scan \'test\'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1500967679327, value=value1
row2 column=cf:b, timestamp=1500967692945, value=value2
row3 column=cf:c, timestamp=1500967715743, value=value3
3 row(s) in 0.2490 seconds
hbase(main):005:0> desc \'test\'
Table test is ENABLED
test
COLUMN FAMILIES DESCRIPTION
{NAME => \'cf\', DATA_BLOCK_ENCODING => \'NONE\', BLOOMFILTER => \'ROW\', REPLICATION_SCOPE => \'0\', VERSIONS =
> \'1\', COMPRESSION => \'NONE\', MIN_VERSIONS => \'0\', TTL => \'FOREVER\', KEEP_DELETED_CELLS => \'FALSE\', BLOC
KSIZE => \'65536\', IN_MEMORY => \'false\', BLOCKCACHE => \'true\'}
1 row(s) in 0.0880 seconds
hbase(main):006:0> alter \'test\',{NAME => \'cf\',TTL => \'100\'}
Updating all regions with the new schema...
0/1 regions updated.
1/1 regions updated.
Done.
0 row(s) in 2.2200 seconds
hbase(main):007:0> scan \'test\'
ROW COLUMN+CELL
0 row(s) in 0.0190 seconds
3. 粒度
早期版本控制粒度是column family; 新版本因为Cell可以支持tag了,所以可以在cell级别设置TTL了。(待考证)
( 参见http://hbase.apache.org/book.html#ttl 及https://issues.apache.org/jira/browse/HBASE-10560)
Cell的TTL与Column family的TTL区别:
- Column family的TTL以秒为单位,cell的TTL以毫秒为单位
- 如果有有cell级别的TTL,则cell的TTL override CF的TTL; 但是不能超出CF级别的TTL
以下引自:http://hbase.apache.org/book.html#ttl
Cell TTLs are expressed in units of milliseconds instead of seconds.
A cell TTLs cannot extend the effective lifetime of a cell beyond a ColumnFamily level TTL setting.
以下引自:https://issues.apache.org/jira/browse/HBASE-10560 作者的comments:
We can keep the existing column level definition and enforcement mechanism and extend it to look for a TTL cell tag during compaction. If one is found, it can override the CF setting. TTL overrides can be passed up to the server in an operation attribute.
参考文献:
http://blog.csdn.net/wulantian/article/details/41010947
http://hbase.apache.org/book.html#ttl
https://issues.apache.org/jira/browse/HBASE-10560