HBase预分区
seq 0 7 | awk \'{printf("\\x%02x\\x%02x\n", $1/256, $1%256);}\' | sort -R |head -3
1:mkSplit.sh
#!/bin/sh
let step=65535/500
let i=1
let ret=0
while(( $i<=500 ))
do
let ret=ret+step
echo $ret
let i=i+1
done
2: showSplit.sh
#!/bin/sh
sh mkSplit.sh | awk \'{printf("\\x%02x\\x%02x\n", $1/256, $1%256);}\'
3:使用
sh showSplit.sh >> split_500.txt
create \'rela_user_acct_relation\', {MAX_FILESIZE => \'10737418240\',SPLITS_FILE => \'/home/hdp/preSplit/split_500.txt\'}, {NAME => \'d\', BLOOMFILTER => \'ROW\', VERSIONS => \'1\', COMPRESSION => \'LZO\', MIN_VERSIONS => \'0\', BLOCKSIZE => \'65536\', IN_MEMORY => \'false\', BLOCKCACHE => \'true\'}
分区算法的接口是SplitAlgorithm,
实现类是HexStringSplit和UniformSplit。
创建表使用预分区的时候可以参照这两个类的split(int numRegions)方法。可以设定first key 和last key。
bin/hbase org.apache.hadoop.hbase.util.RegionSplitter -c 60 -f f:d myTable HexStringSplit
其中f和d是family