step1、进入spark-shell
step2、scala> sc.setCheckpointDir("hdfs://bigdata121:9000/sparkckpt1004")
设置完成后会生成,检查点目录
step3、设置RDD的checkpoint
scala> rdd.checkpoint
<console>:24: error: not found: value rdd
rdd.checkpoint
^
scala> rdd1.checkpoint
step4: 再次统计行数
scala> rdd1.count
res16: Long = 45