cenzhongman

来自官方文档

一、写 python 脚本:

import sys
import datetime

for line in sys.stdin:
  line = line.strip()
  userid, movieid, rating, unixtime = line.split(\'\t\')
  weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
  print \'\t\'.join([userid, movieid, rating, str(weekday)])

二、添加脚本

add file /opt/datas/xxx.py

三、使用脚本

CREATE TABLE u_data_new (
  userid INT,
  movieid INT,
  rating INT,
  weekday INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY \'\t\';

add FILE weekday_mapper.py;

INSERT OVERWRITE TABLE u_data_new
SELECT
  TRANSFORM (userid, movieid, rating, unixtime)		# 传入参数
  USING \'python weekday_mapper.py\'					# 使用的脚本文件
  AS (userid, movieid, rating, weekday)				# 输出的字段
FROM u_data;

SELECT weekday, COUNT(*)
FROM u_data_new
GROUP BY weekday;

分类:

技术点:

相关文章:

  • 2021-06-06
  • 2021-04-14
  • 2021-12-18
  • 2022-12-23
  • 2021-12-28
  • 2021-11-22
  • 2021-11-28
  • 2021-07-04
猜你喜欢
  • 2021-07-04
  • 2022-02-22
  • 2022-12-23
  • 2021-11-16
  • 2022-12-23
  • 2021-11-22
  • 2021-12-30
相关资源
相似解决方案