第10章综合案例1广电大数据分析
第10章综合案例1广电大数据分析
实验目的及要求
(1)现有用户观看历史和用户信息两个广电大数据文件,将对用户数据进行大数据分析。
实验系统环境及版本
-
Linux Ubuntu 20.04
-
JDK1.8
-
Hadoop3.1.0
-
MySQL8.0.28
-
Hive3.1.2
实验任务
-
基本信息查询;
-
单个用户观看市场Top20;
-
用户观看时长Top20;
-
电视观看数Top20;
-
电视观看时长Top20;
-
用户正常状态及数量。
实验内容及步骤
1.创建存储格式为TextFile的表text_see和text_user(用于存储原始数据)。
观看历史表text_see:
hive> create table text_see(
> phone_no string,
> duration int,
> station_name string,
> origin_time string,
> end_time string,
> res_name string,
> owner_code string,
> owner_name string,
> category_name string,
> res_type string,
> vod_title string,
> program_title string,
> day string,
> origin_time1 string,
> end_time1 string,
> wat_time int,
> data string)
> row format delimited fields terminated by ‘,’
> stored as textfile;
用户信息表text_user:
hive> create table text_user(
> phone_no string,
> owner_name string,
> run_name string,
> run_time string,
> sm_name string,
> owner_code string)
> row format delimited fields terminated by ‘,’
> stored as textfile;
创建存储格式为ORC的表orc_see和orc_user
观看历史表orc_see:
hive> create table orc_see(
> phone_no string,
> duration int,
> station_name string,
> origin_time string,
> end_time string,
> res_name string,
> owner_code string,
> owner_name string,
> category_name string,
> res_type string,
> vod_title string,
> program_title string,
> day string,
> origin_time1 string,
> end_time1 string,
> wat_time int,
> data string)
> row format delimited fields terminated by ‘,’
> stored as orc;
用户信息表orc_user:
hive> create table orc_user(
> phone_no string,
> owner_name string,
> run_name string,
> run_time string,
> sm_name string,
> owner_code string)
> row format delimited fields terminated by ‘,’
> stored as orc;
导入数据
观看历史文件存储在本地系统/opt/datas目录下,将其导入表text_see中:
hive> load data local inpath ‘/opt/datas/media3.txt’ into table text_see;
用户信息文件存储在本地系统/opt/datas目录下,将其导入表text_user中:
hive> load data local inpath ‘/opt/datas/userevents.txt’ into table text_user;
将表text_see中数据加载到表orc_see中:
hive> insert into table orc_see select * from text_see;
将表text_user中数据加载到表orc_user中:
hive> insert into table orc_user select * from text_user;
创建完成的表:
hive> show tables;
基本信息查询
查询orc_see的记录总数:
hive> select count( *) from orc_see;
查询表orc_user的记录总数:
hive> select count( *) from orc_user;
查看表text_see的数据大小:
hdfs dfs -du -h /user/hive/warehouse/hivedwh.db/text_see
查看表text_user的数据大小:
hdfs dfs -du -h /user/hive/warehouse/hivedwh.db/text_user
单个用户观看时长Top20
hive> select phone_no,duration from orc_see order by duration desc limit 20;
用户观看时长Top20
hive> select phone_no as name, sum(wat_time) as times from orc_see group by phone_no order by times desc limit 20;
电视观看数Top20
hive> select station_name as station,count(category_name) as num from orc_see group by station_name order by num desc limit 20;
电视观看时长Top20
hive> select station_name as station,sum(duration) as num from orc_see group by station_name order by num desc limit 20;
用户正常状态及数量
hive> select run_name,count(run_name) from orc_user group by run_name;