Hive如何实现DML数据操作、分区表和分桶表

这篇文章主要为大家展示了“Hive如何实现DML数据操作、分区表和分桶表”，内容简而易懂，条理清晰，希望能够帮助大家解决疑惑，下面让小编带领大家一起研究并学习一下“Hive如何实现DML数据操作、分区表和分桶表”这篇文章吧。

公司主营业务：成都网站设计、成都网站建设、移动网站开发等业务。帮助企业客户真正实现互联网宣传，提高企业的竞争能力。成都创新互联是一支青春激扬、勤奋敬业、活力青春激扬、勤奋敬业、活力澎湃、和谐高效的团队。公司秉承以“开放、自由、严谨、自律”为核心的企业文化，感谢他们对我们的高要求，感谢他们从不同领域给我们带来的挑战，让我们激情的团队有机会用头脑与智慧不断的给客户带来惊喜。成都创新互联推出突泉免费做网站回馈大家。

1、DML数据操作

1.1、数据导入

1.通过load data导入
	load data [local] inpath '数据的path' [overwrite] 
		#[local] ：如果不加该字段表示路径为HDFS。加上local表示本地路径
		#[overwrite] ：如果加该字段第二次导入会覆盖第一次导入的数据。不加会追加
		
	into table 表名 [partition (partcol1=val1,…)];
		#[partition (partcol1=val1,…)] ：指定分区的字段（后面再说）。
		
tip：set hive.exec.mode.local.auto=true; 使用本地模式去跑MR（只有在一定条件下才跑本地不满足还跑集群）


-----------------------------------------------------------
2.通过查询语句向表中插入数据（Insert）

	2.1 直接向表中插入新的数据
		insert into student values(1,'aa');

	2.2 将查询的结果插入到表中(注意：查询的结果的列数和原表的列必须保持一致（列的数量和类型）)
		insert overwrite table 表名 sql语句;


--------------------------------------------------------------
3.查询语句中创建表并加载数据（As Select）
	create table if not exists 表名
	as sql语句;
	
	
	
----------------------------------------------------------------
4.创建表时通过Location指定加载数据路径
	create table if not exists student3(
	id int,
	name string
	)
	row format delimited fields terminated by '\t'
	location '/input';


--------------------------------------------------------------------
5.导入数据（只有导出的数据才能导入）
	注意：表必须不存在，否则会报错
	import table 库名.表名  from 'HDFS导出的路径';

1.2、数据导出

1. insert导出
	insert overwrite [local] directory '路径'
	row format delimited fields terminated by '\t' #指定分隔符
            sql查询语句;
	#local:如果加上该字段导出的路径为本地。如果不加该字段导出的路径为HDFS

    例：
	insert overwrite local directory '/opt/module/hive/datas2' 
	row format delimited fields terminated by '\t'
	select * from db4.student3;

	insert overwrite directory '/output' 
	row format delimited fields terminated by '\t'
	select * from db4.student3;


-------------------------------------------------------------------
2. Hadoop命令导出到本地

	hadoop fs -get '表中数据的路径'  '本地路径'
	hdfs dfs -get '表中数据的路径'  '本地路径'
	在hive客户端中 ：dfs -get '表中数据的路径'  '本地路径'


--------------------------------------------------------------------
3.Hive Shell 命令导出
	bin/hive -e 'select * from 表名;' > 本地路径;


--------------------------------------------------------------------
4 Export导出到HDFS上

	export table 库名.表名 to 'HDFS路径';


--------------------------------------------------------------------
5.Sqoop导出
	后面会提。。。

2、分区表和分桶表

2.1、分区表

一 创建分区表
	create table 表名(
		deptno int, dname string, loc string
	)
	partitioned by (字段名 字段类型) #指定分区字段
	row format delimited fields terminated by '\t';

   案例：
	create table dept_partition(
	deptno int, dname string, loc string
	)
	partitioned by (day string)
	row format delimited fields terminated by '\t';


---------------------------------------------------------------------------------
二 分区表的操作：

	1.添加分区
	alter table 表名 add partition(分区字段名='值') partition(分区字段名='值') .......
	
	2.查看分区
	show partitions 表名;
	
	3.删除分区
	alter table 表名 drop partition(分区字段名='值'),partition(分区字段名='值').......
	
	4.向分区表中添加数据
	load data [local] inpath '路径' [overwrite] into table 表名 partition(分区字段名='值');


---------------------------------------------------------------------------------------
三 创建二级分区表
	create table 表名(
	deptno int, dname string, loc string
	 )
	partitioned by (字段名1 字段类型, 字段名2 字段类型,......)
	row format delimited fields terminated by '\t';

   案例：
	create table dept_partition2(
	deptno int, dname string, loc string
	)
	partitioned by (day string, hour string)
	row format delimited fields terminated by '\t';


   向二级分区表中添加数据（在load数据时如果分区不存在则直接创建）：
	load data local inpath '/opt/module/hive/datas/dept_20200401.log' into table
	dept_partition2 partition(day='20200401', hour='12');

	load data local inpath '/opt/module/hive/datas/dept_20200402.log' into table
	dept_partition2 partition(day='20200401', hour='13');


---------------------------------------------------------------
四 数据和分区的关联方式

	1.执行修复命令
		msck repair table 表名;

	2.方式二：上传数据后添加分区
		alter table 表名 add partition(字段名='值');

	3.方式三：创建文件夹后load数据到分区(会直接创建该分区)
		load data local inpath '/opt/module/hive/datas/dept_20200402.log' into table
		dept_partition2 partition(day='20200401', hour='13');

2.2、分桶表

一 创建分桶表：
	create table 表名(id int, name string)
	clustered by(id) #id:分桶字段。分桶时就会根据此id进行分桶。
	into 桶的数量 buckets
	row format delimited fields terminated by '\t';

   案例：
	create table stu_buck(id int, name string)
	clustered by(id) 
	into 4 buckets
	row format delimited fields terminated by '\t';

   注意：
	 1.在hive的新版本当我们向一个分桶表中load数据时会跑MR
		所以load数据的路径最好放在HDFS上。

	 2.我们分桶的数量要和ReduceTask的数量相等。

	 3.分桶的原则：根据分桶的字段的内容的hashCode值 % 分桶的数量 算出数据应该进入到哪个桶。

以上是“Hive如何实现DML数据操作、分区表和分桶表”这篇文章的所有内容，感谢各位的阅读！相信大家都有了一定的了解，希望分享的内容对大家有所帮助，如果还想学习更多知识，欢迎关注创新互联行业资讯频道！

文章名称：Hive如何实现DML数据操作、分区表和分桶表
URL地址：http://chengdu.cdxwcx.cn/article/jcddod.html

甜橘子，专注成都网站制作网站设计与营销型网站建设与优化

首页

网站建设

网站制作案例

解决方案

网站设计报价

网站制作动态

关于我们

联系我们

成都网站建设设计将想法与焦点和您一起共享

Hive如何实现DML数据操作、分区表和分桶表

1、DML数据操作

1.1、数据导入

1.2、数据导出

2、分区表和分桶表

2.1、分区表

2.2、分桶表

其他资讯

mariadb的作用

Linux下查看共享存储的简便操作（linux查看共享存储）

聊聊Service（一）

阿里香港云服务器

香港云服务器1001香港云服务器

甜橘子，专注成都网站制作网站设计与营销型网站建设与优化

成都网站建设设计 将想法与焦点和您一起共享

Hive如何实现DML数据操作、分区表和分桶表

1、DML数据操作

1.1、数据导入

1.2、数据导出

2、分区表和分桶表

2.1、分区表

2.2、分桶表

其他资讯

mariadb的作用

Linux下查看共享存储的简便操作（linux查看共享存储）

聊聊Service（一）

阿里香港云服务器

香港云服务器1001香港云服务器

成都网站建设设计将想法与焦点和您一起共享