Home >> Big Data Hadoop >> Hive Bucketing in Big Data Hadoop

Hive Bucketing in Big Data Hadoop

Bucketing of tables is done to improve the query performance.
Partitioning helps execute queries faster, only if the partitioning scheme has some common range filtering i.e. either by timestamp ranges, by location, etc. Bucketing does not work by default.
Partitioning helps eliminate data when used in WHERE clause. Bucketing helps organize data inside the partition into multiple files so that same set of data will always be written in the same bucket. Bucketing helps in joining various columns.
In partitioning technique, a partition is created for every unique value of the column and there could be a situation where several tiny partitions may have to be created. However, with bucketing, one can limit it to a specific number and the data can then be decomposed in those buckets.
Bucket is a file in Hive whereas partition is a directory.



CREATE TABLE mytable ( name string,city string,employee_id int )



Post Your Comment

Next Questions
Hive File Format
Hive Engine
Hive Vectorization
Hive User Defined Function
Hive How to Write a User Defined Function
Hive User Defined Aggregate Functions
Hive Performance Tuning
Hive Rank and Over
Hive Directed Acyclic Graph
Hive with Sqoop
How to save hive query output in csv using python
Hive How To Convert External table to Internal table or vice-versa
Hive What is User Defined Function and User Defined Aggregate Function
What are the different components of a Hive architecture
How can you prevent a large job from running for a long time
What is a Hive Metastore
Explain about the different types of join in Hive
How can you configure remote metastore mode with Hive
How data transfer happens from HDFS to Hive
Hbase Vs Hive
Hive What is the use of Hcatalog
Where is table data stored in Apache Hive by default
Hive Difference between partitioning and bucketing
Explain about the different types of partitioning in Hive

Copyright ©2022 coderraj.com. All Rights Reserved.