Home >> Big Data Hadoop >> Hive Difference between partitioning and bucketing in Big Data Hadoop

Hive Difference between partitioning and bucketing in Big Data Hadoop

Partitioning and Bucketing of tables is done to improve the query performance.
Partitioning helps execute queries faster, only if the partitioning scheme has some common range filtering i.e. either by timestamp ranges, by location, etc. Bucketing does not work by default. 
•Partitioning helps eliminate data when used in WHERE clause. Bucketing helps organize data inside the partition into multiple files so that same set of data will always be written in the same bucket. Bucketing helps in joining various columns. 
•In partitioning technique, a partition is created for every unique value of the column and there could be a situation where several tiny partitions may have to be created. However, with bucketing, one can limit it to a specific number and the data can then be decomposed in those buckets. 
•Basically, a bucket is a file in Hive whereas partition is a directory. 

 

Post Your Comment

Next Questions
Explain about the different types of partitioning in Hive
How will you read and write HDFS files in Hive
What are the components of a Hive query processor
Differentiate between describe and describe extended
Will the reducer work or not if you use Limit 1 in any HiveQL query
Hive Explain about SORT BY, ORDER BY, DISTRIBUTE BY and CLUSTER BY
What is difference between hive internal table and external table
Why you should choose Hive instead of Hadoop MapReduce
How will you optimize Hive performance
Hive Difference between Sort By and Order By
Sqoop vs Flume
What is the default file format to import data using Apache Sqoop
we have around 300 tables in a database. I want to import all the tables from the database except the tables named Table298, Table 123, and Table299. How can I do this without having to import the tables one by one using sqoop
How can you execute a free form SQL query in Sqoop to import the rows in a sequential manner
How will you list all the columns of a table using Apache Sqoop
What is the difference between Sqoop and DistCP command
What is Sqoop metastore
What is the significance of using --split-by clause for running parallel import tasks in Apache Sqoop

Copyright ©2022 coderraj.com. All Rights Reserved.