Join Type: Bucket Join

In the last three blogs, I discussed Common Join, Map Join and Skewed Join. Common Join is the default join type. Map Join is best used when one of join tables is small and can be fit into memory. Skewed Join improves the query performance for data is skewed in the join keys. What happened when both join tables are very large and none of the above three joins can work? This is where Bucket Join fits in.

Bucket Join is also called Collocated Joins. It is used when all join tables are large and table data has been distributed by the join key. In this case, data copy is unnecessary. It’s map side join and join can happen in the local node. Another condition for Bucket Join is that the number of buckets in one table must be equal or multiple of the number of buckets in another table of the join.

So at the time when creating the table, make sure the buckets are created using the join columns, and BEFORE the data is inserted into the table. Also set both parameter hive.optimize.bucketmapjoin and hive.enforce.bucketing to true before inserting data. One example of creating a bucketed table is shown below:

CREATE TABLE mytable ( 	
name string,	 
city string,	
employee_id int ) 	
CLUSTERED BY (employee_id) INTO 256 BUCKETS	

The above join is also called Bucket Map Join. If the join tables have the same number of buckets and data are also sorted using the join columns, Sort Merge Bucket Map Join is used.

How to Identify the Join
When using EXPLAIN command, you will see Sorted Merge Bucket Map Join Operator below Map Operator Tree.


set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;

One thought on “Join Type: Bucket Join

  1. hive bucketing
    Hive bucketing is a method for dividing the data into number of equal parts.
    CLUSTERED BY command is used While creating bucketing in hive.
    Hive bucketing concept can be performed on internal tables or External tables.
    Hive bucketing can perform only on one column to get best result

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s