Only equality joins, outer joins, and left semi joins are supported in Hive. Also, more than two tables can be joined in Hive.
Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job.
Allowed Equality Joins
SELECT a.* FROM a JOIN b ON (a.id = b.id)
SELECT a.* FROM a JOIN b ON (a.id = b.id AND a.department = b.department)
Disallowed Joins
SELECT a.* FROM a JOIN b ON (a.id <> b.id)
Multiple Tables can be joined in the same query
SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON (c.key = b.key2)
Largest Table LAST:
Map/reduce stage of the join, the last table in the sequence is streamed through the reducers where as the others are buffered.It helps to reduce the memory needed in the reducer for buffering the rows for a particular value of the join key by organizing the tables such that the largest tables appear last in the sequence.
SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON (c.key = b.key1)
Copyright ©2022 coderraj.com. All Rights Reserved.