Home >> Big Data Hadoop >> Hive Rank and Over in Big Data Hadoop

Hive Rank and Over in Big Data Hadoop

Hive’s OLAP functionality (OVER and RANK) to achieve performance, but without a Join.

CREATE TABLE clicks (

timestamp date, sessionID string, url string, source_ip string

) STORED as ORC tblproperties (“orc.compress” = “SNAPPY”);

OLD--

SELECT clicks.* FROM clicks inner join

(select sessionID, max(timestamp) as max_ts from clicks

group by sessionID) latest

ON clicks.sessionID = latest.sessionID and

clicks.timestamp = latest.max_ts;

NEW--

SELECT * FROM (SELECT *, RANK() over (partition by sessionID

order by timestamp desc) as rank FROM clicks) ranked_clicks

WHERE ranked_clicks.rank=1;

 

Post Your Comment

Next Questions
Hive SERDE
Hive Directed Acyclic Graph
Hive with Sqoop
How to save hive query output in csv using python
Hive How To Convert External table to Internal table or vice-versa
Hive What is User Defined Function and User Defined Aggregate Function
What are the different components of a Hive architecture
How can you prevent a large job from running for a long time
What is a Hive Metastore
Explain about the different types of join in Hive
How can you configure remote metastore mode with Hive
How data transfer happens from HDFS to Hive
Hbase Vs Hive
Hive What is the use of Hcatalog
Where is table data stored in Apache Hive by default
Hive Difference between partitioning and bucketing
Explain about the different types of partitioning in Hive
How will you read and write HDFS files in Hive
What are the components of a Hive query processor
Differentiate between describe and describe extended
Will the reducer work or not if you use Limit 1 in any HiveQL query
Hive Explain about SORT BY, ORDER BY, DISTRIBUTE BY and CLUSTER BY
What is difference between hive internal table and external table
Why you should choose Hive instead of Hadoop MapReduce
How will you optimize Hive performance

Copyright ©2022 coderraj.com. All Rights Reserved.