Home >> Big Data Hadoop >> Hive Difference between Sort By and Order By in Big Data Hadoop

Hive Difference between Sort By and Order By in Big Data Hadoop

Hive supports SORT BY which sorts the data per reducer. The difference between "order by" and "sort by" is that the former guarantees total order in the output while the latter only guarantees ordering of the rows within a reducer. If there are more than one reducer, "sort by" may give partially ordered final results.
Note: It may be confusing as to the difference between SORT BY alone of a single column and CLUSTER BY. The difference is that CLUSTER BY partitions by the field and SORT BY if there are multiple reducers partitions randomly in order to distribute data (and load) uniformly across the reducers.
Basically, the data in each reducer will be sorted according to the order that the user specified. 

Group By: 
Group By is done using aggregation. It is pretty much done the same as you would normally in any other SQL dialect. 
    INSERT OVERWRITE TABLE pv_gender_sum
    SELECT pv_users.gender, count (DISTINCT pv_users.userid)
    FROM pv_users
    GROUP BY pv_users.gender;
This query selects pv_users.gender and counts the distinct user_ids from the users table. In order to do count the users in a gender, you would first have to group all the users who are a certain gender together. 

Post Your Comment

Next Questions
Sqoop vs Flume
What is the default file format to import data using Apache Sqoop
we have around 300 tables in a database. I want to import all the tables from the database except the tables named Table298, Table 123, and Table299. How can I do this without having to import the tables one by one using sqoop
How can you execute a free form SQL query in Sqoop to import the rows in a sequential manner
How will you list all the columns of a table using Apache Sqoop
What is the difference between Sqoop and DistCP command
What is Sqoop metastore
What is the significance of using --split-by clause for running parallel import tasks in Apache Sqoop

Copyright ©2022 coderraj.com. All Rights Reserved.