Home >> Big Data Hadoop >> What is difference between hive internal table and external table in Big Data Hadoop

What is difference between hive internal table and external table in Big Data Hadoop

Internal Table  also known as Managed Table, is the one that is managed by Hive. When you point data in HDFS to such table, the data is moved to Hive default location /ust/hive/warehouse/. And, then if such internal table is dropped, the data is deleted along with.
External table on the other hand is user managed, and data is not moved to hive default directory after loading i.e, any custom location can be specified. Consecutively, when you drop such table, no data is deleted, only table schema is dropped.
Use EXTERNAL tables when:
1.The data is also used outside of Hive. For example, the data files are read and processed by an existing program that doesn't lock the files. 
2.Data needs to remain in the underlying location even after a DROP TABLE. This can apply if you are pointing multiple schemas (tables or views) at a single data set or if you are iterating through various possible schemas. 
3.You want to use a custom location such as ASV. 
4.Hive should not own data and control settings, dirs, etc., you have another program or process that will do those things. 
5.You are not creating table based on existing table (AS SELECT). 
6.External table stores files on the HDFS server but tables are not linked to the source file completely.
7.If you delete an external table the file still remains on the HDFS server.

Use INTERNAL tables when:
The data is temporary.
Hive to completely manage the lifecycle of the table and data.

Post Your Comment

Next Questions
Why you should choose Hive instead of Hadoop MapReduce
How will you optimize Hive performance
Hive Difference between Sort By and Order By
Sqoop vs Flume
What is the default file format to import data using Apache Sqoop
we have around 300 tables in a database. I want to import all the tables from the database except the tables named Table298, Table 123, and Table299. How can I do this without having to import the tables one by one using sqoop
How can you execute a free form SQL query in Sqoop to import the rows in a sequential manner
How will you list all the columns of a table using Apache Sqoop
What is the difference between Sqoop and DistCP command
What is Sqoop metastore
What is the significance of using --split-by clause for running parallel import tasks in Apache Sqoop

Copyright ©2022 coderraj.com. All Rights Reserved.