Detailed explanation of HBase and hive data synchronization

Hive's table data can be synchronized to impala. Generally, impala provides real-time query operations, such as time-consuming warehousing operations. We can use hive and then synchronize the data to impala. In addition, we can create a table in hive and map the table in HBase to achieve data synchronization.

Next, the author introduces it in turn.

1、 Data synchronization between impala and hive

First, we execute showdatabases on the hive command line; You can see the following databases:

Then, we also execute showdatabases in impala; You can see:

The current databases are the same.

Next, we execute create database qyk in hive_ test; Create a database as follows:

Then we use qyk_ Test creates a table in the database and executes create table user_ info(idbigint,account string,name string,age int) row format delimitedfields terminated by ‘\t'; As follows:

At this point, we have created it on the hive side, and then directly execute showdatabases on the Impala side; You can see:

Even qyk_ There is no such database.

Next, we execute invalidemetadata in impala; Then query to see:

The database and tables will be synchronized.

Well, let me make a summary:

If you add or delete databases, tables, or data in hive, you need to execute invalidemetadata in impala; Command to synchronize hive data with impala;

If you directly add or delete databases, tables or data in impala, it will be automatically synchronized to hive without executing any commands.

2、 Data synchronization between hive and HBase

First, we create a table create 'user' in HBase_ Sysc ', {name = >' info '}, and then we execute it in hive

Create an external table to point to the table in HBase, and then insert into tableuser in hive_ sysc select id,name fromuser_ info; Further data to user_ Sysc can see:

Then, we execute scan 'user' in HBase_ Sysc 'you can see:

Next, we execute delete all 'user' in HBase_ Sysc ',' 11 'delete a piece of data as follows:

Then, I'll check in hive, as follows:

It means automatic synchronization. Therefore, as long as the hive table is mapped with the table in HBase when it is created, the table name and field name can be inconsistent. After that, whether data is added or deleted in HBase or in hive, it will be synchronized automatically.

If the external table created in hive needs to be created in HBase first, the internal table will automatically create the specified table name in HBase.

Because hive does not support deletion and other operations, and HBase is more convenient, we can use this method.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>