vy32

Asked: 2015-11-01 09:27:18 +0800 CST2015-11-01 09:27:18 +0800 CST 2015-11-01 09:27:18 +0800 CST

Hadoop Hive, Impala, Pig, and more — SQL access to Hadoop?

It appears that Hive, Impala, Pig, and others all provide SQL or SQL-like access to data stored on Hadoop clusters. They all seem to have support for HDFS, S3, and other forms.

So why are there so many different ways for accessing Hadoop information by SQL, how are they different, and how does their performance compare?

Do we have so many different versions because all of the projects were started at the same time for more or less the same reason? If so, is there an advantage to knowing more than one of them?

I have found several articles that attempt to explain the differences (e.g. 10 ways to query hadoop with SQL and Selecting the right SQL on Hadoop, but mostly they just list features.

Hadoop Hive, Impala, Pig, and more — SQL access to Hadoop?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?

Hadoop Hive, Impala, Pig, and more — SQL access to Hadoop?

0 Answers