Sometime back I wrote a blog about Use SQL Developer to Access Hive Table on Hadoop. Recently I noticed another similar product: Toad for Hadoop. So I decided to give a try.
Like many people, I like Toad products in general and use Toad in many of my projects. Toad for Hadoop is a new product in the Toad family. The current version is Toad for Hadoop 1.3.1 Beta on Windows platform only. The software supports Cloudera CDH 5.x and Hortonworks Data Platform 2.3. The software is free for now. But You need to create an account with Dell before you can download the zip file. The entire process of installation and configuration are pretty simple and straight forward. Here are the steps:
Download the zip files
Go to Toad for Hadoop. Click Download button. The zip file is 555 MB in size.
I installed the software in my Window VM. Just double click ToadHadoop_1.3.1_Beta_x64.exe file and take the default values for all of installation screens. At the end of installation, it will open the software automatically.
Unlike so many buttons in the regular Toad software, this one looks quite simple.
Click the dropdown box on the right of Ecosystem box, then click Add New Ecosystem. The Select your Hadoop Cluster setup screen shows up as follows.
Input the name you want for this connection. For this one, I configured the connection for our X3 Big Data Appliance (BDA) full rack cluster with 18 nodes. So I input the Name as Enk-X3-DBA. For Detection Method, you can see it support Cloudera CDH via Cloudera Manager or Hortonworks HDP via Ambari. For this one, I chose CDH managed by Cloudera Manager for Detection Method.
Next screen is to Enter your Cloudera Manager credentials. Use the same url and port number that you access your Cludera Manager for Server Address. The user name is the user name you access Cludera Manager. Make sure you create your user directory on HDFS before you run the installation of the software, for example, create a folder /user/zhouw and change the permission to zhouw user for read/write access. Otherwise you will see permission exception later on.
Next screen shows Autodetection. It does many checks and validations and you should see the successful status for all of them.
Next one shows Ecosystem Configuration. In this screen, I just input zhouw for User Name. Then click Activate button. There is a bug in this version. Sometimes both Activate and Cancel buttons disappear. The workaround is just to close and restart the software.
The most frequently used screen is SQL Screen. You can run the SQLs against either Hive or Impala engine.
The screen is very similar to traditional Toad screen I use to. On the left panel, it shows the schemas and table names. The bottom panel shows the result. Although it has Explain Plan tab in the result panel, I usually consider Explain Plan on Hadoop as a joke at this time of writing. You can take a look, but I would not waste the time in checking out the plan. You will see more issues from other parts of Hadoop world instead of suboptimal query plans. The History panel on the right is an interesting one, which I found it very useful later on. It is not only shows the timing for my queries (or jobs), but also cache the result from the previous runs. It proves a smart feature and I don’t have to rerun my queries to get the result back.
Sometimes you might want to check out DDLs for certain tables. You can just right click the table and select Generate Create Table statement as follows:
Here is an example of generated DDL.
HDFS Screen is another feature I really like. It works just like Window Explorer and shows HDFS directory and files under it in tree structure. It also shows the size information for directory and files. With just a few clicks, you can quickly find out which directories and files are taking a lot of space. On the right panel, it can show you some content of the files. By default, it shows the first 4K of data. Very convenient and save me the time in typing multiple commands to find out the same kind of information. If you want to download and upload files from/to HDFS, just click Download and Upload buttons on the top.
Sometimes I am interested in the replication factor and physical locations of certain files on HDFS. Just right click the file on HDFS, then select Properties.
It shows everything about this file. For sizing information, it shows both Summary and individual block information.
The Chart Screen also looks nice. It does not have many charts in Cloudera Manager, but does have the necessary key information I usually want to know. I just list a few of them as follows:
The Service Screen is useful when you want to know where you deploy your services on Hadoop, like hostname and port number for certain services. It does not have everything, but good enough.
In general, Toad for Hadoop is a nice tool that can help you to quickly find out certain information on Hadoop without going through many screens and commands. I would say this tools is for Hadoop Administrators instead of regular Hadoop user. The reason is that you probably don’t want to give the Cloudera Manager access for every user.