Build an Oracle Big Data Discovery Project in 5 minutes

Oracle Big Data Discovery (BDD) is a nice visual analytic tools providing powerful capabilities that can turn raw data into business insight within minutes. The knowledge of hadoop or big data is not required. In this blog, I am going to show how to create a simple BDD project and do some interesting analysis within 5 minutes.

Recently there are some blogs and articles about the potential strong earthquake in California for the articles like 7.0+ earthquake could shake LA … today or tomorrow or Is California About to Be Destroyed by a Killer Quake?. It made me wondering how the earthquake activities looks like for the past month. I know there is a lot of websites providing this kind of analysis, but I am still interested in what I can find out from raw data by using Oracle BDD.

First, go to USGS website at https://earthquake.usgs.gov/earthquakes/feed/v1.0/csv.php and download the data of All earthquakes for the past 30 days. It is a CSV file containing the data from Sep 13 to Oct 12, 2016. Save it as all_earthquake_30days.csv.

Next, go to BDD login page. I just did a fresh install of BDD on our X3 full Rack Big Data Appliance (BDA) a few days back. So I login to URL: https://enkbda1node05:7004/bdd/web/home. The default installation location is 5th node on BDA and port 7003 for HTTP and port 7004 for HTTPS protocol.
bdd_1project_logon_1

After logon, click Add Data Set.
bdd_1project_adddataset_2

Click Browse button to find out the CSV file just downloaded from USGS website, click Next.
bdd_1project_adddataset_file_3

You will see Preview screen. For certain columns don’t really care (in this case, net column), just uncheck the column header, the column data will not show up in the analysis. I unchecked column net, id, updated, depthError, horizontalError, magError, magNst, type, status and locationSource, then click Next. The nice thing about BDD is that it detects the data and set the header information, delimiter and other setting information automatically.
bdd_1project_adddataset_preview_4

In the Data Set screen, input the data set name, description, and Hive table name. Then click Create.

bdd_1project_adddataset_create_5

After the data is loaded and indexed, the Explore screen shows up. You can see there is 9100 records in the dataset and 18 attributes (or columns) are indexed. Click Add to Project on the top right to create a new project for the data set.
bdd_1project_explore_6

Input the Project name and Description, click Add.
bdd_1project_explore_addproject_7

You can add one or more attributes to the scratchpad. For example, I want to add Mag column to scratchpad.
bdd_1project_explore_scratchpad_1
After adding to the scratchpad, it shows more detail about the target attribute.
bdd_1project_explore_scratchpad_2

Next, click Transform, then click Convert. Highlight latitude column, then click to Geocode.
bdd_1project_explore_transform_1

Select latitude column for Latitude and longitude column for Longitude. Give a new attribute name location, then click Add to Script, then click Commit to Project.
bdd_1project_explore_transform_2

You will see the new attribute location is added to data panel, and is committed to the project. After finish, click Discover on the top.

bdd_1project_explore_transform_3Drag MapComponent from right to the main panel. Automatically a nice map showing the earthquake location shows up.
bdd_1project_explore_discover_1

I want to know anything happened in CA. Click Search button, and input information for LA.
bdd_1project_explore_discover_2

A nice view of earthquake activities show up on the screen.
bdd_1project_explore_discover_3

Ok, next, I want to filter by mag column and want to see only the quakes a litte bigger. I chose mag between 1.8 and 5.2.
bdd_1project_explore_discover_4

You can see majority of the quakes are smaller than 1.8 with only 832 results out of 5070 selected.
bdd_1project_explore_discover_5

It seems there is an issue for BDD to recognize my transformed location column as geocode column. So I modified the CSV file and add a new column called geocode using the rule like Latitude Longtitude and created a new dataset. In this way, BDD can recognized this new column as geocode-enabled. I added this attribute to the scratchpad.
bdd_1project_geocode_1

Goto the Discover page and zoom in this thematic map to the US area.
bdd_1project_geocode_2

Zoom in more by checking out California.
bdd_1project_geocode_3

From the graph, it tells me the central California area (Mono County) has the largest earthquake activities with 732 records, followed by two South California counties, Riverside County (569) and Imperial County (428). The data covers only 30 days and definitely can not tell whether it is normal or not. It needs to be compared with a long history in the past. I would definitely leave the earthquake forecasting to experts and would not comment anything about the earthquake. Anyway, this blog just demonstrates how easy we can do the data discovery within a few minutes using Oracle BDD. It looks like an impressed tools and and I believe it will have strong potential in the big data world.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s