Oracle Big Data Discovery (BDD) is a nice visual analytic tools providing powerful capabilities that can turn raw data into business insight within minutes. The knowledge of hadoop or big data is not required. In this blog, I am going to show how to create a simple BDD project and do some interesting analysis within 5 minutes.
Recently there are some blogs and articles about the potential strong earthquake in California for the articles like 7.0+ earthquake could shake LA … today or tomorrow or Is California About to Be Destroyed by a Killer Quake?. It made me wondering how the earthquake activities looks like for the past month. I know there is a lot of websites providing this kind of analysis, but I am still interested in what I can find out from raw data by using Oracle BDD.
First, go to USGS website at https://earthquake.usgs.gov/earthquakes/feed/v1.0/csv.php and download the data of All earthquakes for the past 30 days. It is a CSV file containing the data from Sep 13 to Oct 12, 2016. Save it as all_earthquake_30days.csv.
Next, go to BDD login page. I just did a fresh install of BDD on our X3 full Rack Big Data Appliance (BDA) a few days back. So I login to URL: https://enkbda1node05:7004/bdd/web/home. The default installation location is 5th node on BDA and port 7003 for HTTP and port 7004 for HTTPS protocol.
You will see Preview screen. For certain columns don’t really care (in this case, net column), just uncheck the column header, the column data will not show up in the analysis. I unchecked column net, id, updated, depthError, horizontalError, magError, magNst, type, status and locationSource, then click Next. The nice thing about BDD is that it detects the data and set the header information, delimiter and other setting information automatically.
In the Data Set screen, input the data set name, description, and Hive table name. Then click Create.
After the data is loaded and indexed, the Explore screen shows up. You can see there is 9100 records in the dataset and 18 attributes (or columns) are indexed. Click Add to Project on the top right to create a new project for the data set.
You will see the new attribute location is added to data panel, and is committed to the project. After finish, click Discover on the top.
It seems there is an issue for BDD to recognize my transformed location column as geocode column. So I modified the CSV file and add a new column called geocode using the rule like Latitude Longtitude and created a new dataset. In this way, BDD can recognized this new column as geocode-enabled. I added this attribute to the scratchpad.
From the graph, it tells me the central California area (Mono County) has the largest earthquake activities with 732 records, followed by two South California counties, Riverside County (569) and Imperial County (428). The data covers only 30 days and definitely can not tell whether it is normal or not. It needs to be compared with a long history in the past. I would definitely leave the earthquake forecasting to experts and would not comment anything about the earthquake. Anyway, this blog just demonstrates how easy we can do the data discovery within a few minutes using Oracle BDD. It looks like an impressed tools and and I believe it will have strong potential in the big data world.