Build an Oracle Big Data Discovery Project in 5 minutes

Oracle Big Data Discovery (BDD) is a nice visual analytic tools providing powerful capabilities that can turn raw data into business insight within minutes. The knowledge of hadoop or big data is not required. In this blog, I am going to show how to create a simple BDD project and do some interesting analysis within 5 minutes.

Recently there are some blogs and articles about the potential strong earthquake in California for the articles like 7.0+ earthquake could shake LA … today or tomorrow or Is California About to Be Destroyed by a Killer Quake?. It made me wondering how the earthquake activities looks like for the past month. I know there is a lot of websites providing this kind of analysis, but I am still interested in what I can find out from raw data by using Oracle BDD.

First, go to USGS website at https://earthquake.usgs.gov/earthquakes/feed/v1.0/csv.php and download the data of All earthquakes for the past 30 days. It is a CSV file containing the data from Sep 13 to Oct 12, 2016. Save it as all_earthquake_30days.csv.

Next, go to BDD login page. I just did a fresh install of BDD on our X3 full Rack Big Data Appliance (BDA) a few days back. So I login to URL: https://enkbda1node05:7004/bdd/web/home. The default installation location is 5th node on BDA and port 7003 for HTTP and port 7004 for HTTPS protocol.
bdd_1project_logon_1

After logon, click Add Data Set.
bdd_1project_adddataset_2

Click Browse button to find out the CSV file just downloaded from USGS website, click Next.
bdd_1project_adddataset_file_3

You will see Preview screen. For certain columns don’t really care (in this case, net column), just uncheck the column header, the column data will not show up in the analysis. I unchecked column net, id, updated, depthError, horizontalError, magError, magNst, type, status and locationSource, then click Next. The nice thing about BDD is that it detects the data and set the header information, delimiter and other setting information automatically.
bdd_1project_adddataset_preview_4

In the Data Set screen, input the data set name, description, and Hive table name. Then click Create.

bdd_1project_adddataset_create_5

After the data is loaded and indexed, the Explore screen shows up. You can see there is 9100 records in the dataset and 18 attributes (or columns) are indexed. Click Add to Project on the top right to create a new project for the data set.
bdd_1project_explore_6

Input the Project name and Description, click Add.
bdd_1project_explore_addproject_7

You can add one or more attributes to the scratchpad. For example, I want to add Mag column to scratchpad.
bdd_1project_explore_scratchpad_1
After adding to the scratchpad, it shows more detail about the target attribute.
bdd_1project_explore_scratchpad_2

Next, click Transform, then click Convert. Highlight latitude column, then click to Geocode.
bdd_1project_explore_transform_1

Select latitude column for Latitude and longitude column for Longitude. Give a new attribute name location, then click Add to Script, then click Commit to Project.
bdd_1project_explore_transform_2

You will see the new attribute location is added to data panel, and is committed to the project. After finish, click Discover on the top.

bdd_1project_explore_transform_3Drag MapComponent from right to the main panel. Automatically a nice map showing the earthquake location shows up.
bdd_1project_explore_discover_1

I want to know anything happened in CA. Click Search button, and input information for LA.
bdd_1project_explore_discover_2

A nice view of earthquake activities show up on the screen.
bdd_1project_explore_discover_3

Ok, next, I want to filter by mag column and want to see only the quakes a litte bigger. I chose mag between 1.8 and 5.2.
bdd_1project_explore_discover_4

You can see majority of the quakes are smaller than 1.8 with only 832 results out of 5070 selected.
bdd_1project_explore_discover_5

It seems there is an issue for BDD to recognize my transformed location column as geocode column. So I modified the CSV file and add a new column called geocode using the rule like Latitude Longtitude and created a new dataset. In this way, BDD can recognized this new column as geocode-enabled. I added this attribute to the scratchpad.
bdd_1project_geocode_1

Goto the Discover page and zoom in this thematic map to the US area.
bdd_1project_geocode_2

Zoom in more by checking out California.
bdd_1project_geocode_3

From the graph, it tells me the central California area (Mono County) has the largest earthquake activities with 732 records, followed by two South California counties, Riverside County (569) and Imperial County (428). The data covers only 30 days and definitely can not tell whether it is normal or not. It needs to be compared with a long history in the past. I would definitely leave the earthquake forecasting to experts and would not comment anything about the earthquake. Anyway, this blog just demonstrates how easy we can do the data discovery within a few minutes using Oracle BDD. It looks like an impressed tools and and I believe it will have strong potential in the big data world.

Advertisements

Disabling Firewall after Turning off Firewall

firewall
Many applications requires to disable firewall on Linux. The most common used commands are as follows:

Stop the ipchains service.
# service ipchains stop
Stop the iptables service.
# service iptables stop
Stop the ipchains service after reboot.
# chkconfig ipchains off
Stop the iptables service after reboot.
# chkconfig iptables off

Another popular one is to set SELINUX=disabled in the /etc/selinux/config file to disable some extra security restrictions.

The above usually works fine with me when turning off firewall. Recently I run into a situation that makes me to add extra check for firewall stuff. The consultant tried to install Oracle Big Data Discovery on a Red Hat Linux VM and connect it to an Oracle Big Data Appliance (BDA) X6-2 Starter Rack. He used similar approaches as above to turn off the firewall and Linux security between this Red Hat VM and BDA. But still run into a weird issue when BDD application on BDA nodes try to pull a request from a web service on this Red Hat VM. The result has never come back.

I tried ping and ssh. Both worked. Hmm, it does show the connectivity between both. Looks like
firewall issue. Check with network infrastructure team. It has firewall rules between the two, but not enabled yet.

I noticed the OS is Red Hat 7.1 Linux. Could be some new firewall feature in 7.1? After some investigation, yes, it does. On Redhat 7 Linux, the firewall run as firewalld daemon. So let me find out what it does.

[root@bddhost ~]# firewall-cmd --zone=public --list-services 
dhcpv6-client ssh

[root@bddhost ~]# firewall-cmd --get-default-zone 
public

[root@bddhost ~]# firewall-cmd --list-all
public (default, active)
  interfaces: eth0 eth2
  sources:
  services: dhcpv6-client ssh
  ports:
  masquerade: no
  forward-ports:
  icmp-blocks:
  rich rules:

The above commands shows the firewall allows only ssh service. Not wonder http web service is not working.

Ok, let me stop it.

[root@bddhost ~]# systemctl stop firewalld
[root@bddhost ~]# firewall-cmd --list-ports
FirewallD is not running

Right now the WGET is working from BDA to BDD VM.

[root@uat-bda1node01 ~]# wget http://192.168.2113:7003/endeca-server/ws/config?wsdl
--2016-10-03 18:56:29--  http://192.168.2113:7003/endeca-server/ws/config?wsdl
Connecting to 192.168.2113:7003... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2529 (2.5K) [text/xml]
Saving to: “config?wsdl”
100%[============================================>] 2,529       --.-K/s   in 0s
2016-10-03 18:56:29 (456 MB/s) - “config?wsdl” saved [2529/2529]

The above changes works only if the server is not rebooted.

[root@bddhost ~]# systemctl status firewalld
firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled)
   Active: inactive (dead) since Mon 2016-10-03 18:56:22 SGT; 10min ago
 Main PID: 1016 (code=exited, status=0/SUCCESS)

Sep 30 12:52:35 localhost.localdomain systemd[1]: Started firewalld - dynamic fire....
Sep 30 15:13:09 bddhost.example.com firewalld[1016]: 2016-09-30 15:13:09 ERR...
Oct 03 18:56:21 bddhost systemd[1]: Stopping firewalld - dynamic firewall dae.....
Oct 03 18:56:22 bddhost systemd[1]: Stopped firewalld - dynamic firewall daemon.
Hint: Some lines were ellipsized, use -l to show in full.

To make the change to be permeant, need to do the following:

[root@bddhost ~]# systemctl disable firewalld
rm '/etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service'
rm '/etc/systemd/system/basic.target.wants/firewalld.service’

[root@bddhost ~]# systemctl status firewalld
firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled)
   Active: inactive (dead)

Sep 30 12:52:33 localhost.localdomain systemd[1]: Starting firewalld - dynamic fir....
Sep 30 12:52:35 localhost.localdomain systemd[1]: Started firewalld - dynamic fire....
Sep 30 15:13:09 bddhost.example.com firewalld[1016]: 2016-09-30 15:13:09 ERR...
Oct 03 18:56:21 bddhost systemd[1]: Stopping firewalld - dynamic firewall dae.....
Oct 03 18:56:22 bddhost systemd[1]: Stopped firewalld - dynamic firewall daemon.
Hint: Some lines were ellipsized, use -l to show in full.

To learn more about this firewalld daemon, please check out this link at https://www.digitalocean.com/community/tutorials/how-to-set-up-a-firewall-using-firewalld-on-centos-7.