Previous Definition: Velocity, Variety and Volume
New Definition: Velocity, Variety and Volume + Variability and Complexity
Packt is giving its readers a chance to dive into their comprehensive catalog of over 2000 books and videos for the next 7 days with LevelUp program:
Packt is offering all of its eBooks and Videos at just $10 each or less –
The more EXP customers want to gain, the more they save:
For more information please visit : www.packtpub.com/packt/offers/levelup
Here is a list of top players in Big Data world having influence over billion dollars (or more) Big Data projects directly or indirectly (not in order):
The list is based on each above companies involvement in Big data directly or indirectly along with a direct product or not. All of above companies are involved in Big Data projects worth considering Billion+ …
$ psql <dbname> -U <user_name>
After login at Postgresql Console:
List all rows where column value is null
Deleting all rows where column value is null:
Backing up a single table:
Restoring a single table back into database:
You can check out the Spark Summit 2014 agenda here: http://spark-summit.org/2014/agenda
Please register yourself at the summit site to get more details information.
Keywords: Spark Summit, Hadoop, Spark,
Sometimes you may need to access Hadoop runtime from a machine where Hadoop services are not running. In this process you will create password-less SSH access to Hadoop machine from your local machine and once ready you can use Hadoop API to access Hadoop cluster or you can directly use Hadoop commands from local machine by passing proper Hadoop configuration.
You can use these instructions on any VM running Hadoop or you can download HDP 1.3 or 2.1 Images from the link below:
Now start your VM and make sure your Hadoop cluster is up and running. Once you VM is up and running you will get IP address and hostname on the VM screen which is mostly 192.168.21.xxx as shown below:
Using the IP address provided you can check the Hadoop server status on port 8000 as below
HDP 1.3 – http://192.168.21.187:8000/about/
HDP 2.1 – http://192.168.21.186:8000/about/
The UI for both HDP1.3 and HDP 2.1 looks as below:
Now from your host machine you can also try to ssh to any of the machine using user name root and password hadoop as below:
The authenticity of host ‘192.168.21.187 (192.168.21.187)’ can’t be established.
RSA key fingerprint is b2:c0:9a:4b:10:b4:0f:c0:a0:da:7c:47:60:84:f5:dc.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘192.168.21.187’ (RSA) to the list of known hosts.
email@example.com’s password: hadoop
Last login: Thu Jun 5 03:55:17 2014
Now we will add password less SSH access to these VM and there could be two option:
In this option, first we will make sure we have RSA based key for SSH session in our local machine and then we will use it for password less SSH access:
Last login: Thu Jun 5 06:35:31 2014 from 192.168.21.1
Note: You will see that password is not needed this time as Password less SSH is working.
In this option first we will create a SSH based key first and then use it exactly with Option #1.
$ ssh-keygen -C ‘SSH Access Key’ -t rsa
Enter file in which to save the key (/home/avkashchauhan/.ssh/id_rsa): ENTER
Enter passphrase (empty for no passphrase): ENTER
Enter same passphrase again: ENTER
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 700 $HOME && chmod 700 ~/.ssh && chmod 600 ~/.ssh/*
Adding correct Java Home path to java
To get this working we will have to get Hadoop configuration files from HDP server to local machine and to do this you just need to copy Hadoop configuration files from HDP servers as below:
Create a folder name hdp13 in your working folder and now use SCP command to copy configuration files as below over password less SSH:
$ scp -r firstname.lastname@example.org:/etc/hadoop/conf.empty/ ~/hdp13
Create a folder name hdp21 in your working folder and now use SCP command to copy configuration files as below over password less SSH:
$ scp -r email@example.com:/etc/hadoop/conf/ ~/hdp21
Now visit to your hdp13 or hdp21 folder and edit hadoop-env.sh file with correct JAVA_HOME as below:
# The java implementation to use. Required.
# export JAVA_HOME=/usr/jdk/jdk1.6.0_31
export JAVA_HOME=`/usr/libexec/java_home -v 1.7`
Now you would need to add Hortonworks HDP hostnames into your local machines hosts file. On Mac OSX you would need to edit /private/etc/hosts file to add the following:
Once added make sure you can ping the hosts by name as below:
$ ping sandbox
PING sandbox (192.168.21.187): 56 data bytes
64 bytes from 192.168.21.187: icmp_seq=0 ttl=64 time=0.461 ms
And for HDP 2.1
$ ping sandbox.hortonworks.com
PING sandbox.hortonworks.com (192.168.21.186): 56 data bytes
64 bytes from 192.168.21.186: icmp_seq=0 ttl=64 time=0.420 ms
Now using local machine Hadoop runtime you can connect to Hadoop at HDP VM as below:
$ ./hadoop –config /Users/avkashchauhan/hdp13/conf.empty fs -ls /
Found 4 items
drwxr-xr-x – hdfs hdfs 0 2013-05-30 10:34 /apps
drwx—— – mapred hdfs 0 2014-06-05 03:54 /mapred
drwxrwxrwx – hdfs hdfs 0 2014-06-05 06:19 /tmp
drwxr-xr-x – hdfs hdfs 0 2013-06-10 14:39 /user
$ ./hadoop –config /Users/avkashchauhan/hdp21/conf fs -ls /
Found 6 items
drwxrwxrwx – yarn hadoop 0 2014-04-21 07:21 /app-logs
drwxr-xr-x – hdfs hdfs 0 2014-04-21 07:23 /apps
drwxr-xr-x – mapred hdfs 0 2014-04-21 07:16 /mapred
drwxr-xr-x – hdfs hdfs 0 2014-04-21 07:16 /mr-history
drwxrwxrwx – hdfs hdfs 0 2014-05-23 11:35 /tmp
drwxr-xr-x – hdfs hdfs 0 2014-05-23 11:35 /user
If you are using Hadoop API then you can pass the CONF file path to API and have access to Hadoop runtime.
Ambari Blueprint allows an operator to instantiate a Hadoop cluster quickly—and reuse the blueprint to replicate cluster instances elsewhere, for example, as development and test clusters, staging clusters, performance testing clusters, or co-located clusters.
Ambari now extends database support for Ambari DB, Hive and Oozie to include PostgreSQL. This means that Ambari now provides support for the key databases used in enterprises today: PostgreSQL, MySQL and Oracle. The PostgreSQL configuration choice is reflected in this database support matrix.