Postgresql – Tips and Trics

July 9, 2014 1 comment


$ psql <dbname> -U <user_name>

After login at Postgresql Console:

  •  Exit:
    • dbname=# \q
  • List all tables:
    • dbname=# \dt
  • Info about specific table
    • dbname=# \d+ <table_name>

List all rows where column value is null

  • perspectivedb=# select * from cluster_config_property where key is null;

Deleting all rows where column value is null:

  • perspectivedb=# delete from cluster_config_property where key is null;

Backing up a single table:

  • This is done from regular prompt (not when you are logged into psql)
  • $ pg_dump -t <table_name> <db_name> -U <user_name> > <target_file_name>.sql

Restoring a single table back into database:

Categories: Uncategorized

Watch Spark Summit 2014 on UStream

You can check out the Spark Summit 2014 agenda here:








U Stream Sessions :

Please register yourself at the summit site to get more details information.

Keywords: Spark Summit, Hadoop, Spark,

Accessing Remote Hadoop Server using Hadoop API or Tools from local machine (Example: Hortonworks HDP Sandbox VM)

Sometimes you may need to access Hadoop runtime from a machine where Hadoop services are not running. In this process you will create password-less SSH access to Hadoop machine from your local machine and once ready you can use Hadoop API to access Hadoop cluster or you can directly use Hadoop commands from local machine by passing proper Hadoop configuration.

Starting Hortonworks HDP 1.3 and/or 2.1 VM

You can use these instructions on any VM running Hadoop or you can download HDP 1.3 or 2.1 Images from the link below:

Now start your VM and make sure your Hadoop cluster is up and running. Once you VM is up and running you will get IP address and hostname on the VM screen which is mostly as shown below:

Screen Shot 2014-06-05 at 1.21.53 PM

Accessing Hortonworks HDP 1.3 and/or 2.1 from browser:

Using the IP address provided you can check the Hadoop server status on port 8000 as below

HDP 1.3 –

HDP 2.1 –

The UI for both HDP1.3 and HDP 2.1 looks as below:













Now from your host machine you can also try to ssh to any of the machine using user name root and password hadoop as below:

$ssh root@

The authenticity of host ‘ (’ can’t be established.
RSA key fingerprint is b2:c0:9a:4b:10:b4:0f:c0:a0:da:7c:47:60:84:f5:dc.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘’ (RSA) to the list of known hosts.
root@’s password: hadoop
Last login: Thu Jun 5 03:55:17 2014

Now we will add password less SSH access to these VM and there could be two option:

Option 1: You already have SSH key created for yourself earlier and want to reuse here:

In this option, first we will make sure we have RSA based key for SSH session in our local machine and then we will use it for password less SSH access:

  1. In your home folder (/Users/<yourname>) visit to folder name .ssh
  2. Identify a file name  (/Users/avkashchauhan/.ssh/ and you will see a long string key there
  3. Now also identify another file name  authorized_keys there (i.e. /Users/avkashchauhan/.ssh/authorized_keys) and you will see one or more long string keys there.
  4. Check the content of and make sure that this key is also available into authorized_keys files along with other keys (if there)
  5. Now copy the key string from file in memory.
  6. SSH to your HDP machine as in previous step using username and password
  7. visit to /root/.ssh folder
  8. You will find authorized_keys file there so open this file in editor and append the key here which you have copied in previous step #5.
  9. Save authorized_keys files
  10. Now in the same VM you will find file and please copy its content in memory.
  11. Exit the HDP VM
  12. In your host machine you have already checked authorized_keys in step #3, append the key from HDP VM into authorized_keys file and save it.
  13. Now try logging HDP VM as below:

ssh root@

Last login: Thu Jun 5 06:35:31 2014 from

Note: You will see  that password is not needed this time as Password less SSH is working.

Option 2: You haven’t created SSH key in your local machine and will do everything from scratch:

In this option first we will create a SSH based key first and then use it exactly with Option #1.

  • Log into your host machine and open terminal
  • For example your home folder will be /Users/<username>
  • Create a folder name .ssh inside your working folder
  • now go inside .ssh folder and run the following command

$ ssh-keygen -C ‘SSH Access Key’ -t rsa

Enter file in which to save the key (/home/avkashchauhan/.ssh/id_rsa): ENTER

Enter passphrase (empty for no passphrase): ENTER

Enter same passphrase again: ENTER

  • You will see id_rsa and files are created. Now we will append the contents of into authorized_keys files and it is not there then we will create and add. For both the command is as below:

$ cat ~/.ssh/ >> ~/.ssh/authorized_keys

  • In the above step you will see the contents of are included into authorized_keys.
  • Now we will set proper permissions for keys and folders as below:

$ chmod 700 $HOME && chmod 700 ~/.ssh && chmod 600 ~/.ssh/*

  • Finally we can follow Option #1 now to add both keys in both machines authorized_keys files to have password less ssh working.

Adding correct Java Home path to java

Migrating Hadoop configuration from Remote Machine to local Machine:

To get this working we will have to get Hadoop configuration files from HDP server to local machine and to do this you just need to copy Hadoop configuration files from HDP servers as below:

HDP 1.3:

Create a folder name hdp13 in your working folder and now use SCP command to copy configuration files as below over password less SSH:

$ scp -r root@ ~/hdp13

HDP 2.1:

Create a folder name hdp21 in your working folder and now use SCP command to copy configuration files as below over password less SSH:

$ scp -r root@ ~/hdp21

Adding correct JAVA_HOME to imported Hadoop configuration

Now visit to your hdp13 or hdp21 folder and edit file with correct JAVA_HOME as below:

# The java implementation to use. Required.
# export JAVA_HOME=/usr/jdk/jdk1.6.0_31  
export JAVA_HOME=`/usr/libexec/java_home -v 1.7`

Adding correct HDP Hostname into local machine hosts entries:

Now you would need to add Hortonworks HDP hostnames into your local machines hosts file. On Mac OSX you would need to edit /private/etc/hosts file to add the following:

#HDP 2.1
#HDP 1.3 sandbox


Once added make sure you can ping the hosts by name as below:

$ ping sandbox

PING sandbox ( 56 data bytes
64 bytes from icmp_seq=0 ttl=64 time=0.461 ms

And for HDP 2.1

$ ping
PING ( 56 data bytes
64 bytes from icmp_seq=0 ttl=64 time=0.420 ms

Access Hadoop Runtime on Remote Machine from Hadoop commands (or API) at Local Machine:

Now using local machine Hadoop runtime you can connect to Hadoop at HDP VM as below:

HDP 1.3

$ ./hadoop –config /Users/avkashchauhan/hdp13/conf.empty fs -ls /
Found 4 items
drwxr-xr-x – hdfs hdfs 0 2013-05-30 10:34 /apps
drwx—— – mapred hdfs 0 2014-06-05 03:54 /mapred
drwxrwxrwx – hdfs hdfs 0 2014-06-05 06:19 /tmp
drwxr-xr-x – hdfs hdfs 0 2013-06-10 14:39 /user

HDP 2.1

$ ./hadoop –config /Users/avkashchauhan/hdp21/conf fs -ls /
Found 6 items
drwxrwxrwx – yarn hadoop 0 2014-04-21 07:21 /app-logs
drwxr-xr-x – hdfs hdfs 0 2014-04-21 07:23 /apps
drwxr-xr-x – mapred hdfs 0 2014-04-21 07:16 /mapred
drwxr-xr-x – hdfs hdfs 0 2014-04-21 07:16 /mr-history
drwxrwxrwx – hdfs hdfs 0 2014-05-23 11:35 /tmp
drwxr-xr-x – hdfs hdfs 0 2014-05-23 11:35 /user

If you are using Hadoop API then you can pass the CONF file path to API and have access to Hadoop runtime.


Apache Ambari 1.6.0 support with Blueprints is released

What is Ambari Blueprint?

Ambari Blueprint allows an operator to instantiate a Hadoop cluster quickly—and reuse the blueprint to replicate cluster instances elsewhere, for example, as development and test clusters, staging clusters, performance testing clusters, or co-located clusters.

Release URL:








Ambari Blueprint supports PostgreSQL:

Ambari now extends database support for Ambari DB, Hive and Oozie to include PostgreSQL. This means that Ambari now provides support for the key databases used in enterprises today: PostgreSQL, MySQL and Oracle. The PostgreSQL configuration choice is reflected in this database support matrix.

More Links:


Content Source:

Free ebook: Introducing Microsoft Azure HDInsight

New Free eBook by Microsoft Press:

Microsoft Press is thrilled to share another new free ebook with you:Introducing Microsoft Azure HDInsight, by Avkash Chauhan, Valentine Fontama, Michele Hart, Wee Hyong Tok, and Buck Woody. 


Free ebook: Introducing Microsoft Azure HDInsight

Introduction (excerpt)

Microsoft Azure HDInsight is Microsoft’s 100 percent compliant distribution of Apache Hadoop on Microsoft Azure. This means that standard Hadoop concepts and technologies apply, so learning the Hadoop stack helps you learn the HDInsight service. At the time of this writing, HDInsight (version 3.0) uses Hadoop version 2.2 and Hortonworks Data Platform 2.0.

In Introducing Microsoft Azure HDInsight, we cover what big data really means, how you can use it to your advantage in your company or organization, and one of the services you can use to do that quickly—specifically, Microsoft’s HDInsight service. We start with an overview of big data and Hadoop, but we don’t emphasize only concepts in this book—we want you to jump in and get your hands dirty working with HDInsight in a practical way. To help you learn and even implement HDInsight right away, we focus on a specific use case that applies to almost any organization and demonstrate a process that you can follow along with.

We also help you learn more. In the last chapter, we look ahead at the future of HDInsight and give you recommendations for self-learning so that you can dive deeper into important concepts and round out your education on working with big data.

Here are the download links (and below the links you’ll find an ebook excerpt that describes this offering):

Download the PDF (6.37 MB; 130 pages) from

Download the EPUB (8.46 MB) from

Download the MOBI (12.8 MB) from

Download the code samples (6.83 KB) from

Packt celebrates International Day Against DRM, May 6th

International Day Against DRM, May 6th

Packt Publishing firmly believes that you should be able to read and interact with your content when you want, where you want, and how you want – to that end they have been advocates of DRM-free content since their very first eBook was published back in 2004.

To show their continuing support for Day Against DRM, Packt Publishing is offering  all its DRM-free content at $10 for 24 hours only on May 6th eBooks and Videos at “Our top priority at Packt has always been to meet the evolving needs of developers in the most practical way possible, while at the same time protecting the hard work of our authors. DRM-free content continues to be instrumental in making that happen, providing the flexibility and freedom that is essential for an efficient and enhanced learning experience. That’s why we’ve been DRM-free from the beginning – we’ll never put limits on the innovation of our users.”

– Dave Maclean, Managing Director

Advocates of Day Against DRM are invited to spread the word and celebrate on May 6th by exploring the full range of DRM-free content at, where all eBooks and Videos will be $10 for 24 hours.


Buy my book Learning Cloudera Impala from Packt Publication


Categories: Announcement, Big Data

Hadoop 2.4.0 release (helpful links)

April 14, 2014 2 comments

Kudos to Hadoop community as Hadoop 2.4.0 release is available for everyone to consume. A small list of improvements in HDFS, MapReduce along with overall framework are as below but not limited to:

Hadoop 2.4.0 Highlights:

  • HDFS:
    • Full HTTPS support
    • ACL Supported HDFS, allows easier access to Apache Sentry-managed data by components using it
    • Native supported Rolling upgrades in HDFS
    • HDFS FSImage using protocol-buffers for smoother operational upgrades
  • YARN:
    • ResourceManager HA Automatic Failover
    •  YARN Timeline Server PREVIEW for storing and serving generic application history

Hadoop 2.4.0 Release Notes:

Hadoop 2.4.0 Source download:

Hadoop 2.4.0 Binary download:


Get every new post delivered to your Inbox.

Join 46 other followers

%d bloggers like this: