Archive
Adventure with Postgresql on Ubuntu
Setting postgresql to not start during startup:
To enable or disable postgresql start at machine startup edit start.conf located @ /etc/postgresql/9.2/main/ folder and add proper configuration auto | manual | disabled
$sudo vi /etc/postgresql/9.2/main/start.conf
# Automatic startup configuration
# auto: automatically start/stop the cluster in the init script
# manual: do not start/stop in init scripts, but allow manual startup with
# pg_ctlcluster
# disabled: do not allow manual startup with pg_ctlcluster (this can be easily
# circumvented and is only meant to be a small protection for
# accidents).
auto
postgres database server Startup or Shutdown:
postgresql database server can start directly as service from services tool i.e. “sudo services postgresql start|stop|*” or using a wrapper named pg_ctl, as below:
$sudo services postgresql start
$ /usr/lib/postgresql/9.2/bin/pg_ctl
If postgresql is start through wrapper with specific catalog, then the status display different results as below:
$ sudo service postgresql status
9.2/main (port 5432): down
$ /usr/lib/postgresql/9.2/bin/pg_ctl status -D /home/hadoopuser/hadoopuser_DATA/pgdb -l /home/hadoopuser/hadoopuser_DATA/logs/pg.log -w
pg_ctl: server is running (PID: 3381)
/usr/lib/postgresql/9.2/bin/postgres “-D” “/home/hadoopuser/hadoopuser_DATA/pgdb”
hadoopuser@HADOOP_CLUSTER:~$
Handling portgresql error “could not create lock file “/var/run/postgresql/.s.PGSQL.5432.lock”: Permission denied”
Sometimes starting postgresql server returns the following error:
WARNING: could not create listen socket for “*”
FATAL: could not create any TCP/IP sockets
FATAL: could not create lock file “/var/run/postgresql/.s.PGSQL.5432.lock”: Permission denied
The problem is that in ubuntu postgresql is running and during installation hadoopuser is starting it and because of ownership issue hadoopuser can not use it and user see the lock issue as shown in above logs:
hadoopuser@hadoopserver:~$ ls -l /var/run/
drwxrwsr-x 2 postgres postgres 40 May 28 22:16 postgresql
hadoopuser@hadoopserver:~$ sudo chmod 777 /var/run/postgresql
hadoopuser@hadoopserver:/usr/local/hadoopuser/current$ ls -l /var/run/
drwxrwsrwx 2 postgres postgres 80 May 28 22:32 postgresql
Handling postgresql error “Is another postmaster already running on port 5432″
Sometimes starting postgresql server returns the following error:
HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
LOG: could not bind IPv6 socket: Address already in use
HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
WARNING: could not create listen socket for “*”
FATAL: could not create any TCP/IP sockets
To solve this problem please first check if postgresql is running as below:
hadoopuser@hadoopserver:~$ ps auxwww | grep postg
hadoopuser 1805 0.0 0.1 137520 10212 pts/0 S 22:25 0:00 /usr/lib/postgresql/9.2/bin/postgres -D /home/hadoopuser/hadoopuser_DATA/pgdb
hadoopuser 1807 0.0 0.0 137520 1492 ? Ss 22:25 0:00 postgres: checkpointer process
hadoopuser 1808 0.0 0.0 137520 1732 ? Ss 22:25 0:00 postgres: writer process
hadoopuser 1809 0.0 0.0 137520 1492 ? Ss 22:25 0:00 postgres: wal writer process
hadoopuser 1810 0.0 0.0 138300 2776 ? Ss 22:25 0:00 postgres: autovacuum launcher process
hadoopuser 1811 0.0 0.0 97208 1572 ? Ss 22:25 0:00 postgres: stats collector process
hadoopuser 2165 0.0 0.0 8104 924 pts/0 S+ 22:32 0:00 grep –color=auto postg
Stop the postgresql service as below:
hadoopuser@hadoopserver:~$ sudo service postgresql stop
* Stopping PostgreSQL 9.2 database server [ OK ]
Note: Above command does not guarantee to stop services if the database server was started in other user context and actively running with an open database.
or kill the postgresql process as
hadoopuser@hadoopserver:~$ killall postg
Adding SSH access to additional users in Amazon EC2 instance
Step 1: First you would need to find the location of SSH authorized_keys for the main user i.e. ubuntu. The process is as below:
ubuntu@ip-xx-xx-xx-xx:/home/amazonuser$ ls -la /home/ubuntu/.ssh
total 12
drwx—— 2 ubuntu ubuntu 4096 May 21 19:40 .
drwxr-xr-x 6 ubuntu ubuntu 4096 May 22 01:11 ..
-rw——- 1 ubuntu ubuntu 394 May 21 19:40 authorized_keys
Once you figure out above, you would need to follow below steps to copy above SSH authorized_keys to same location in your new user home folder. In example below the new user account is amazonuser while main account user is ubuntu:
ubuntu@ip-xx-xx-xx-xx:/home/amazonuser$ sudo mkdir /home/amazonuser/.ssh
ubuntu@ip-xx-xx-xx-xx:/home/amazonuser$ sudo chmod 700 /home/amazonuser/.ssh
ubuntu@ip-xx-xx-xx-xx:/home/amazonuser$ sudo chown amazonuser:amazonuser /home/amazonuser/.ssh
ubuntu@ip-xx-xx-xx-xx:/home/amazonuser$ sudo cp /home/ubuntu/.ssh/authorized_keys .
ubuntu@ip-xx-xx-xx-xx:/home/amazonuser$ ls -l
total 4
-rw——- 1 root root 394 May 22 01:21 authorized_keys
ubuntu@ip-xx-xx-xx-xx:/home/amazonuser$ sudo cp authorized_keys .ssh/
ubuntu@ip-xx-xx-xx-xx:/home/amazonuser$ ls -l
total 4
-rw——- 1 root root 394 May 22 01:21 authorized_keys
ubuntu@ip-xx-xx-xx-xx:/home/amazonuser$ sudo chmod 600 .ssh/authorized_keys
ubuntu@ip-xx-xx-xx-xx:/home/amazonuser$ sudo ls -la .ssh/
total 12
drwx—— 2 amazonuser amazonuser 4096 May 22 01:21 .
drwxr-xr-x 3 amazonuser amazonuser 4096 May 22 01:21 ..
-rw——- 1 root root 394 May 22 01:21 authorized_keys
ubuntu@ip-xx-xx-xx-xx:/home/amazonuser$ sudo chown amazonuser:amazonuser .ssh/authorized_keys
ubuntu@ip-xx-xx-xx-xx:/home/amazonuser$ sudo ls -la .ssh/
total 12
drwx—— 2 amazonuser amazonuser 4096 May 22 01:21 .
drwxr-xr-x 3 amazonuser amazonuser 4096 May 22 01:21 ..
-rw——- 1 amazonuser amazonuser 394 May 22 01:21 authorized_keys
Thats all you need. Now try SSH to amazon cluster using new account user amazonuser.
Upgrading Pycrypto using pip in Ubuntu
Here are the steps to upgrade pycrypto library in ubuntu machine:
Step 1: check pycrypto version
ubuntu@ip-***:~$ pip show pycrypto
—
Name: pycrypto
Version: 2.4.1
Location: /usr/local/lib/python2.7/dist-packages
Requires:
Note: If you dont have pip working try installing
$ sudo apt-get install python-devel
$ easy_install pip
Step 2: upgrade pycrypto using pip
ubuntu@ip-10-254-71-179:~$ pip install –upgrade pycrypto
Downloading/unpacking pycrypto from https://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.6.tar.gz#md5=88dad0a270d1fe83a39e0467a66a22bb
Downloading pycrypto-2.6.tar.gz (443kB): 443kB downloaded
Running setup.py egg_info for package pycrypto
Installing collected packages: pycrypto
Found existing installation: pycrypto 2.4.1
Uninstalling pycrypto:
…..
Successfully installed pycrypto
Cleaning up…
Step 3: Verifying the upgrade
ubuntu@ip-10-254-71-179:~$ pip show pycrypto
—
Name: pycrypto
Version: 2.6
Location: /usr/local/lib/python2.7/dist-packages
Requires:
Amazon EC2 Security Group (Firewall) settings for Hadoop Cluster
When setting Hadoop cluster in Amazon EC2 you would need to configure proper security settings (firewall) so you can access Hadoop cluster directly. Following are the settings for Cloudera CDH4 Hadoop distribution on EC2:
Port 22 for SSH, Port 7180/82 for CDH Manager, 7432 for PSQL and 8888 for Hue and finally Port 50000-50100 for Hadoop JT and HDFS.
Connecting Amazon EC2 Linux machine over SSH with Secure Key
While creating a Linux instance at Amazon EC2 you can choose to “Create and Download SSH Key” option which will let you download a PEM file (SSH Secure Key) on your local machine and you can use this SSH secure key while connecting your Linux instance at Amazon.
Once your Linux instance is ready and running at Amazon you can get instance URL from Amazon portal shown as below:
You can use iTerm/Terminal (Mac) or Putty/BitVise SSH client (Windows) applications to connect your Linux instance as below:
1. Change your PEM file permissions
$chmod -R 700 Downloads/your_instance_ssh_key.pem
2. Now you can use the ssh command to connect to your Linux instance by using -i parameter to use key:
$ ssh -i your_ssh_security_key.pem root@ec2-NN-NNN-N-NN.us-west-2.compute.amazonaws.com
Thats all!!
Finding Hadoop specific processes running in a Hadoop Cluster
Recently I was asked to provide info on all Hadoop specific process running in a Hadoop cluster. I decided to run few commands as below to provide that info.
Hadoop 2.0.x on Linux (CentOS 6.3) – Single Node Cluster
First list all Java process running in the cluster
[cloudera@localhost usr]$ ps -A | grep java
1768 ? 00:00:28 java
2197 ? 00:00:54 java
2439 ? 00:00:30 java
2507 ? 00:01:19 java
2654 ? 00:00:35 java
2784 ? 00:00:52 java
2911 ? 00:00:56 java
3028 ? 00:00:31 java
3239 ? 00:00:59 java
3344 ? 00:01:11 java
3446 ? 00:00:27 java
3551 ? 00:00:30 java
3644 ? 00:00:22 java
3878 ? 00:01:08 java
4142 ? 00:02:16 java
4201 ? 00:00:36 java
4223 ? 00:00:25 java
4259 ? 00:00:21 java
4364 ? 00:00:29 java
4497 ? 00:11:11 java
4561 ? 00:00:44 java
Next dig each Java specific process to dig further to see which Hadoop specific application is running within Java proc:
[cloudera@localhost usr]$ ps -aef | grep java
499 1768 1 0 08:29 ? 00:00:29 /usr/java/jdk1.6.0_31/bin/java -Dzookeeper.datadir.autocreate=false -Dzookeeper.log.dir=/var/log/zookeeper -********
yarn 2197 1 0 08:29 ? 00:00:55 /usr/java/jdk1.6.0_31/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn ********
sqoop2 2439 1 0 08:29 ? 00:00:31 /usr/java/jdk1.6.0_31/bin/java -Djava.util.logging.config.file=/usr/lib/sqoop2/sqoop-server/conf/logging.properties -Dsqoop.config.dir=/etc/sqoop2/conf ****************
yarn 2507 1 0 08:29 ? 00:01:21 /usr/java/jdk1.6.0_31/bin/java -Dproc_nodemanager -Xmx1000m -server -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn **********
mapred 2654 1 0 08:30 ? 00:00:36 /usr/java/jdk1.6.0_31/bin/java -Dproc_historyserver -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-mapreduce -Dhadoop.log.file=yarn-mapred-historyserver-localhost.localdomain.log -Dhadoop.home.dir=/usr/lib/hadoop ********
hdfs 2784 1 0 08:30 ? 00:00:53 /usr/java/jdk1.6.0_31/bin/java -Dproc_datanode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-datanode-localhost.localdomain.log ********
hdfs 2911 1 0 08:30 ? 00:00:57 /usr/java/jdk1.6.0_31/bin/java -Dproc_namenode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-namenode-localhost.localdomain.log *********
hdfs 3028 1 0 08:30 ? 00:00:31 /usr/java/jdk1.6.0_31/bin/java -Dproc_secondarynamenode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-secondarynamenode-localhost.localdomain.log -Dhadoop.home.dir=/usr/lib/hadoop ********
hbase 3239 1 0 08:31 ? 00:01:00 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-master-localhost.localdomain.log *******
hbase 3344 1 0 08:31 ? 00:01:13 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC ****
hbase 3446 1 0 08:31 ? 00:00:28 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-rest-localhost.localdomain.log -Dhbase.home.dir=/usr/lib/hbase/bin/*******
hbase 3551 1 0 08:31 ? 00:00:31 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-thrift-localhost.localdomain.log *******
flume 3644 1 0 08:31 ? 00:00:23 /usr/java/jdk1.6.0_31/bin/java -Xmx20m -cp /etc/flume-ng/conf:/usr/lib/flume-ng/lib/*:/etc/hadoop/conf:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/asm-3.2.jar *******
root 3865 1 0 08:31 ? 00:00:00 su mapred -s /usr/java/jdk1.6.0_31/bin/java — -Dproc_jobtracker -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-jobtracker-localhost.localdomain.log ********
mapred 3878 3865 0 08:31 ? 00:01:09 java -Dproc_jobtracker -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-jobtracker-localhost.localdomain.log -Dhadoop.home.dir=/usr/lib/hadoop-0.20-mapreduce -Dhadoop.id.str=hadoop **********
root 4139 1 0 08:31 ? 00:00:00 su mapred -s /usr/java/jdk1.6.0_31/bin/java — -Dproc_tasktracker -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-tasktracker-localhost.localdomain.log ************
mapred 4142 4139 1 08:31 ? 00:02:19 java -Dproc_tasktracker -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-tasktracker-localhost.localdomain.log ***************
httpfs 4201 1 0 08:31 ? 00:00:37 /usr/java/jdk1.6.0_31/bin/java -Djava.util.logging.config.file=/usr/lib/hadoop-httpfs/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager ******
hive 4223 1 0 08:31 ? 00:00:26 /usr/java/jdk1.6.0_31/bin/java -Xmx256m -Dhive.log.dir=/var/log/hive -Dhive.log.file=hive-metastore.log -Dhive.log.threshold=INFO -Dhadoop.log.dir=//usr/lib/hadoop/logs *********
hive 4259 1 0 08:31 ? 00:00:22 /usr/java/jdk1.6.0_31/bin/java -Xmx256m -Dhive.log.dir=/var/log/hive -Dhive.log.file=hive-server.log -Dhive.log.threshold=INFO -Dhadoop.log.dir=//usr/lib/hadoop/logs *****
hue 4364 4349 0 08:31 ? 00:00:30 /usr/java/jdk1.6.0_31/bin/java -Xmx1000m -Dlog4j.configuration=log4j.properties -Dhadoop.log.dir=//usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log *******
oozie 4497 1 6 08:31 ? 00:11:27 /usr/bin/java -Djava.util.logging.config.file=/usr/lib/oozie/oozie-server-0.20/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xmx1024m -Doozie.https.port=11443 *********
sqoop 4561 1 0 08:31 ? 00:00:45 /usr/java/jdk1.6.0_31/bin/java -Xmx1000m -Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop *******
cloudera 15657 8150 0 11:26 pts/4 00:00:00 grep java
Note: The above output is trimmed as each process spit out full class path etc. along with other process specific details.
HDInsight On Windows – Single Node Cluster
| Apache Hadoop datanode | Running | Automatic .\hadoop |
| Apache Hadoop historyserver | Running | Automatic .\hadoop |
| Apache Hadoop isotopejs | Running | Automatic .\hadoop |
| Apache Hadoopjobtracker | Running | Automatic .\hadoop |
| Apache Hadoop namenode | Running | Automatic .\hadoop |
| Apache Hadoop secondarynamenode | Running | Automatic .\hadoop |
| Apache Hadoop tasktracker | Running | Automatic .\hadoop |
| Apache Hive Derbyserver | Running | Automatic Ahadoop |
| Apache Hive hiveserver | Running | Automatic .\hadoop |
| Apache Hive hwi | Running | Automatic .\hadoop |
Windows Azure HDInsight on Windows 8 (Single Node Hadoop Cluster) Walkthrough
Windows Azure HDInsight is Microsoft response to Big Data movement. Microsoft’s end-to-end roadmap for Big Data embraces Apache Hadoop™ by distributing enterprise class, Hadoop-based solutions on both Windows Server and Windows Azure.
In this video you will learn details about HDInsight running on a Windows 8 as a single node Hadoop cluster. Running CSharp and Java based MapReduce jobs examples shows here along with example of Apache Pig.
Microsoft’s roadmap for Big Data: http://www.microsoft.com/bigdata/
Apache Hadoop on Windows Azure: http://hadooponazure.com
Windows Azure HDInsight Introduction and Installation Video
Windows Azure HDInsight is Microsoft response to Big Data movement. Microsoft’s end-to-end roadmap for Big Data embraces Apache Hadoop™ by distributing enterprise class, Hadoop-based solutions on both Windows Server and Windows Azure.
In this video you will learn how to install HDInsight on a Windows 8 machine, as a single node Hadoop cluster.
Microsoft’s roadmap for Big Data: http://www.microsoft.com/bigdata/
Apache Hadoop on Windows Azure: http://hadooponazure.com
Contact me: @avkashchauhan
Windows Azure HDInsight – Installation Walkthrough
CTP version of HDInsight for Windows Server and Windows Clients is available to download from here.
When you install HDInsight through WebPI the following components are installed in your Windows machine:
Once installation is done you can launch the Hadoop console to verify the installation is done along with checking the Hadoop version using command “hadoop version” as below:
Also you can check System > Services to verify that all HDInsight specific services are running as expected:
Cloud Storage Performance Tests are out and Windows Azure Cloud Storage is #1 in most categories
Window Azure Cloud storage is #1 in most of categories as you can see below:
Cloud Storage Delete Speed Report: (Azure Cloud Storage #1)

Cloud Storage Read Speed Report: (Azure Cloud Storage #1)

Cloud Storage Read/Write Error Report: (Azure Cloud Storage #1)
Cloud Storage Response Time/ UpTime Report: (Azure Cloud Storage #1 in response time)
Windows Azure Cloud Storage is not #1 in uptime due to SIE.
Cloud Storage Scaling Test Report (Azure Cloud Storage in #2 behind Amazon):
Cloud Storage Write Speed Report: (Azure Cloud Storage #1 with all file size)

Read more about Nasuni Cloud Storage Report details here.
Read the full details in more here.






