Archive

Archive for the ‘Uncategorized’ Category

Adventure with Postgresql on Ubuntu


Setting postgresql to not start during startup:

To enable or disable postgresql start at machine startup edit start.conf located @ /etc/postgresql/9.2/main/ folder and add proper configuration auto |  manual | disabled 

$sudo vi /etc/postgresql/9.2/main/start.conf

# Automatic startup configuration
# auto: automatically start/stop the cluster in the init script
# manual: do not start/stop in init scripts, but allow manual startup with
# pg_ctlcluster
# disabled: do not allow manual startup with pg_ctlcluster (this can be easily
# circumvented and is only meant to be a small protection for
# accidents).

auto

postgres database server Startup or Shutdown:

postgresql database server can start directly as service from services tool i.e. “sudo services postgresql start|stop|*” or using a wrapper named pg_ctl, as below: 

$sudo services postgresql start

$ /usr/lib/postgresql/9.2/bin/pg_ctl 

If postgresql is start through wrapper with specific catalog, then the status display different results as below:   

$ sudo service postgresql status
9.2/main (port 5432): down

$ /usr/lib/postgresql/9.2/bin/pg_ctl status -D /home/hadoopuser/hadoopuser_DATA/pgdb -l /home/hadoopuser/hadoopuser_DATA/logs/pg.log -w 

pg_ctl: server is running (PID: 3381)
/usr/lib/postgresql/9.2/bin/postgres “-D” “/home/hadoopuser/hadoopuser_DATA/pgdb”
hadoopuser@HADOOP_CLUSTER:~$

Handling portgresql error “could not create lock file “/var/run/postgresql/.s.PGSQL.5432.lock”: Permission denied”

Sometimes starting postgresql server returns the following error:

WARNING: could not create listen socket for “*”
FATAL: could not create any TCP/IP sockets
FATAL: could not create lock file “/var/run/postgresql/.s.PGSQL.5432.lock”: Permission denied

The problem is that in ubuntu postgresql is running and during installation hadoopuser is starting it and because of ownership issue hadoopuser can not use it and user see the lock issue as shown in above logs:

 

hadoopuser@hadoopserver:~$ ls -l /var/run/
drwxrwsr-x 2 postgres postgres 40 May 28 22:16 postgresql

hadoopuser@hadoopserver:~$ sudo chmod 777 /var/run/postgresql

hadoopuser@hadoopserver:/usr/local/hadoopuser/current$ ls -l /var/run/
drwxrwsrwx 2 postgres postgres 80 May 28 22:32 postgresql

 

Handling postgresql error “Is another postmaster already running on port 5432″

Sometimes starting postgresql server returns the following error:

HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
LOG: could not bind IPv6 socket: Address already in use
HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
WARNING: could not create listen socket for “*”
FATAL: could not create any TCP/IP sockets

To solve this problem please first check if postgresql is running as below:

hadoopuser@hadoopserver:~$ ps auxwww | grep postg
hadoopuser 1805 0.0 0.1 137520 10212 pts/0 S 22:25 0:00 /usr/lib/postgresql/9.2/bin/postgres -D /home/hadoopuser/hadoopuser_DATA/pgdb
hadoopuser 1807 0.0 0.0 137520 1492 ? Ss 22:25 0:00 postgres: checkpointer process
hadoopuser 1808 0.0 0.0 137520 1732 ? Ss 22:25 0:00 postgres: writer process
hadoopuser 1809 0.0 0.0 137520 1492 ? Ss 22:25 0:00 postgres: wal writer process
hadoopuser 1810 0.0 0.0 138300 2776 ? Ss 22:25 0:00 postgres: autovacuum launcher process
hadoopuser 1811 0.0 0.0 97208 1572 ? Ss 22:25 0:00 postgres: stats collector process
hadoopuser 2165 0.0 0.0 8104 924 pts/0 S+ 22:32 0:00 grep –color=auto postg

Stop the postgresql service as below:

hadoopuser@hadoopserver:~$ sudo service postgresql stop
* Stopping PostgreSQL 9.2 database server [ OK ]

Note: Above command does not guarantee to stop services if the database server was started in other user context and actively running with an open database. 

or kill the postgresql process as 

hadoopuser@hadoopserver:~$ killall postg

 

Categories: Uncategorized

Upgrading Pycrypto using pip in Ubuntu


Here are the steps to upgrade pycrypto library in ubuntu machine:

Step 1: check pycrypto version

ubuntu@ip-***:~$ pip show pycrypto

Name: pycrypto
Version: 2.4.1
Location: /usr/local/lib/python2.7/dist-packages
Requires:

Note: If you dont have pip working try installing 

$ sudo apt-get install python-devel

$ easy_install pip

 

Step 2: upgrade pycrypto using pip

ubuntu@ip-10-254-71-179:~$ pip install –upgrade pycrypto

Downloading/unpacking pycrypto from https://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.6.tar.gz#md5=88dad0a270d1fe83a39e0467a66a22bb
Downloading pycrypto-2.6.tar.gz (443kB): 443kB downloaded
Running setup.py egg_info for package pycrypto

Installing collected packages: pycrypto
Found existing installation: pycrypto 2.4.1
Uninstalling pycrypto:

…..

Successfully installed pycrypto
Cleaning up…

 

Step 3: Verifying the upgrade

ubuntu@ip-10-254-71-179:~$ pip show pycrypto

Name: pycrypto
Version: 2.6
Location: /usr/local/lib/python2.7/dist-packages
Requires:

 

 

Amazon EC2 Security Group (Firewall) settings for Hadoop Cluster


When setting Hadoop cluster in Amazon EC2 you would need to configure proper security settings (firewall) so you can access Hadoop cluster directly. Following are the settings for Cloudera CDH4 Hadoop distribution on EC2:

Image

 

 

Port 22 for SSH, Port 7180/82 for CDH Manager, 7432 for PSQL and 8888 for Hue and finally Port 50000-50100 for Hadoop JT and HDFS.

 

Categories: Uncategorized Tags: , , , ,

Finding Hadoop specific processes running in a Hadoop Cluster


Recently I was asked to provide info on all Hadoop specific process running in a Hadoop cluster. I decided to run few commands as below to provide that info.

Hadoop 2.0.x on Linux (CentOS 6.3) – Single Node Cluster

First list all Java process running in the cluster

[cloudera@localhost usr]$ ps -A | grep java
1768 ?        00:00:28 java
2197 ?        00:00:54 java
2439 ?        00:00:30 java
2507 ?        00:01:19 java
2654 ?        00:00:35 java
2784 ?        00:00:52 java
2911 ?        00:00:56 java
3028 ?        00:00:31 java
3239 ?        00:00:59 java
3344 ?        00:01:11 java
3446 ?        00:00:27 java
3551 ?        00:00:30 java
3644 ?        00:00:22 java
3878 ?        00:01:08 java
4142 ?        00:02:16 java
4201 ?        00:00:36 java
4223 ?        00:00:25 java
4259 ?        00:00:21 java
4364 ?        00:00:29 java
4497 ?        00:11:11 java
4561 ?        00:00:44 java

Next dig each Java specific process to dig further to see which Hadoop specific application is running within Java proc:

[cloudera@localhost usr]$ ps -aef | grep java

499       1768     1  0 08:29 ?        00:00:29 /usr/java/jdk1.6.0_31/bin/java -Dzookeeper.datadir.autocreate=false -Dzookeeper.log.dir=/var/log/zookeeper -********

yarn 2197 1 0 08:29 ? 00:00:55 /usr/java/jdk1.6.0_31/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn ********

sqoop2 2439 1 0 08:29 ? 00:00:31 /usr/java/jdk1.6.0_31/bin/java -Djava.util.logging.config.file=/usr/lib/sqoop2/sqoop-server/conf/logging.properties -Dsqoop.config.dir=/etc/sqoop2/conf ****************

yarn 2507 1 0 08:29 ? 00:01:21 /usr/java/jdk1.6.0_31/bin/java -Dproc_nodemanager -Xmx1000m -server -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn **********

mapred 2654 1 0 08:30 ? 00:00:36 /usr/java/jdk1.6.0_31/bin/java -Dproc_historyserver -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-mapreduce -Dhadoop.log.file=yarn-mapred-historyserver-localhost.localdomain.log -Dhadoop.home.dir=/usr/lib/hadoop ********

hdfs 2784 1 0 08:30 ? 00:00:53 /usr/java/jdk1.6.0_31/bin/java -Dproc_datanode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-datanode-localhost.localdomain.log ********

hdfs 2911 1 0 08:30 ? 00:00:57 /usr/java/jdk1.6.0_31/bin/java -Dproc_namenode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-namenode-localhost.localdomain.log *********

hdfs 3028 1 0 08:30 ? 00:00:31 /usr/java/jdk1.6.0_31/bin/java -Dproc_secondarynamenode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-secondarynamenode-localhost.localdomain.log -Dhadoop.home.dir=/usr/lib/hadoop ********

hbase 3239 1 0 08:31 ? 00:01:00 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-master-localhost.localdomain.log *******

hbase 3344 1 0 08:31 ? 00:01:13 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC ****

hbase 3446 1 0 08:31 ? 00:00:28 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-rest-localhost.localdomain.log -Dhbase.home.dir=/usr/lib/hbase/bin/*******

hbase 3551 1 0 08:31 ? 00:00:31 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-thrift-localhost.localdomain.log *******

flume 3644 1 0 08:31 ? 00:00:23 /usr/java/jdk1.6.0_31/bin/java -Xmx20m -cp /etc/flume-ng/conf:/usr/lib/flume-ng/lib/*:/etc/hadoop/conf:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/asm-3.2.jar *******

root 3865 1 0 08:31 ? 00:00:00 su mapred -s /usr/java/jdk1.6.0_31/bin/java — -Dproc_jobtracker -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-jobtracker-localhost.localdomain.log ********

mapred 3878 3865 0 08:31 ? 00:01:09 java -Dproc_jobtracker -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-jobtracker-localhost.localdomain.log -Dhadoop.home.dir=/usr/lib/hadoop-0.20-mapreduce -Dhadoop.id.str=hadoop **********

root 4139 1 0 08:31 ? 00:00:00 su mapred -s /usr/java/jdk1.6.0_31/bin/java — -Dproc_tasktracker -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-tasktracker-localhost.localdomain.log ************

mapred 4142 4139 1 08:31 ? 00:02:19 java -Dproc_tasktracker -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-tasktracker-localhost.localdomain.log ***************

httpfs 4201 1 0 08:31 ? 00:00:37 /usr/java/jdk1.6.0_31/bin/java -Djava.util.logging.config.file=/usr/lib/hadoop-httpfs/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager ******

hive 4223 1 0 08:31 ? 00:00:26 /usr/java/jdk1.6.0_31/bin/java -Xmx256m -Dhive.log.dir=/var/log/hive -Dhive.log.file=hive-metastore.log -Dhive.log.threshold=INFO -Dhadoop.log.dir=//usr/lib/hadoop/logs *********

hive 4259 1 0 08:31 ? 00:00:22 /usr/java/jdk1.6.0_31/bin/java -Xmx256m -Dhive.log.dir=/var/log/hive -Dhive.log.file=hive-server.log -Dhive.log.threshold=INFO -Dhadoop.log.dir=//usr/lib/hadoop/logs *****

hue 4364 4349 0 08:31 ? 00:00:30 /usr/java/jdk1.6.0_31/bin/java -Xmx1000m -Dlog4j.configuration=log4j.properties -Dhadoop.log.dir=//usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log *******

oozie 4497 1 6 08:31 ? 00:11:27 /usr/bin/java -Djava.util.logging.config.file=/usr/lib/oozie/oozie-server-0.20/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xmx1024m -Doozie.https.port=11443 *********

sqoop 4561 1 0 08:31 ? 00:00:45 /usr/java/jdk1.6.0_31/bin/java -Xmx1000m -Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop *******

cloudera 15657 8150 0 11:26 pts/4 00:00:00 grep java

Note: The above output is trimmed as each process spit out full class path etc. along with other process specific details.

HDInsight On Windows – Single Node Cluster

Apache Hadoop datanode Running Automatic .\hadoop
Apache Hadoop historyserver Running Automatic .\hadoop
Apache Hadoop isotopejs Running Automatic .\hadoop
Apache Hadoopjobtracker Running Automatic .\hadoop
Apache Hadoop namenode Running Automatic .\hadoop
Apache Hadoop secondarynamenode Running Automatic .\hadoop
Apache Hadoop tasktracker Running Automatic .\hadoop
Apache Hive Derbyserver Running Automatic Ahadoop
Apache Hive hiveserver Running Automatic .\hadoop
Apache Hive hwi Running Automatic .\hadoop

Windows Azure HDInsight on Windows 8 (Single Node Hadoop Cluster) Walkthrough


Windows Azure HDInsight is Microsoft response to Big Data movement. Microsoft’s end-to-end roadmap for Big Data embraces Apache Hadoop™ by distributing enterprise class, Hadoop-based solutions on both Windows Server and Windows Azure.

In this video you will learn details about HDInsight running on a Windows 8 as a single node Hadoop cluster. Running CSharp and Java based MapReduce jobs examples shows here along with example of Apache Pig.

Microsoft’s roadmap for Big Data: http://www.microsoft.com/bigdata/
Apache Hadoop on Windows Azure: http://hadooponazure.com

Windows Azure HDInsight Introduction and Installation Video


Windows Azure HDInsight is Microsoft response to Big Data movement. Microsoft’s end-to-end roadmap for Big Data embraces Apache Hadoop™ by distributing enterprise class, Hadoop-based solutions on both Windows Server and Windows Azure.

In this video you will learn how to install HDInsight on a Windows 8 machine, as a single node Hadoop cluster.

Microsoft’s roadmap for Big Data: http://www.microsoft.com/bigdata/
Apache Hadoop on Windows Azure: http://hadooponazure.com

Contact me: @avkashchauhan

Categories: Uncategorized Tags: ,

Windows Azure HDInsight – Installation Walkthrough


 

 

CTP version of HDInsight for Windows Server and Windows Clients is available to download from here

When you install HDInsight through WebPI the following components are installed in your Windows machine:

Image

Once installation is done you can launch the Hadoop console to verify the installation is done along with checking the Hadoop version using command “hadoop version” as below:

Image

Also you can check System > Services to verify that all HDInsight specific services are running as expected:

Image

 

 

Cloud Storage Performance Tests are out and Windows Azure Cloud Storage is #1 in most categories

February 21, 2013 Leave a comment

Window Azure Cloud storage is #1 in most of categories as you can see below:

Cloud Storage Delete Speed Report: (Azure Cloud Storage #1)

Storage-DeleteSpeed


Cloud Storage Read Speed Report: (Azure Cloud Storage #1)
Storage-ReadSpeed


Cloud Storage Read/Write Error Report: (Azure Cloud Storage #1)
Storage-ReadWriteErrors


Cloud Storage Response Time/ UpTime Report: (Azure Cloud Storage #1 in response time)
Windows Azure Cloud Storage is  not #1 in uptime due to SIE.
Storage-ResponseTime


Cloud Storage Scaling Test Report (Azure Cloud Storage in #2 behind Amazon):

Storage-SCaling


Cloud Storage Write Speed Report: (Azure Cloud Storage #1 with all file size)
Storage-WriteSpeed
Read more about  Nasuni Cloud Storage Report details here.

Read the full details in more  here.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: