Data360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics (Presentation Slides)

Yesterday I participated in Data360 conference and given an introductory presentation about Big Data, Hadoop and Big Data Analytics. It was a great way to connect with community and share some of the information.


The full presentation slides are located at Slideshare which you can get directly from the link below:

Keywords: Hadoop, Big Data, Analytics

Open Source Distributed Analytics Engine with SQL interface and OLAP on Hadoop by eBay – Kylin

What is Kilyn?

  • Kylin is an open source Distributed Analytics Engine with SQL interface and multi-dimensional analysis (OLAP) to support extremely large datasets on Hadoop by eBay.


Key Features:

  • Extremely Fast OLAP Engine at Scale:
    • Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data
  • ANSI-SQL Interface on Hadoop:
    • Kylin offers ANSI-SQL on Hadoop and supports most ANSI-SQL query functions
  • Interactive Query Capability:
    • Users can interact with Hadoop data via Kylin at sub-second latency, better than Hive queries for the same dataset
  • MOLAP Cube:
    • User can define a data model and pre-build in Kylin with more than 10+ billions of raw data records
  • Seamless Integration with BI Tools:
    • Kylin currently offers integration capability with BI Tools like Tableau.
  • Other Highlights:
    • Job Management and Monitoring
    • Compression and Encoding Support
    • Incremental Refresh of Cubes
    • Leverage HBase Coprocessor for query latency
    • Approximate Query Capability for distinct Count (HyperLogLog)
    • Easy Web interface to manage, build, monitor and query cubes
    • Security capability to set ACL at Cube/Project Level
    • Support LDAP Integration

Keywords: Kylin, Big Data, Hadoop, Jobs, OLAP, SQL, Query

The present and future of Hadoop from its creator Doug Cutting

At Starta + Hadoop world, Hadoop creator Dough Cutting explained Hadoop and talked more about its present and future. Doug talked about:

  • What is Hadoop?
  • How the name “Hadoop” came from?
  • What a Hadoop application look like?
  • Ethical use of Data
  • Quick plans and future strategy

Here is the full interview:

A collection of Big Data Books from Packt Publication

I found that Packt publication have few great books on Big Data and here is a collection of few books which I found very useful:Screen Shot 2014-09-30 at 11.50.08 AM

Packt is giving its readers a chance to dive into their comprehensive catalog of over 2000 books and videos for the next 7 days with LevelUp program:


Packt is offering all of its eBooks and Videos at just $10 each or less

The more EXP customers want to gain, the more they save:

  • Any 1 or 2 eBooks/Videos – $10 each
  • Any 3 to 5 eBooks/Videos – $8 each
  • Any 6 or more eBooks/Videos – $6 each

More Information is available at  |

For more information please visit :

Big Data 1B dollars Club – Top 20 Players

Here is a list of top players in Big Data world having influence over billion dollars (or more) Big Data projects directly or indirectly (not in order):

  1. Microsoft
  2. Google
  3. Amazon
  4. IBM
  5. HP
  6. Oracle
  7. VMWare
  8. Terradata
  9. EMC
  10. Facebook
  11. GE
  12. Intel
  13. Cloudera
  14. SAS
  15. 10Gen
  16. SAP
  17. Hortonworks
  18. MapR
  19. Palantir
  20. Splunk

The list is based on each above companies involvement in Big data directly or indirectly along with a direct product or not. All of above companies are involved in Big Data projects worth considering Billion+ …

Postgresql – Tips and Trics


$ psql <dbname> -U <user_name>

After login at Postgresql Console:

  •  Exit:
    • dbname=# \q
  • List all tables:
    • dbname=# \dt
  • Info about specific table
    • dbname=# \d+ <table_name>

List all rows where column value is null

  • perspectivedb=# select * from cluster_config_property where key is null;

Deleting all rows where column value is null:

  • perspectivedb=# delete from cluster_config_property where key is null;

Backing up a single table:

  • This is done from regular prompt (not when you are logged into psql)
  • $ pg_dump -t <table_name> <db_name> -U <user_name> > <target_file_name>.sql

Restoring a single table back into database: