Refresher R for Beginners


R Studio Environment

R Location (OSX)

$ ls –l /Library/Frameworks/R.framework/Versions

#Get R Version

version

R01

Environment

getwd()

setwd(“/Users/avkashchauhan/work/global”)

getwd()

dir()

#Getting Help

help(getwd)

#Reading a File

help(read.csv)

filename <- “test.csv”

filex <- read.csv(filename, header = TRUE, sep=”,”)

filex

summary(filex)

filex$id

filex$name

filex$age

filex$zip

names(filex)

attributes(filex)

# Listing All Vars

ls()

# ls() – List of all variables

# DataTypes & number Assignment

asc <- c(1,2,3,4,5,6,7,8,9,10)

# What is c? c is “combine”

asc[2]

asc[5]

asc[5:6]

asc[1:9]

View(asc)

a <- 10

a

a[1]

a[3]

help(sqrt)

a <- sqrt(10)

a

a <- sqrt(10*a)

a

asc

mean(asc)

median(asc)

help(var)

typeof(asc)

typeof(a)

# String data type

a <- c(“this”, “is”, “so”, “fun”)

a

a[1]

typeof(a)

#Understanding c or combine

a <- 10

> a

[1] 10

> a[1]

[1] 10

> a[2]

[1] NA

> a <- c(10)

> a

[1] 10

> a[2]

[1] NA

# DATAFRAME

# creating a data frame

a <- c(1,2,3,4,5,6,7,8,9,10)

b <- c(10,20,30,40,50,60,70,80,90,100)

ab <- data.frame(first=a, second=b)

ab

ab[1]

ab[1][1]

ab[1][2] ß XXX

ab[2]

ab[2][1]

ab[2][2] ß XXX

ab$first

ab$second

ab$second[1]

ab$second[3]

ab$first[10]

View(ab)

#Logical

a <- c(TRUE)

a

typeof(a)

a <- c(FALSE)

a

typeof(a)

#Conditions in R

a <- c(TRUE)

if(!a) a <- c(FALSE)

a ß Still TRUE

if(a) a <- c(FALSE)

a ß FALSE Now

a <- c(TRUE,FALSE)

a

a[1]

a[2]

if (a[1]) a[2] <- TRUE

a

R02

Factor in R – A “factor” is a vector whose elements can take on one of a specific set of values. For example, “Sex” will usually take on only the values “M” or “F,” whereas “Name” will generally have lots of possibilities. The set of values that the elements of a factor can take are called its levels.

a <- factor(c(“Male”, “Female”, “Female”, “Male”, “Male”))

a

a <- factor(c(“A”,”A”,”B”,”A”,”B”,”B”,”C”,”A”,”C”))

a

Tables: (One way and two way)

a <- factor(c(“Male”, “Female”, “Female”, “Male”, “Male”))

a

mytable <- table(a)

a

mytable

summary(a)

attributes(a)

#datatype check R

#Example #1

a <- c(1,2,4)

is.numeric(a)

is.factor(a)

#Example #2

b <- factor(c(“M”, “F”))

b

is.factor(b)

is.numeric(b)

Graph Plotting in R

Using Library ggplot2

#installing ggplot2

install.packages(“ggplot2”)

R03

also installing the dependencies ‘colorspace’, ‘Rcpp’, ‘stringr’, ‘RColorBrewer’, ‘dichromat’, ‘munsell’, ‘labeling’, ‘plyr’, ‘digest’, ‘gtable’, ‘reshape2’, ‘scales’, ‘proto’

Using ggplot2 Library

 

library(ggplot2)

detach(package:ggplot2)

head(diamonds)

View(diamonds)

qplot(clarity, data=diamonds, fill=cut, geom=”bar”)

R04

qplot(clarity, data=diamonds, geom=”bar”, fill=cut, position=”stack”)

qplot(clarity, data=diamonds, geom=”freqpoly”, group=cut, colour=cut, position=”identity”)

R05

qplot(carat, data=diamonds, geom=”histogram”, binwidth=0.1)

qplot(carat, data=diamonds, geom=”histogram”, binwidth=0.01)

R06

Graph Source: http://www.ceb-institute.org/bbs/wp-content/uploads/2011/09/handout_ggplot2.pdf

Keywords:  R, Analysis, ggplot,

Error with git as “git-sh-setup: No such file or directory” with OSX Yosemite and oh-my-zsh (Z-Shell)


I recently received the following error while pushing/pulling code to/from git:

$ git pull
/Applications/Xcode.app/Contents/Developer/usr/libexec/git-core/git-pull: line 11: git-sh-setup: No such file or directory

This happened after I updated my macbook to OSX Yosemite and I do have Zsh (Z-Shell – oh-my-zsh)  as my favorite shell and iTerm2 as my favorite terminal.

After looking around I found the problem is that Zshell does not invoke /usr/bin/login when opening the command window as well as not clearing the environment vars while closing…

The potential solutions:

1. Edit the opening command when open a new shell (Preferred as it keeps your encoding as well as theme intact):

  • Open ITerm2 preferences > Profile – Default > Command – Command – /bin/bash -c /bin/zsh

2. You can also edit the same command to use your login:

  • Open ITerm2 preferences > Profile – Default > Command – Command –  /usr/bin/login -f <your user name>

More info at Stackoverflow..

Unknown Entity exception with Java Hibernate


If you hit an exception as “Unknown Entity” with Java Hibernate as below:

org.hibernate.MappingException: Unknown entity: com.myapplication.mymodel.modelname.ModalClass

It means the Class is not added into Hibernate configuration and to solve it simply you would need to add the class into your hibernate.cfg.xml as below:

  • <session-factory>
    • ……
    • <mapping class=”com.myapplication.mymodel.modelname.ModalClass” />
  • </session-factory>

You can find more details at StackOverflow.

Keywords: Java, Hibernate, Class, Unknown Entity, Exception

Data360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics (Presentation Slides)


Yesterday I participated in Data360 conference and given an introductory presentation about Big Data, Hadoop and Big Data Analytics. It was a great way to connect with community and share some of the information.

dara360

The full presentation slides are located at Slideshare which you can get directly from the link below:

http://www.slideshare.net/Avkashslide/data-360-conference-introduction-to-big-data-hadoop-and-big-data-analytics

Keywords: Hadoop, Big Data, Analytics

Big Data 1B dollars Club – Top 20 Players


Here is a list of top players in Big Data world having influence over billion dollars (or more) Big Data projects directly or indirectly (not in order):

  1. Microsoft
  2. Google
  3. Amazon
  4. IBM
  5. HP
  6. Oracle
  7. VMWare
  8. Terradata
  9. EMC
  10. Facebook
  11. GE
  12. Intel
  13. Cloudera
  14. SAS
  15. 10Gen
  16. SAP
  17. Hortonworks
  18. MapR
  19. Palantir
  20. Splunk

The list is based on each above companies involvement in Big data directly or indirectly along with a direct product or not. All of above companies are involved in Big Data projects worth considering Billion+ …

Postgresql – Tips and Trics


Login:

$ psql <dbname> -U <user_name>

After login at Postgresql Console:

  •  Exit:
    • dbname=# \q
  • List all tables:
    • dbname=# \dt
  • Info about specific table
    • dbname=# \d+ <table_name>

List all rows where column value is null

  • perspectivedb=# select * from cluster_config_property where key is null;

Deleting all rows where column value is null:

  • perspectivedb=# delete from cluster_config_property where key is null;

Backing up a single table:

  • This is done from regular prompt (not when you are logged into psql)
  • $ pg_dump -t <table_name> <db_name> -U <user_name> > <target_file_name>.sql

Restoring a single table back into database: