Archive

Archive for the ‘NoSQL’ Category

Facts about any NoSQL Database

February 20, 2012 Leave a comment

Read both the article for more info:

Here are some facts about any NoSQL database:

  • Experimental  spontaneity
  • Performance lies in application design
  • Structure of query is more important than structure of data
  • To design a DB you always start from what kind of query will be used to access the data
  • The approach is results centric about what you really expect as result
  • This also helps to design how to run an efficient query
  • Queries is used to define data objects
  • Accepting less than perfect consistency provide huge improvements in performance
  • Big-Data backed application have DB spread across thousands of nodes so
    • Performance penalty of locking DB to write/modify is huge
    • In frequent writes, serializing all writes will lose the advantage of distributed database
  • In an eventually consistence db, changes arrives to the node in 10th of the seconds before DB arrives in consistent state
  • For web applications availability always get priority over consistency
  • If consistency is not guaranteed by DB then application will have to manage it and to do it, application will be complex
  • You can add replication and failover strategies database designed for consistency can deliver super high availability
    • HBASE is an example here
Categories: NoSQL Tags: , , ,

Types of NoSQL databases and extensive details

February 20, 2012 1 comment

Please study the first article as background of this article: Why there is a need for NoSql Database?

What about NoSQL:

  • NoSQL is not completely “No Schema” DB
  • There are mainly 3 types of NoSQL DB
    • Document DB
    • Data Structure Oriented DB
    • Column Oriented DB

 What is a Document DB?

  • Documents are key-value pair
  • document can also be stroed in JSON format
  • Because of JSON document considered as object
  • JSON documents are used as Key-Value pairs
  • Document can have any set of keys
  • Any key can associate with any arbitrarily complex value, that is itself a JSON document
  • Documents are added with different sets of keys
    • Missing keys
    • Extra keys
    • Add keys in future when in need
    • Application must know that certain key present
    • Queries are made on Keys
    • Index are set to keys to make search efficient
  • Example: CouchDB, MongoDB, Redis, Riak

Example of Document DB – CouchDB

  • The value is plain string in JSON format
  • Queries are views
  • Views are documents in the DB that specify searches
  • View can be complex
  • Views can use map/reduce to process and summarize results
  • Write Data to Append Only file, an extremely efficient and makes write are significantly faster then write
  • Single headed database
  • Can run in cluster environment (not available in core)
  • From CAP Theorem -
    • Partition Tolerance
    • Availability
    • In Non-Cluster environment availability is main
    • In clustered environment consistency is main
  • BigCouch
    • Integrating clustering with CouchDB
    • Cloudant merging CouchDB & BigCouch

Example of Document DB – MongoDB

  • The value is plain string in JSON format
  • Queries are views
  • Views are JSON documents specifying fields and values to match
  • Queries results can be processed by built in map/reduce
  • Single headed database
  • Can run in cluster environment (not available in core)
  • From CAP Theorem -
    • Partition Tolerance
    • Availability
    • In Non-Cluster environment availability is main
    • In clustered environment consistency is main

Example of Document DB – Riak

  • A document database with more flexible document types
  • Supports JSON, XML, plain text
  • A plugin architecture supports adding other document types
  • Queries must know the structure of JSON or XML for proper results
  • Queries results can be processed by built in map/reduce
  • Built in control about replication and distribution
  • Core is designed to run in cluster environment
  • From CAP Theorem -
    • Partition Tolerance
    • Availability
    • Note: Tradeoff between availability and consistency is tunable
  • Write Data to Append Only file, an extremely efficient and makes write are significantly faster then write

Data Structure Oriented DB – Redis:

  • In Memory DB for fastest read and write speed
  • If dataset can fit in memory, top choice
  • Great  for Raw speed
  • Data isn’t saved on disk and list in case of crash
  • Can be configured to save on disk but hit in performance
  • Limited scalability with some replication
  • Cluster Replication Support is coming
  • In Redis there is a difference
    • The value can be data structure (list or sets)
    • You can do union and intersection on list and sets

Column Oriented DB

  • Also considered as “Sparse row store”
  • Equivalent to “relational table” – “set of rows” identified by key
  • Concept starts with columns
  • Data is organized in the columns
  • Columns are stored contiguously
  • Columns tend to have similar data
  • A row can have as many columns as needed
  • Columns are essentially keys, that can let you lookup values in the rows
  • Columns can be added any time
  • Unused columns in a row does not occupy storage
  • NULL don’t exist
  • Write Data to Append Only file, an extremely efficient and makes write are significantly faster then write
  • Built in control about replication and distribution
  • Example: HBASE & Cassandra
  • HBase
    • From CAP Theorem
      • Partition Tolerance
      • Consistency
  • Cassandra
    • From CAP Theorem
      • Partition Tolerance
      • Availability Note: Tradeoff between availability and consistency is tunable

Additional functionalities supported by NoSql DB: 

  • Scripting Language Support
    • JavaScript
      • CouchDB, MongoDB
  • Pig
    • Hadoop
  • Hive
    • Hadoop
  • Lua
    • Redis
  • RESTFull Interface:
    • CouchDB and Riak
    • CouchDB can be considered as best with Web Application Framework
    • Riak provides traditional protocol buffer interface
Categories: NoSQL Tags: , ,

Why there is a need for NoSql Database?

February 20, 2012 2 comments

Let’s start with what are the issues and requirements with data in this generation:

  • Issues with dataSize
    • Scalability
      • Vertical
        • CPU Limit
    • Horizontal
      • Distributed
      • Scalability on Multiple Servers
      • Response
        • No overnight queries at all
        • No night batch processing, application needs instant results
        • Instant analytics
        • Availability
          • Data is living, breathing part of your application
          • No single point of failure
          • Distributed in nature
            • Manual distribution – sharding
              • Relational databases are split between multiple hosts by manual sharding
              • Energy spent on sharding and replication design
    • Inherent distribution
      • Built in control about replication and distribution
    • Hybrid (manual and inherent) distribution: Not inherently distributed, but designed to partitioned easily (automatically or manually)
    • Architecture:
      • For any RDBMS the schema is needed even before the program is written
      • Schemaless (best for agile development)
      • Latency while Interaction with Data:
        • Read Latency
          • Traditional RDBMS with proper indexing results FAST read access
  • Write Latency
    • Write Data to Append Only file, an extremely efficient and makes write are significantly faster then write
    • All Database must following below consideration:
      • ACIDproperties
        • Atomicity
        • Consistency
        • Isolation
        • Durability
  • Two-Phase Commit
  • CAP Theorem – You can get 2 out of following 3, means you will need to sacrifice the least required. Partition Tolerance is must for any distributed database so most of the db choose to sacrifice either consistency or availability
  • Partition Tolerance
  • Consistency
  • Availability

Here is an example (e.g. Twitter) how data evolve in this generation:

  • Twitter started with 140 chars + a few things
  • Later added pic and
  • Then added location
  • So you can see the lots of metadata has been added
  • So the type of data schema is changing regularly and a fixed schema will not work in this kind of data model
  • There are more and more examples to show that the data requirements are fluid
  • So the applications needs little DB planning at start
  • Data design is more query centric (what you are looking for) instead what kind of data is

Based on above following are the requirements for a database to fulfill the need:

  • Scalable
  • Super fast data insertion without concurrency
  • Extremely fast random reads on large datasets.
  • Consistent read/write speed across the whole data set.
  • Super efficient data storage
  • Scale well for cloud application need
  • Easy to maintain
  • Stable, of course.
  • Replicate data across machines to avoid issue if certain machine goes down
  • No more 80s batch processing
  • No-more schema because data changes frequently
  • Structured data is not a priority as unstructured data is growing faster then we establish the schema

Source: what-you-need-to-know-about-nosql-databases

Categories: NoSQL
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: