Introduction to NoSQL
NoSQL first made appearance as the name of an open-source relational database. This was Strozzi NoSQL. Carlo Strozzi was the originator of this database which stored data as ASCII files. The tuples were contained in lines that were separated by tabs. Of course, the name derived as the database system did not use SQL as its query language. This system was manipulated through shell scripts which were put on the Unix line. However, this relational database was not the originator of the NoSQL systems that exist today.
Not delving into the historical development of NoSQL, it is clear that NoSQL does not use SQL in anyway. However, a few of these databases also use other query languages. Further, it is also expected that these should be as easy to learn as the SQL. Cassandra developed CQL which is much like SQL when it is clear that it is not SQL. However, these query languages are nowhere close to the standard SQL. A logical question is as to what would happen if a NoSQL database tries to use SQL. There is no utility of SQL to NoSQL.
It is also said that these databases are mostly open-source projects. Still, NoSQL applies to mostly systems that are closed-source. Nevertheless, the origins of NoSQL are traced to an open-source phenomenon. Most of the NoSQL database systems run on clusters. Accordingly, the data model and the approach to consistency is dictated by this use. The consistency throughout the database in relational databases is achieved through ACID transactions. The cluster environment cannot provide these transactions. Consequently, consistency and distribution is achieved through other means.
Not all NoSQL systems have a cluster-based organization. The distribution system of graph databases is similar to that of relational databases. However, these NoSQL databases have a different data model that is aimed at controlling the complexity of these databases. Only those systems that were developed during the early twenty-first century and were commanded by web estates are called as the NoSQL systems. The BC (Before Codd) systems are not classified as NoSQL systems.
There is no schema in NoSQL databases. Accordingly, it is very easy to keep adding data records to NoSQL system data files without specifying any changes in the structure of the database. Another benefit is that they allow easy inclusion of custom data fields which are non-uniform. Relational databases deal with these custom fields with names like customField6 which may even handle custom field tabls that have awkward data.
All of these are common characteristics of NoSQL databases. However, there is no strict definition of NoSQL databases that is available to-date. Still, it is clear that relational databases are the most common form of databases. The utility of relational databases is also more than that of NoSQL databases. Most projects require relational databases due to favourable characteristics like familiarity, feature set, available support, and stability.
Now, the relational databases have become one option for storing data. This phenomenon is known as 'polyglot persistence' in which different types of data stores are used in different circumstances. Therefore, it is not a question of popularity that one is going to use the relational database only. On the other hand, the question of the database system is answered by the type of data that needs to be stored as well as by the way it will be manipulated. Accordingly, organizations today have a mix of database technologies that address varying data storage needs in different circumstances.
Likewise, there is also a need to move from integration databases to application databases. The best use of a NoSQL database is as an application database. NoSQL databases are not good as integration databases. However, there is also always the availability of the option to not use NoSQL databases. This is by encapsulating the different data in services (Sadalage and Fowler, 2013).
Types of NoSQL databases
1. Sorted ordered column-oriented databases
The Bigtable of Google is an example of this database in which data is stored in columns. In contrast, RDMS systems have row-oriented data storage format. Then, data can be stored in columns effectively in these systems and databases. The space is also saved when storing null values as the corresponding column is simply skipped by the DBMS.
The units of data are in the form of key/value pairs that have their own primary key. This primary key is called the row-key in Bigtable and other databases like it. The data units are also stored as ordered-sorted columns. Then, data is sorted and ordered on the basis of the row-key. The difference becomes clear when we consider an example. In such a database, a column will have all of the names of the people. Another column will have all of the genders of the people. Another column will have all of the locations of the people in the database. Then, unlike RDBMS there will not be any specific rows for each person in the dataabase (Tiwari, 2011).
2. Key/value stores
Key/value pairs are stored in a HashMap or in an associative array. These data models are very popular as the average algorithm running time is very less. The key/value pair has a unique key which is used for looking up data.
There are different types of key/value pairs. Some contain data in the memory while others are able to persist the data to a disk. These key/value pairs are also distributed and are stored in clusters of nodes.
Oracle's Berkeley DB is one of the efficient key/value stores. Berkeley DB is a storage engine which stores keys and values as arrays of bytes. Here, no meaning is attached to the value, nor to the key. It picks arrays of bytes and provides them to the clients. It also caches the data in memory and can then flush it onto a disk. The keys are also indexed for better lookup and access (Tiwari, 2011).
3. Document databases
These databases treat each document as a whole and do not split it into key/value pairs. Then, these documents can be collected into a single collection. Here, the indexing occurs through both the primary key as well as through document properties. Most of these document databases are open-source with MongoDB and CouchDB being the most popular (Tiwari, 2011).
4. Graph databases
The graph is like a network. Graph databases do not store data in rows or columns. Instead, the data is stored in three different constructs which are nodes, edges and properties. Here, node is a standalone object which is independent of everything else. The edge is an object that depends on the existence of two nodes. Properties are attributes of a node. The node of person will have properties of name and gender. These properties are of both nodes and edges. The good thing about these databases are that these are whiteboard friendly and there is no need to construct the graphs as the data is already present in graph format. Then, these are also very good in mapping the social networking data in Web 2.0 applications (Hewitt, 2011).