Somehow we haven’t noticed the time when SQL became out-of-date and not so relevant programming language. As a result, a lot of NoSQL solutions appeared on the network, instantly replacing the SQL language and the parameters of the relational data storage model.
The main arguments of this approach are the following:
- Real opportunity to work with a great deal of information (well-known Big Data);
- The data storage process in the most exotic structures;
- And the main thing – opportunity to manage and save similar data as quickly as possible.
Further, we will talk about the way how simply the most famous ones from the NoSQL world can do this.
How Can You Achieve the Main Speed in Nosql?
First of all, it is a result of completely another data storage model. Parsing and translation of the SQL requests, optimizer functioning, tables uniting and many other factors delay the replying process.
If you delete all these layers, make the requests a little easier, read from the disk into the network and save the information in the internal memory, you will save some time. After all these things are done, you will need much less time on any request processing and the number of requests per second will be fewer.
In this way, the concept of key-value appeared on the Internet, the most famous of which is memcached. This cache is a common component of some web applications for an acceleration of accessing information and is another variant of NoSQL.
Kinds of NoSQL
We can easily distinguish 4 main types of modern NoSQL systems:
- Key-value. It is a huge hash table where you can only record and read information using a key.
- Column. It is special tables with rows and columns. Unlike the SQL, the number of columns from one row to another can be variable, and the total number of columns can be billion. Such a table has an original key. You may consider this structure like a hash table of the hash table, where the first key is row key, and the second one is a column name. There are selection parameters by column value (besides selection by row key) thanks to the secondary index support.
- Document-oriented. This is collections of structured files. It is possible to select certain files, as well as edit part of the file. This category also includes search engines, which are essentially indexes but do not contain documents.
- Graph. They are created especially for saving all the mathematic graphs: vertexes and connections among them. Traditionally it allows creating attributes set for vertexes and establish a connection there. Moreover, there is a support of graph traversal and route creation algorithms.
An Example of a Test
We will take the first three categories for testing.
- Aerospike. It is a very popular key-value database (starting from version 3.0 it became a document-oriented). It supports SSD, uses them directly without file systems and similar to block devices. Initially, it was developed for the needs of the online advertising market where you should have big memory caches.
- Couchbase. It is a symbiosis of CouchDB and Membase. Also, it is deemed to be an adherent of memcached and key-value in particular. It differs from memcached in that it is persistent and cluster because such features were created on Erlang.
- Cassandra. It is the oldest DBMS in this list. It is derived from Facebook and was released as an Apache version in 2008. Basically, this is a column-oriented database, some kind of descendant of google Bigtable.
- MongoDB. Another very popular document-oriented database. Rightly, it is considered the founder of all document-oriented NoSQL. All the documents are JSON objects. Initially, it was created for web products’ needs.
The Nature of the Test
So, let’s imagine that we have 4 server machines for test execution. Each of them has Xeon with 8 cores, 32 GB RAM, and 4 Intel SSD (each has 120GB).
The whole test is based on Yahoo! Cloud Serving Benchmark. This is a special benchmark that was developed by Yahoo! in 2010 with Apache license. It was created for NoSQL database testing.
And even now it’s still a very popular tool for testing the NoSQL products. Moreover, it may be considered as a standard. And for the record, it was created in Java programming language.
We added driver Aerospike to Yahoo! Cloud Serving Benchmark for our test, updated the driver for MongoDB, as well as changed the result output form.
We needed 8 local machines in order to model some load for the cluster. Specifically, each machine had i5 processors with 4 cores and 4 GB RAM.
The NoSQL Configuration Is the Following
You can change the initial configuration only on the basis of recommendations which are in the documentation of achieving the expected productivity. The most common recommendation – network set up according to the core number and main memory capacity.
The settting of Couchbase is very simple. Also, this product has its own web console. You just need to launch the service at every cluster node.
Then create a new bucket (especially for key-value) at some node and add other nodes to the cluster. You can do all this in the simple easy web interface. There are no hidden or difficult configuration parameters.
Cassandra and Aerospike are set at one level approximately. A user just has to create an original configuration document at every cluster node.
Similar files are almost identical for any node. After this, you should run the daemon. If everything is ok, nodes will be connected into one cluster.
The situation with MongoDB is more difficult. Unlike other databases, this one doesn’t have equal nodes. In MongoDB amount of replicated data is defined by the cluster structure. You can set the correct work of cluster in MongoDB only if you thoroughly deal with technical information about this.
Every NoSQL database has its own way to display data and valid operations. It means that YCSB will consolidate random databases (including SQL).
Set of information which is managed by YCSB is simple values and keys.
Key is an original row which includes 64-bit hash. In other words, YCSB turns to database records by integral index, if it knows their total amount. But a lot of keys look like a random thing for a database.
As a result, we have dozens of fields with random binary information. By default, YCSB can generate kilobyte records. But in the GBIT network, it limits the user up to 100 000 operations per second. It’s a good practice to reduce the record size to 100 bytes.
YCSB makes simple operations on similar information: new record with key and random data, the record reads and updates via the key.
You should execute the test under various load. A critical parameter is operations correlation: what is the number of reads and updates.
You can use Heavy Write (50% reads and 50% updates) or Mostly Read (95% reads and 5% updates). The operation is chosen randomly. The percentage is only responsible for determining the probability of choosing an operation.
Quality assurance companies can use 2 types of NoSQL – slow and fast. According to the described test and common tendency on the Internet, we can consider the key-value database as a fast one. Couchbase and Aerospike are well ahead of the competition.
Aerospike is a really fast database. Its configurations allow performing a million operations per second.
Couchbase is fast too but on the memory operations only. You can use such a base if you need to execute operations of less than 500 million records.
MongoDB slowly records data but it can quickly read. A big advantage of this base is parameters of blocking the reads and recording which may be very helpful in the case of heavy loads.