The following are the primary reasons why "data storage mechanism" is regarded as the core of corporate software systems: (1) The most important part of software is what controls how quickly an application responds to a request, and (2) data loss is frequently regarded as undesirable because it disrupts critical business operations. Relational database management systems (RDMS) were the sole option up to the advent of NoSQL (Not-Only SQL) databases. However, as the amount of data kept grows, relational database management system constraints, such as scalability and storage, as well as query efficiency loss due to huge volumes of data, become more complex, making the storage and maintenance of bigger databases more challenging
NoSQL databases, meaning "Not only SQL", have become known since 2009, to meet new performance needs when processing large volumes of data. NoSQL does not replace relational databases; rather, it complements or replaces the functionality of relational databases to provide more interesting solutions in specific situations. The term "NoSQL" is made up of two words: "no" and "SQL"
The CAP theorem is an acronym for "coherence", "availability", and "partition tolerance", also known as Brewer's theorem. This theorem, formulated by Eric Brewer in 2000 and demonstrated by Seth Gilbert and Nancy Lych in 2002, is a conjecture that states that it is impossible, on a computer system with ribbed computing, to guarantee the following three constraints at the same time
Consistency: At the same moment, all nodes (servers) in the system see the same data.
Availability: Ensure that any request gets a response, even if it hasn't been changed.
Tolerance for partitions: Except in the event of a broad network outage, the system must be able to respond correctly to all requests in all conditions. When splitting a network into subnets, each subnet must be able to function independently.
MongoDB is a document-oriented database that is open-source and provided under the AGPL license. (Free license), ensuring excellent performance, availability, and scalability on demand. The MongoDB database has been created in C++ by the 10gen firm since 2007, when it was working on a broadly distributed data cloud computing system akin to Google's App Engine. The initial version was released in 2009, but version 1.4 was only declared commercially acceptable in 2010
Cassandra is a large-scale data management system originally designed in 2007 by engineers from Facebook to address issues related to the storage and use of large volumes of data. In 2008, they tried to democratize it by providing a stable, documented version, available on GoogleCode. However, Cassandra did not receive a particularly enthusiastic reception
Redis, which stands for Remote Dictionary Server, is a BSD-licensed key/value type NoSQL database that was created in C
Author
Yahoo! Cloud Serving Benchmark is used by
The behavior of two of the most popular document-based NoSQL databases, MongoDB and document-based MySQL, was examined in this research
In Article
In this research, two widely used NoSQL database management systems, MongoDB and Apache Cassandra, are compared and contrasted in the author's
Cornelia A. Gyorodi, Diana V. Dumse-Burescu, Doina R. Zmaranda, and Robert S. Gyorodi
In the above lecture review, the authors test the performance based on specific records like 1000 and 10000. Our attempt to address this in this article is to do comparative analysis of the performance of three commonly used databases: MongoDB, Cassandra, and Redis. Existing tests are based on overall throughput of 100000, 250000, 500000, 750000, and 1000000 operations. We will also measure the average latency of different workload scenarios that include a mix of read, write, and update activities. The authors employ the widely used Yahoo Cloud Serving Benchmarking Tool (YCSB), which is a performance measurement tool for NoSQL databases. This helps a user to better understand database performance and choose which system is best for a given workload.
The rest of this research study is arranged in the following manner. The following section covers related work. The methodology is described in Section 2. The results and discussion for the tests are presented in Section 3, followed by the experimental evaluation. Finally, we offer our conclusions and recommendations in section 4.
A complex tool provided by Yahoo is called YCSB (Yahoo! Cloud Serving Benchmark). It's a brand-new open-source benchmarking approach that lets users make their own packages by adding additional workload parameters or, if necessary, writing Java code. In a Yahoo! research that included benchmark data for four commonly used systems, it was discovered that: Apache HBase, Apache Cassandra, YahooPNUTS,!, and a sharded MySQL version are the best in terms of performance and elasticity. In
|
|
Processor : Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz 2.90 GHz RAM : 12.00 GB HDD: 1 TB |
Windows 10 , PC 64-bit |
YCSB 0.17.0 |
|
MongoDB 4.4. |
|
Cassandra 4.0.3 |
|
Redis 6.2.6 |
We'll use the three databases covered in this paper for our testing: MongoDB, Cassandra, and Redis. Using the Yahoo! Cloud Serving Benchmark framework, we'll put each of the three databases to the test. The YCSB tool is made up of two parts: a ycsb-client that creates the workload and defined workloads, which are the scenarios that the client will run.
We continued with the installation and configuration of YCSB 0.17.0 after downloading and installing MongoDB 4.4, Cassandra 4.0.3, and Redis 6.2.6, but first, we needed to install Java, Maven, and Git on our machine. Each test began with a blank database. We began by creating six workloads in the YCSB tool. Workloads (detailed below) are run on the three databases after the data has been loaded. Checks on the database's health are performed between each task.
The general objectives of the tests were to:
1. Choose workloads that are representative of today's current apps.
2. Use data amounts similar to those found in "big data" datasets.
3. Vary the read/write workload amounts to compare the two solutions' performance.
4. For a more comparability of results and a clearer comparison, keep the same titles of workloads with the same rates as in
Workloads are a set of scenarios that include a mix of read, write, and update activities. The following are the workloads that we used in our testing:
• Workload A has a 50% reads and 50% updates ratio;
• Workload B has a 95% reads and 5% updates ratio.
• Workload C, all readings;
• Workload D, consisting of 95% reads and 5% inserts;
• Workload E, consisting of 95% scanning and 5% inserting;
• Workload F, which is composed of 50% reads and 50% read-modify-write;
100000, 250000, 500000, 750000, and 1000000 operations were chosen for testing. We're also selecting entries for Uniform using the default distribution.
For all databases and workloads across all operation counts, all tests have been executed successfully, confirming no insert, read, or update failure.
The total performance of MongoDB and Cassandra databases drops precipitously as the number of operations rises, but Redis' overall throughput rises (
The read average latency for the Cassandra database is steadily rising (
Compared to the overall throughput and read latency graphs, the update latency has a somewhat distinct curve. From 100,000 to 1,000,000 operations, MongoDB and Redis have a steady rise in latency, but Cassandra has fairly stable latency. However, it is clear from
When compared to Redis and Cassandra, MongoDB offers many more operations per second in workload B. Redis and Cassandra ensure stable performance as the number of operations increases. However, MongoDB's performance may vary. When 500,000 and 750,000 procedures are performed, it drastically decreases. But compared to Redis and Cassandra, MongoDB continues to retain a substantially higher throughput (
As might be predicted with a 95 percent read operation mix, MongoDB has significantly lower read operation latency than Redis and Cassandra. Due to the high volume of active operations, there is only a little increase in MongoDB's latency when compared to Redis and Cassandra (
Once more, we see that Cassandra still has a larger update latency than Redis and MongoDB in workload B. With an increase in operation counts, Cassandra's latency increases marginally, while MongoDB's latency is also rising (
In workload C, MongoDB has a substantially higher total operations per second than Redis and Cassandra. Cassandra's performance falls as the number of operations rises, but MongoDB's throughput begins to marginally decline at 1,000,000 operations. With Redis, throughput is almost constant throughout the process. But compared to Redis and Cassandra, MongoDB still has a considerably greater throughput (
With a 100% read operation, the latency in read operations is lower for MongoDB as compared to Redis and Cassandra. From the above figure, MongoDB and Redis latency are almost consistent across the operation count. Overall, MongoDB’s latency is lower than the other two databases (
Compared to MongoDB and Cassandra, Redis has a substantially higher total operations per second in workload D. Redis improves throughput whereas Cassandra reduces it as the number of operations rises, although MongoDB experiences a tiny drop in performance when the number of operations reaches one million.
With a 95% read operation, the latency in read operations is lower for Redis as compared to MongoDB and Cassandra. From the
For workload D Redis performance better in case of overall throughput and read operation.
Compared to Redis and Cassandra, MongoDB has a substantially higher total operations per second for workload E. While MongoDB modestly increases throughput as the operation count reaches 7500,000, Cassandra loses throughput as the operation count rises. With Redis, throughput is almost constant throughout the process.
Again, MongoDB outperforms Redis and Cassandra in workload F in terms of total operations per second. As with Cassandra and Redis, their performance falls as the number of operations rises. However, MongoDB shows a tiny gain in throughput at a million operations.
With workload F read operations, the latency in read operations is lower for MongoDB as compared to Redis and Cassandra. From
After analyzing the results from the three NoSQL databases, MongoDB 4.4 as document store, Cassandra 4.0.3 as column store, and Redis 6.2.6 as key-value store, and after executing six workloads made up of 100000, 250000, 500000, 750000, and 1000000 operations, we came to the conclusion that the numerous optimizations used by the designers of NoSQL solutions to improve performance, such as good cache memory operation, have a direct impact on the execution time.
From the performance tests of MongoDB, Cassandra, and Redis, we have learned a few things.
• Redis has the best read performance of all the databases. This is due to the fact that data is stored and retrieved using volatile memory.
• In terms of read operations, MongoDB outperformed Cassandra. The register mapping for MongoDB is loaded into RAM as a result, improving reading performance.
• MongoDB outperformed Redis and Cassandra when it came to scan operations.
• Cassandra outperformed Redis in terms of scan operations.
• Cassandra was harder to work with when it came to reading and updating. This is mostly due to the lack of optimization for these types of procedures.
• In all workloads except workload D, MongoDB has significantly reduced latency across all operations.
As a consequence of our tests and research, we can conclude that MongoDB is a superior performing NoSQL database.
However, the study described in the article has a number of shortcomings that can be fixed with new research approaches. One of these options for expanding on the study that has been provided will entail evaluating several NoSQL databases over the cloud. The examination of additional NoSql databases for testing, in order to be able to test other elements of performance, might also be a second route for the development and enhancement of this article.