Relational database management system (RDBMS) have been a primary data storage mechanism for decades. NoSQL databases have existed since the 1960s, but have been recently gaining traction and the business faces a challenge of their efficient adoption.
There are many tutorials explaining how to use a particular flavor of SQL or NoSQL, but few discuss why you should choose one in preference to the other (“SQL or NoSQL – that is the question”). Hope to answer the tough question here covering the fundamental differences in business capabilities. Here should be noted that since the author is Microsoft fan all the next is written keeping in mind Azure environment with such a products as Azure SQL Database, Azure Parallel Data Warehouse, Azure Data Lake Analytics and Azure Spark on HDInsight.
The Holy War
Before going further, let’s dispel a number of myths…
* NoSQL supersedes RDBMS *
NoSQL and RDBMS do the same thing: store data. They take different approaches, which may help or hinder a project. Despite feeling newer and grabbing recent headlines, NoSQL is not a replacement for RDBMS — it only increments business capability in specific areas (text processing, stream analytics, etc). In very few cases it’s reasonable to replace completely RDBMS with NoSQL.
* NoSQL is better / worse than RDBMS *
Some data is better to be stored and processed using an RDBMS database. Some is better suited for NoSQL. Some could use either interchangeably.
* RDBMS vs NoSQL is a clear distinction *
SQL Server 2016 and Azure Parallel Data Warehouse already adopted NoSQL features and vice versa: Hive (on HDInsight cluster) and Data Lake Analytics can work with relational data. The choices are likely to become increasingly blurred, and NewSQL hybrid databases could provide some interesting options in the future.
That said, don’t make life purposely difficult! Choosing an unusual technology combination or a mix of RDBMS and NoSQL is possible, but it’s tougher to find support and employ experienced developers. And this is THE MAIN REASON to stay with one and the same vendor providing all services for ALL cases – Microsoft.
With that in mind, let’s look at the primary differences in impact on business…
Pros and cons of both or How to make the difficult choice
Projects where RDBMS is ideal:
- logical related discrete data requirements,
- data requirements which can be identified up-front (NoSQL allows to leave data until data requirements arrive from a business),
- data integrity is essential,
- standards-based proven technology with good developer experience and support.
Projects where NoSQL is ideal:
- unrelated, indeterminate or evolving data requirements (NoSQL has capability allowing to perform cleaning/filtering out rogue data on the fly),
- simpler or looser project objectives, able to start coding immediately,
- scalability is imperative.
Also it’s important to note that from budgeting perspective RDBMS is more preferable because:
- it’s easier to find RDBMS developers;
- wages in RDBMS area are lower;
- development is faster;
- administration is cheaper due to better transparency;
- RDBMS is Azure might appear expensive (price should be carefully calculated in each case because there are multiple options here like for example new suspension feature of PDW that allows to save a lot in the most of cases).
It’s uncertain which variant is more preferable from speed perspective. Again, every case is unique and should be considered separately.
in the most of cases it’s clear which approach to use – NoSQL or RDBMS – however it’s necessary to choose particular technology carefully and this choice in some cases can alter initial choice between the approaches.