arch:orm-nosql [Программная инженерия и машинное обучение]

ORM' for NoSQL: main challenges and solutions

Name: Danil Kireev

Introduction

Choosing a database for your application is always a challenge: do you want safety of SQL or speed of NoSQL, but what if you could combine the speed of NoSQL and safety of SQL? Well, ORMs for NoSQL can provide precisely that! But why aren't they as popular as ORMs for SQL? Moreover, why aren't there as many ORMs for NoSQL as there are ORMs for SQL?

Main Part

Firstly, let us take a look at what are NoSQL and ORMs?

What is NoSQL?

NoSQL stands for “non-SQL”, or sometimes “Not only SQL”, and according to [1] is a “next Generation Database Management Systems mostly addressing some of the points: being non-relational, distributed, open-source, and horizontally scalable.”. In other words, NoSQL is a modern replacement for traditional databases. Moreover, I think that there is an important distinction: SQL is based on relationships while NoSQL is not. That comes with some advantages to NoSQL: previously mentioned scalability and distribution, increase in productivity, because programmers do not need to learn relational algebra, and many others, but also brings some disadvantages, like stale reads or data loss.

What are ORMs?

ORM stands for Object Relational Mapping, meaning mapping from Relations (database) to Object (construct in a programming language). It is a technique of converting relationships in a database to objects in Object-Oriented programming languages, making working with such databases more productive and accessible for those who do not know relational algebra.

Relations in non-relational DB?

When talking about ORMs for NoSQL, one of the commonly brought-up questions is the naming itself: how can there be mapping to a relationship in a database if it is non-relational? Moreover, many people agree and call them ODM - Object Document Mapping, like in [2], a MongoDB ODM library for Python, and I fully support this naming.

Problems

So now that we are on the same page about NoSQL, ORM, and naming, let us talk about the main problems of implementing ODM for NoSQL. Main problems:

1. The most obvious problem is the variety of NoSQL databases and the lack of standard or common language. Firstly, there are key-value stores, like Redis, which are used to store data and quickly retrieve it. Secondly, there are Document databases, like Couchbase or MongoDB, which store information in objects. Thirdly, there are graph databases, like Neo4j or Dgraph, which store information in graphs. There are almost like another kind of database, even though they're commonly grouped with NoSQL. Even their ORM alternative is named uniquely: OGM[3], which stands for Object-Graph Mapping. Lastly, there are NoSQL databases wich can present data differently, like ArangoDB, which can provide a relational, document or graph view of the database. Of course, this is because NoSQL databases solve many problems, like object storage of graph networks, which is almost impossible to solve with traditional SQL databases.

2. Another problem, most commonly brought up, is that ORMs sacrifice performance and speed, so should ODMs, which contradicts the purpose of NoSQL: speed and performance. ORMs do this to synchronize “local” and “original”, or “remote”, state of the database and is the cause for lower speed and performance of ORMs.

3. Last but not least is the problem of need. Is there any actual need for ODMs, or is NoSQL enough? Since NoSQL databases are so varied and unique, there must be a database that solves all of your application's needs and problems.

Solutions to these problems

1. The easiest and most common solution is to support only one NoSQL database, like in [2,3]. This approach makes abstractions easier to create and connections to plan. A more complex solution is to create a NoSQL standard and common query language and create an ORM framework for this standard. This approach is similar to SQL databases: they already have a common language, SQL itself, and a standard SQL and “ISO/IEC 9075”[4]. However, this approach might be impossible because of the uniqueness of NoSQL databases. Nevertheless, it is possible to take a smaller subset of NoSQL databases, like graph databases. There is already a common language for querying graphs: GraphQL[5], which is a primary query language for Dgraph and can be used with Neo4j, the only thing left is a standard of graph databases.

2. Talking about performance, I think that it's a problem of tradeoffs: if you want performance, then use native solution, but if you want readability and ease of use, then use ODM. Nevertheless, it is always important to think about the performance of a library, so library maintainers need to think about the performance impact of the library.

3. Lastly, the question of need. The answer to this question is personal: if you know the query language of choice and know how to access the database, you don't need ODM. If you are only learning about the database but already need access to it, then ODM will combine programming language familiarity and access to a database.

Conclusion

In conclusion, I would like to state my opinion. I personally believe that ODMs are unnecessary. There's no uniformity among NoSQL databases, so every solution will be partial. Moreover, the tradeoff of speed doesn't make any sense for databases aimed at speed. Lastly, I find many NoSQL database connectors have a very common and understandable interface, like in [3]. However, I think that there exist two exceptions: 1. prototyping: if you only prototype something and don't need to dwell deeper, then ODMs are the right fit. You don't need to know everything about the database and can leverage your knowledge of programming language. 2. readability: if readability is at all important to you, then ODM is the right choice. If someone reads your source code, they will understand ODMs better than they can understand some new query language.