Domain-Driven vs Storage-Driven Object Modeling •…

One of the main concerns in modern day programming is how best to represent the most important rules of a business domain, where Object Oriented Programming, the idea of encoding interconnected objects as representatives of real life entities in the business domain, is by far the most widespread approach. However, regardless of language or technology, there are many flavors of how developers approach this.

Most typical backend applications nowadays will have a database and some form of a JSON based API. A common pattern that can be observed in the engineering community is to start all design from the database, deciding first how data will be stored, that is, how entities of the business domain will be represented when persisted. Then, engineers either continue to build their JSON based API to match the DB structure as closely as possible, or design it independently and consider it the applications responsibility to do the appropriate conversions.

Let's look at an example of what this may look like, assuming we're working with a relational database and Scala as the programming language. Let's assume we need to represent passengers of a US travel agency, where domestic passengers should have a social security number and a state of residence, while foreign passengers should have a passport number, country of origin and optionally a visa number.

Following the above described approach, let's start from designing the database schema, which may look like the following:

CREATE TABLE passenger(
    id UUID PRIMARY KEY,
    is_domestic BOOLEAN NOT NULL,
    ssn VARCHAR NULL,
    state VARCHAR NULL,
    passport VARCHAR NULL,
    country VARCHAR NULL,
    visa VARCHAR NULL
);

Having such a database table, engineers are often tempted to proceed representing it within application code as such:

case class Passenger(
    id: UUID,
    isDomestic: Boolean,
    ssn: Option[String],
    state: Option[USState],
    passport: Option[String],
    country: Option[Country],
    visaNum: Option[String]
)

and in the JSON API as:

{
    "id": "c4be40a9-87a8-43ca-8269-d757cb6341d2",
    "isDomestic": true,
    "ssn": "123456789",
    "state": "Minnesota"
}

While there is some undeniable appeal to this approach in its simplicity and conciseness, the above example shows a very important drawback. Various invariants of the business domain are not represented by this model. It does not restrict from having a domestic passenger with no SSN, a foreign passenger with an SSN and US state of residence, or any other combination that should not be possible by the rules of the business domain.

One common way of dealing with the domain model not having all the appropriate restrictions is by making sure that a layer of validation is applied to all user input, essentially making sure that even though our model is capable of representing invalid objects, they will never be created because such data will never pass the validations. While this solves the more obvious part of the issue, responsibilities of an application often don't stop at validating, persisting and retrieving data. Engineers writing applications with various layers of fairly complex business logic will quickly run into the issue of having to always keep in mind the business rules that were applied during validation to know what invariants can they safely assume when working with the objects of this inherently permissive model. This, apart from being the source of significant mental overhead, can also be one of the biggest sources of mistakes, especially when a longer time passes between implementing the validations and introducing a new piece of business logic that needs to rely on such assumptions. How can we avoid this?

One of the most important concepts of Domain Driven Design is what some call "model driven design", the idea of having a central portion of the application dedicated to having a reasonably accurate representation of the most relevant entities of the business domain (the "model"). On top of this, when dealing with strongly typed programming languages, we also have the opportunity to adopt the approach that is mostly known as "making illegal states unrepresentable" to make sure that the types we define for representing our domain entities also encode the most relevant business rules, so that having only valid objects becomes no longer an assumption but a guarantee.

Stating that databases don't represent at least some of the business rules would simply not be true, as essentially this is what database schemas are for. In the above definition of the Passenger table we have a guarantee for example that all passengers will either be domestic or not and the system will always have this information about each passenger. The same applies to JSON and many other standards for representing data. However, the degree to which various technologies can express the potentially complex rules of a business domain varies. More often than not, statically typed programming languages have more sophisticated ways of representing these than database schemas or JSON API specifications would. Having this in mind, let's try to redesign our application, but this time let's start from the layer where we can express the most: the object model. Having the same requirements as above, we may represent passengers as such:

sealed abstract class Passenger { id: UUID }
case class DomesticPassenger(id: UUID, ssn: String, state: USState) extends Passenger
case class ForeignPassenger(id: UUID, passportNumber: String, country: Country, visa: Option[String]) extends Passenger

Our database schema and JSON api resource may remain the same as before.

Just as the previous one, this approach also has some drawbacks, mainly the fact that converting between these three representations (DB schema, Object model, API resource) of the same data is no longer trivial. This means writing the appropriate mappings adds some development overhead as well as possibly adding some degree of performance overhead. However, we've gained something we were lacking before in the realm of business rules, as this encoding of the domain model introduces several invariants that were not represented before:

domestic passengers must have ssn and state defined while these are irrelevant for foreign passengers
foreign passengers must have passportNumber and country defined while these are irrelevant for domestic passengers
only foreign passengers may need a visa

Now whenever anyone is writing any business logic relying on objects of this model, they will no longer need to resort to assumptions or keep going back to the validations to check what should always hold true. Furthermore, mistakes in respecting the encoded restrictions will be caught by the compiler, automatically warning engineers about the rules of the business domain, even if they were defined months or years ago by another engineer.

The takeaway from this of course should not be that the standard structures of object-oriented programming are useful, as that would be an obvious understatement. Rather, the takeaway should be to consider the object model representing the business domain as the heart of an application and try approaching software design from a model-first perspective, essentially treating the database just as a tool serving the application rather than the other way around.

Just as any other decision in engineering, there is a tradeoff to consider when choosing this approach. Backend services whose sole purpose is to store and retrieve data in the most efficient way possible may not benefit from taking this route. However, more often than not, services are responsible for much more, including business logic of various complexity and data structures that are significantly more complex than the simplified example above. Lost or forgotten knowledge often becomes the largest cost in an application's lifetime, taking a toll on both maintenance and further growth. For them, having a strong model can be paramount in the long run.