Canonical Models and your Data & Analytics Strategy

raulporras · ‎04-04-2022

As many organizations continue to invest in a data and analytics (DnA) strategy fundamentally centered on the care and feeding of an enterprise Data Lake, the idea of defining and maintaining a single, holistic semantic layer on top of the data is worth evaluating.

The concept of a canonical data model usually refers to a common data model that represents every single entity across a whole enterprise and that is complete and exhaustive to support any use case, business capability, purpose, or line of business. Many data architecture approaches strive for establishing one (and “exposing” one through various technical means). Canonical models have the guiding principle of abstracting what could be a complex combination of data sources and physical data structures into a coherent model that is relevant and understandable to non-technical consumers and applicable across the whole organization.

Since these canonical models are -by definition- not use case specific, they’re usually constructed with a “field of dreams” mentality (“if you build it, they will come”). The success of their deployment can only be determined after sufficient adoption is demonstrated, which makes it hard to justify the effort required to build and maintain them and (in my experience) successful definition and long-term implementation of these models are rare.[1]

However, recent excitement and advancements in AI technologies have renewed the interest in having canonical models at the enterprise level since Machine Learning models also intend to find patterns and insights whether a specific use case has been described in advance or not. Also, the relevance of any AI model will largely be dependent on the breadth of the scope of its training data: the more diverse the data, the better.

283214_shutterstock_512884012 sm.jpg

Domain Driven Design

“A model is a simplification. It is an interpretation of reality that abstracts the aspects relevant to solving the problem at hand and ignores extraneous detail.” Eric Evans

One of the challenges tied to defining semantic models is that someone needs to make a design choice on what is relevant to the model based on some context in which the model is expected to be useful. Even though they are representations of the same real-world region, a street map, topographical map, or an aviation chart will look very different because they serve different purposes.

This is what Domain Driven Design[2] refers to, and a few implications come to mind in the context of defining enterprise-wide (canonical) semantic models for analytics purposes:

Domain expertise is needed
If you were curious enough to follow the links, I’m pretty sure your understanding of the maps above will vary greatly if you’re a certified pilot or if you’ve ever had to interpret contour lines while hiking a mountain. The people/organization defining a canonical model need to understand the business or some other type of context in which the model will be used. Which means that…
Even if it is broad, some notion of what the domain is needs to be explicitly defined
I’ve tried to imagine a canonical model that is so broad that it can serve any imaginable domain in which its described entities participate. I couldn’t. If you must create a representation of reality that lives in some abstract data structure, you must make choices on what is relevant and what isn’t. And you won’t be able to tell what is relevant and what isn’t unless you know the context (i.e. domain) in which that model will be used.
The broader the domain, the more complex the model
And the more complex the model is, the harder it will be to adopt by non-technical consumers (which is where we started this discussion). Also, complex models need more maintenance over time as the real world changes. Your company’s strategy might change, its business might diversify, etc.

Does SAP have a Canonical Model for its solutions?

Not that I know of.

You may have heard about the OneDomain model (ODM) that SAP has created as part of its Intelligent Enterprise program to facilitate integration across functional components. This model does not promise to represent everything and anything needed by the different solutions within the Intelligent Enterprise suite (and that’s a good thing).

The ODM is instead meant to facilitate integration within these components and can be thought of as a “least common denominator” definition of entities relevant across functional domains. It also simplifies the underlying (domain-specific) structure with the intention of facilitating extensibility (minimizing dependencies, taking only what’s necessary, etc.)

How else can SAP solutions help me provide a semantic layer?

Beyond canonical models, there’s a couple of things you can count on from SAP’s strategy:

End to end analytics for end to end processes
The ODM, as other technical guidelines under the Intelligent Suite program, is meant to support end-to-end processes across SAP applications (or other modular components). And end-to-end processes produce end-to-end insights that will be relevant for every functional area involved. The ODM will only have what’s relevant to all business domains, and anything that’s domain specific will be in that domain’s application, but we will continue to invest in providing business context to reconcile these different perspectives across end-to-end processes.
A flexible semantic layer
One of the reasons Data Marts exist, is because there are many challenges that come from either comingling different domains on a single entity (like “customer”) or restricting that entity to be the “least common denominator” across domains. But Data Marts and other “physical” data structures historically cause fragmentation or unneeded replication of data, which are to be avoided. The concept of Data Warehouse Cloud spaces allows somewhat independent models to coexist with curated enterprise-wide data and should be seriously considered.

Start with the end (user) in mind

Finally, just like anything else in the software world, it pays to consider the ultimate business user as we build abstract concepts and structures to coalesce a diverse collection of information. There’s always a reason for which analyzing business data is a worthwhile activity and our models (canonical or otherwise) will only be better if that reason is at the top of our minds as we make architectural decisions and tradeoffs.

I welcome any comments challenging my perspective. Have you been involved in an implementation of a successful data strategy based on a holistic canonical model that has stood the test of time? I’d love to hear from you.

[1] To be clear: I’m referring to the use of canonical models to expose enterprise-wide data for broad analytics purposes. The success of canonical models as an integration pattern is a different discussion.