As many organizations continue to invest in a data and analytics (DnA) strategy fundamentally centered on the care and feeding of an enterprise Data Lake, the idea of defining and maintaining a single, holistic semantic layer on top of the data is worth evaluating.
The concept of a canonical data model usually refers to a common data model that represents every single entity across a whole enterprise and that is complete and exhaustive to support any use case, business capability, purpose, or line of business. Many data architecture approaches strive for establishing one (and “exposing” one through various technical means). Canonical models have the guiding principle of abstracting what could be a complex combination of data sources and physical data structures into a coherent model that is relevant and understandable to non-technical consumers and applicable across the whole organization.
Since these canonical models are -by definition- not use case specific, they’re usually constructed with a “field of dreams” mentality (“if you build it, they will come”). The success of their deployment can only be determined after sufficient adoption is demonstrated, which makes it hard to justify the effort required to build and maintain them and (in my experience) successful definition and long-term implementation of these models are rare.[1]
However, recent excitement and advancements in AI technologies have renewed the interest in having canonical models at the enterprise level since Machine Learning models also intend to find patterns and insights whether a specific use case has been described in advance or not. Also, the relevance of any AI model will largely be dependent on the breadth of the scope of its training data: the more diverse the data, the better.
“A model is a simplification. It is an interpretation of reality that abstracts the aspects relevant to solving the problem at hand and ignores extraneous detail.” Eric Evans
One of the challenges tied to defining semantic models is that someone needs to make a design choice on what is relevant to the model based on some context in which the model is expected to be useful. Even though they are representations of the same real-world region, a street map, topographical map, or an aviation chart will look very different because they serve different purposes.
This is what Domain Driven Design[2] refers to, and a few implications come to mind in the context of defining enterprise-wide (canonical) semantic models for analytics purposes:
Not that I know of.
You may have heard about the OneDomain model (ODM) that SAP has created as part of its Intelligent Enterprise program to facilitate integration across functional components. This model does not promise to represent everything and anything needed by the different solutions within the Intelligent Enterprise suite (and that’s a good thing).
The ODM is instead meant to facilitate integration within these components and can be thought of as a “least common denominator” definition of entities relevant across functional domains. It also simplifies the underlying (domain-specific) structure with the intention of facilitating extensibility (minimizing dependencies, taking only what’s necessary, etc.)
Beyond canonical models, there’s a couple of things you can count on from SAP’s strategy:
Finally, just like anything else in the software world, it pays to consider the ultimate business user as we build abstract concepts and structures to coalesce a diverse collection of information. There’s always a reason for which analyzing business data is a worthwhile activity and our models (canonical or otherwise) will only be better if that reason is at the top of our minds as we make architectural decisions and tradeoffs.
I welcome any comments challenging my perspective. Have you been involved in an implementation of a successful data strategy based on a holistic canonical model that has stood the test of time? I’d love to hear from you.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
14 | |
3 | |
3 | |
1 | |
1 | |
1 | |
1 | |
1 | |
1 |