Integrating Analytical and Operational Data: A Non-Negotiable
Apr 14, 2024The blog article was inspired by Jan Hegewald's question posted to the data community on Linkedin. Thank you!
Operational systems generate data that is immediate but raw and often incomplete, while analytical systems work on enriched, valuable data that is not being utilized in real-time decision-making.
This disconnect causes a stagnation that prevents businesses from fully understanding and engaging with their customers and stakeholders.
This is, in very large part, because holding onto the division between analytical and operational data is antiquated. Here’s how.
The Consequence: Missed Opportunities, Organization-Wide Frustration and Lackluster Customer Experience
When companies fail to integrate these two data streams, they risk delivering a service that’s disconnected from the customer's and/or stakeholder's current needs and behaviors. They miss out on the chance to personalize, to be proactive, and to truly delight the customer/stakeholder, which can lead to frustration, customer churn, and ultimately, a hit to the bottom line.
In few words, the outputs don’t generate the expected and necessary outcome.
Faced with this predicament, there's a palpable frustration in knowing that the insights to propel a product or service to the next level were already within reach, yet they remained untapped due to outdated structures. The potential of ETL-processed data—clean, integrated, and ready for action—is being wasted. The good news is that this can be prevented.
Integrating rich analytical insights directly into operational systems or customer interfaces marks a significant leap towards creating personalized and engaging user experiences. This strategy leverages the depth and breadth of analytical data to inform and enhance operational processes and customer interactions in real-time. Personalized offerings, predictive maintenance, and dynamic pricing models are just a few examples of how enriched operational systems can transform the customer experience.
The Solution: Integration of your Analytical and Operational data
The solution is a strategic integration of your analytical and operational data. Here's how to make it happen:
- Eliminating the barriers between operational and analytical systems and providing high-quality data in the use case-specific formats.
- Embeding privacy and consent filters into the process, ensuring that the data leveraged respects customer consent and regulatory requirements.
- Fostering cultural and organizational changes that break down silos, promote cross-functional teams, and align everyone towards a common data-driven goal.
Max Schultze, our Master Coach for Data Architecture:
"It is interesting to observe that the boundaries between operational and analytical data are getting more and more blurry and that it is no longer uncommon to use analytical systems to feed back into operational processes simply because they are the best you have. Ultimately, we might drop that separation entirely and focus on the things that actually matter: the requirements for the use case. If you can clearly express what your needs are, e.g., latency and quality, and you have the right tooling in place to enable that use case, then you can make a conscious decision about how to serve it. A data product inherently does not care where it is stored, how it is built, or how it is accessed; it should exist for a clear purpose. Then, often, the cost of fulfilling that purpose becomes the driving factor. You won't be serving all your data in sub-second freshness just because you can."
"It's fundamentally a culture issue (plus sometimes justified). MLEs and DSs feel like the warehouse data are not for them. 1. Their voice is not included in setting up / adding to / maintaining the DWH. The use cases that were considered when choosing the architecture don't reflect current reality. The "data model" might not fit their needs and it's too hard to change anything about the DWH architecture. 2. It takes too long to add new data sources to the DWH. Sometimes it's because the process is over engineered and the DE team operates too much like an application engineering team. It's faster to just get the data from source. 3. Sometimes it's easier to get access to the source than to data in the needlessly coveted DWH. Once the MVP is built to leverage data a certain way, to re-configure everything to fit the DWH is non-value-adding work. 4. Corollary to 1, data need to be processed in unique ways for different applications. Especially for ML use cases, it might be easier, and healthier, to start feature engineering the raw data than modeled data. Lots of choices are made during data "quality validation" and modeling. Similarly for feature engineering. These choices don't always line up."
Francesco Mucio, Lead Data Engineer (Contract) at Allianz on Linkedin:
"There is a whole industry doing what they call Reverse ETL, which is more or less "feed back your operational systems with your DWH data".
People and use cases never cared where the data were. The problem was the technology. If you try to run an analytical query on an operational system you will make it look bad for your users: end users will probably get bad performance, your analysts will get bad performance and bad data quality.
With the shift to cloud, event processing, the separation of storage and computer (and other nice things), data become more accessible and usable for multiple purposes. Can I query operational data to put them in a dashboard? Sure, just use the events. Can I serve the DWH data to my external users? Sure, just spin up a compute engine not used by the analysts."
Jon Cooke, CTO at Dataception on Linkedin:
"There is no difference between transactional and analytical data - any data that is used in a business process is "operational" I.e. it needs some love and people to make sure it's good enough for usage in it's part in the business process, there are different workloads (analytical/transactional). Best to think away from technology solution, towards which bit of business process the data is created for and which component consumes upstream and creates data for that bit. One should always source data from the system that created it - I've seen many a person saying data warehouses are sources of truth - which for most data is not correct. The key is the life-cycle of the data when it's created as any copies in downstream systems will have a different life-cycle and business context is usually lost. AI is very interesting as the whole way we approach data management actually changes quite a bit. E.g. generally not interested in the "quality" of one row but of statistically significant data points in a large number of rows, to show a pattern that we can infer e.g. If an individual payment is wrong a biggie for finance, but for a ML based forecast it may not be significant. So the notion of "quality" changes."
Tom Redman, the Data Doc, our Master Coach for Data Strategy:
"Every now and then I find that asking a question at the right level of abstraction leads to a simple answer. I think the slightly more abstract question than the one you asked is as follows. "I have to do something very important. There is relevant data from multiple sources. Should I bring all that data together, or only use one source?" The answer to this more abstract question is, "Yes, of course, bring all the data together!!"
FURTHER READING:
- Is data mesh only for analytical data?
- What are key differences between operational and analytical data?
- What is Reverse ETL?
Eliminating the barriers between operational and analytical systems
Several crucial tactics are involved in removing the obstacles that stand between analytical and operational systems to guarantee the unrestricted flow of high-quality, actionable data across a business. These tactics aim to increase data quality and accessibility, which will ultimately facilitate better decision-making and higher operational effectiveness. Here are the main things to think about when removing these obstacles.
For both operational and analytical purposes, create a single data architecture. Data should be able to flow freely around the organization and be stored in formats that are compatible with a variety of use cases and systems. Data segmentation and parallel processing are two ways that improve the efficiency with which huge data volumes are managed. The efficiency can be enhanced by taking advantage of cloud computing's distributed computing and elastic scalability. There are three highly relevant integration pattern:
- Reverse ETL (Extract, Transform, Load) basically turns the traditional ETL method on its head. As a basic idea in data engineering, traditional ETL involves getting data from different sources, transforming it into a usable format, and then loading it into a data warehouse or data lake so that it can be analyzed. Reverse ETL, on the other hand, moves data from a central data warehouse or data lake back into operating systems and tools. The at-rest nature of data in warehouses and lakes means that Reverse ETL may not naturally support real-time use cases.
- Event Streaming: As an alternative to or in addition to reverse ETL, event streaming platforms such as Apache Kafka can process data while it is in motion, enabling real-time data integration. Reducing the time between the capture and analysis of operating system data by using real-time data processing can be therefore an important component. Data may be instantly analyzed by stream processing and in-memory databases, enabling decision-makers to take action on it.
- Landing zones are foundational structures within a data mesh architecture designed to support both operational and analytical data usage. They serve as a standardized and automated method for deploying cloud workloads, ensuring consistency across different domains for a modern data platform. These zones include components for data storage, real-time and offline processing, analysis, consumption, and governance controls, facilitating the distribution and integration of data across various domains.
Using a robust data governance framework, establish explicit norms and procedures for data management. In order to guarantee accurate, secure, and legally compliant data throughout the company, this framework ought to include privacy, security, and data quality requirements. Integrity and control of data are guaranteed by keeping an eye on its health using observability principles. These criteria include data freshness, distribution, volume, schema, and lineage. To guarantee data consistency and dependability, standardize data formats and carry out quality control. For system compatibility, this entails developing shared data models and metadata standards as well as routinely cleaning, validating, and enriching data.
Embedding privacy and consent filters into the process
A piece of data that doesn't mean anything in an operational sense could reveal sensitive information when studied in more detail. This shows how important it is to have a sophisticated approach to privacy and consent that can adapt to how data is used as it changes.
When operational and analytical data are combined, consent methods need to be strong and flexible so they can record users' choices in a range of situations. To do this, frameworks for consent must be created that not only meet regulatory requirements but also give people detailed control over how their data is used for both operational and analytical purposes. A customer might agree that their information can be used for practical reasons, like making services more relevant to them, but not for certain types of deep analytical processing.
Anonymizing data becomes even more important when operational data, which often has personally identifiable information (PII), is mixed with analysis datasets to get more information. PII is kept safe by using dynamic anonymization methods, which change the privacy of data based on how it is used. This is especially important when analytical processes need data to be shared between teams or used in new ways that weren't planned when the data was collected.
Real-time tracking is needed to make sure that privacy rules are always followed because data is always moving between operational and analytical systems. This includes keeping an eye out for possible data breaches or illegal access and making sure that all activities related to processing data stay in line with both the original consent agreements and any privacy laws that apply.
It is very important that privacy and permission are built right into the tools and technologies that are used to combine operational and analytical data. This "privacy by design" method makes sure that protecting data is not an accident but an important part of the process of integrating data. It means choosing data management and analytics tools that support privacy laws and consent management by design. This lowers the risk of privacy violations.
Data engineers are vital in detecting and preventing unwanted access to data by keeping an eye on audit trails and access logs. Protecting sensitive data and staying in compliance with regulatory standards are both made easier by performing system-wide security tests.
Fostering cultural and organizational changes that break down silos
Data silos form when various groups or departments inside an organization employ separate methods or systems to manage data, resulting in data warehouses that are inaccessible to one another. Inefficiencies and missed opportunities arise as a result of the organization's inability to view and evaluate data holistically due to this segmentation. As two separate fields, software engineering and data engineering frequently employ different approaches, jargon, and end goals. When people are on different sides of this divide, it can cause problems with data collection, processing, and utilization. Form interdisciplinary teams with experts in software engineering, data science, business analysis, and any other fields that may be pertinent. These groups should work on projects that call for an all-encompassing perspective of the company's data resources. Collaboration between these groups will allow for the development of integrated solutions that improve customer interactions and operational efficiencies by making use of analytical and operational data. In order to achieve the organization's data-driven goals, it is important to create and disseminate a transparent data strategy that details the responsibilities of each department. Teams may stay focused on the bigger picture of the data-driven mission by evaluating these goals and progress made towards them in cross-functional meetings on a regular basis.
Integrating richer analytical data into operational systems or customer-facing products, companies can improve user experiences, personalize interactions, and launch unique features based on deep insights. The best team is powerless in the face of a siloed culture. It shouldn't just collect data; it should use it—wisely and with precision—across every touchpoint.
Do not miss out on our premium content!
Join our mailing list to receive free premium content and updates from our team.Ā
Don't worry, your information will not be shared.
We hate SPAM. We will never sell your information, for any reason.