International Journal of Sustainability and Innovation in Engineering (IJSIE)
2024
https://www.doi.org/10.56830/IJSIE202407
Author
Chandra Bonthu
Abstract
The data environments found in enterprises continue to be plagued by incompleteness, inconsistency, duplication, staleness, and distributional drift, which directly impact decision-making, the performance of machine learning, and regulatory compliance. Conventional data-quality strategies that may be as narrow as semi-static rules or as inefficient as manually cleaning data cannot address the speed and variety of modern pipelines. This paper suggests Data Quality as a Service (DQaaS). This paradigm shift redefines quality as a provision-managed, cloud-native capability that provides through APIs, contracts, and measurable service level objectives (SLOs). DQaaS incorporates declarative rules, statistical anomaly detectors, and machine learning models under a common multi-tenant platform and delivers round-the-clock monitoring, lineage-enabled diagnostics, and remediation as a service. The contributions that this work has can be classified in four ways. It is first to present a reference architecture that has control and data planes in both the streaming and batch pipelines.It formalizes measures of service level indicators (SLIs), SLOs, and error budgets on the critical dimensions of completeness, validity, timeliness, and accuracy. It makes contracts and schema evolution operational in CI/CD pipelines, in a compatible and accountable way between producers and consumers. It also tests DQaaS using enterprise datasets across ERP, CRM, and streaming data, clearly highlighting improvements in reliability, incident recovery times, and business performance with very little latency overhead. The results show how DQaaS can turn ad hoc quality activities into a scalable and bureaucratically enforceable service that is economically sustainable, with technical assurance, governance, and organizational objectives.
Keywords;
Autonomous Rule Discovery, Causal Anomaly Detection, Counterfactual Data Augmentation, Federated Learning, Contract-Aware LLM Validators
