white paper

From Chaos to Clarity - The Role of Clean Data in Banks' Digital Journeys

Digitalization has transformed the financial services sector over the past decade. Yet, with the rapid pace of change and continual technological advances, it can often feel like a never-ending journey. 

Banks are faced with the decision of choosing the most appropriate architecture model for their business, one which can help them efficiently absorb and process the huge influx of unstructured data, particularly when it comes to artificial intelligence (AI) and large language models (LLMs). 

In a Risk Live Europe panel session sponsored by Numerix, experts explored the role that clean and accurate data plays in digital transformation. This whitepaper examines the main themes arising from the discussion.  

  • Data architecture considerations for banks
  • Ethics and the increased risks of synthetic fraud using generative AI
  • Defining clean and accurate data
  • Responsibility, culture and governance over data quality
  • Benefits of self-service, cloud-based data infrastructure 
 

FAQs

How do capital markets banks solve data quality problems when the
majority of failures originate in the front office, not in IT systems?

Most data quality initiatives target technology — but at BNP Paribas, analysis
demonstrated that only approximately 10% of data quality failures were IT problems,
while approximately 90% were process or data ownership problems, according to
Mikael Sörböen, Head of Risk Systems and CIO Risk Markets at BNP Paribas. This
means technology investment alone addresses a fraction of the root cause. The
primary lever is governance: establishing clear data ownership accountability at the
source — particularly in the front office — with risk, finance, and IT empowered to
escalate quality failures rather than absorb them.

---

How do banks choose between public cloud, private cloud, and on-premise
architecture for risk and regulatory analytics workloads?

Architecture decisions in capital markets are not one-size-fits-all — the right
model depends on the specific use case. According to Sarthak Shreya, Product
Manager at Numerix, XVA calculations require deep domain expertise, significant
compute power, and data traceability and lineage, making a specialist cloud vendor
the appropriate choice for that workload. For sensitive customer information,
on-premise or private cloud may be preferable. According to Aiman El-Ramly, Chief
Business Officer at Zema Global Data Corporation, banks must also weigh culture,
legacy systems, cost, speed, and the operational resilience requirements of
third-party providers under regulations such as the EU's Digital Operational
Resilience Act.

---

What is the difference between using LLMs and simpler statistical models
for time series analytics in capital markets risk management?

Deploying large language models for tasks that simpler models can handle
creates unnecessary cost and energy consumption without analytical benefit.
According to Deenar Toraskar, Risk CTO Architect at UBS, a simple rules-based
algorithm can suffice for many queries that firms are routing to LLMs at 10 times
the energy and cost. Sarthak Shreya of Numerix noted that approaches such as
EWMA (exponentially weighted moving average) and GARCH remain highly relevant
for time series analytics in capital markets, and can be applied to fill historical data
gaps and produce consistent, complete time series data — which is specifically what
Numerix clients require for risk model inputs.

---

How do banks measure data quality when poor model outputs are often
the first — and only — visible signal that input data has degraded?

Data quality measurement is complicated by a fundamental detection lag:
degraded model outputs are often the earliest signal that input data has a problem,
but by then the bad data has already influenced analytics. According to Sarthak
Shreya, Product Manager at Numerix, it is difficult to look at data objectively and
distinguish good from bad in isolation. Shreya recommended beginning with a clearly
defined problem scope — for example, ensuring equity swaps or discount factor
curves are consistent across systems — and building a roadmap incrementally
rather than attempting an all-at-once data quality remediation.

---

How much of a bank's data quality problem is actually a technology
problem, and how should this change investment priorities?

The assumption that data quality is an IT problem significantly misallocates
investment. BNP Paribas demonstrated through internal analysis that only
approximately 10% of data quality failures originated in IT, with approximately 90%
attributable to process and data ownership failures, according to Mikael Sörböen,
CIO Risk Markets at BNP Paribas. Banks that continue to invest primarily in data
technology without establishing governance frameworks, data ownership
accountability, and cross-functional cultural change are solving the smaller problem
while the larger one persists.

---

How do banks establish governance frameworks that hold the front office
accountable for data quality rather than treating it as an IT problem?

Data governance fails when accountability is diffuse and front-office data
originators do not bear the consequences of data quality failures downstream.
According to Mikael Sörböen of BNP Paribas, culture is the primary lever — every
function including risk, finance, and IT must take responsibility, and risk and finance
teams should not be in the position of chasing data quality issues that originated
in the front office. Marina Antoniou of the ICAEW Financial Services Faculty noted
that governance requires leadership buy-in and integrated decision-making, with
key governance forums in place to monitor breaches and escalate problems —
framing data quality as a strategic issue, not a technology incident.

---

What EU regulatory requirements are driving banks to invest in scalable
data infrastructure and clean data capabilities in 2024?

The EU's Digital Operational Resilience Act (DORA) introduces requirements
around the operational resilience of third-party providers — including cloud vendors
— that banks must factor into architecture decisions, according to Aiman El-Ramly
of Zema Global Data Corporation. Separately, AI and large language model adoption
requires scalable infrastructure to absorb and process large influxes of unstructured
data, according to Marina Antoniou of the ICAEW Financial Services Faculty. These
regulatory and operational drivers together are pushing banks to build data
infrastructure that is not just analytically capable but demonstrably resilient and
governable.

---

How does self-service cloud data infrastructure reduce data quality
problems and eliminate the IT bottleneck in capital markets firms?

When users cannot access data directly and must route quality issues through
IT, resolution times extend and accountability diffuses. According to Deenar Toraskar,
Risk CTO Architect at UBS, self-service cloud-based data infrastructure with complete
visibility eliminates reliance on IT to diagnose what is wrong — democratising data
access so users can identify data owners and drill down into quality issues themselves.
Toraskar noted that implementing this at scale — with efficient catalogues, intuitive
interfaces, and collaboration features — reduces both data quality problems and silo
effects across the organisation.

---

How does Numerix approach time series data completeness for clients
running risk models that require consistent historical data inputs?

Gaps in historical time series data undermine the reliability of risk models that
depend on complete historical inputs. According to Sarthak Shreya, Product Manager
at Numerix, the firm applies statistical approaches including EWMA and GARCH to
fill historical data gaps and produce consistent, complete time series data — which
is precisely what Numerix clients require for their risk model inputs. Shreya noted
that while machine learning approaches can be easier to apply, simpler and
well-understood statistical methods remain highly relevant and are preferred by
clients who need model transparency and explainability alongside accuracy.

---

How do banks integrate clean data governance frameworks with digital
transformation strategies without treating them as separate workstreams?

Treating data quality as a standalone technology initiative separate from
digital strategy creates misaligned investments and governance gaps. According to
Marina Antoniou of the ICAEW Financial Services Faculty, decisions around data
quality and technology should be considered as part of an overall digital strategy
and framework — not looked at separately. Leadership buy-in is essential and
governance forums must be active enough to monitor breaches and escalate
problems. Antoniou noted that when data quality failures surface, they should be
treated as strategic issues requiring cross-functional response, not technology
incidents assigned to IT.

---

How does Numerix's data platform connect to a bank's risk analytics
infrastructure to ensure consistent market data and model inputs across systems?

Inconsistent data across systems — for example, equity swap or discount
factor curve discrepancies — is one of the most common sources of model output
error in capital markets risk functions. Sarthak Shreya, Product Manager at Numerix,
recommended a phased approach: define the specific consistency problem first,
build a structured roadmap, and progress incrementally rather than expecting
enterprise-wide clean data from the outset. Numerix's platform supports EWMA
and GARCH-based time series completion, data lineage tracking for XVA and
regulatory use cases, and the governance framework components — including manual
steps, incident management procedures, and fallback mechanisms — required for
production-grade risk model data integrity.

Subscribe

Want More from Numerix?

Subscribe to our mailing list to stay current on what we're doing and thinking at Numerix