By Dave Sissens and Phillip Straley
It’s ironic that analytics professionals often work in situations with limited data availability, poor data quality, or both. In many cases, large amounts of granular data is available but legal or security concerns make it difficult to access in a timely or secure manner. The current COVID-19 situation, with much of the world working from home – often without the usual data access they would have from their offices – has accentuated this situation.
FNA has operated for years with these limitations given the nature and sensitivity of data we deal with. For instance, we work heavily with payment system infrastructures around the world in designing, testing and monitoring their systems. Obviously, the “monitoring” component can only be done on historical production data to be meaningful. However, there are often legal or practical impediments to our teams gaining timely access to large amounts of production data for research, design and testing. Even monitoring solutions can be tested and developed with synthetic data, and often much more quickly.
And – for very obvious reasons, thorough testing is required before deploying changes to mission-critical infrastructures. This is vital not just to ensure that the system is production-ready and meets security standards, but also to test and iterate the design of the system itself. Some of the questions that need to be answered:
- How does the payment system design cope under stress?
- What is the most liquidity-efficient configuration for the system to function?
- What happens when new participants join the system?
- What happens when a participant fails?
All of these questions require the design to be thoroughly tested before even a line of code is written. You will often hear us refer to FNA deploying “simulators” or “replicas” of infrastructures for design and testing. Other modeling approaches can also be used for such validation. However, all models and simulators require a precious commodity… data. So, what to do when access to significant transactional production datasets is just not possible?
FNA has over the last decade developed an industry-vetted methodology to overcome this challenge and to provide credible and usable but “synthetic” (or “representative”) data. For instance, this enables us to generate a set of payments which have a similar look and feel as the historical data set. Not only do the data sets look similar to the historical ones, but once processed through a model or a simulator, they produce economically similar results as the historical data. They can also be used in training AI/Machine Learning models, something that is often difficult to do using historical data.
We provide these synthetic data sets to research facilities, Central Banks and Market Infrastructures around the world where gaining access to the actual payment system data just isn’t possible. We like to think of our Synthetic data as “possible future” data given its likeness to the historical data, and the ability to build in variability and forward-looking assumptions. And we can create unlimited days of the possible future, which helps us understand potential situations that may have not yet manifested historically.
Synthetic data is as “real” as historical data for many uses, from understanding economic dynamics to training machine learning models and test running of different algorithms. It’s also great for fast-tracking the development of new ways of looking at data in dashboards as the dashboards can be shared among a larger community of people. Clearly we don’t get the same insights as from historical or real-time data. However, once the analytics and dashboards are developed with synthetic data, they can be quickly deployed in production – saving lots of time.
The synthetic data we’re working on isn’t just limited to large payment infrastructures either. We’re doing similar work with central banks for granular loan-level credit data, and with banks for optimization of liquidity in their payment systems, amongst other data sets.
Please get in touch if you think FNA can help you with your data needs.