Hard Problem, Simple Solution: Categorizing Data from Financial Transactions

Author

Ntropy Team

Team

Data from Financial Transactions is Everywhere

For neobanks, payment processors, fintechs, and really anyone who is shipping financial products or services, transactions are at the core. And behind each transaction is data. At its most fundamental level, financial data contains vital information that tells the story of who, what, where, when, and how much money has been moved between parties.

It also includes important metadata, such as the currency, contex merchant & third party processor information, MCC codes, etc. A single transaction, let alone thousands of records, is never easy to understand in its raw state.

Why does this matter? What someone spends their money on is the single most important piece of information about them and with digital payments on the rise, there will continue to be no shortage of critical financial information that needs analyses. In two years alone, cash based interactions have become increasingly less common as a result of Covid19. That trend is not expected to reverse. According to the 2022 Commerce and Payments Trends Survey from Global Payments, 38% of merchants reported expanding their digital payments options in 2021, with a significant increase to 53% of merchants who expect to expand their digital options over the course of this year.

Making Use of Financial Transaction Data

There are a myriad of use cases already underway - from identifying business fraud, to categorizing spend behavior to shipping beautiful consumer banking experiences. The applications range from obvious to magical, such as instant capitalization or automated wealth creation and management. But how do we actually get from Point A to Point B when the data we rely on is inherently hard to understand?

Before financial transaction data can be effectively utilized for product development, the merchant information must first be accurately named, and the transaction itself must be accurately labeled. Historically, understanding each transaction has been a difficult problem to solve at scale because the sheer number of players involved results in inevitable inconsistencies. When banks, payment processors and other fintechs receive financial transaction data, they are collecting data from a multitude of different sources, each with their own naming conventions and schemas, and with varying degrees of processing.

For example, what should be a simple explanation of an AWS purchase, often has unnecessary additional information:

With the application of machine learning models and the right inputs, the same transaction can and should be labeled like this:

How to get more out of your raw transaction data

Categorizing Financial Transactions at Scale

To effectively label and categorize transactions at scale requires human level reasoning that is impossible to keep up with rule-based systems.

Prior to advancements with AI and machine learning, categorizing transaction data relied on a look-up methodology. The information returned would not necessarily be wrong, but it’s ability to scale and adapt to changes in data was limited. And while this method has certain benefits, it lacks important advantages that ML brings to the table, namely intelligence through context, and accuracy through adaptability to variations. For example, a monthly payment to Netflix as a subscription based payment means something different than a recurring monthly payment to your student loan provider. Further, a personal transaction with Seamless means something different than a business transaction with Seamless where that business is subsidizing employee meals.

At Ntropy, we’re breaking some of these rules by building the best multi-lingual, multi-geo categorization API with natural language models. The Ntropy API delivers intelligent labeling of raw transaction data, and in turn, an understanding of what those transactions mean. We like to say we label transaction data like a human would - but better. What really excites us though, is empowering developers to build awesome financial products without spending years, or millions of dollars, to fix a problem they shouldn’t have to fix.

Facebook

Twitter

Copy link

The unreasonable effectiveness of combining datasets

24 Apr 2020

Technical

Dissolving data silos