Co-Founder and CEO
The term “transaction categorization” is widely used in financial services and in the broader software industry and is well-known in fintech circles. Here we will unpack what transaction categorization means and why it is important, share what is available today and what we are making possible at Ntropy.
Transaction categorization is a clear need for anyone who is shipping financial products or services, whether those are banks, standalone fintechs or embedded finance use cases, such as Uber offering cards and salary advances to its drivers, ServiceTitan offering payroll and cards to their CRM customers and more.
A financial transaction can have a date, an amount, currency code, a description, mcc code, pending status and other fields. It can also have information about a third-party payment processor if one is involved (e.g. in p2p transfers).
A transaction means different things to the originator and receiver of the order. For instance, a bill payment for a corporate dinner party is sales for the restaurant and an employee expense for the business organizing the party. Similarly, a payment to AWS is cloud computing spend for a startup and revenue for Amazon.
Here are a few examples of the transaction description:
PAYONEER PAYONEE BUSBILLPAY TRAN#78
BP#9538547FLEET AVE CLEVELAND OHUSA
TST* SUBPAR MINIATURE SAN FRANCISCOCA USA
Although financial transactions are standardized by ISO 8583 and other formats, there are approximately 26,895 active ABA RTNs currently in use in the US alone. Each one may have its own way of formatting the information in a transaction. Every financial institution in the United States has at least one ABA and can be assigned up to five. The formatting from each ABA also evolves over time. Furthermore, the parties involved in the execution of a transaction order each hold a piece of the information and have no incentives to share it between each other. The usual suspects are the issuing bank, the acquiring bank, the card network, the merchant and the cardholder.
It is hard even for humans to interpret and corectly piece together information around a transaction. Parsing it programmatically is even harder. When we trained our first models to interpret transaction data, we found that we needed tens of crowd-sourced human labels per transaction for the model to distinguish any signal from the noise.
The most optimal solution has to meet the following criteria:
More than 10k new businesses are started every day in the US alone. Each of these businesses has a specific cohort of customers and hence types of transactions. It is essential that new transaction labels and merchants can be added on-the-fly without any system downtime or engineering overhead.
We are seeing a growing trend of global expansion across a large fraction of our customers. To make such transitions seamless, it is essential to support multiple currencies, languages, account-holder types and geographies with minimal re-tooling or workarounds.
For the companies building products and services driven by information extracted from transaction data, accuracy is critical. It is the difference between a seamless and powerful user experience and a product that doesn’t work.
Whether you are building a personal finance manager or a savings tool, getting the basics wrong will result in poor analytics and is a handicap for fintech developers.
The cost of an error increases with the amount in the transaction. For example, just a single wrongly categorized equity investment in a startup can result in a badly priced loan for a bank or significant errors in VAT returns and tax credits.
Build vs buy: the need for an API
Transaction categorization has received lots of interest from internal data and engineering teams inside fintechs, including Flinks, Plaid, Finicity, Yodlee, MX, Mastercard, Marqeta. Specialized vendors have also appeared over the last few years, competing to own the intelligence ecosystem on top of money movement. These include the likes of pave.dev for consumer transaction insights, Heron data for more general categorization, Spade.dev for merchant name cleansing.
In a world where software experiences are increasingly supported by API-s, spending resources on transaction categorization in-house rarely makes sense. For a few reasons:
- There is the cold start problem. To solve the categorization problem internally, some teams spend multiple years and 10M+ USD. Along with the time and monetary cost, the risk of an unsatisfactory outcome is high. As is true for any machine-learning pipeline, an approach that seems to work well, can plateau fast and becomes a nightmare to maintain and improve past a threshold.
- Diminishing returns. More in-house data does not always mean equally better results. To keep accuracy from plateauing in the long tail, a model needs to learn from diverse data across multiple sources.
- It will involve lots of manual labelling and re-training, which your engineers are most definitely going to hate.
- It will divert focus from core features.
At Ntropy we have combined recent advancements in machine learning and data processing, including algorithmic privacy, weak supervision, knowledge transfer and transformer architecture for language models to enable us to understand transaction data to a level of accuracy and generality that has never before been possible. We cover in more detail how we do this here.
There are a few ways to get integrated with our transaction categorization API and it usually takes less than 10 minutes.
You can start uploading individual transactions and get a feel for the product with a no-sign-in, no-code version here: try.ntropy.network.