04 Jul 2023

The Power of Transaction Categorization: A Practical Approach

Placeholder image


Ntropy Team


The term “transaction categorization” is widely used in financial services, the broader software industry and is well-known in fintech circles.

Here we will unpack what transaction categorization means and why it is important, share what is available today and what we are making possible going forward.

Transaction data is an uncut gem

Transactional data is generated every time money is transferred between two entities, one initiating the transaction order and the other receiving it.

A financial transaction consists of a date, an amount, and a description or a memo, often but not always the explicit identity of the sending financial institution. Depending on the parties involved, the pieces contained in this string may vary.  

Although financial transactions are standardized by ISO 8583 and other formats, there are approximately 28,000 active ABA RTNs (i.e. distinct financial institutions) currently in the US alone and each may have its own way of formatting the transaction string. 

Banks are not the only parties that may be involved in a transaction between two entities. Different payment processors could be involved making the transaction description string even less legible. A PayPal or a transaction often looks like the processor is the merchant and is very hard to decrypt as a result.‍

Here are a few examples of messy transaction descriptions:

Messy transaction examples

These inconsistencies and the lack of a common standard across banks and payment service providers make it almost impossible to interpret transactions programmatically. If you also factor in cross border transactions which would include financial institutions from other countries with additional different formats and languages, you begin to realise the scale of the problem.

Even for humans, transactions are non-trivial to understand. Obtaining data to train machine-learning models to parse transactions is even harder. When we trained our first models, we found that we needed crowd-sourced human labels per transaction for the model to achieve reasonable accuracy.  Using some of the best off the shelf solutions, such as Scale or Amazon Sagemaker, you end up paying the high cost for it and still not getting the desired output.

Context Matters for Transaction Categorization

There is effectively no ground truth about a transaction. They mean different things for the sender and the receiver of the payment.

For instance, a bill payment for a corporate dinner party is sales for the restaurant , entertainment for the consumer, and a corporate employee expense for the business organizing the party.  Similarly, an AWS transaction can be tracked as cloud infrastructure for one and sales cost for another one, depending on the internal classification logic they are pursuing.

Despite a variety of solutions being built to optimize moving money from point A to point B, the movement of information is still a TODO despite the abundance of counterfactual effects and real business needs. This is a very apparent, yet a very hard technical problem to solve.

The reason this problem exists in the first place is antiquated technology and lack of incentives for the five parties to a transaction (the issuing bank, the acquiring bank, the card network, the merchant and the cardholder)  to effectively transport and reveal information.‍

Different Approaches for Transaction Enrichment

Given the above, transaction categorization has become an area with lots of activity both by internal data and engineering teams of certain companies, as well as external vendors competing to own the intelligence ecosystem on top of money movement. These include the likes of for consumer transaction insights, Heron data for general categorization for lending, Spade for merchant name cleansing, as well as in-house solutions of the incumbent aggregators like Flinks, Plaid, Finicity, Yodlee, MX.

In-house workarounds are combinations of rules, lookup tables, internally labelled datasets and rudimentary SLMs and LLMs models. You can find a guide on the different approaches to transaction enrichment here. It’s a must read if you are considering building an in-house solution or partnering with a provider like Ntropy.

All of these vendors and workarounds have different approaches with advantages and disadvantages.

The most optimal solution has to meet the following criteria:

Scalability, While Keeping Its Flexibility

More than 10k new businesses are started every day in the US alone. Each of these businesses has a specific cohort of customers and hence types of transactions. It is essential that new transaction labels and merchants can be added on-the-fly without any system downtime or engineering overhead.


As it is becoming increasingly easy for fintechs to serve customers globally, the API that parses transaction data has to handle new transaction patterns that it has never seen before without any drop in reliability. It should be able to understand and distinguish between the transactions of consumers, freelancers and small and large businesses. It should also be able to interpret transactions in multiple languages and currency codes. We are seeing a trend of  increasingly global, multi-currency, multi-lingual, multi-account-holder-type transactions, even within single batches, across many of our customers.

Whether it is a Mastercard payment, a post-processed string from a Plaid or a Finicity or one you are getting from an issuer like Marqeta, your categorization engine needs to be able to parse and infer information equally well and be accessible within all fintech playgrounds.


For the companies building products and services driven by information extracted from transaction data, accuracy is critical. It is the difference between a seamless and powerful user experience and a product that doesn’t work.

For many use cases,the cost of mistakes is high. For lenders accuracy is paramount as making a bad underwriting decision is hugely costly and can result in a total loss of funds.  It is easy to give money away, the hardest thing is getting it back.

For all transactions, the cost of mistakes increases exponentially with the amount in the transaction. For example, just a single wrongly categorized equity investment in a startup can result in a badly priced loan for a bank or significant errors in VAT returns and tax credits.

Whether you are building a personal finance manager or a savings tool, getting the basics wrong will result in poor analytics and is a handicap for fintech developers.

The counterfactual power of this information is massive too: the potential of things you could build if you got it right.

Build vs Buy: The Need for an API

Transaction categorization is a clear need for anyone who is shipping financial products or services, whether those are banks, standalone fintechs or embedded finance use cases, such as Uber offering cards and salary advances to its drivers, ServiceTitan offering payroll and cards to their CRM customers and more.

If an engineer inside a company has touched or seen payments, they know how bad the data is and have worked on solving this problem.

Here are the two core issues with in-house categorization engines:

Cold Start

To solve the categorization problem efficiently, some fintech teams spend multiple years and $10M+. Along with the time and monetary cost, the variance of the expected outcome is high. As is especially true for machine-learning, an approach that seems to work well, turns out to plateau fast and becomes a nightmare to maintain and improve past a fixed threshold.

Diminishing Returns

More in-house data does not always mean better results . To keep accuracy from plateauing in the long tail, a model needs to learn from its diverse information.

In a world where software experiences are increasingly supported by API-s, the lego bricks for the modern economy, spending resources on transaction categorization in-house does not make sense.

Firstly, it will involve lots of manual labelling and re-training, which your engineers are most definitely going to hate.

Secondly, it will divert focus from core features.

Thirdly, building a standardized source of truth about transactions can only be done across the industry, training on a variety of diverse datasets and use cases. We have covered how we do this here.‍

Once you start getting clean and enriched data from your customers, the opportunities and use cases you can build on top are endless.

Here are some of the use cases that we love the most.

Business Underwriting

Use of transaction categories in business underwriting

Years ago if you were an entrepreneur about to start something or needing to grow what you have built, you would have to have great “friends and family” to access capital.

Later on you had to look and act the part to gain the trust of your bank manager.

Most recently you need to have years of history, a credit score and fill out a bunch of paperwork. With a bit of luck involved, you will get alright terms and the capital you need.

With great transaction categorization, luck is overrated. Cash flows are read and interpreted in near real-time by machines providing an optimal view on the current and the future of a business. This means cutting the time to access capital from weeks to seconds.

Fraud Detection

With payments increasingly becoming more digital and payment systems going real-time, the vectors of fraud are shifting and new approaches must be taken. The ability to perform anomaly detection for new payment behaviours is a huge opportunity but is reliant on understanding historical patterns.

There is a big opportunity to use account holder history for anomaly detection when making account-to-account payments.  Once enriched, a business or consumer's transaction history is like a unique fingerprint of their history and behaviours and can be used as the basis for authorising or declining a payment if it appears to fit normal behaviour patterns.

Corporate Spend

In the current economic climate, CFOs are the new CEOs as they control the purse strings. Business banking, corporate spend and treasury operations are the in vogue industries for investors because of this.  But how can you build a product to provide a Finance team with a holistic picture of the financial health of their organisations with the poor quality of their bank data?

Different iterations of Amazon appearing in transactions

Finance teams want to know where money is being spent but bank transactions are messy and are not standardized which can make it seem like there are multiple new vendors when they are all in fact variations of the same entity.‍T

ransparency of a company’s finances can only truly be gained with enriched financial data and should be the first step to providing CFOs and Finance teams the controls to effectively manage corporate spend.

Enabling Climate-Positive Purchases

Where and how you spend your money directly affects the planet. To be able to change, we need to keep track and get an understanding of our spend, as well as the merchants we spend with.

High resolution transaction data is going to play an important role in building a carbon negative future.

Embedded Finance and Vertical SaaS

Imagine if you are the operating system, aka the CRM, the payment processor, the employee management system, the customer comms layer for the majority of car repair shops in the US. You have saved them from the pen and paper operations and the pains associated with that. Your software environment is technically the home for their business. However, when they want to grow, make changes, hire more, invest in inventory or get started, they need to go to a bank. An entity that knows nothing about them or their business and who has to start from scratch to understand them. Banks spend time and money to assess and later charge that back to customers in onboarding costs, communication and interest rates.

Instead, being where these businesses live, you can allow them to connect their cash flows and financial information without leaving your premises. You get very happy customers and 3-5x LTV. They get time and resources to run their business instead of messing about with banks.

In order to make this happen, banking data needs to be machine-readable at scale and easily married to other meta-data about the business, for instance the CRM information, customer service and reviews. Enter, transaction categorization.

Hyper-Personal Rewards

As much as we hate the word hyper, getting things that you actually care for vs generic discounts and points is fun. To be able to do this, one needs to know what you spend your money, hence time on.

A well-tuned transaction categorization engine serves as the backbone for being able to build a rewards system for your customers.

Making Money Last

Business finance management and personal finance management tooling have so far been the core consumers of transaction histories and spend data. In order to allow your users, whether a business or an individual, to have a great overview of how to make their money last, you need a granular understanding of where it is coming from and where it is going.

Merchant recognition, understanding the recurrence of payments, as well as the individual descriptions of transactions are key to this.


Transaction categorization is a key part of the process for being able to fully utilise bank transactions.  Without having the additional context that categories provide, transaction data will not easily be understood by humans or machines which makes it a vital next step now that access to bank data is widely available.

At Ntropy we combine multiple approaches as part of our transaction categorization pipeline to provide the flexibility and accuracy that customers across use cases such as underwriting, payments, financial operations and digital banking requires. You can learn more about our approach and what makes us the most accurate transaction categorization solution on the market here.

Get Started with Ntropy

Ntropy self-serve dashboard

There are two ways that you can get started with Ntropy for transaction categorization.  You can sign up and get started in less than 5 minutes with our self-serve dashboard without needing to speak to us.  

Check our developer documentation with a quickstart guide for Python, Rest API and Postman! 

Alternatively you can book a call to talk to our team!

Join hundreds of companies taking control of their transactions

Ntropy is the most accurate financial data standardization and enrichment API. Any data source, any geography.