Yes, for a company that provides transaction categorization, we know how controversial our title sounds. We’re not saying you can’t build a really good general transaction classifier (we can and we have!), but the truth is, no matter what you do, you can’t build a solution that will satisfy all people at all times.
“Fine,” you might say, “then we’ll just build a transaction classifier in-house.” The appeal of owning a pipeline end-to-end, with the ability to tweak and tune as you see fit is enticing. Unfortunately, in-house models are inadequate and brittle. Transaction categorization is inherently a long tail problem. If I were to hand you 1 million transactions, you might find that 500,000 of those belong to the same 300 companies, but the other 500,000 belong to 100,000 different companies. This means you have two options:
Ironically, to build the best specialized model, you also need to build the best general model. Transaction classification requires access to expansive databases and knowledge graphs (continuously updated as businesses constantly come and go), as well as exposure to a diverse array of transaction formats and syntaxes. For any practitioner, this is a daunting task, leaving most with no choice but to opt for external general transaction categorization models, no matter how imperfect.
At Ntropy, we refuse to accept the status quo, and we’ve found a better way to understand transactions by discarding the premise that offering general transaction categorization is the only option. We’ve built and released Ntropy Custom Models, which allows users to build customized categorization models, combining both the power and expansiveness of the Ntropy General Models, with the finesse and precision of specialized in-house models. We encourage you to read more about how it works, get the main talking points, and see some code.
This post isn’t about customization though.
It’s about convincing you that general transaction categorization is the wrong way of understanding your financial data.
It’s about understanding the story that transactions tell.
It’s about understanding what it means to categorize a transaction.
By the end of this post, we hope you’ll understand why transaction categorization is hard, what information exists in a group of transactions, and how we can build smarter insights 💡 from transaction data.
At a high level, there are really three things that degrade performance of a transaction classifier.
The typical user approach is to map the output of our general transaction classifier to their internal set of categories. This process is both inefficient and error prone.
Label mappings (mapping every category in our hierarchy to one in yours), can also fail when they depend on more than just the label. An important example is the “loans” category. If the user has both “loan payments” and “loan disbursements” as categories, then it is impossible to map from our label to theirs. To do so, they would need to add additional logic to check the transaction entry type, and assign disbursement for incoming and payment for outgoing transactions.
From a practical standpoint, label mappings carry infrastructure burdens as well. How should code adapt to changes in the Ntropy set of categories? In general, we never change categories once they are public, but we will add categories. In some cases this could break workflows, such as when the new category is not the child of an existing category in the user’s mapping.
They’re small records describing the exchange of money between two parties. A transaction can contain myriad types of data, but there are three principal fields that it absolutely must contain: description, amount and entry type. Here’s an example of a (fake) transaction:
Description: ACH PULL ORIG CO NAME:Wood LTD ORIG ID:400005498 EED:210510 INDN:Door LLC SEC:CCD PMT DET:MATER
Amount: $5000 USD
Entry type: Outgoing
which can be summarized as
“Door LLC sent $5000 to Wood LTD for building materials”.
Broadly, the goal of transaction categorization is to convert the above snippet into the below sentence. In doing so, there are three distinct steps that take place. In the first step, we need to extract all of the relevant information, including the sender and receiver of money, amount, entry type, persons names, dates, locations, and any natural language descriptions. In the second step, we have to use some kind of external database to figure out who and what the named entities we found are (e.g., we need to confirm with Google that Door LLC does in fact sell doors). Then, in the third and final step, we need to take all of that information to piece together what makes most sense logically and assign a category (in this case payment for building materials).
Why is transaction categorization tough?
Hopefully the above process was relatively straightforward. Unfortunately, nearly every single one of those steps can quickly become unsolvable. Here are three quick examples:
Fortunately, we can solve or get around most of these problems. (a) name corruptions have patterns that our models can learn (usually deletions that preserve the spoken sound like amazon -> amzn). (b) given an account holder industry, amount, location, and entry type, some results are much more likely. We can also filter based on the popularity of an organization. (c) given the amount and transaction history, we can nudge predictions towards materials instead of equipment. However, it’s still not totally obvious which to choose.
All of these solutions depend on one absolutely critical component that will differ for every single user: context. Even in solution (a), we are making the assumption that our transactions are in English, which can fail spectacularly if not true. Suppose we have a transaction “PIX PMT”. If we knew the transaction were in Portuguese, we could reasonably infer that PIX is the Brazilian payment processor, but if the transaction were in English, it might be more probable that pix is a shorthand for the word “pictures”. In solution (b), Spectrum is a major cable company in the U.S., but in India it’s a major clothing company. And in (c), we are making the (highly probable) assumption that a door company sending money to a wood company is for supplies to build doors. But this is still an assumption, and all categorization models need to decide what qualifies as a reasonable assumption. There’s always the possibility that this transaction could be for something else, like wood to make facilities improvements.
We still haven’t defined what it means to categorize a transaction. Let’s do that now. The problem is as follows:
Given a transaction consisting of a snippet of text as well as further pieces of metadata such as amount, entry type, date, account_holder, etc., label this transaction as one of several human-understandable categories.
We’ll handle these case by case.
More data equals better results. However, not all input fields are equally important. In order of most to least importance, the ranking looks something like:
Description => Entry Type => Amount => Account Holder Type (business, consumer,…) => ISO Currency Code and Country => Account Holder => Date.
Accuracy is proportional to how many of these fields you correctly supply, and, like a Jenga puzzle, as soon as you start removing items you risk collapsing the whole thing. We won’t discuss how to perform cleaning, deduplication, and error removal from input data; that deserves its own post. Here, we will focus on one sneaky field: account holder. Why is this field so sneaky? Well, it’s because it provides the much needed context, which we discussed in the previous sections. Without the account holder ID, we can only work at the single transaction level. This obviously restricts us from properly finding things that need an account history like recurrence or fraud.
Account holder is the most natural way to mark a particular set of transactions as different from another. It also suggests that the most accurate solutions should be customized at either the user or account holder level. Instead of supplying repetitive, but useful, metadata such as the account holder industry, it can be more effective and efficient to build custom classifiers per user/account holder.
Everybody wants one, and only one, category assigned to every transaction. It makes sense why; such a system is easy to understand, fits nicely in data pipelines, and allows one to unambiguously group transactions. The only problem is, it’s an impossible task.
In our first iteration of the Ntropy categorization API, this problem was our central concern. Our solution was to provide multi-label classification. Due to composability, this greatly increased the expressibility of our model. Instead of say C fixed classes, there now existed 2^C, possible classes, one for each combination of output labels (to put that in perspective, 100 classes would mean 1⁰³⁰ possible outputs.) In practice, we truncated to 4 labels per transaction though, so the number of classes was more like C⁴ which is still a whopping 1 million. To see this in action, consider a Netflix subscription. This would be composed of three labels: television, subscription, entertainment.
There was a glaring issue with that approach though. 1 million labels is too many. Ain’t nobody got time for that. The point of categorization is to make things simpler, not more complex.
In our second iteration of the Ntropy categorization API, we had no choice but to adopt the single-label classifier. Sure, no such classifier can ever reach 100% accuracy, but it doesn’t mean we can’t get close. There are two ways to hack 100% accuracy though, and they give us insight into the different tradeoffs involved.
It may seem surprising that such opposite limits can yield “100% accuracy”, but it tells us a lot about what we hope to achieve. A good set of categories should have just enough categories to be able to fit in a human’s memory (7 ± 2 seems to be the natural limit), while also containing meaning. Perhaps Operating Expenses is too general and Electric bills too specific, but Facilities Expenses might be just right. With enough thought and effort, it’s possible to build a reasonably self-consistent, broad, and usable set of categories. However, once doing so, eventually you’ll remember why the single-class method was iteration two of the Ntropy API
In a single-class categorization system, class overlaps are inevitable. Consider the two categories of wages and taxes. They seem rather distinct, until you come across a transaction like “GUSTO PAYRLL TAX 693557 5bmsdkgmmo MONEYMAN LLC.” (easter egg, grab some good headphones and lookup that company ;)). This is payroll tax. Does that count as wages or as taxes? Is wages supposed to include the overall cost of an employee (which would include their taxes as well), or simply the after tax compensation?
Thankfully, in this case, and also in the majority of other cases as well, this problem is solvable.
To fix the category overlap problem, all we need to do is tell our models how to make the choice! This is where customization shines. Machine learning is incredibly adept at finding patterns in data, and so long as enough such examples of classifying payroll tax are supplied, our algorithms can pick up on the pattern. The question of “enough” can be tough (and will be addressed more quantitatively later), but intuitively, it helps to think about how many samples of transactions a human would need to understand what’s going on.
When we try to explain this concept, we usually start with an anecdote. Over the many customers we’ve worked with, there is one category that we’ve seen consistently pop up: Special Transactions.
The principal difficulty with this transaction is that for you, for us, for anyone that didn’t make that category, we have no idea what it means. One person’s trash is another person’s treasure, and one person’s “special transaction” is another person’s “yeah who cares about that”. There is no hope that a general transaction model could ever classify this category correctly, and even if it could, it would necessarily come at the cost of classifying it incorrectly for someone else!
We can quickly see that customization is the only solution that detangles things. However, this section doesn’t stop here. That’s because even with model customization, it’s still difficult to categorize. That’s because it requires learning an unknown or indeterminate pattern.
To illustrate, suppose that I give you 300 plates of food, with 100 dishes of each of Japanese, Italian, and Moroccan cuisines. If I asked you to build a classifier that can taste 50 dishes from each cuisine and then predict the next 50, I imagine you would be able to build a pretty good model. The categories are well defined, and there should be patterns that distinguish one cuisine from another. Now let’s make it harder. Instead, let’s say I want you to additionally classify whether or not a dish is considered a “special dish”. Let’s consider three possible strategies for marking dishes as special, each of which will explain a different phenomenon:
A dish is special if I flip a coin 5 times, and it comes up heads 4 times.
Dorayaki, Lamb Tagine, and Cacio e Pepe are all considered special.
A dish is special if its name consists of two, and only two, words.
We’ve spent a lot of time trashing 🗑 general categorization models, and while it’s been fun doing so, we really wanted to do so to drive home why we’re so excited about our solution to all of the problems outlined above: model customization. Model customization is a realization something we’ve long strived for at Ntropy, the Data Network. Customization brings us one step closer, by allowing every user to create high performance individual classifiers that can be used to bump the performance of all other customized models in turn. For more information, we suggest you check out our blogs.
If all of this sounds cool, you can also drop us a line at firstname.lastname@example.org, and we’d be happy to get you started with Ntropy Custom Models. Thanks for reading 👋!