The new pursuit for a Mint alternative: powered by AI
Solving the “million users, million feature requests” problem with LLMs + Transaction Enrichment
With the recent improvements in language models we have been experimenting with new ideas that they enable. In this post we will showcase Cookie, a financial copilot leveraging Large Language Models (LLMs) and our transaction enrichment capabilities to provide categories and merchant information the LLMs can make use of.
We will outline the possibilities as well as dig deeper into the issues that still exist with current LLMs and what the future can look like.
Current state of banking apps and personal finance
Below are two examples of typical budgeting apps and Mint alternatives
All personal finance apps have standard features such as transferring money and displaying recent transactions broken down into spending categories and saving opportunities. Sometimes they’re bucketed into monthly trends of each category over time. All of the above are useful features, but these features are kept very generic to appeal to the broader user base. Each feature has to be researched and implemented by engineers, which takes time and resources.
Why personalisation matters
Personalization is the holy grail of finance. Here are some practical examples.
For instance, the “Eating out” category exists in most personal finance apps and is standard, but what if a user wants to dive deeper to find out how much they spent on eating fast food because they’re health-conscious? The feature in existing personal finance software has to be tailored to the average user, which does not perfectly satisfy the needs of everyone. And if the feature doesn’t make the cut, this user has to manually go through their transactions and count each fast food transaction manually. You end up in a world where users have to take on all the manual work to get a custom solution that fits their needs. Back to Mint again in order to get the experience you need.
In fact the top reason people are looking for Mint alternatives is not to have to do all the work. Most of the time you still have to settle with a very generic financial advice on how to manage money and where to allocate your personal capital.
For another example, let’s take budgeting and expense tracking. A lot of popular apps offer tools to identify recurring cash flow and subscriptions as well as help canceling them. However, doing this accurately enough is tough and no app will has enough context on you to find real new ways to save.
One of the key points fintech writer Alex Johnson made in his review of PFMs was to "be wary of the infrastructure you build your product on" and we couldn't agree more. Open banking and data aggregation alone is not enough because of the data that comes out is often very messy and difficult to understand.
Like in lending, in personal finance, context is king. The more recent iterations on Mint like Monarch Money are focused to solve this. Regardless of your net worth, the keyword in personal finance software is the word personal.
The perfect solution for your personal capital and money management can exist
In a perfect world, one would want a financial copilot for expense tracking , to create budgets and to save money, as well as for retirement planning and acting as a wealth management service. This however should be one that has full understanding of who you are and what you need, a place where we can ask questions in natural language about all our needs without being limited to a feature set that has been crafted to suit the average user. Given this we wanted to experiment how close we can get using Ntropy API + LLMs.
Cookie, the open-source skeleton to create Mint alternatives and your ideal budgeting app
At Ntropy we are building infrastructure to unlock the next generation of AI powered financial software. Our transaction enrichment solution already allows anyone to categorize their transactions, find merchants, locations of transactions and more.
You can get started within minutes here
To test the full potential of our offering, we made a version of a mobile budgeting app and called it Cookie to show what is possible today and how AI can help reach your financial goals.
The UI choice was in favor of the quickest possible option. We do not claim it is the best one.
Cookie is a Discord based budgeting app and bot. Its main purpose is to solve the issue of “a million users with a million feature requests”. This is the problem we mentioned above that most budgeting apps are facing.
Instead of hand coding generic features for all the users like the other apps, we rely on enriched transactions from Ntropy API and LLMs to provide personal answers and recommendations.
The next generation of budgeting apps and wealth management services
The first step to use Cookie for a new user is to link bank accounts. We support US and European bank connections for this specific application via Plaid for the former and Nordigen for the latter.
If they do not want to connect their bank account for privacy or other reasons, users can also submit transactions directly via a CSV file they have downloaded from their digital banking app.
Linking your bank accounts
Once a consumer has linked their bank account to the budgeting app , they can either write free-form messages to ask questions about their finances, or they can choose from a list of pre-made commands that are crowdsourced from the most popular questions asked. Cookie then takes the user’s question as well as their transactions that have already been pre-enriched by Ntropy in order to respond. It is both possible for the consumer to directly chat with the personal finance app to ask further questions or to ask Cookie to correct itself when it gets something wrong.
Now we can look into testing the actual queries. Let’s try to answer our previous question of how much we spent on fast food last month which as mentioned is something that existing banking apps struggle to answer. For this we use a free-form prompt where we directly ask the question.
Cookie tells us which fast-food restaurants we spent at, how much we spent and the total amount in an easy to understand format. You can also see the locations of restaurants.
Without Cookie, to do this in existing apps you would have to crack open a new spreadsheet to manually sift through all of your transactions, check the merchant names and/or categories and find the ones that are in the fast-food group, then sum the amounts.
At a first glance what Cookie returns looks good, however when adding up the actual amount spent we should get $172.11 but as you can see, Cookie returned $172.01, so is off by $0.10. This is a common problem with current LLMs. Outputs can be hallucinated and are difficult to verify. In this case, giving the LLM access to a calculator tool could help so it doesn’t have to do its own calculations. More on this later.
Ready made Commands
Now let us try some of the premade commands.
Here we used /essentials to get a breakdown of our transactions split by our essential and non-essential category. Most financial institutions like to know this about their users too as it is very indicative of their ability to pay and affordability.
Cookie, our new AI powered budgeting app, gives us a well formatted list of spend categories across all of our financial accounts that have been linked, thanks to running the transactions through Ntropy’s enrichment API, something we go into detail a little later. As you can see however with the above response from Cookie, the message is cut off because we ran into the maximum length limit that was set when calling GPT-4, another limitation which will be addressed later. Compared to other Mint alternatives, Cookie is still very raw and has a lot of issues that need to be addressed.
In the example here Cookie counted Pets and Education as non-essential, which we thought should be essential, so we asked it to correct itself in plain English. No matter how much into saving money and improving your spending habits you are , your pets need to be fed. There are other ways to cut spending in order to get to financial freedom. This functionality unlike Mint allows a lot of flexibility and cuts the time and effort to manually map and fix every single thing. Best Mint alternatives like Monarch Money for instance allow this too.
As expected after the correction it put those categories into essential while keeping the rest the same. Now we want to know the sum of the essential amounts.
Cookie gives us the answer we wanted and this time the amounts even add up to the correct number. Notably, it also did not include the credit transaction (a pay check) in this calculation. Pay checks or salary transactions are incoming, not outgoing and rightfully should not be included in the essential/non-essential spend breakdown.
Another command that we wanted to add is /saving_opportunities which shows you different ways in which one can save money and control monthly expenses . You can also add bespoke commands such as investment tracking or transactions grouping by merchants or amounts, in order to get an entire financial picture . No more google sheets, struggling to import data or trying to manually enter transactions. Being able to automatically sync transactions is the big relief most of the current mint alternatives offer.
All of the advice on savings seems reasonable and you get useful tips and hopefully more money as a result of using it , such as looking out for bank fees that were incurred that the account holder might have not seen.
Given the right amount of context, Cookie can go further to help you do better financially and to empower personal wealth . It can identify that you have a credit card payment that failed last month and if you want to make sure it does not fail going forward you need to cancel your trip you have pre-booked on Booking.com.
Preemptive action is the holy grail for personal finance software and it is definitely on the roadmap with current technology.
We also tried some more fun commands that have been made popular by existing PFM apps. The app known for this the most is Cleo. For instance we created a /roast command that makes Cookie tease you based on your spend. A copy cat feature with infinite possibilities.
Companies like Monarch and Parthean are already experimenting with AI-powered budgeting apps and finance assistants to provide a more personalized experience for consumers. They will not be the only ones. But what does it take to do it?
Behind the scenes
Here is a breakdown of what is happening behind the scenes, a lot of which is obfuscated from the end user. Our open source personal finance app is made to help you break this down.
Once a user has linked their bank account and the is consented to be shared, the user’s query as well as their transactions are fed into a Large Language Model to generate a response. In our experiments we used OpenAI’s GPT-3.5-Turbo and GPT-4 models. The former is faster and cheaper, but the latter generated more interesting and accurate responses so we decided to use GPT-4. Would be interesting to try Claude 2 or an open-source model.
- For this experiment we used Plaid for linking bank accounts in the US and Nordigen for European accounts. As mentioned before, users can also upload the transactions directly via a CSV, which allows for more flexibility. For investment tracking and estimating net worth, you can also use Plaid's Assets and Investments products. Here we kept things as minimal as possible.
- Secondly we used our Merchant Enrichment and Categorization products (check out our API docs here) to enrich the users transactions, which gives us reliable and information rich fields that are better suited to run queries on. Without this steps, the responses were worse and getting to a similar state of reliability was expensive. We have covered this in our benchmarking post.
- The fields Ntropy API returns are the following:
- Merchant name / website: who the merchant of the transaction was if there was one present
- Category: the category of the transaction, eg. Food and Drink or Bank fee (check out our consumer category hierarchy here and our business category hierarchy here)
- Location: where the transaction was made, for example using store ids within the transaction description
- We take the enriched output and the user’s query and create a prompt as a new input for GPT-4. We used the following fields for this:
- Amount and currency
- Merchant name and website
Again, here you can experiment even more and get better answers. The better the financial data and the input, the better the responses
GPT-4 limitations and specifics
With ordinary GPT-4 the maximum context length is around 6,000 words. This means we can’t just feed hundreds or thousands of transactions to the copilot by giving it every single field for every single transaction. We need to compress the transaction information in some way. The OpenAI API charges per token which is another motivation to keep prompts short. One input token costs $0.03 per 1000 and output tokens cost $0.06 per 1000. A long prompt with 8000 input tokens and 100 output tokens will thus cost around $0.25. Finally, a longer prompt means the latency will be larger as the LLM has to process more data. This adds a lot of friction to the end user experience.
One trick we used here when making the prompt is to group the transactions by the merchant. When we have dozens of transactions with the same merchant, we only need to spell out the merchant name and website once instead of many times without losing any information. The raw transaction description is also only given for the transactions where we don’t have a merchant.
Locations for the same merchant are often the same, so we also used one location instead of many. This is not always correct, but is one sacrifice we made in order to save on prompt size. Depending on your own needs and dials, you can adjust this.
The fields we used for every transaction are date, amount, currency code and the category. The encoded transactions then look something like this:
Amazon amazon.com2023–03–01 -14.99 USD Media2023–03–02 -43.21 USD eCommerce purchase2023–04–01 -14.99 USD Media2023–04–14 -120.01 USD eCommerce purchase2023–05–01 -14.99 USD Media
Transfer from savings2023–04–13 1500 USD Intra account transfer
Chipotle chipotle.com 620 9th Ave, New York, US2023–05–03 21.43 USD Food and Drink2023–05–07 13.98 USD Food and Drink
We use this together with a short system message explaining its behaviour, so GPT-4 knows how it should respond to the user’s question. With this format we could safely fit around 300 transactions in our prompt, which is around 2–4 months of the most recent transactions for a typical user.
Despite seeing interesting results, we encountered a number of blockers too while building the copilot. These are limitations that are present for anyone working on a similar product, hence we would like to share and address them.
Data privacy and security
Financial transactions are highly sensitive and private information. Leveraging a commercial LLM out of the box, the prompts are sent to OpenAI who will store them in the US. There are companies and end users who are fine with this as they are getting a completely free financial dashboard and the cutting edge of personal finance tools, but many are not.
This is especially a big issue in Europe where data is not allowed to leave the EU. Moreover not all the users would trust a commercial LLM with their transaction data. To track spending and access your bank account information in a more human readable way, this may seem too big of a sacrifice for some. In return you are getting a glimpse into a better financial future and how to enable it.
To address the privacy questions, Ntropy leverages its vast transactions cache made specifically for financial services. The tradeoffs between usability and privacy will always be there, yet we believe in giving the end users the dials to decide where, how and what they value most.
The latency for a full prompt is between 30 and 60 seconds. This is too long when a user wants real-time responses. We can try to reduce prompt size by reducing the number of transactions or by compressing the prompt further, but this isn’t always easy and risks losing valuable information. Given any existing budgeting tool is faster, this can be a serious adoption barrier.
A simple way to reduce latency is to only allow pre-specified prompts instead of free-form prompts. That way we could run all the commands when we get the bank transactions and cache them, so when a user runs one of the commands the response can be returned immediately. This is of course very limiting, and also potentially expensive, if there are too many commands. To differentiate from existing financial tools you want to allow the free form conversation.
Latency can also be improved by self-hosting an open-source LLM and fine-tuning it to this specific task set. We will be releasing our benchmarks here very soon. For now GPT-4 performance with the right prompting and using Ntropy’s Enrichment API is still superior to all out of box open source alternatives we have experimented with.
Another way to reduce perceived latency is to display the answer token-by-token using OpenAI’s streaming api. This is a common design choice that works and addresses the issue, at least partially.
Model flaws and hallucinations. Is this where financial AI goes to die?
Often the model will refer to transactions that do not exist or it will miss existing transactions in its answer. When asking for new restaurants nearby, it will sometimes give restaurants with locations where none exist, either because its knowledge is outdated and the restaurant closed, or because it just made it up. Imagine you have to sync bank accounts, your savings account and investment accounts and are hoping for the smartest ever budgeting tool and copilot and instead you get made up suggestions. This is a user experience and reputation nightmare for the company behind the app. Many users will not hesitate to go to Reddit and other places on the internet to complain.
When doing math such as summing amounts in the example above, it will often get the amounts wrong. This could be improved by letting the model use a calculator instead of doing the math itself, eg. by evaluating expressions generated by the model, or by using a command-based approach like LangChain and Transformers Agents. These enable LLMs to use calculators and interact with other external tools by themselves.
Ideas for improvement
There are many avenues to improve from where we are today in order to create a truly transformational and delightful financial experience for end users. These do not only rely on model performance enhancements but are also closely related to design and product choices. Hence we are bullish on the best of product teams to get ahead of the game here.
Beyond queries and questions: enabling financial actions
Currently one can only ask for information related to their transaction data. Designing a 10x customer experience, you could take this up a notch and allow users to get recommendations for merchants and make bookings and purchases directly from the app. This means not just advice on how to spend money and budgeting features but real actions based on your personal finance situation.
Moreover, moving beyond the budgeting tools use case, a command line for your personal finance can be the perfect interface to sort what and when you should buy the next thing and at what price. This is way more powerful than getting boxed into your interests on social media or merely following your intent on search engines.
Addressing the context length issue
As mentioned before, the context of GPT-4 is limited to around 6000 words. There is another version of GPT-4 which supports 4 times as many words, and Anthropic supports 100,000 tokens. Using these would allow us to fit more transactions into our prompt.
We could also add more information to our prompt. With the APIs we used we could also get the balance history of the account which would help with things like budgeting and retirement estimates.
Location data is paramount both for hyper personalizing recommendations, but also for alerting on potentially fraudulent activity.
Here is the summary of key conclusions we came to while running this experiment
- Augmenting raw data and smart prompting were critical for accuracy
- Our enrichment capabilities proved to be critical to augmented the raw bank data so that the LLM could give more accurate answers.
- We have been talking about personal finance for a while now, but the reality is most out of the box tools are not that personal. Transaction data, Ntropy’s enrichment products and LLMs are a powerful combination to create the next generation of truly contextual and personal finance apps, savings and investment tools .
- Moving beyond questions into agents capable of performing actions on your behalf empowered by your financial incentives and information is the next logical step. Early iterations on this look promising. This is the world where money moves into the background and is a supporting act powering one’s goals and dreams throughout their life.
- Depending on your or your user's preferences, products should account for multiple dials optimizing for privacy, reliability or cost. We are currently working in this direction at Ntropy.
Want access to try Cookie?
Cookie is currently an experiment. There are commercial efforts to build AI based financial copilots. Monarch's assistant is the one we love.
Do reach out if you have any questions. Take a look at our open source repo for ideas and inspiration or a skeleton to build on.