Financial services
& data science.

In today’s society every day enormous amounts of (big) data are generated and stored and these data can contain a lot of interesting information. One of the industries this applies to is finance. Each day millions of people and companies purchase goods in shops all around the world (both in person as online) and they arrange almost all their financial affairs online. To make this all possible a lot of technical improvements have been made in the past. These technical improvements within the financial industry are often referred to as Fintech. Fintech is a terminology that is used to describe new tech that seeks to improve and automate the use of financial services[1]. A staple within the Fintech domain is the usage of different data science techniques. In this article we will discuss different areas within the finance industry where the usage of data science techniques can shine.

Financial insights

As finance is such an important part of life, obtaining reliable and good insights in different financial situations can add a lot of value to both companies and people around the world.

When people are in financial problems the first thing to do is to get an overview of all incoming and outgoing payments. This results in the person knowing where his/her money goes and they can adjust their budgets appropriately. Doing this can be an annoying and time-consuming process and data science could be used to automatically generate these insights. Think about transaction classification, where every time you go to the supermarket (and you pay with your debit card) the banking application automatically classifies the transaction as ‘Supermarket’. By doing this for all transactions it is possible to automatically generate a report that shows where all money went during the past month. Combine that with a cashflow forecast by making use of a time series model and as an individual we get quite a good insight in our own financial situation. This does not only apply to individuals but also to businesses. These insights can be useful to financial institutions that offer such a product to their customers, whether they are businesses or just individuals, but also for individual companies that want develop their own real time financial insights.

Fraud detection

For financial institutions the detection of fraudulent activities is of great importance. Back in the days this was mainly done by following a rule-based approach. Whenever certain static criteria were met the suspicious activities are flagged or blocked. The rule-based approach is static which can greatly affect the performance of finding the fraudulent activities. To solve this issue different data science techniques can be used, especially the usage of anomaly detection techniques. The goal of anomaly detection is to find observations within the data that stand out compared to the majority of the data. In some cases simple techniques can be quite effective for this, especially when the dimensionality of the problem is low. In those cases it can be handy to apply some dimension reduction techniques, such as PCA or t-SNE. After the dimensions are reduced, to two dimensions for example, we can plot the results and see whether there are outliers that behave differently from the rest of the datapoints. These can then be investigated further on possible fraudulent activity.

Fig 1: A 2D plot with one clear outlier.

When the dimensionality of the data increases, the possibilities of these methods decrease and we have to use different, more advanced techniques such as deep learning.

Risk analysis

Companies and individuals often have to take actions that have some sort of risk assigned to the actions. Whether this is continuing with a project that is not yet profitable and that might cost a lot of money with no added benefits in the future or that this is walking through the red light to be just in time for an appointment instead of waiting for the green light. For this kind of actions we assess the risks and the possible rewards, in other words we are applying some sort of risk analysis. The end goal of these analyses is to maximize reward and minimize the risk. Within the financial industry there is a constant analysis of different risk questions, such as: the creditworthiness of a company, the probability of someone defaulting on their loan or the possibility of a customer churning within the next 3 months. To answer questions like this a lot of different factors have to be taken into account. This is where data science techniques such as supervised learning come into play. These techniques are able to identify underlying relationships between these factors and are then able to answer the question based on the historic data. For example, if we want to know the probability of whether a customer will stop doing business with the company within three months we need to analyse the data of that customer to make an estimation. Some factors to take into consideration could be:

  • How long has the customer been a client of the company?
  • Has the customer been in contact with the company in the past six months?
  • Has the customer used the provided service in the past six months?

There are most likely many more factors and the combination of those can contain very interesting insights. Some of these insights may be known already, but maybe new unexpected information is obtained or historic assumptions suddenly appear to be not as important as imagined.  Insights like this can lead to a model that can precisely predict whether a customer will indeed stop doing business with the company within the next three months. With the knowledge that a certain customer might churn the company can proactively try and retain the customer by, for example, providing the customer with a more personal approach, a discount or another measure to maintain a healthy client relationship.

Risk of customers churning is only one example of data science being used to assess possible risks. Other areas in which data science and artificial intelligence play a role are e.g. the application of more advanced algorithms to assess credit risk. Traditionally more classic linear algorithms like logit were used in credit models to assess credit risk but recently more advanced possibly non-linear algorithms are being used in risk assessment processes. Another way in which AI can support a company’s risk management process is to assess and model more qualitative risk indicators: by trawling various sources of information on an organisation (e.g. social media, news sites, etc.) and using the terms extracted from these sources with NLP, classification models can be trained improve risk categorisation of possible customers.

Customer satisfaction

Closely related to customers churning is their satisfaction with the company. As finance is such an important part of life, the customers, of especially financial institutions, should feel safe and satisfied with the product they get from the company. This requires to know what the customer wants and needs. To get this information it is possible to look at the different ways customers interact with the company in the form of online reviews, email contact, phone call transcriptions or other forms of unstructured narrative data which could contain complaints and compliments. By collecting this and applying different natural language processing techniques, such as sentiment analysis, it is possible to get a clearer insight in the things that need to be improved as well as what the customers are happy with. Acting accordingly to these insights will improve customer satisfaction and hopefully keep the customers more loyal to the company.

Another field where data science can have its added value in customer satisfaction is in personalised offers. Analysing customers’ characteristics and behaviour (e.g. transactions, credit use, product portfolio) can give an idea of possible other interesting products or services by comparing them to similar customers. The outcome of this can then be used to further tailor customers’ product and service offerings. . When done properly this will lead to a higher customer satisfaction and to more sales for the company itself. Products like this are obviously only possible when they follow the guidelines of the GDPR. This same concept can be performed to the website of the company by analysing the website traffic. Certain website visitors search for different things and by analysing their behaviour it is possible to advice certain specific products to different groups. Furthermore, analysing the website traffic data can give an overview of whether the customers can find what they are searching for. With this in mind it is possible to make proper website adjustments which can add to the satisfaction and experience of the customer.

Points of attention

As one can see there are many interesting ways in which data science can deliver value in the financial services sector. However, there are a number of important points to keep in mind when venturing into the world of data science in this specific industry: the financial crisis of 2008 and the inherent sensitive nature of (some of) the data that can be used have resulted in increased scrutiny from both consumers as well as regulatory bodies when it comes to using data to add value.

For example when using data for a more personal customer approach one has to take into account the creep factor: due to the sensitive nature of financial data customers may become uncomfortable when confronted with the notion that a financial institution knows so much about their situation. This may cause customers to be not as welcoming to personalisation efforts from a bank as they would from for example an online retailer. So when using personalisation techniques one had better be sure to be just right since possibly people will become annoyed or creeped out quicker when an offer is not entirely relevant. Another point to note is that scrutiny by regulatory bodies may limit the use of data science techniques. When using algorithms on data to for example come to a credit scoring or to a particular advice towards a client, the workings of this algorithm may need to be able to be explained when audited. This limits the use of black box models like certain types of neural networks or deep learning, since the inner workings of the model and the relative importance of the features that are fed into the model are difficult to explain.


With finance being a part of everyday life and the amount of data this generates each day the opportunities described in this article are just a small part of all the possibilities within the financial industry. This makes that opportunities to improve and automate financial services in the future seem to be endless, whether they are to improve the business from the inside or to provide a better service to the customers of the company.

If you are interested in the possibilities that data science can offer to you and the company you work for we would like to get in touch with you to see what the possibilities are and how we can achieve the data-driven ambitions of the company together.