The Data Analytics Blog

Our news and views relating to Data Analytics, Big Data, Machine Learning, and the world of Credit.

All Posts

Finding Value In Transaction Data: Part 2

December 9, 2015 at 9:50 AM

In the first part of our series on "Finding Value in Transaction Data" we explored a problem that is encountered by many organisations – how to identify and extract value out of the ever growing amount of data. 

I proposed a 3 step approach (the 3D approach) to realising value from a variety of large data sets:

  1. Determination –scour data sources to establish if and where there might be value
  2. Development – create models that will be developed for the decision areas where value was identified with data that is predictive of this predetermined outcome
  3. Deployment – implement and run the developed market

In this post, we will explore the 1st of the 3D approach, “Determination”.

How to eat an elephant..

A frequent question that we hear clients asking is, “I have loads of data; I know there must be value in the data, but I don’t know where to start!”

A mistake that is often made is that managers assume that the first steps in tackling this problem should result in a tangible outcome or product.

The problem is actually a lot bigger than this. What is more prudent is that an exploratory exercise is initiated to determine what value there is, for what purpose and in what area. This is the “determination phase”.

The determination phase involves the identification of valuable data within the data set(s).

The determination phase can be divided into the following steps:


1. Incorporating the data

Today credit granters, customer managers and marketers have access to a plethora of data sources both internal and external. The first step in the “Determination” phase is deciding which data source to explore while being aware that the data should be in a usable state once a solution is deployed. Types of data that might be available include:

  • credit bureau,
  • customer demographic,
  • internal behavioural,
  • transactional (e.g. retail purchases, mobile telemetry),
  • geo-data,
  • store data,
  • social-media data, to name a few.

This data should be selected and sourced and then linked to the other sources of data typically by customer ID, store codes, and customer numbers. In this way the data can be linked across data sources which will be critical in determining where the data might add value.

Data are also comprised of structured and unstructured data. Structured data are typically fixed fields that can be easily grouped, analysed and modelled upon. Unstructured data are the rest, e.g. free-text, such as Twitter tweets. A different sort of analytics is required to assess free-text.

In this step of the process, data-cleansing is also essential. This involves the identification of valid (clean) data, the adjusting of data to make it usable and the understanding of the data universe to be analysed.

Once data is linked within a relational database or in a single file format and it has been cleaned, aggregation can take place.

2. Aggregation

Aggregation is essential for analysing transactional or behavioural type data where trends need to be measured. Raw data typically lists single events which may have a degree of value. However, there is more value in single events when they are included in a group of events and measured in relationship to other events or over a period of time.

Aggregation is what credit bureaux have been doing for years, but similar work should be done with aggregated data.

Example of aggregated fields across various industries:

  1. Number of SMS’s sent in the last month
  2. Cleaning products purchased as a percentage of all purchases this month
  3. Highest till-slip value in the last six months
  4. Average monthly spend in the last three months
  5. Minimum value of products viewed online

Within the different transactional sets – such as credit card, fashion purchases, mobile data, and e-commerce data - a variety of aggregation is possible. The key is to follow a methodical approach through event classification. For example, in transactional fashion retail:

  1. Categories (high/med/low fashion, men/women/children, clothing/apparel/other, premium/average/sale pricing, till-slip value, number of shopping events, store name)
  2. Time periods (1d, 7d, 1m, 3m, 6m, 12m)
  3. Metrics (number, average, maximum, minimum, worst, percentage)
Aggregation will often use a combination of two or three of the above categories.

Aggregation can be coded in a programme like R, SAS, MSSQL. Ultimately an aggregation process will be required when a solution goes live.

3. Identifying the target areas of value

Another important step in Determination is to identify target areas of value. A long list of outcome target areas can be identified with very little additional analytical effort. Areas of interest include:

  1.  Probability of attrition/churn (in the next 1m/2m/3m)
  2. Probability of missing a payment/rolling (in the next 1m)
  3. Probability of missing three payments (in the next 6m/12m)
  4. Probability of increasing spend (by 20%/50%, e.g.)
  5. Propensity to take up a cross-sell or up-sell offer (1m/2m)
  6. Propensity to increase wallet-share (i.e. spend as percentage of spend at competitors)
  7. Propensity to make an (insurance) claim (1m/2m/3m)
  8. Propensity to migrate to a high value segment/cluster (1m/2m/3m)

Once these areas of value have been identified and the time period set, you’ll be ready to aggregate your data.

Stay tuned for my next blog when we look at the final 2 steps in the Determination phase of finding value in your transaction data.

Note: Your aggregated/observational data may have to be a few months old to allow for significant time between observation and outcome.

New call-to-action

Image Credit: Freepik

Thomas Maydon
Thomas Maydon
Thomas Maydon is the Head of Credit Solutions at Principa. With over 13 years of experience in the Southern African, West African and Middle Eastern retail credit markets, Tom has primarily been involved in consulting, analytics, credit bureau and predictive modelling services. He has experience in all aspects of the credit life cycle (in multiple industries) including intelligent prospecting, originations, strategy simulation, affordability analysis, behavioural modelling, pricing analysis, collections processes, and provisions (including Basel II) and profitability calculations.

Latest Posts

Amazon Web Services Vs Microsoft Azure: Which Cloud Provider Should You Host Core Business Systems On?

Deciding on which cloud service to host your core business systems on can be a daunting task. Amazon Web Services (AWS) and Microsoft Azure are two of the biggest players around, while Google Cloud and IBM Cloud are also gaining market-share.

Truth Seeker: How To Avoid Logical Fallacies And Cognitive Biases In Data Science

We have released a new eBook titled Truth Seeker: a guide to avoiding logical fallacies and cognitive biases in data science.

Exploring The Evolution Of Data Science In Banking

We’re looking forward to attending this year’s Evolution of Data Science in Banking conference. The 2019 conference will be held on the 5th and 6th of June at the Indaba Hotel, Fourways, Johannesburg. The event will explore the use of data and analytical techniques to help financial services providers meet regulatory and reporting requirements. Also to be discussed is how running analytics at a product level can provide a more holistic view of customers across their portfolios.