The Data Analytics Blog

Our news and views relating to Data Analytics, Big Data, Machine Learning, and the world of Credit.

All Posts

Finding Value In Transaction Data (Part 3)

December 15, 2015 at 3:12 PM

In my previous post, we looked at the first “D” of the 3D approach of identifying and extracting value out of your transaction data: Determination.

If you recall, I proposed a 3 step approach (the 3D approach) to realising value from a variety of large data sets:

  1. Determination –scour data sources to establish if and where there might be value
  2. Development – create models that will be developed for the decision areas where value was identified with data that is predictive of this predetermined outcome
  3. Deployment – implement and run the developed market

In this post, we will continue looking at the 5 steps of the Determination phase:

  1. Incorporating the data
  2. Aggregating the data
  3. Identifying the target areas of value
  4. Scouring the data for value
  5. Reviewing and planning mini-projects

Let’s now look at steps 4 and 5.

4. Scouring the data for value

The process of data scouring or prospecting involves looking at the observation – such as the aggregated data - and assessing whether this data might predict future target areas of value. Essentially, is there a correlation between an historic piece/group of data and a future outcome?

To determine this, analysts will typically assess a regression relationship between the observation characteristic and the outcome (target) variable. Initially the analysis will be univariate (single variable) analysis, and later multi-variable analysis can take place.

Univariate analysis involves taking an observation characteristic, creating attribute groups and then calculating the information value. The equation to calculate this is displayed below:

 Univariate calculation for finding value in big data

The resultant information value calculated per field and given target variable will indicate whether a field has value or not. An indicative table of strength values is displayed below.

Range Strength
0 0.02 Non-predictive
0.02 0.1 Weak
0.1 0.3 Medium
0.3 0.5 Strong
0.5 100 Very Strong

An example of the calculation is given. Here an online retailer wants to understand whether customers who bought books in January 2015 - March 2015 were likely to buy music in April 2015 - June 2015. Therefore, the rows represent the observation data and the columns represent the outcome, or target, data.


Does 3 months of book purchases predict a music purchase 3m later?

Music purchases
(Apr 2015-Jun 2015)

Customers with no purchases

Customers with no music purchases

ALL Customers

Book purchases (Jan 2015-Mar 2015)

Customers with no book purchases




Customers with book purchases





ALL Customers




This is then converted to column percentages:

Does 3 months of book purchases predict a music purchase 3m later?

Music purchases
(Apr 2015-Jun 2015)

Customers with no purchases

Customers with music purchases


Book purchases (Jan 2015-Mar 2015)

No purchases




Book purchases




ALL Customers




The calculations are then run:



























Sum of D1 and D2




Therefore book purchases in the first quarter of 2015 had a medium correlation to music purchases in the second quarter of 2015.

This calculation is run for all observation variables on of the target variables. The resultant can be displayed in a data-scouring heat-map. An example is displayed here.


The rows comprise observation, or the aggregated, characteristics. The columns represent each outcome target value. The colours represent the degree of correlation (according to the mapping table - red being the strongest and blue being the weakest).

This exercise, if done well, will set the business up for a well-ordered series of projects to extract value from the data.

5. Measuring incremental lift

Once the univariate analysis is complete, it is worthwhile assessing how correlated each characteristic is with the others. Correlation analysis is often run to determine this. For example, it may be that “Number of purchases in the last 1m” is positively correlated to someone responding to a marketing offer. Similarly “Number of purchases in the last 3m” may also be positively correlated. The naïve conclusion may be that both characteristics should be used together to predict response to an offer. The reality is that these two characteristics are highly correlated.

Many statistical techniques and measures can be used to determine the correlation. Ultimately, the characteristics could be grouped into correlated items to summarise the number of low-correlated but predictive groups. This information is important when it comes to determining what model – for example, decision tree/segmentation, clusters, scorecard - should be considered in the next phase.

Reviewing and planning mini-projects

Once the heat maps are developed the following should be determined:

  1. For each target variable – was there significant amount of data with a strong prediction of the target?
  2. Could this data be used to create models, such as scores, segmentation or clustering? The variables identified should be tested for co-linearity/correlation to determine if the strong characteristics can add value.
  3. Can the data merging, aggregation and model deployment be successfully accomplished given current hardware and software constraints? If not, what needs to change and is the business ready for it?
  4. Which exercise is likely to produce the most value?
  5. Should the business run projects in parallel sprints or in a relay (one-at-a-time) fashion?


The Determination step gives the data analysts, risk/loyalty/marketing managers a very good insight into which fields might be valuable and for what purposes. The next step is the Development step. In the next blog we’ll be exploring this step.

In my next blog post, I’ll take you through the next phase of finding value in your transaction data: Development.

Subscribe to the Principa blog of Data Analytics Trends, Ideas and Insights!

Thomas Maydon
Thomas Maydon
Thomas Maydon is the Head of Credit Solutions at Principa. With over 13 years of experience in the Southern African, West African and Middle Eastern retail credit markets, Tom has primarily been involved in consulting, analytics, credit bureau and predictive modelling services. He has experience in all aspects of the credit life cycle (in multiple industries) including intelligent prospecting, originations, strategy simulation, affordability analysis, behavioural modelling, pricing analysis, collections processes, and provisions (including Basel II) and profitability calculations.

Latest Posts

[Slideshare] How To Make Your Business Data Work For You

Common barriers to success: Skills shortage: data scientists are in high demand and in low supply. Companies lack the skills to develop advanced data analytics or machine learning applications. Cost: recruiting and building up or training a team, as well as infrastructure costs are immense. Inefficiency and low ROI on: acquisition campaigns; re-activation and retention campaigns; outbound sales calls and debt collection. Resulting in: No or ineffective use of data. High cost to get insights from data. Low returns from campaigns. What’s the alternative? Machine Learning as a Service (MLaaS): removes infrastructure skills and requirements for machine learning, allowing you to begin benefiting from machine learning quickly with little investment. Subscription based pricing, allowing you to benefit using machine learning while minimising your set-up costs and seeing returns sooner. Answers as a Service: Use historic data and machine learning to allow answers to increase in accuracy with time. MLaaS with predictive models pre-developed to answers specific questions: Genius Call Connect: What is the best time and number to call customers? Genius Customer Growth: Which customers are most likely to respond to cross-sell? Genius Re-activation: Which dormant customers are worth re-activating? Genius Customer Retention: Which customers are most likely to churn? Genius Leads: Which contacts are likely to respond to my campaign? Genius Risk Classifier: Which debtors are most likely to pay or roll? Benefits of Genius: Quick and cost-effective ability to leverage machine learning: Minimal set-up time Minimal involvement from IT Subscription based service Looking to make your data work for your business? Read more on Genius to see how it can help your business succeed. 

5 Must-Join Facebook Pages For Data Science, Machine Learning And Artificial Intelligence In 2019

While LinkedIn has traditionally been thought of as the business or work focussed social platform, Facebook has been making headway into gaining market share in the space as well. With company pages and groups, Facebook is catering to every interest and aspiration that people might have – and combining that with their social interactions and news sources. Facebook aims to give users a one-stop-shop experience, and it’s very good at doing it.

Our 2018 Customer Acquisition And Engagement Blog Roundup

Our final roundup this year covers two of our main topics: customer acquisition and customer engagement. We’ve not covered these topics in depth this year, and so decided to combine these two to provide a roundup of the best of both.