The Data Analytics Blog

Our news and views relating to Data Analytics, Big Data, Machine Learning, and the world of Credit.

All Posts

Finding Value In Transaction Data (Part 3)

December 15, 2015 at 3:12 PM

In my previous post, we looked at the first “D” of the 3D approach of identifying and extracting value out of your transaction data: Determination.

If you recall, I proposed a 3 step approach (the 3D approach) to realising value from a variety of large data sets:

  1. Determination –scour data sources to establish if and where there might be value
  2. Development – create models that will be developed for the decision areas where value was identified with data that is predictive of this predetermined outcome
  3. Deployment – implement and run the developed market

In this post, we will continue looking at the 5 steps of the Determination phase:

  1. Incorporating the data
  2. Aggregating the data
  3. Identifying the target areas of value
  4. Scouring the data for value
  5. Reviewing and planning mini-projects

Let’s now look at steps 4 and 5.

4. Scouring the data for value

The process of data scouring or prospecting involves looking at the observation – such as the aggregated data - and assessing whether this data might predict future target areas of value. Essentially, is there a correlation between an historic piece/group of data and a future outcome?

To determine this, analysts will typically assess a regression relationship between the observation characteristic and the outcome (target) variable. Initially the analysis will be univariate (single variable) analysis, and later multi-variable analysis can take place.

Univariate analysis involves taking an observation characteristic, creating attribute groups and then calculating the information value. The equation to calculate this is displayed below:

 Univariate calculation for finding value in big data

The resultant information value calculated per field and given target variable will indicate whether a field has value or not. An indicative table of strength values is displayed below.

Range Strength
0 0.02 Non-predictive
0.02 0.1 Weak
0.1 0.3 Medium
0.3 0.5 Strong
0.5 100 Very Strong

An example of the calculation is given. Here an online retailer wants to understand whether customers who bought books in January 2015 - March 2015 were likely to buy music in April 2015 - June 2015. Therefore, the rows represent the observation data and the columns represent the outcome, or target, data.


Does 3 months of book purchases predict a music purchase 3m later?

Music purchases
(Apr 2015-Jun 2015)

Customers with no purchases

Customers with no music purchases

ALL Customers

Book purchases (Jan 2015-Mar 2015)

Customers with no book purchases




Customers with book purchases





ALL Customers




This is then converted to column percentages:

Does 3 months of book purchases predict a music purchase 3m later?

Music purchases
(Apr 2015-Jun 2015)

Customers with no purchases

Customers with music purchases


Book purchases (Jan 2015-Mar 2015)

No purchases




Book purchases




ALL Customers




The calculations are then run:



























Sum of D1 and D2




Therefore book purchases in the first quarter of 2015 had a medium correlation to music purchases in the second quarter of 2015.

This calculation is run for all observation variables on of the target variables. The resultant can be displayed in a data-scouring heat-map. An example is displayed here.


The rows comprise observation, or the aggregated, characteristics. The columns represent each outcome target value. The colours represent the degree of correlation (according to the mapping table - red being the strongest and blue being the weakest).

This exercise, if done well, will set the business up for a well-ordered series of projects to extract value from the data.

Measuring incremental lift

Once the univariate analysis is complete, it is worthwhile assessing how correlated each characteristic is with the others. Correlation analysis is often run to determine this. For example, it may be that “Number of purchases in the last 1m” is positively correlated to someone responding to a marketing offer. Similarly “Number of purchases in the last 3m” may also be positively correlated. The naïve conclusion may be that both characteristics should be used together to predict response to an offer. The reality is that these two characteristics are highly correlated.

Many statistical techniques and measures can be used to determine the correlation. Ultimately, the characteristics could be grouped into correlated items to summarise the number of low-correlated but predictive groups. This information is important when it comes to determining what model – for example, decision tree/segmentation, clusters, scorecard - should be considered in the next phase.

5. Reviewing and planning mini-projects

Once the heat maps are developed the following should be determined:

  1. For each target variable – was there significant amount of data with a strong prediction of the target?
  2. Could this data be used to create models, such as scores, segmentation or clustering? The variables identified should be tested for co-linearity/correlation to determine if the strong characteristics can add value.
  3. Can the data merging, aggregation and model deployment be successfully accomplished given current hardware and software constraints? If not, what needs to change and is the business ready for it?
  4. Which exercise is likely to produce the most value?
  5. Should the business run projects in parallel sprints or in a relay (one-at-a-time) fashion?


The Determination step gives the data analysts, risk/loyalty/marketing managers a very good insight into which fields might be valuable and for what purposes. The next step is the Development step. In the next blog we’ll be exploring this step.

In my next blog post, I’ll take you through the next phase of finding value in your transaction data: Development.

Subscribe to our blog

Thomas Maydon
Thomas Maydon
Thomas Maydon is the Head of Credit Solutions at Principa. With over 17 years of experience in the Southern African, West African and Middle Eastern retail credit markets, Tom has primarily been involved in consulting, analytics, credit bureau and predictive modelling services. He has experience in all aspects of the credit life cycle (in multiple industries) including intelligent prospecting, originations, strategy simulation, affordability analysis, behavioural modelling, pricing analysis, collections processes, and provisions (including Basel II) and profitability calculations.

Latest Posts

The 7 types of credit risk in SME lending

  It is common knowledge in the industry that the credit risk assessment of a consumer applying for credit is far less complex than that of a business that is applying for credit. Why is this the case? Simply put, consumers are usually very similar in their requirements and risks (homogenous) whilst businesses have far more varying risk elements (heterogenous). In this blog we will look at all the different risk elements within a business (here SME) credit application. These are: Risk of proprietors Risk of business Reason for loan Financial ratios Size of loan Risk industry Risk of region Before we delve into this list, it is worth noting that all of these factors need to be deployable as assessment tools within your originations system so it is key that you ensure your system can manage them. If you are on the look out for a loans origination system, then look no further than Principa’s AppSmart. If you are looking for a decision engine to manage your scorecards, policy rules and terms of business then take a look at our DecisionSmart business rules engine. AppSmart and DecisionSmart are part of Principa’s FinSmart Universe allowing for effective credit management across the customer life-cycle.   The different risk elements within a business credit application 1) Risk of proprietors For smaller organisations the risk of the business is inextricably linked to the financial well-being of the proprietors. How small is small? The rule of thumb is companies with up to two to three proprietors should have their proprietors assessed for risk too. This fits in with the SME segment. What data should be looked at? Generally in countries with mature credit bureaux, credit data is looked at including the score (there is normally a score cut-off) and then negative information such as the existence of judgements or defaults; these are typically used within policy rules. Those businesses with proprietors with excessive numbers of “negatives” may be disqualified from the loan application. Some credit bureaux offer a score of an individual based on the performance of all the businesses with which they are associated. This can also be useful in the credit risk assessment process. Another innovation being adopted internationally is the use of psychometrics in credit evaluation of the proprietors. To find out more about adopting credit scoring, read our blog on how to adopt credit scoring.   2) Risk of business The risk of the business should be managed through both scores and policy rules. Lenders will look at information such as the age of company, the experience of directors and the size of company etc. within a score. Alternatively, many lenders utilise the business score offered by credit bureaux. These scores are typically not as strong as consumer scores as the underlying data is limited and sometimes problematic. For example, large successful organisations may have judgements registered against their name which, unlike for consumers, is not necessarily a direct indication of the inability to service debt.   3) Reason for loan The reason for a loan is used more widely in business lending as opposed to unsecured consumer lending. Venture capital, working capital, invoice discounting and bridging finance are just some of many types of loan/facilities available and lenders need to equip themselves with the ability to manage each of these customer types whether it is within originations or collections. Prudent lenders venturing into the SME space for the first time often focus on one or two of these loan types and then expand later – as the operational implication for each type of loan is complex.   4) Financial ratios Financial ratios are core to commercial credit risk assessment. The main challenge here is to ensure that reliable financials are available from the customer. Small businesses may not be audited and thus the financials may be less trustworthy. Financial ratios can be divided into four categories: Profitability Leverage Coverage Liquidity Profitability can be further divided into margin ratios and return ratios. Lenders are frequently interested in gross profit margins; this is normally explicit on the income statement. The EBIDTA margin and operating profit margins are also used as well as return ratios such as return on assets, return on equity and risk-adjusted-returns. Leverage ratios are useful to lenders as they reflect the portion of the business that is financed by debt. Lower leverage ratios indicate stability. Leverage ratios assessed often incorporate debt-to-asset, debt-to-equity and asset-to-equity. Coverage ratios indicate the coverage that income or assets provide for the servicing of debt or interest expenses. The higher the coverage ratio the better it is for the lender. Coverage ratios are worked out considering the loan/facility that is being applied for. Finally, liquidity ratios indicate the ability for a company to convert its assets into cash. There are a variety of ratios used here. The current ratio is simply the ratio of assets to liabilities. The quick ratio is the ability for the business to pay its current debts off with readily available assets. The higher the liquidity ratios the better. Ratios are used both within credit scorecards as well as within policy rules. You can read more about these ratios here.   5) Size of loan When assessing credit risk for a consumer, the risk of the consumer does not normally change with the change of loan amount or facility (subject to the consumer passing affordability criteria). With business loans, loan amounts can range quite dramatically, and the risk of the applicant is normally tied to the loan amount requested. The loan/facility amount will of course change the ratios (mentioned in the last section) which could affect a positive/negative outcome. The outcome of the loan application is usually directly linked to a loan amount and any marked change to this loan amount would change the risk profile of the application.   6) Risk of industry The risk of an industry in which the SME operates can have a strong deterministic relationship with the entity being able to service the debt. Some lenders use this and those who do not normally identify this as a missing element in their risk assessment process. The identification of industry is always important. If you are in manufacturing, but your clients are the mines, then you are perhaps better identified as operating in mining as opposed to manufacturing. Most lenders who assess industry, will periodically rule out certain industries and perhaps also incorporate industry within their scorecard. Others take a more scientific approach. In the graph below the performance of an industry is tracked for two years and then projected over the next 6 months; this is then compared to the country’s GDP. As the industry appears to track above the projected GDP, a positive outlook is given to this applicant and this may affect them favourably in the credit application.                   7) Risk of Region   The last area of assessment is risk of region. Of the seven, this one is used the least. Here businesses,  either on book or on the bureau, are assessed against their geo-code. Each geo-code is clustered, and the projected outlook is given as positive, static or negative. As with industry this can be used within the assessment process as a policy rule or within a scorecard.   Bringing the seven risk categories together in a risk assessment These seven risk assessment categories are all important in the risk assessment process. How you bring it all together is critical. If you would like to discuss your SME evaluation challenges or find out more about what we offer in credit management software (like AppSmart and DecisionSmart), get in touch with us here.

Collections Resilience post COVID-19 - part 2

Principa Decisions (Pty) L

Collections Resilience post COVID-19

Principa Decisions (Pty) L