7.26 Cluster Analysis in Market Research

You've been tracking your visitors carefully, trying to establish customer profiles. But it's very difficult because you sell a range of products and a baffling range of people buy them: different countries, different social groupings, following different promotions and webpage marketing copy. Nonetheless, you do need proper customer profiles because you're starting a major ppc campaign, which depends on careful targeting. How can you make best use of your web analytics data?

Cluster analysis finds the statistically most significant groupings in a collection of data, often presenting the groupings hierarchically as a dendogram. More formally, cluster analysis or clustering is the objective assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. It is one of many ways of grouping data.

Using a Clustering Computer Program

How do you run a clustering program? You key the relevant data into the program input box, set a few details (number of clusters required, learning rate, epochs, initial weighting) and click the start button.


Customer 1

Customer 2

Customer 3

Total Previous Sales




Value of Last Sale




Payment By

Credit Card


Credit Card

Date of Last Sale




Discount Applying




Pages viewed

prior to sale




Guarantees page

visited how many times

on day of sale




Shipping Charges page

visited how many times

on day of sale




VDU Resolution

1024 x 768

800 x 600

1280 x 1024

Country of





The program will then crunch through the data to give you something like this (usually with an explanation of the groupings):

The bracketing (vertical tie-lines) indicates how the further groupings are related: the more to the right lie the vertical lines combining groups, the looser are their grouping

Sometimes the data will only give you an oblique indication on the spending power of your customer (PayPal versus credit card, VDU resolution). Or you'll have to think beyond obvious categories — not year and quarter but possible Christmas present, spring booking for summer holidays, economic outlook at the time.

Perhaps a simple classification will emerge, based entirely on product type. That will be the most important grouping, and you must run ppc programs that respect the grouping, making marketing efforts different for each product type. Or you may find the grouping is by discounts offered, when discounts will feature in your ad copy. If customer loyalty is important, then perhaps an email campaign would be wiser.

Real Time Data

Your own data will only take you so far. The more go-ahead companies access the voluminous data available from eBay and social media sites. Rather than search, identify and key in the relevant data, however, they write programs to continuously (i.e. in real time) tap into data through an API that links the raw data to analysis programs running on their own computers: a quicker and less error-prone process.

Further Work

You may also want to run clustering on visitors who didn't buy anything (though you'll have less data on them).

Above all, you need to experiment, and understand what the results mean. You'll have to quantify the relevance of factors by running clustering programs with factors present and removed. Once the major groupings are established, you'll want to find further groupings — either by setting larger numbers for the factors and/or by analyzing data in batches selected by the major groupings.

Uncertainties will doubtless remain after cluster analysis, but you can redesign your website to explore customer profiles further (funnel analysis, split testing, click density and task completion analysis). And refine the profiles of customers giving you the greater profit margin. That done and your important customer profiles established, you can think about tweaking the website and automating customer selection, sending customers to products they are most likely to buy.


1. Free Statistics. Good listing of open source and freeware statistics packages.
2. Wil's Domain. Straightforward listing of statistics software, both free and commercial.
3. Statistical Analysis Software Survey. Useful tables if you're familiar with statistics packages.
4. COMPACT — Comparative Package for Clustering Assessment. Free cluster analysis package.
5. Chameleon Statistics. Uses cluster analysis. Free evaluation model.


1. Define cluster analysis. What is the significant word in the definition?
2. How could it be useful in webpage design, pay-per-click marketing and search engine optimization?
3. Evaluate some statistical packages available, both free and commercial.

Sources and Further Reading

1. Information Theory, Inference, and Learning Algorithms. David MacKay. 2003. Inference. Includes neural networks and their theory. Hardback $60 but free as PDF download.
2. Programming Collective Intelligence: Building Smart Web 2.0 Applications by Toby Segaran. O'Reilly. August 2007. Includes specimen code in Python.
3. Cluster Analysis. StatSoft. Straightforward but extended treatment of various approaches.