Thursday, 16 April 2009

What is data mining?

After a couple of posts about coal and diamonds I thought it might be a good idea to post a straightforward answer to the question: What is data mining?

Data mining is the applications of statistical techniques and artificial intelligence to find patterns in data that are not apparent using queries or other database techniques. Data patterns can provide insights into behaviours and trends that would otherwise remain hidden. Data mining is perhaps more descriptively known as knowledge discovery in data.

The statistical techniques are run as software programmes which allow parameters to define how the algorithms are applied. The pattern-finding process can be run on different data sets and with different parameter settings. Models can be refined to improve the accuracy of the results.

Although data mining algorithms can be run on any data file, they are often applied to files where data has been brought together from a number of different sources. Different statistical techniques, or algorithms, are suited to different types of data, and different problems.

The basic premise of data mining is that predictions can be made about the future from a sample of past behaviour, ie the existing data files. For example, theatre bookings together with other information about those who made the bookings can be used to find patterns, and predict what type of productions they might book in the future. Segments can be found and different marketing messages sent to them according to their profile.

Data mining is the automatic or semi automatic means of finding patterns and making predictions.

Data mining has now been built into Microsoft’s SQL Server database: starting with two algorithms in SQL Server 2000, extended to 7 algorithms in SQL Server 2005, and with some further enhancements in SQL Server 2008.

Get in touch if you would like to find out whether your data files are suitable for data mining.

No comments:

Post a Comment