Data Mining - Technology Driven by Standards?

Friedemann Schwenkreis
editor of ISO/IEC 13249-6 SQL/MM Data Mining

IBM Deutschland Entwicklung GmbH

fschwenk@acm.org

Currently, most of the data mining products are focusing on the data mining technology rather than on the ease-of-use, scalability, or portability. Given that data mining technology just recently became actually usable in real world scenarios this seems to be just natural. At the same time several attempts are made by standard organizations and consortia to agree on a standardized way to use data mining together with today's data management products like SQL databases and data warehouses. Three major efforts shall be mentioned here:

         ISO/IEC JTC1 SC32 WG4: SQL/MM Part 6 Data Mining
A collection of SQL user-defined types and routines to compute and apply data mining models.

         The Data Mining Group (DMG): Predictive Model Markup Language (PMML)
An XML based specification for data mining models.

         OMG: Common Warehouse Metamodel (CWM): Chapter 14 Data Mining
A UML/XML based specification for data mining metadata.

 

All three approaches abstract from specific data mining technologies. They clearly address the ease-of-use problem by hiding the complexity of the underlying data mining algorithm. The data mining extensions defined in SQL/MM go even further. They introduce SQL routines that allow the invocation of data mining functions as part of SQL statements. Hence, the SQL implementation (the optimizer) is in control of the execution and decides whether parallelism is used or not and it also decides where the actual computation takes place.

 

The standards introduce high requirements on data mining products mainly demanded by users of data mining technology. It seems that in case of data mining, standards are not only intended to unify existing products with well-known functionality but to (partially) design the functionality such that future products match real world requirements.

 

This can be seen as a general trend in today's standardization efforts. The objective is to have a standardized specification as early as possible rather than defining a standard after major products already have set a "de facto" standard. Hence, with this new approach products are driven by standards and not the standards by products.