Data Mining Concepts 5. Exploring and Validating Models

EMC2DATA

Exploring and Validating Models

The fifth step in the data mining process, as highlighted in the following diagram, is to explore the mining models that you have built and test their effectiveness. +

Before you deploy a model into a production environment, you will want to test how well the model performs. Also, when you build a model, you typically create multiple models with different configurations and test all models to see which yields the best results for your problem and your data.

Analysis Services provides tools that help you separate your data into training and testing datasets so that you can accurately assess the performance of all models on the same data. You use the training dataset to build the model, and the testing dataset to test the accuracy of the model by creating prediction queries. This partitioning can be done automatically while building the mining model. For more information, see Testing and Validation (Data Mining).

You can explore the trends and patterns that the algorithms discover by using the viewers in Data Mining Designer in SQL Server Data Tools. For more information, see Data Mining Model Viewers. You can also test how well the models create predictions by using tools in the designer such as the lift chart and classification matrix. To verify whether the model is specific to your data, or may be used to make inferences on the general population, you can use the statistical technique called cross-validation to automatically create subsets of the data and test the model against each subset. For more information, see Testing and Validation (Data Mining).

If none of the models that you created in the Building Models step perform well, you might have to return to a previous step in the process and redefine the problem or reinvestigate the data in the original dataset.