The King County Homes prices prediction challenge is an excellent dataset for trying out and experimenting with various regression models. As we’ll see in the following post on Moscow flats, the modeler deals with similar challenges: skewed data and outliers, highly correlated variables (predictors), heteroskedasticity and a geographical correlation structure. Ignoring one of these may lead to undeperforming models, so in this post we’re going to carefully explore the dataset, which should inform which modeling strategy to choose.
”I can live with doubt and uncertainty and not knowing. We absolutely must leave room for doubt or there is no progress and there is no learning. There is no learning without having to pose a question. And a question requires doubt. People search for certainty, but there is no certainty.” — Richard Feynman The pleasure of finding things out
The first series of posts gives an overview of the probabilistic perspective on Machine Learning 1, a flexible and adaptive approach which can be used in modeling complex phenomena.