eda

King County Homes Challenge. Exploratory Data Analysis

The King County Homes prices prediction challenge is an excellent dataset for trying out and experimenting with various regression models. As we’ll see in the following post on Moscow flats, the modeler deals with similar challenges: skewed data and outliers, highly correlated variables (predictors), heteroskedasticity and a geographical correlation structure. Ignoring one of these may lead to undeperforming models, so in this post we’re going to carefully explore the dataset, which should inform which modeling strategy to choose.