a wafer metrology consortium under MAGNET

Metro450 Members Publications

A distance function for data with missing values and its application

22 October, 2013
Authors: Loai AbdAllah and Ilan Shimshoni
Source: CDMKE 2013 : International Conference on Data Mining and Knowledge Engineering
Abstract: Missing values in data are common in real world applications. Since the performance of many data mining algorithms depend critically on it being given a good metric over the input space, we decided in this paper to define a distance function for unlabeled datasets with missing values. We use the Bhattacharyya distance, which measures the similarity of two probability distributions, to define our new distance function. According to this distance, the distance between two points without missing attributes values is simply the Mahalanobis distance.