Efficient Non-Parametric Function Induction in Semi-Supervised Learning
There has been an increase of interest for semi-supervised learning recently, because of the many datasets with large amounts of unlabeled examples and only a few labeled ones. This paper follows up on proposed non-parametric algorithms which provide an estimated continuous label for the given unlabeled examples. It extends them to function induction algorithms that correspond to the minimization of a regularization criterion applied to an out-of-sample example, and happens to have the form of a Parzen windows regressor. The advantage of the extension is that it allows predicting the label for a new example without having to solve again a linear system of dimension 'n' (the number of unlabeled and labeled training examples), which can cost O(n^3). Experiments show that the extension works well, in the sense of predicting a label close to the one that would have been obtained if the test example had been included in the unlabeled set. This relatively efficient function induction procedure can also be used when 'n' is large to approximate the solution by writing it only in terms of a kernel expansion with 'm' << 'n' terms, and reducing the linear system to 'm' equations in 'm' unknowns.
[ - ]