Skip to main content

Instance Based Learning, KNN

Introduction

SL4_instance_based_Learning_intro.png

Pros

  • Remember

  • Fast

  • Simple

Cons

  • No generalization
  • Overfitting
  • Can't handle multiple reference to the same value/ weights.

Cost of the house

SL4_KNN_definition.png

KNN

SL4_KNN_definition_formal.png

KNN Bias

Preference Bias: Our believe about what makes a good hypothesis

  • Locality -> near points are similar

  • Smoothness -> averaging

  • An features matter equally ( distances are the same for different dimensions)

Won't you compute my neighbors?

SL4_KNN_calculation_time.png

Curse of Dimensionality

As the number of features or dimensions grows, the amount of data we need to generalize accuracy grows exponentially.

Some other stuff

  • Distance(x,q) , Euclian, Manhattan (weighted) - regressions, mismatches for "colors"
    • Define a function to measure similar functionality between different points.
    • The idea of similarity measurement.
  • K : how to pick K. K = n. Weighted average n.
    • Locally weighted regression
    • Locally weighted linear regression

What have we learned

  • Instance based learning
  • Lazy vs. eager learning.
  • KNN.
  • Nearest neighbor: similarity function (distance)
    • Domain knowledge matters.
  • Classification vs regression
  • Averaging.
  • Locally weighted $X regression.

No free lunch

Any learning algorithm that you create, if you average over all possible instances, it's doing similar to random.

Solution

Use the domain knowledge to pick up the best similarity functions.

Motivation of this class

Let students learn the domain knowledge and pick up the right approach for the learning task.