什么是KNN

什么是KNN

You're The Average Of The Five People You Spend The Most Time With

--Jim Rohn

KNN(k-nearest neighbors)算法应该是机器学习领域内最简单的分类算法。 它的思路十分朴素,一言以蔽之,在一定范围内,数量最多的类别即是样本的类别。

本文将通过tensorflow来实现一个KNN的Demo

class="highlight">
1
2
3
import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
  • 获取数据集
class="highlight">
1
2
3
4
5
6
7
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

Xtrain, Ytrain = mnist.train.next_batch(5000)  #训练集 X:[5000,784], Y:[5000,10]
Xtest, Ytest = mnist.test.next_batch(200)    #测试集 X:[200,784], Y:[200,10]

xtr = tf.placeholder("float", [None, 784])
xte = tf.placeholder("float", [784])
  • 距离计算公式
class="highlight">
1
2
3
4
5
# L1-distance
distance = tf.reduce_sum(tf.abs(tf.add(xtr, tf.negative(xte))), axis=1)

# L2-distance
distance = tf.sqrt(tf.reduce_sum(tf.square(tf.add(xtr,tf.negative(xte))),axis=1))
class="highlight">
1
2
3
4
5
6
7
8
# 获取最小距离的索引
pred = tf.arg_min(distance, 0)

#分类精确度
accuracy = 0.

# 初始化变量
init = tf.global_variables_initializer()
class="highlight">
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 运行会话,训练模型
with tf.Session() as sess:

    # 运行初始化
    sess.run(init)


    # 遍历测试数据
    for i in range(len(Xtest)):
        # 获取当前样本的最近邻索引
        nn_index = sess.run(pred, feed_dict={xtr: Xtrain, xte: Xtest[i, :]})   #向占位符传入训练数据
        # 最近邻分类标签与真实标签比较
        print("Test", i, "Prediction:", np.argmax(Ytrain[nn_index]), \
            "True Class:", np.argmax(Ytest[i]))
        # 计算精确度
        if np.argmax(Ytrain[nn_index]) == np.argmax(Ytest[i]):
            accuracy += 1./len(Xtest)

    print("Done!")
    print("Accuracy:", accuracy)
Some funny posts written by Luke Lee