什么是KNN

07/03/2020 #machine-learning

什么是KNN

You're The Average Of The Five People You Spend The Most Time With
--Jim Rohn

KNN(k-nearest neighbors)算法应该是机器学习领域内最简单的分类算法。它的思路十分朴素，一言以蔽之，在一定范围内，数量最多的类别即是样本的类别。

本文将通过tensorflow来实现一个KNN的Demo

导入相关库

 class="highlight">import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
获取数据集
 class="highlight">mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

Xtrain, Ytrain = mnist.train.next_batch(5000)  #训练集 X:[5000,784], Y:[5000,10]
Xtest, Ytest = mnist.test.next_batch(200)    #测试集 X:[200,784], Y:[200,10]

xtr = tf.placeholder("float", [None, 784])
xte = tf.placeholder("float", [784])
距离计算公式
 class="highlight"># L1-distance
distance = tf.reduce_sum(tf.abs(tf.add(xtr, tf.negative(xte))), axis=1)

# L2-distance
distance = tf.sqrt(tf.reduce_sum(tf.square(tf.add(xtr,tf.negative(xte))),axis=1))
 class="highlight"># 获取最小距离的索引
pred = tf.arg_min(distance, 0)

#分类精确度
accuracy = 0.

# 初始化变量
init = tf.global_variables_initializer()
 class="highlight"># 运行会话，训练模型
with tf.Session() as sess:

    # 运行初始化
    sess.run(init)


    # 遍历测试数据
    for i in range(len(Xtest)):
        # 获取当前样本的最近邻索引
        nn_index = sess.run(pred, feed_dict={xtr: Xtrain, xte: Xtest[i, :]})   #向占位符传入训练数据
        # 最近邻分类标签与真实标签比较
        print("Test", i, "Prediction:", np.argmax(Ytrain[nn_index]), \
            "True Class:", np.argmax(Ytest[i]))
        # 计算精确度
        if np.argmax(Ytrain[nn_index]) == np.argmax(Ytest[i]):
            accuracy += 1./len(Xtest)

    print("Done!")
    print("Accuracy:", accuracy)
Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript" rel="nofollow">comments powered by Disqus.</a>
Related posts


    Some funny posts
    written by Luke Lee