视频地址:https://www.youtube.com/watch?v=cKxRvEZd3Mw
以下内容基于Python3,视频还是Python2.7
运行环境: PyCharm2019
[TOC]
你好,世界(决策树分类器)
决策树分类器
1 2 3 4 5 6
| from sklearn import tree features = [[140,1],[130,1],[150,1],[170,0]] labels = [0,0,1,1] clf = tree.DecisionTreeClassifier() clf = clf.fit(features, labels) print(clf.predict([[160,0]]))
|
决策树的可视化(鸢尾花iris资料集)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
| import numpy as np from sklearn.datasets import load_iris from sklearn import tree
iris = load_iris() print(iris.feature_names) print(iris.target_names) print(iris.data[0]) print(iris.target[0]) test_idx = [0, 50, 100]
train_target = np.delete(iris.target, test_idx) train_data = np.delete(iris.data, test_idx, axis=0)
test_target = iris.target[test_idx] test_data = iris.data[test_idx]
clf = tree.DecisionTreeClassifier() clf = clf.fit(train_data, train_target)
print(test_target) print(clf.predict(test_data))
|
在运行下面代码之前,需要pip install pydot 和pip install six,pip install GraphViz。还有GraphViz软件,并修改PATH
graphviz-2.38.msi,下载地址为https://graphviz.gitlab.io/_pages/Download/Download_windows.html
1 2 3 4 5 6 7 8 9 10 11 12 13
| from six import StringIO import pydot dot_data = StringIO() tree.export_graphviz(clf, out_file=dot_data, feature_names=iris.feature_names, class_names=iris.target_names, filled=True, rounded=True, special_characters=True) graph = pydot.graph_from_dot_data(dot_data.getvalue()) print(len(graph)) print(graph) print(graph[0]) graph[0].write_pdf("iris.pdf")
|
何为优秀特征
有些特征会破坏分类器的准确性。比如狗的身高在一定范围内能预测狗的种类,但是狗的眼睛颜色就和狗的种类没有关系。
*np.random.randn()含义:
For random samples from N$(\mu, \sigma^2)$, use:
$\sigma$ * np.random.randn(…) + $\mu$
1 2 3 4 5 6 7 8
| import numpy as np import matplotlib.pyplot as plt
greyhounds=500 labs=500 grey_height=28+4*np.random.randn(greyhounds) lab_height=24+4*np.random.randn(labs) plt.hist([grey_height,lab_height],stacked=True,color=['r','g'])
|
让我们写一个机器学习流水线吧!(k近邻分类器)
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| from sklearn import datasets iris=datasets.load_iris() x=iris.data y=iris.target
from sklearn.model_selection import train_test_split x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.5) from sklearn.neighbors import KNeighborsClassifier my_classifier=KNeighborsClassifier() my_classifier.fit(x_train,y_train)
predictions=my_classifier.predict(x_test) from sklearn.metrics import accuracy_score print(accuracy_score(y_test,predictions))
|
我们的第一个分类器
自己编写一个分类器。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
| from sklearn import datasets iris=datasets.load_iris() x=iris.data y=iris.target
from scipy.spatial import distance def euc(a,b): return distance.euclidean(a,b) class ScrappyKNN(): def fit(self,x_train,y_train): self.x_train=x_train self.y_train=y_train def predict(self,x_test): predictions=[] for row in x_test: label=self.closest(row) predictions.append(label) return predictions def closest(self,row): best_dist=euc(row,self.x_train[0]) best_index=0 for i in range(1,len(self.x_train)): dist=euc(row,self.x_train[i]) if dist<best_dist: best_dist=dist best_index=i return self.y_train[best_index] from sklearn.model_selection import train_test_split x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.5)
my_classifier=ScrappyKNN() my_classifier.fit(x_train,y_train)
predictions=my_classifier.predict(x_test) from sklearn.metrics import accuracy_score print(accuracy_score(y_test,predictions))
|

使用TensorFlow for Poets 训练图像分类器
需要pip install tensorflow 库
(弃坑)