Google机器学习讲座笔记

视频地址:https://www.youtube.com/watch?v=cKxRvEZd3Mw

以下内容基于Python3,视频还是Python2.7

运行环境: PyCharm2019

[TOC]

你好,世界(决策树分类器)

决策树分类器

1
2
3
4
5
6
from sklearn import tree
features = [[140,1],[130,1],[150,1],[170,0]]# 特征
labels = [0,0,1,1]#标签
clf = tree.DecisionTreeClassifier()#决策树分类器
clf = clf.fit(features, labels)#训练分类器
print(clf.predict([[160,0]]))#预测结果

决策树的可视化(鸢尾花iris资料集)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import numpy as np
from sklearn.datasets import load_iris
from sklearn import tree

iris = load_iris()
print(iris.feature_names) # ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
print(iris.target_names) # ['setosa' 'versicolor' 'virginica']
print(iris.data[0]) # [ 5.1 3.5 1.4 0.2]
print(iris.target[0]) # 0
test_idx = [0, 50, 100] # 取0,50,100作为测试集

# training data
train_target = np.delete(iris.target, test_idx)
train_data = np.delete(iris.data, test_idx, axis=0)

# testing data
test_target = iris.target[test_idx]
test_data = iris.data[test_idx]

clf = tree.DecisionTreeClassifier()
clf = clf.fit(train_data, train_target)

print(test_target)
print(clf.predict(test_data))

在运行下面代码之前,需要pip install pydot 和pip install six,pip install GraphViz。还有GraphViz软件,并修改PATH

graphviz-2.38.msi,下载地址为https://graphviz.gitlab.io/_pages/Download/Download_windows.html

1
2
3
4
5
6
7
8
9
10
11
12
13
from six import StringIO #制作PDF
import pydot
dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True,
special_characters=True)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
print(len(graph)) # 1
print(graph) # [<pydot.Dot object at 0x000001F7BD1A9630>]
print(graph[0]) # <pydot.Dot object at 0x000001F7BD1A9630>
graph[0].write_pdf("iris.pdf")#默认工作路径,也可以更改到桌面之类的地方

何为优秀特征

有些特征会破坏分类器的准确性。比如狗的身高在一定范围内能预测狗的种类,但是狗的眼睛颜色就和狗的种类没有关系。

*np.random.randn()含义:

For random samples from N$(\mu, \sigma^2)$, use:

$\sigma$ * np.random.randn(…) + $\mu$

1
2
3
4
5
6
7
8
import numpy as np
import matplotlib.pyplot as plt#导入库

greyhounds=500#数据规模
labs=500
grey_height=28+4*np.random.randn(greyhounds)#灰猎狗的平均身高为28英寸,服从正态分布
lab_height=24+4*np.random.randn(labs)#生成训练数据
plt.hist([grey_height,lab_height],stacked=True,color=['r','g'])#展示直方图

让我们写一个机器学习流水线吧!(k近邻分类器)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from sklearn import datasets#导入数据集
iris=datasets.load_iris()
x=iris.data
y=iris.target

from sklearn.model_selection import train_test_split#生成训练集测试集
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.5)
from sklearn.neighbors import KNeighborsClassifier#训练K近邻分类器
my_classifier=KNeighborsClassifier()
my_classifier.fit(x_train,y_train)#拟合

predictions=my_classifier.predict(x_test)#训练预测结果
from sklearn.metrics import accuracy_score#计算训练精度
print(accuracy_score(y_test,predictions))#输入测试集和预测集

我们的第一个分类器

自己编写一个分类器。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
from sklearn import datasets#导入数据集
iris=datasets.load_iris()
x=iris.data
y=iris.target
#用类来编写分类器
from scipy.spatial import distance
def euc(a,b):#计算欧几里得距离
return distance.euclidean(a,b)
class ScrappyKNN():#自定义的KNN分类器
def fit(self,x_train,y_train):
self.x_train=x_train
self.y_train=y_train
def predict(self,x_test):
predictions=[]
for row in x_test:
label=self.closest(row)
predictions.append(label)
return predictions
def closest(self,row):#n个最近邻
best_dist=euc(row,self.x_train[0])
best_index=0
for i in range(1,len(self.x_train)):
dist=euc(row,self.x_train[i])
if dist<best_dist:
best_dist=dist
best_index=i
return self.y_train[best_index]

from sklearn.model_selection import train_test_split#生成训练集测试集
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.5)
#from sklearn.neighbors import KNeighborsClassifier#训练K近邻分类器
#my_classifier=KNeighborsClassifier()
my_classifier=ScrappyKNN()
my_classifier.fit(x_train,y_train)#拟合

predictions=my_classifier.predict(x_test)#训练预测结果
from sklearn.metrics import accuracy_score#计算训练精度
print(accuracy_score(y_test,predictions))#输入测试集和预测集

使用TensorFlow for Poets 训练图像分类器

需要pip install tensorflow 库

1
#貌似现在不支持了(x

(弃坑)