This article is a introduction of how to use bayes in python. naive bayes is a probability model/formula which is used under lots of constraints . In reality , I seldom use it as normally data from industrial environment can't fit the bayes' constraints which is "Law of large numbers" and max-lilkelihood normally is build under lots of assumptions which make the model inaccurate. In this article ,I won't derive the formula as typing math formulas in computer is a terrible work for me. :)
I will use the digit recognition data set from kaggle to demostrate how the basic flow works .
Import libs & datas
%%time import numpy as np import pandas as pd from sklearn.lda import LDA from sklearn import datasets from sklearn import metrics from sklearn.model_selection import KFold from sklearn.naive_bayes import GaussianNB # import into bayes lib dataset = pd.read_csv("train.csv") target = dataset[[0]].values.ravel() train = dataset.iloc[:,1:].values test = pd.read_csv("test.csv").values x_finaltest = test # test data kf = KFold(n_splits = 10)
Train & predict in 10 Fold CrossValidation
total_score1 = []
gnb = GaussianNB()
for train_index ,test_index in kf.split(train):
x_train = train[train_index]
y_train = target[train_index]
x_test = train[test_index]
y_test = target[test_index]
gnb.fit(x_train, y_train) # train the model
y_predict1 = gnb.predict(x_test) # predict
total_score1.append(metrics.accuracy_score(y_predict1,y_test)) # compare and score
# print average score
print(np.mean(total_score1))
y_pred1 = gnb.predict(x_finaltest) # predict
# save into file
np.savetxt('bayes.csv', np.c_[range(1,len(x_finaltest)+1),y_pred1], delimiter=',',
header = 'ImageId,Label', comments = '', fmt='%d')
0.556166666667 Wall time: 4min 32s
No comments:
Post a Comment