Sunday, September 10, 2017

Introductory python code of bayes

This article is a introduction of how to use bayes in python. naive bayes is a probability model/formula which is used under lots of constraints . In reality , I seldom use it as normally data from industrial environment can't fit the bayes' constraints which is "Law of large numbers" and max-lilkelihood normally is build under lots of assumptions which make the model inaccurate.  In this article ,I won't derive the formula as typing math formulas in computer is a terrible work for me. :)

I will use the digit recognition data set from kaggle to demostrate how the basic flow works .

Import libs & datas 


import numpy as np
import pandas as pd
from sklearn.lda import LDA
from sklearn import datasets 
from sklearn import metrics
from sklearn.model_selection import KFold
from sklearn.naive_bayes import GaussianNB  # import into bayes lib
dataset = pd.read_csv("train.csv")
target = dataset[[0]].values.ravel()
train = dataset.iloc[:,1:].values
test = pd.read_csv("test.csv").values

x_finaltest = test  # test data

kf = KFold(n_splits = 10)

Train & predict in 10 Fold CrossValidation 

total_score1 = []
gnb = GaussianNB()
for train_index ,test_index in kf.split(train):
    x_train = train[train_index]
    y_train = target[train_index]
    x_test  = train[test_index]
    y_test  = target[test_index]
   , y_train)  # train the model
    y_predict1 = gnb.predict(x_test)  # predict 

    total_score1.append(metrics.accuracy_score(y_predict1,y_test)) # compare and score

# print average score

y_pred1 = gnb.predict(x_finaltest) # predict

# save into file
np.savetxt('bayes.csv', np.c_[range(1,len(x_finaltest)+1),y_pred1], delimiter=',', header = 'ImageId,Label', comments = '', fmt='%d')

Wall time: 4min 32s

