Python基于sklearn库的分类算法简单应用示例
本文实例讲述了Python基于sklearn库的分类算法简单应用。分享给大家供大家参考,具体如下:
scikit-learn已经包含在Anaconda中。也可以在官方下载源码包进行安装。本文代码里封装了如下机器学习算法,我们修改数据加载函数,即可一键测试:
#coding=gbk ''' Createdon2016年6月4日 @author:bryan ''' importtime fromsklearnimportmetrics importpickleaspickle importpandasaspd #MultinomialNaiveBayesClassifier defnaive_bayes_classifier(train_x,train_y): fromsklearn.naive_bayesimportMultinomialNB model=MultinomialNB(alpha=0.01) model.fit(train_x,train_y) returnmodel #KNNClassifier defknn_classifier(train_x,train_y): fromsklearn.neighborsimportKNeighborsClassifier model=KNeighborsClassifier() model.fit(train_x,train_y) returnmodel #LogisticRegressionClassifier deflogistic_regression_classifier(train_x,train_y): fromsklearn.linear_modelimportLogisticRegression model=LogisticRegression(penalty='l2') model.fit(train_x,train_y) returnmodel #RandomForestClassifier defrandom_forest_classifier(train_x,train_y): fromsklearn.ensembleimportRandomForestClassifier model=RandomForestClassifier(n_estimators=8) model.fit(train_x,train_y) returnmodel #DecisionTreeClassifier defdecision_tree_classifier(train_x,train_y): fromsklearnimporttree model=tree.DecisionTreeClassifier() model.fit(train_x,train_y) returnmodel #GBDT(GradientBoostingDecisionTree)Classifier defgradient_boosting_classifier(train_x,train_y): fromsklearn.ensembleimportGradientBoostingClassifier model=GradientBoostingClassifier(n_estimators=200) model.fit(train_x,train_y) returnmodel #SVMClassifier defsvm_classifier(train_x,train_y): fromsklearn.svmimportSVC model=SVC(kernel='rbf',probability=True) model.fit(train_x,train_y) returnmodel #SVMClassifierusingcrossvalidation defsvm_cross_validation(train_x,train_y): fromsklearn.grid_searchimportGridSearchCV fromsklearn.svmimportSVC model=SVC(kernel='rbf',probability=True) param_grid={'C':[1e-3,1e-2,1e-1,1,10,100,1000],'gamma':[0.001,0.0001]} grid_search=GridSearchCV(model,param_grid,n_jobs=1,verbose=1) grid_search.fit(train_x,train_y) best_parameters=grid_search.best_estimator_.get_params() forpara,valinlist(best_parameters.items()): print(para,val) model=SVC(kernel='rbf',C=best_parameters['C'],gamma=best_parameters['gamma'],probability=True) model.fit(train_x,train_y) returnmodel defread_data(data_file): data=pd.read_csv(data_file) train=data[:int(len(data)*0.9)] test=data[int(len(data)*0.9):] train_y=train.label train_x=train.drop('label',axis=1) test_y=test.label test_x=test.drop('label',axis=1) returntrain_x,train_y,test_x,test_y if__name__=='__main__': data_file="H:\\Research\\data\\trainCG.csv" thresh=0.5 model_save_file=None model_save={} test_classifiers=['NB','KNN','LR','RF','DT','SVM','SVMCV','GBDT'] classifiers={'NB':naive_bayes_classifier, 'KNN':knn_classifier, 'LR':logistic_regression_classifier, 'RF':random_forest_classifier, 'DT':decision_tree_classifier, 'SVM':svm_classifier, 'SVMCV':svm_cross_validation, 'GBDT':gradient_boosting_classifier } print('readingtrainingandtestingdata...') train_x,train_y,test_x,test_y=read_data(data_file) forclassifierintest_classifiers: print('*******************%s********************'%classifier) start_time=time.time() model=classifiers[classifier](train_x,train_y) print('trainingtook%fs!'%(time.time()-start_time)) predict=model.predict(test_x) ifmodel_save_file!=None: model_save[classifier]=model precision=metrics.precision_score(test_y,predict) recall=metrics.recall_score(test_y,predict) print('precision:%.2f%%,recall:%.2f%%'%(100*precision,100*recall)) accuracy=metrics.accuracy_score(test_y,predict) print('accuracy:%.2f%%'%(100*accuracy)) ifmodel_save_file!=None: pickle.dump(model_save,open(model_save_file,'wb'))
测试结果如下:
readingtrainingandtestingdata...
*******************NB********************
trainingtook0.004986s!
precision:78.08%,recall:71.25%
accuracy:74.17%
*******************KNN********************
trainingtook0.017545s!
precision:97.56%,recall:100.00%
accuracy:98.68%
*******************LR********************
trainingtook0.061161s!
precision:89.16%,recall:92.50%
accuracy:90.07%
*******************RF********************
trainingtook0.040111s!
precision:96.39%,recall:100.00%
accuracy:98.01%
*******************DT********************
trainingtook0.004513s!
precision:96.20%,recall:95.00%
accuracy:95.36%
*******************SVM********************
trainingtook0.242145s!
precision:97.53%,recall:98.75%
accuracy:98.01%
*******************SVMCV********************
Fitting3foldsforeachof14candidates,totalling42fits
[Parallel(n_jobs=1)]:Done 42outof 42|elapsed: 6.8sfinished
probabilityTrue
verboseFalse
coef00.0
degree3
tol0.001
shrinkingTrue
cache_size200
gamma0.001
max_iter-1
C1000
decision_function_shapeNone
random_stateNone
class_weightNone
kernelrbf
trainingtook7.434668s!
precision:98.75%,recall:98.75%
accuracy:98.68%
*******************GBDT********************
trainingtook0.521916s!
precision:97.56%,recall:100.00%
accuracy:98.68%
更多关于Python相关内容感兴趣的读者可查看本站专题:《Python数学运算技巧总结》、《Python数据结构与算法教程》、《Python函数使用技巧总结》、《Python字符串操作技巧汇总》、《Python入门与进阶经典教程》及《Python文件与目录操作技巧汇总》
希望本文所述对大家Python程序设计有所帮助。