python实现求特征选择的信息增益
使用python语言,实现求特征选择的信息增益,可以同时满足特征中有连续型和二值离散型属性的情况。
师兄让我做一个特征选择的代码,我在网上找了一下,大部分都是用来求离散型属性的信息益益,但是我的数据是同时包含二值离散型和连续型属性的,所以这里实现了一下。
代码块
importnumpyasnp importmath classIG(): def__init__(self,X,y): X=np.array(X) n_feature=np.shape(X)[1] n_y=len(y) orig_H=0 foriinset(y): orig_H+=-(y.count(i)/n_y)*math.log(y.count(i)/n_y) condi_H_list=[] foriinrange(n_feature): feature=X[:,i] sourted_feature=sorted(feature) threshold=[(sourted_feature[inde-1]+sourted_feature[inde])/2forindeinrange(len(feature))ifinde!=0] thre_set=set(threshold) iffloat(max(feature))inthre_set: thre_set.remove(float(max(feature))) ifmin(feature)inthre_set: thre_set.remove(min(feature)) pre_H=0 forthreinthre_set: lower=[y[s]forsinrange(len(feature))iffeature[s]thre] H_l=0 forlinset(lower): H_l+=-(lower.count(l)/len(lower))*math.log(lower.count(l)/len(lower)) H_h=0 forhinset(highter): H_h+=-(highter.count(h)/len(highter))*math.log(highter.count(h)/len(highter)) temp_condi_H=len(lower)/n_y*H_l+len(highter)/n_y*H_h condi_H=orig_H-temp_condi_H pre_H=max(pre_H,condi_H) condi_H_list.append(pre_H) self.IG=condi_H_list defgetIG(self): returnself.IG if__name__=="__main__": X=[[1,0,0,1], [0,1,1,1], [0,0,1,0]] y=[0,0,1] print(IG(X,y).getIG())
输出结果为:
[0.17441604792151594,0.17441604792151594,0.17441604792151594,0.6365141682948128]
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持毛票票。