pytorch 实现L2和L1正则化regularization的操作
1.torch.optim优化器实现L2正则化
torch.optim集成了很多优化器,如SGD,Adadelta,Adam,Adagrad,RMSprop等,这些优化器自带的一个参数weight_decay,用于指定权值衰减率,相当于L2正则化中的λ参数,注意torch.optim集成的优化器只有L2正则化方法,你可以查看注释,参数weight_decay的解析是:
weight_decay(float,optional):weightdecay(L2penalty)(default:0)
使用torch.optim的优化器,可如下设置L2正则化
optimizer=optim.Adam(model.parameters(),lr=learning_rate,weight_decay=0.01)
但是这种方法存在几个问题,
(1)一般正则化,只是对模型的权重W参数进行惩罚,而偏置参数b是不进行惩罚的,而torch.optim的优化器weight_decay参数指定的权值衰减是对网络中的所有参数,包括权值w和偏置b同时进行惩罚。很多时候如果对b进行L2正则化将会导致严重的欠拟合,因此这个时候一般只需要对权值w进行正则即可。(PS:这个我真不确定,源码解析是weightdecay(L2penalty),但有些网友说这种方法会对参数偏置b也进行惩罚,可解惑的网友给个明确的答复)
(2)缺点:torch.optim的优化器固定实现L2正则化,不能实现L1正则化。如果需要L1正则化,可如下实现:
(3)根据正则化的公式,加入正则化后,loss会变原来大,比如weight_decay=1的loss为10,那么weight_decay=100时,loss输出应该也提高100倍左右。而采用torch.optim的优化器的方法,如果你依然采用loss_fun=nn.CrossEntropyLoss()进行计算loss,你会发现,不管你怎么改变weight_decay的大小,loss会跟之前没有加正则化的大小差不多。这是因为你的loss_fun损失函数没有把权重W的损失加上。
(4)采用torch.optim的优化器实现正则化的方法,是没问题的!只不过很容易让人产生误解,对鄙人而言,我更喜欢TensorFlow的正则化实现方法,只需要tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES),实现过程几乎跟正则化的公式对应的上。
(5)Github项目源码:点击进入
为了,解决这些问题,我特定自定义正则化的方法,类似于TensorFlow正则化实现方法。
2.如何判断正则化作用了模型?
一般来说,正则化的主要作用是避免模型产生过拟合,当然啦,过拟合问题,有时候是难以判断的。但是,要判断正则化是否作用了模型,还是很容易的。下面我给出两组训练时产生的loss和Accuracy的log信息,一组是未加入正则化的,一组是加入正则化:
2.1未加入正则化loss和Accuracy
优化器采用Adam,并且设置参数weight_decay=0.0,即无正则化的方法
optimizer=optim.Adam(model.parameters(),lr=learning_rate,weight_decay=0.0)
训练时输出的loss和Accuracy信息
step/epoch:0/0,TrainLoss:2.418065,Acc:[0.15625] step/epoch:10/0,TrainLoss:5.194936,Acc:[0.34375] step/epoch:20/0,TrainLoss:0.973226,Acc:[0.8125] step/epoch:30/0,TrainLoss:1.215165,Acc:[0.65625] step/epoch:40/0,TrainLoss:1.808068,Acc:[0.65625] step/epoch:50/0,TrainLoss:1.661446,Acc:[0.625] step/epoch:60/0,TrainLoss:1.552345,Acc:[0.6875] step/epoch:70/0,TrainLoss:1.052912,Acc:[0.71875] step/epoch:80/0,TrainLoss:0.910738,Acc:[0.75] step/epoch:90/0,TrainLoss:1.142454,Acc:[0.6875] step/epoch:100/0,TrainLoss:0.546968,Acc:[0.84375] step/epoch:110/0,TrainLoss:0.415631,Acc:[0.9375] step/epoch:120/0,TrainLoss:0.533164,Acc:[0.78125] step/epoch:130/0,TrainLoss:0.956079,Acc:[0.6875] step/epoch:140/0,TrainLoss:0.711397,Acc:[0.8125]
2.1加入正则化loss和Accuracy
优化器采用Adam,并且设置参数weight_decay=10.0,即正则化的权重lambda=10.0
optimizer=optim.Adam(model.parameters(),lr=learning_rate,weight_decay=10.0)
这时,训练时输出的loss和Accuracy信息:
step/epoch:0/0,TrainLoss:2.467985,Acc:[0.09375] step/epoch:10/0,TrainLoss:5.435320,Acc:[0.40625] step/epoch:20/0,TrainLoss:1.395482,Acc:[0.625] step/epoch:30/0,TrainLoss:1.128281,Acc:[0.6875] step/epoch:40/0,TrainLoss:1.135289,Acc:[0.6875] step/epoch:50/0,TrainLoss:1.455040,Acc:[0.5625] step/epoch:60/0,TrainLoss:1.023273,Acc:[0.65625] step/epoch:70/0,TrainLoss:0.855008,Acc:[0.65625] step/epoch:80/0,TrainLoss:1.006449,Acc:[0.71875] step/epoch:90/0,TrainLoss:0.939148,Acc:[0.625] step/epoch:100/0,TrainLoss:0.851593,Acc:[0.6875] step/epoch:110/0,TrainLoss:1.093970,Acc:[0.59375] step/epoch:120/0,TrainLoss:1.699520,Acc:[0.625] step/epoch:130/0,TrainLoss:0.861444,Acc:[0.75] step/epoch:140/0,TrainLoss:0.927656,Acc:[0.625]
当weight_decay=10000.0
step/epoch:0/0,TrainLoss:2.337354,Acc:[0.15625] step/epoch:10/0,TrainLoss:2.222203,Acc:[0.125] step/epoch:20/0,TrainLoss:2.184257,Acc:[0.3125] step/epoch:30/0,TrainLoss:2.116977,Acc:[0.5] step/epoch:40/0,TrainLoss:2.168895,Acc:[0.375] step/epoch:50/0,TrainLoss:2.221143,Acc:[0.1875] step/epoch:60/0,TrainLoss:2.189801,Acc:[0.25] step/epoch:70/0,TrainLoss:2.209837,Acc:[0.125] step/epoch:80/0,TrainLoss:2.202038,Acc:[0.34375] step/epoch:90/0,TrainLoss:2.192546,Acc:[0.25] step/epoch:100/0,TrainLoss:2.215488,Acc:[0.25] step/epoch:110/0,TrainLoss:2.169323,Acc:[0.15625] step/epoch:120/0,TrainLoss:2.166457,Acc:[0.3125] step/epoch:130/0,TrainLoss:2.144773,Acc:[0.40625] step/epoch:140/0,TrainLoss:2.173397,Acc:[0.28125]
2.3正则化说明
就整体而言,对比加入正则化和未加入正则化的模型,训练输出的loss和Accuracy信息,我们可以发现,加入正则化后,loss下降的速度会变慢,准确率Accuracy的上升速度会变慢,并且未加入正则化模型的loss和Accuracy的浮动比较大(或者方差比较大),而加入正则化的模型训练loss和Accuracy,表现的比较平滑。
并且随着正则化的权重lambda越大,表现的更加平滑。这其实就是正则化的对模型的惩罚作用,通过正则化可以使得模型表现的更加平滑,即通过正则化可以有效解决模型过拟合的问题。
3.自定义正则化的方法
为了解决torch.optim优化器只能实现L2正则化以及惩罚网络中的所有参数的缺陷,这里实现类似于TensorFlow正则化的方法。
3.1自定义正则化Regularization类
这里封装成一个实现正则化的Regularization类,各个方法都给出了注释,自己慢慢看吧,有问题再留言吧
#检查GPU是否可用 device=torch.device("cuda"iftorch.cuda.is_available()else"cpu") #device='cuda' print("-----device:{}".format(device)) print("-----Pytorchversion:{}".format(torch.__version__)) classRegularization(torch.nn.Module): def__init__(self,model,weight_decay,p=2): ''' :parammodel模型 :paramweight_decay:正则化参数 :paramp:范数计算中的幂指数值,默认求2范数, 当p=0为L2正则化,p=1为L1正则化 ''' super(Regularization,self).__init__() ifweight_decay<=0: print("paramweight_decaycannot<=0") exit(0) self.model=model self.weight_decay=weight_decay self.p=p self.weight_list=self.get_weight(model) self.weight_info(self.weight_list) defto(self,device): ''' 指定运行模式 :paramdevice:cudeorcpu :return: ''' self.device=device super().to(device) returnself defforward(self,model): self.weight_list=self.get_weight(model)#获得最新的权重 reg_loss=self.regularization_loss(self.weight_list,self.weight_decay,p=self.p) returnreg_loss defget_weight(self,model): ''' 获得模型的权重列表 :parammodel: :return: ''' weight_list=[] forname,paraminmodel.named_parameters(): if'weight'inname: weight=(name,param) weight_list.append(weight) returnweight_list defregularization_loss(self,weight_list,weight_decay,p=2): ''' 计算张量范数 :paramweight_list: :paramp:范数计算中的幂指数值,默认求2范数 :paramweight_decay: :return: ''' #weight_decay=Variable(torch.FloatTensor([weight_decay]).to(self.device),requires_grad=True) #reg_loss=Variable(torch.FloatTensor([0.]).to(self.device),requires_grad=True) #weight_decay=torch.FloatTensor([weight_decay]).to(self.device) #reg_loss=torch.FloatTensor([0.]).to(self.device) reg_loss=0 forname,winweight_list: l2_reg=torch.norm(w,p=p) reg_loss=reg_loss+l2_reg reg_loss=weight_decay*reg_loss returnreg_loss defweight_info(self,weight_list): ''' 打印权重列表信息 :paramweight_list: :return: ''' print("---------------regularizationweight---------------") forname,winweight_list: print(name) print("---------------------------------------------------")
3.2Regularization使用方法
使用方法很简单,就当一个普通Pytorch模块来使用:例如
#检查GPU是否可用 device=torch.device("cuda"iftorch.cuda.is_available()else"cpu") print("-----device:{}".format(device)) print("-----Pytorchversion:{}".format(torch.__version__)) weight_decay=100.0#正则化参数 model=my_net().to(device) #初始化正则化 ifweight_decay>0: reg_loss=Regularization(model,weight_decay,p=2).to(device) else: print("noregularization") criterion=nn.CrossEntropyLoss().to(device)#CrossEntropyLoss=softmax+crossentropy optimizer=optim.Adam(model.parameters(),lr=learning_rate)#不需要指定参数weight_decay #train batch_train_data=... batch_train_label=... out=model(batch_train_data) #lossandregularization loss=criterion(input=out,target=batch_train_label) ifweight_decay>0: loss=loss+reg_loss(model) total_loss=loss.item() #backprop optimizer.zero_grad()#清除当前所有的累积梯度 total_loss.backward() optimizer.step()
训练时输出的loss和Accuracy信息:
(1)当weight_decay=0.0时,未使用正则化
step/epoch:0/0,TrainLoss:2.379627,Acc:[0.09375] step/epoch:10/0,TrainLoss:1.473092,Acc:[0.6875] step/epoch:20/0,TrainLoss:0.931847,Acc:[0.8125] step/epoch:30/0,TrainLoss:0.625494,Acc:[0.875] step/epoch:40/0,TrainLoss:2.241885,Acc:[0.53125] step/epoch:50/0,TrainLoss:1.132131,Acc:[0.6875] step/epoch:60/0,TrainLoss:0.493038,Acc:[0.8125] step/epoch:70/0,TrainLoss:0.819410,Acc:[0.78125] step/epoch:80/0,TrainLoss:0.996497,Acc:[0.71875] step/epoch:90/0,TrainLoss:0.474205,Acc:[0.8125] step/epoch:100/0,TrainLoss:0.744587,Acc:[0.8125] step/epoch:110/0,TrainLoss:0.502217,Acc:[0.78125] step/epoch:120/0,TrainLoss:0.531865,Acc:[0.8125] step/epoch:130/0,TrainLoss:1.016807,Acc:[0.875] step/epoch:140/0,TrainLoss:0.411701,Acc:[0.84375]
(2)当weight_decay=10.0时,使用正则化
--------------------------------------------------- step/epoch:0/0,TrainLoss:1563.402832,Acc:[0.09375] step/epoch:10/0,TrainLoss:1530.002686,Acc:[0.53125] step/epoch:20/0,TrainLoss:1495.115234,Acc:[0.71875] step/epoch:30/0,TrainLoss:1461.114136,Acc:[0.78125] step/epoch:40/0,TrainLoss:1427.868164,Acc:[0.6875] step/epoch:50/0,TrainLoss:1395.430054,Acc:[0.6875] step/epoch:60/0,TrainLoss:1363.358154,Acc:[0.5625] step/epoch:70/0,TrainLoss:1331.439697,Acc:[0.75] step/epoch:80/0,TrainLoss:1301.334106,Acc:[0.625] step/epoch:90/0,TrainLoss:1271.505005,Acc:[0.6875] step/epoch:100/0,TrainLoss:1242.488647,Acc:[0.75] step/epoch:110/0,TrainLoss:1214.184204,Acc:[0.59375] step/epoch:120/0,TrainLoss:1186.174561,Acc:[0.71875] step/epoch:130/0,TrainLoss:1159.148438,Acc:[0.78125] step/epoch:140/0,TrainLoss:1133.020020,Acc:[0.65625]
(3)当weight_decay=10000.0时,使用正则化
step/epoch:0/0,TrainLoss:1570211.500000,Acc:[0.09375] step/epoch:10/0,TrainLoss:1522952.125000,Acc:[0.3125] step/epoch:20/0,TrainLoss:1486256.125000,Acc:[0.125] step/epoch:30/0,TrainLoss:1451671.500000,Acc:[0.25] step/epoch:40/0,TrainLoss:1418959.750000,Acc:[0.15625] step/epoch:50/0,TrainLoss:1387154.000000,Acc:[0.125] step/epoch:60/0,TrainLoss:1355917.500000,Acc:[0.125] step/epoch:70/0,TrainLoss:1325379.500000,Acc:[0.125] step/epoch:80/0,TrainLoss:1295454.125000,Acc:[0.3125] step/epoch:90/0,TrainLoss:1266115.375000,Acc:[0.15625] step/epoch:100/0,TrainLoss:1237341.000000,Acc:[0.0625] step/epoch:110/0,TrainLoss:1209186.500000,Acc:[0.125] step/epoch:120/0,TrainLoss:1181584.250000,Acc:[0.125] step/epoch:130/0,TrainLoss:1154600.125000,Acc:[0.1875] step/epoch:140/0,TrainLoss:1128239.875000,Acc:[0.125]
对比torch.optim优化器的实现L2正则化方法,这种Regularization类的方法也同样达到正则化的效果,并且与TensorFlow类似,loss把正则化的损失也计算了。
此外更改参数p,如当p=0表示L2正则化,p=1表示L1正则化。
4.Github项目源码下载
《Github项目源码》点击进入
以上为个人经验,希望能给大家一个参考,也希望大家多多支持毛票票。如有错误或未考虑完全的地方,望不吝赐教。
声明:本文内容来源于网络,版权归原作者所有,内容由互联网用户自发贡献自行上传,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任。如果您发现有涉嫌版权的内容,欢迎发送邮件至:czq8825#qq.com(发邮件时,请将#更换为@)进行举报,并提供相关证据,一经查实,本站将立刻删除涉嫌侵权内容。