当前位置:主页 > 资料 >

Deep learning beginner to expert: 10 techniques from Fast.ai
栏目分类:资料   发布日期:2018-08-02   浏览次数:

导读:本文为去找网小编(www.7zhao.net)为您推荐的Deep learning beginner to expert: 10 techniques from Fast.ai,希望对您有所帮助,谢谢! Right now, Jeremy Howard – the co-founder of – currently holds the 105th highest s

本文为去找网小编(www.7zhao.net)为您推荐的Deep learning beginner to expert: 10 techniques from Fast.ai,希望对您有所帮助,谢谢! 欢迎访问www.7zhao.net



Right now, Jeremy Howard – the co-founder of – currently holds the 105th highest score for the on Kaggle, but he's dropping fast. Why? His own students are beating him. And their names can now be found across the tops of leaderboards all over Kaggle.

欢迎访问www.7zhao.net

When I left you I was but the learner, but now I am the master.

So what are these secrets that are allowing novices to implement world-class algorithms in mere weeks, leaving behind experienced deep learning practitioners in their GPU-powered wake? Allow me to tell you in ten simple steps. www.7zhao.net

Read on if you're already practicing deep learning and want to quickly get an overview of the powerful techniques that fast.ai uses in their courses. Read on if you've already completed fast.ai and want to recap some of what you were supposed to have already learned. Read on if you're flirting with the idea of studying deep learning, and would like to see how the field is evolving and what fast.ai can offer beginners. www.7zhao.net

Now, before we begin, you should know that you'll need access to GPUs to run fast.ai content effectively. For my fast.ai projects, I've been usingFloydHub. After much experimentation and research with other cloud-based solutions, I've found FloydHub is the best and easiest way to train deep learning models on cloud GPUs. I love being able to easily keep track of my experiments inProjects on FloydHub, making it especially easy to visualize and filter which models are performing best. They also have the simplest solution for managing (and automatically versioning) your datasets, which you'll learn is going to be super valuable down the road in any DL project.

欢迎访问www.7zhao.net

Okay, let's get started.

内容来自www.7zhao.net

1. Use the Fast.ai library

from fast.ai import *

去找(www.7zhao.net欢迎您

The fast.ai library is not only a toolkit to get newbies quickly implementing deep learning, but a powerful and convenient source of current best practices. Each time the fast.ai team (and their network of AI researchers & collaborators) finds a particularly interesting paper, they test it out on a variety of datasets and work out how to tune it. If they are successful, it gets implemented in the library, and the technology can be quickly accessed by its users.

本文来自去找www.7zhao.net

The result is a powerful toolbox, including quick access to best-current practices such as stochastic gradient descent with restarts, differential learning rates, and test-time augmentation (not to mention many more). 内容来自www.7zhao.net

Each of these techniques will be described below, and we will show how you can rapidly implement them using the fast.ai library. The library is built upon PyTorch, and you can use them together quite fluidly. To get going with the library on FloydHub, check out their2-min installation.

欢迎访问www.7zhao.net

2. Don’t use one learning rate, use many

Differential Learning rates mean higher layers change more than deeper layers during training. Building deep learning models on top of pre-existing architectures is a proven method to generate much better results in computer vision tasks.

www.7zhao.net

Most of these architectures (i.e. Resnet, VGG, inception…) are trained on ImageNet, and depending on the similarity of your data to the images on ImageNet, these weights will need to be altered more or less greatly. When it comes to modifying these weights, the last layers of the model will often need the most changing, while deeper levels that are already well trained to detecting basic features (such as edges and outlines) will need less. www.7zhao.net

So firstly, to get a pre-trained model with the fast ai library use the following code:

www.7zhao.net

from fastai.conv_learner import *

# import library for creating learning object for convolutional #networks
model = VVG16()

# assign model to resnet, vgg, or even your own custom model
PATH = './folder_containing_images' 
data = ImageClassifierData.from_paths(PATH)

# create fast ai data object, in this method we use from_paths where 
# inside PATH each image class is separated into different folders

learn = ConvLearner.pretrained(model, data, precompute=True)

# create a learn object to quickly utilise state of the art
# techniques from the fast ai library 

欢迎访问www.7zhao.net

With the learn object now created, we can solve the problem of only finely tuning the last layers by quickly freezing the first layers:

欢迎访问www.7zhao.net

learn.freeze()

# freeze layers up to the last one, so weights will not be updated.

learning_rate = 0.1
learn.fit(learning_rate, epochs=3)

# train only the last layer for a few epochs copyright www.7zhao.net 

Once the last layers are producing good results, we implement differential learning rates to alter the lower layers as well. The lower layers want to be altered less, so it is good practice to set each learning rate to be 10 times lower than the last:

内容来自www.7zhao.net

learn.unfreeze()

# set requires_grads to be True for all layers, so they can be updated

learning_rate = [0.001, 0.01, 0.1]
# learning rate is set so that deepest third of layers have a rate of 0.001, # middle layers have a rate of 0.01, and final layers 0.1.

learn.fit(learning_rate, epochs=3)
# train model for three epoch with using differential learning rates 本文来自去找www.7zhao.net 

3. How to find the right learning rate

The learning rate is the most important hyper-parameter for training neural networks, yet until recently deciding its value has been incredibly hacky. Leslie Smith may have stumbled upon the answer in his paper on ; a relatively unknown discovery until it was promoted by the fast.ai course.

www.7zhao.net

In this method, we do a trial run and train the neural network using a low learning rate, but increase it exponentially with each batch. This can be done with the following code:

copyright www.7zhao.net

learn.lr_find()
# run on learn object where learning rate is increased  exponentially

learn.sched.plot_lr()
# plot graph of learning rate against iterations 

本文来自去找www.7zhao.net

The learning rate is increased exponentially with every iteration

Meanwhile, the loss is recorded for every value of the learning rate. We then plot loss against learning rate:

内容来自www.7zhao.net

learn.sched.plot()
# plots the loss against the learning rate 
去找(www.7zhao.net欢迎您
Find where the loss is still decreasing but has not plateaued.

The optimum learning rate is determined by finding the value where the learning rate is highest and the loss is still descending, in the above case about this value would be 0.01.

www.7zhao.net

4. Cosine annealing

With each batch of stochastic gradient descent (SGD), your network should be getting closer and closer to a global minimum value for the loss. As it gets closer to this minimum, it hence makes sense that the learning rate should get smaller so that your algorithm does not overshoot, and instead settles as close to this point as possible. Cosine annealing solves this problem by decreasing the learning rate following the cosine function as seen in the figure below.

copyright www.7zhao.net

As we increase x, we see cosine(x) decrease following this wavy shape.

Looking at the figure above, we see that as we increase x the cosine value descends slowly at first, then more quickly and then slightly slower again. This mode of decreasing works well with the learning rate, yielding great results in a computationally efficient manner.

欢迎访问www.7zhao.net

learn.fit(0.1, 1)
# Calling learn fit automatically takes advantage of cosine annealing 本文来自去找www.7zhao.net 

The technique is implemented automatically by the fast ai library when using learn.fit() . The above code would have our learning rate decrease across the epoch as shown in the figure below. 去找(www.7zhao.net欢迎您

Learning rate decreases across an epoch that takes 200 iterations

However we can go one step further than this even, and introduce restarts

内容来自www.7zhao.net

5. Stochastic Gradient Descent with restarts

During training it is possible for gradient descent to get stuck at local minima rather than the global minimum.

www.7zhao.net

Gradient descent can get stuck at local minima

By increasing the learning rate suddenly, gradient descent may “hop” out of the local minima and find its way toward the global minimum. Doing this is called stochastic gradient descent with restarts (SGDR), an idea shown to be highly effective in .

copyright www.7zhao.net

SGDR is also handled for you automatically by the fast ai library. When calling learn.fit(learning_rate, epochs) , the learning rate is reset at the start of each epoch to the original value you entered as a parameter, then decreases again over the epoch as described above in cosine annealing. 去找(www.7zhao.net欢迎您

The learning rate is restored to its original value after each epoch.

Each time the learning rate drops to it’s minimum point (every 100 iterations in the figure above), we call this a cycle.

本文来自去找www.7zhao.net

cycle_len = 1
# decide how many epochs it takes for the learning rate to fall to
# its minimum point. In this case, 1 epoch

cycle_mult=2
# at the end of each cycle, multiply the cycle_len value by 2

learn.fit(0.1, 3, cycle_len=2, cycle_mult=2)
# in this case there will be three restarts. The first time with
# cycle_len of 1, so it will take 1 epoch to complete the cycle.
# cycle_mult=2 so the next cycle with have a length of two epochs, 
# and the next four. 
内容来自www.7zhao.net
Each cycle taking twice as many epochs to complete as the prior cycle

Playing around with these parameters, along with using differential learning rates, are the key techniques allowing fast ai users to perform so well on image classification problems. www.7zhao.net

Cycle_mult and cycle_len are discussed on the fast.ai forum, while the concepts above regarding learning rate can be found explained more fully in and .

本文来自去找www.7zhao.net

6. Anthropomorphise your activation functions

Softmax likes to pick just one thing. Sigmoid wants to know where you are between -1 and 1, and beyond these values won’t care how much you increase. Relu is a club bouncer who won’t let negative numbers through the door. 欢迎访问www.7zhao.net

It may seem silly to treat activation functions in such a manner, but giving them a character ensures not using them for the wrong task. As Jeremy Howard points out, even academic papers often use softmax for multi-class classification, and I too have already seen it used incorrectly in blogs and papers during my short time studying DL. www.7zhao.net

7. Transfer learning is hugely effective in NLP

Just as using pre-trained models has proven immensely effective in computer vision, it is becoming increasingly clear that natural language processing (NLP) models can benefit from doing the same.

本文来自去找www.7zhao.net

In the of fast.ai, Jeremy Howard builds a model to determine if IMDB reviews are positive or negative using transfer learning. The power of this technique is observed instantly, where the accuracy he achieves beat all previous efforts of the time presented in .

去找(www.7zhao.net欢迎您

Pre-existing architectures deliver state of the art NLP results.

The secret to success lies in training a model firstly to gain some understanding of the language, before using this pre-trained model as a part of a model to analyze sentiment. copyright www.7zhao.net

To create the first model, a recurrent neural network (RNN) is trained to predict the next word in a sequence of text. This is known as language modeling. Once the network is trained to a high degree of accuracy, its encodings for each word are passed on to a new model that is used for sentiment analysis. 内容来自www.7zhao.net

In the example we see this language model being integrated with a model to perform sentiment analysis, but this same method could be used for any NLP task from translation to data extraction. copyright www.7zhao.net

And again the same principles as above in computer vision apply here, where freezing and using differential learning rates can yield better results.

内容来自www.7zhao.net

The implementation of this method for NLP is too detailed for me to share the code in this post, but if you are interested watch the lesson and access the code .

copyright www.7zhao.net

8. Deep learning can challenge ML in tackling structured data

Fast.ai shows techniques to rapidly generate great results on structured data without having to resort to feature engineering or apply domain specific knowledge.

copyright www.7zhao.net

Their library makes the most of PyTorch’s embedding functions, allowing rapid conversion of categorical variables into embedding matrixes.

内容来自www.7zhao.net

The technique they show is relatively straight forward, and simply involves turning the categorical variables into numbers and then assigning each value an embedding vector:

去找(www.7zhao.net欢迎您

Each day of the week is given an embedding with four values.

The advantage of doing this compared to the traditional approach of creating dummy variables (i.e. doing one hot encodings), is that each day can be represented by four numbers instead of one, hence we gain higher dimensionality and much richer relationships. 内容来自www.7zhao.net

The implementation shown in (the code ) gained third place in the , only beaten by domain experts who had their own code to create many, many extra features.

www.7zhao.net

The idea that using deep learning dramatically reduces need for feature engineering has been confirmed by Pinterest too, who have said this to be the case ever since they switched to deep learning models, gaining state of the art results with a lot less work! 内容来自www.7zhao.net

9. A game-winning bundle: building up sizes, dropout and TTA

On the 30th April, the fast.ai team won the DAWNBench competition (run by Stanford University) on Imagenet and CIFAR10 classification. In Jeremy’s , he credits their success to the little extra touches available in the fast.ai library.

copyright www.7zhao.net

One of these is the concept of Dropout, proposed by Geoffrey Hinton two years ago in . Despite its initial popularity, it seems to be somewhat ignored in recent computer vision papers. However, PyTorch has made its implementation incredibly easy, and with fast ai on top it’s easier than ever. copyright www.7zhao.net

Blank spaces represent activations knocked out by dropout function.

Dropout combats overfitting and so would have proved crucial in winning on a relatively small dataset such at CIFAR10. Dropout is implemented automatically by fast ai when creating a learn object, though can be altered using the ps variable as shown here:

去找(www.7zhao.net欢迎您

learn = ConvLearner.pretrained(model, data, ps=0.5, precompute=True)
# creates a dropout of 0.5 (i.e. half the activations) on test dataset. 
# This is automatically turned off for the validation set 内容来自www.7zhao.net 

For more information on dropout see (from 4:57). copyright www.7zhao.net

Another incredibly simple and effective method they used for tackling overfitting and improving accuracy is training on smaller image sizes, then increasing the size and training the same model on them again. www.7zhao.net

# create a data object with images of sz * sz pixels 
def get_data(sz): 
	tmfs = tfms_from_model(model, sz)
	# tells what size images should be, additional transformations such
	# image flips and zooms can easily be added here too
 
	data = ImageClassifierData.from_paths(PATH, tfms=tfms)
	# creates fastai data object of create size

	return data

learn.set_data(get_data(299))
# changes the data in the learn object to be images of size 299
# without changing the model.

learn.fit(0.1, 3)
# train for a few epochs on larger versions of images, avoiding overfitting 本文来自去找www.7zhao.net 

A final technique that can raise accuracy by one percent or two is test time augmentation (TTA). This involves taking a series of different versions of the original image (for example cropping different areas, or changing the zoom) and passing them through the model. The average output is then calculated for the different versions and this is given as the final output score for the image. It can be called by running learn.TTA() .

欢迎访问www.7zhao.net

preds, target = learn.TTA() 本文来自去找www.7zhao.net 

This technique is effective as perhaps the original cropping of an image may miss out a vital feature. Providing the model with multiple versions of the picture and taking an average makes this less likely to have an effect. 内容来自www.7zhao.net

10. Creativity is key

Not only did the fast.ai team win prizes for fastest speed in the DAWNBench competition, but these same algorithms also won the prize for being cheapest to run. The lesson to be learnt here is that creating successful DL applications is not just a case of chucking huge amounts of GPU at the issue, but should instead be a question of creativity, of intuition and innovation. 本文来自去找www.7zhao.net

Most of the breakthroughs discussed in this article (dropout, cosine annealing, SGD with restarts, the list goes on…) in fact were such exact moments, where someone thought of approaching the problem differently. These approaches then brought increases in accuracy greater than those that would have been achieved by simply throwing another thousand images at the problem with a handful of IBM computers.

www.7zhao.net

So just because there are a lot of big dogs out there with a lot of big GPUs in Silicon Valley, don’t think that you can’t challenge them, or that you can’t create something special or innovative.

去找(www.7zhao.net欢迎您

In fact, perhaps sometimes you can see constraints as a blessing; after all, necessity is the mother of invention. 内容来自www.7zhao.net

About Samuel Lynn-Evans

For the last 10 years, Sam has combined his passion for science and languages by teaching life sciences in foreign countries. Seeing the huge potential for ML in scientific progress, he began studying AI at school 42 in Paris, with the aim of applying NLP to biological and medical problems. copyright www.7zhao.net

You can follow along with Sam on and . 去找(www.7zhao.net欢迎您

欢迎访问www.7zhao.net


本文原文地址:https://blog.floydhub.com/ten-techniques-from-fast-ai/

以上为Deep learning beginner to expert: 10 techniques from Fast.ai文章的全部内容,若您也有好的文章,欢迎与我们分享! 去找(www.7zhao.net欢迎您

Copyright ©2008-2017去找网版权所有   皖ICP备12002049号-2 皖公网安备 34088102000435号   关于我们|联系我们| 免责声明|友情链接|网站地图|手机版