In earlier generations of machine learning algorithms
,So if you sample in the grid
then you've really tried out 5 values of .In deep learning
, what we tend to do and what i recommend you do instead,random
, then you will have tried out 25 distinct values of the learning rate and therefore you be more likely to find a value that works really well.Another common practice is to use a coarse to fine
sampling scheme.
So let's say in this two-dimensional example that you sample these points,
and maybe you found that this point work the best and maybe a few other points around it tended to work really well.
Then in the coarse of fine
scheme what you might do is zoom in to a smaller region of the hyperparamteres, and then sample more density within this space.
Or maybe again at rondom,
but to then focus more resources on searching within this blue square if you suspecting that the best set of the hyperparameters, may be in this region.
(최상의 설정인 하이퍼파라미터가 이 영역에 있을 수 있다고 의심되는 경우, 이 파란색 사각형 내에서 검색하는 데 더 많은 리소스를 집중할 수 있다.)
So after doing a coarse sample of this entire square, that tells you to then focus on a smaller square.
You can then sample more densely into smaller square.
(따라서 전체 사각형의 대략적인 sampling을 수행한 후,
더 작은 사각형으로 더 조밀하게 sampling할 수 있다.)
The two keys
1. Use random sampling
2. Optinally consider implementing a coarse to fine search process.
There's even more to hyperparameter search than these
uniformly
at random over the range you're comtemplating might be a reasonable thing to do.log scale
.Now you have more resources dedicated to searching between 0.0001~0.001, and between 0.001 ~ 0.01, and so on.
Before wrapping up orur dicussion on hyperparamter search,
i want to share with you just a couple of final tips and tricks for how to organize your hyperparamter search process.
Deep learning today is applied to many different application areas and
that intuitions about hyperparameter settings from one application area
may or may not transfer to a different one.
Finally, in terms of how people go about searching for hyperparameters
,
i see maybe two major different ways
in which people go about it.
babysit one model
.Babysitting one model
is going to call the panda approach
.Training many models in parallel
is going to call the Caviar strategy
.If you have enough computers to train a lot of models in parallel,
then by all means take the cavier approach and try a lot of different hyperparameters and see what works.
But in some application domains(online advertising setting, computer vision) where there's so much data and the models you want to train is so big that it's difficult to train a lot of models at the same time.
I've seen those communities use the panda approach a little bit more.
이런 유용한 정보를 나눠주셔서 감사합니다.