COMPARISON OF OPTIMIZATION TECHNIQUES BASED ON GRADIENT DESCENT ALGORITHM: A REVIEW
Abstract
Whether you deal with a real-life issue or create a software product, optimization is constantly the ultimate goal. This goal, however, is achieved by utilizing one of the optimization algorithms. The progressively popular Gradient Descent (GD) optimization algorithms are frequently used as black box optimizers when solving unrestricted problems of optimization. Each iteration of a gradient-based algorithm attempts to approach the minimizer/maximizer cost function by using the gradient's objective function information. Moreover, a comparative study of various GD variants like Gradient Descent (GD), Batch Gradient Descent (BGD), Stochastic Gradient Descent (SGD) and Mini-batch GD are described in this paper. Additionally, this paper outlines the challenges of those algorithms and presents the most widely used optimization algorithms, including Momentum, Nesterov Momentum, Adaptive Gradient (AdaGrad), Adaptive Delta (AdaDelta), Root Mean Square Propagation (RMSProp), Adaptive Moment Estimation (Adam), Maximum Adaptive Moment Estimation (AdaMax) and Nesterov Accelerated Adaptive Moment Estimation (Nadam) algorithms; All of which, will be separately presented in this paper. Finally, a comparison has been made between these optimization algorithms that are based on GD in terms of training speed, convergency rate, performance and the pros and cons.