Comparative Study of Deep Learning Optimizers: SGD, Momentum SGD, RMSProp, AMSGrad, Adam, Yogi, and Lion

Khalil Shehada, Ibrahim and Ragheb Atallah, Rasha and Yunis Maghari, Ashraf (2025) Comparative Study of Deep Learning Optimizers: SGD, Momentum SGD, RMSProp, AMSGrad, Adam, Yogi, and Lion. International Journal of Innovative Science and Research Technology, 10 (8): 25aug007. pp. 593-601. ISSN 2456-2165

Abstract

In order to increase prediction accuracy, deep learning models must be trained by adjusting parameters to minimize a loss function. In supervised learning, the mapping between inputs and their right outputs is learned by training models using labeled input examples. In order to minimize mistakes, predictions are compared to actual outcomes, and optimization methods are used to adjust parameters. Until convergence is reached, these algorithms go through several cycles of iteration. Stochastic Gradient Descent (SGD), Momentum SGD, RMSProp, AMSGrad, Adam, Yogi, and Lion are the seven optimization techniques that are evaluated in this study based on training accuracy, test accuracy, training loss, and sensitivity to learning rate. MNIST and CIFAR-10 were the two benchmark datasets used in the experiments. SGD with a learning rate of 0.5 had the best test accuracy of 99.14% and the highest training accuracy of 99.89% on MNIST. With test accuracy of 99.15% and 98%, respectively, at a learning rate of 1e-2, Momentum SGD and Adam likewise demonstrated strong performance. Optimizers like Yogi and Lion, on the other hand, performed competitively at lower learning rates but suffered at higher ones; at 1e-5, Lion's test accuracy was 98.69%. All optimizers displayed comparatively decreased accuracies for CIFAR-10, which was indicative of the dataset's increased complexity. Momentum SGD outperformed other optimizers including Adam, Yogi, and Lion, achieving the highest training accuracy of 98.90% and the best test accuracy of 72.94% at a learning rate of 1e-2. Lion showed better performance and stability on both datasets at a low learning rate of 1e-5. These results highlight how crucial it is to choose learning rates and optimization techniques that are specific to the features of each dataset.

Documents
2363:14235
[thumbnail of IJISRT25AUG007.pdf]
Preview
IJISRT25AUG007.pdf - Published Version

Download (675kB) | Preview
Information
Library
Metrics

Altmetric Metrics

Dimensions Matrics

Statistics

Downloads

Downloads per month over past year

View Item