Logo del repository
  1. Home
 
Opzioni

Deep Neural Networks Training by Stochastic Quasi-Newton Trust-Region Methods

Yousefi, Mahsa
•
MARTINEZ CALOMARDO, ANGELES
2023
  • journal article

Periodico
ALGORITHMS
Abstract
While first-order methods are popular for solving optimization problems arising in deep learning, they come with some acute deficiencies. To overcome these shortcomings, there has been recent interest in introducing second-order information through quasi-Newton methods that are able to construct Hessian approximations using only gradient information. In this work, we study the performance of stochastic quasi-Newton algorithms for training deep neural networks. We consider two well-known quasi-Newton updates, the limited-memory Broyden–Fletcher–Goldfarb–Shanno (BFGS) and the symmetric rank one (SR1). This study fills a gap concerning the real performance of both updates in the minibatch setting and analyzes whether more efficient training can be obtained when using the more robust BFGS update or the cheaper SR1 formula, which—allowing for indefinite Hessian approximations—can potentially help to better navigate the pathological saddle points present in the non-convex loss functions found in deep learning. We present and discuss the results of an extensive experimental study that includes many aspects affecting performance, like batch normalization, the network architecture, the limited memory parameter or the batch size. Our results show that stochastic quasi-Newton algorithms are efficient and, in some instances, able to outperform the well-known first-order Adam optimizer, run with the optimal combination of its numerous hyperparameters, and the stochastic second-order trust-region STORM algorithm.
DOI
10.3390/a16100490
WOS
WOS:001090519800001
Archivio
https://hdl.handle.net/11368/3061238
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-85174862761
https://doi.org/10.3390/a16100490
Diritti
open access
license:creative commons
license uri:http://creativecommons.org/licenses/by/4.0/
FVG url
https://arts.units.it/bitstream/11368/3061238/1/algorithms-16-00490.pdf
Soggetti
  • stochastic optimizati...

  • quasi-Newton method

  • trust-region method

  • BFGS

  • SR1

  • deep neural networks ...

google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback