Skip to content

Latest commit

 

History

History
72 lines (66 loc) · 4.54 KB

1997-long-short-term-memory.md

File metadata and controls

72 lines (66 loc) · 4.54 KB
title authors fieldsOfStudy meta_key numCitedBy reading_status ref_count tags urls venue year
Long Short-Term Memory
S. Hochreiter
J. Schmidhuber
Computer Science
1997-long-short-term-memory
52035
TBD
68
gen-from-ref
other-default
paper
Neural Computation
1997

semanticscholar url

Long Short-Term Memory

Abstract

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

Paper References

  1. Learning long-term dependencies in NARX recurrent neural networks
  2. Learning Unambiguous Reduced Sequence Descriptions
  3. Bridging Long Time Lags by Weight Guessing and \long Short Term Memory
  4. Induction of Multiscale Temporal Structure
  5. Learning Sequential Structure with the Real-Time Recurrent Learning Algorithm
  6. A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks
  7. Continuous history compression
  8. Learning long-term dependencies with gradient descent is difficult
  9. Learning Complex, Extended Sequences Using the Principle of History Compression
  10. Generalization of backpropagation with application to a recurrent gas market model
  11. Gradient calculations for dynamic recurrent neural networks - a survey
  12. Finding Structure in Time
  13. Credit Assignment through Time - Alternatives to Backpropagation
  14. Finite State Automata and Simple Recurrent Networks
  15. Learning Sequential Tasks by Incrementally Adding Higher Orders
  16. Learning State Space Trajectories in Recurrent Neural Networks
  17. An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories
  18. Language Induction by Phase Transition in Dynamical Recognizers
  19. Adaptive neural oscillator using continuous-time back-propagation learning
  20. Experimental Comparison of the Effect of Order in Recurrent Neural Networks
  21. The Recurrent Cascade-Correlation Architecture
  22. LSTM can Solve Hard Long Time Lag Problems
  23. Contrastive Learning and Neural Oscillations
  24. Dynamics and architecture for neural computation
  25. Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks
  26. A time-delay neural network architecture for isolated word recognition
  27. Generalization of back-propagation to recurrent neural networks.
  28. Holographic Recurrent Networks
  29. A Theory for Neural Networks with Time Delays
  30. Induction of Finite-State Languages Using Second-Order Recurrent Networks
  31. Guessing can Outperform Many Long Time Lag Algorithms
  32. Bifurcations in the learning of recurrent neural networks
  33. Untersuchungen zu dynamischen neuronalen Netzen
  34. A learning rule for asynchronous perceptrons with feedback in a combinatorial environment
  35. Gradient-Based Learning Algorithms for Recurrent Networks
  36. Gradient-based learning algorithms for recurrent networks and their computational complexity
  37. A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks
  38. Netzwerkarchitekturen, Zielfunktionen und Kettenregel
  39. Learning long-term dependencies is not as difficult with NARX recurrent neural networks
  40. A time delay neural network architecture for speech recognition
  41. Time Warping Invariant Neural Networks