title | authors | fieldsOfStudy | meta_key | numCitedBy | reading_status | ref_count | tags | urls | venue | year | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Long Short-Term Memory |
|
|
1997-long-short-term-memory |
52035 |
TBD |
68 |
|
Neural Computation |
1997 |
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
- Learning long-term dependencies in NARX recurrent neural networks
- Learning Unambiguous Reduced Sequence Descriptions
- Bridging Long Time Lags by Weight Guessing and \long Short Term Memory
- Induction of Multiscale Temporal Structure
- Learning Sequential Structure with the Real-Time Recurrent Learning Algorithm
- A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks
- Continuous history compression
- Learning long-term dependencies with gradient descent is difficult
- Learning Complex, Extended Sequences Using the Principle of History Compression
- Generalization of backpropagation with application to a recurrent gas market model
- Gradient calculations for dynamic recurrent neural networks - a survey
- Finding Structure in Time
- Credit Assignment through Time - Alternatives to Backpropagation
- Finite State Automata and Simple Recurrent Networks
- Learning Sequential Tasks by Incrementally Adding Higher Orders
- Learning State Space Trajectories in Recurrent Neural Networks
- An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories
- Language Induction by Phase Transition in Dynamical Recognizers
- Adaptive neural oscillator using continuous-time back-propagation learning
- Experimental Comparison of the Effect of Order in Recurrent Neural Networks
- The Recurrent Cascade-Correlation Architecture
- LSTM can Solve Hard Long Time Lag Problems
- Contrastive Learning and Neural Oscillations
- Dynamics and architecture for neural computation
- Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks
- A time-delay neural network architecture for isolated word recognition
- Generalization of back-propagation to recurrent neural networks.
- Holographic Recurrent Networks
- A Theory for Neural Networks with Time Delays
- Induction of Finite-State Languages Using Second-Order Recurrent Networks
- Guessing can Outperform Many Long Time Lag Algorithms
- Bifurcations in the learning of recurrent neural networks
- Untersuchungen zu dynamischen neuronalen Netzen
- A learning rule for asynchronous perceptrons with feedback in a combinatorial environment
- Gradient-Based Learning Algorithms for Recurrent Networks
- Gradient-based learning algorithms for recurrent networks and their computational complexity
- A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks
- Netzwerkarchitekturen, Zielfunktionen und Kettenregel
- Learning long-term dependencies is not as difficult with NARX recurrent neural networks
- A time delay neural network architecture for speech recognition
- Time Warping Invariant Neural Networks