fast inference on long sequences.
hard to optimize and slow to train.
careful design of deep RNNs using standard signal propagation arguments
can linearizing and diagonalizing the recurrence, using better parameterizations and initializations
, and proper normalization
of the forward pass.