Vanilla rnns often run into vanishing and exploding gradients
Due to their recurrent nature, the parameters of RNNs are changed proportional to a partial derivative which is exponential. If larger than one, it often explodes with long considered sequences. If smaller than one, it often vanishes. LSTMs attempt to solve this.