Convolutional LSTM is an increasingly popular and very promising algorithm to solve machine learning tasks in the context of video data. Next frame prediction, i.e. predicting how a video will continue, is the machine learning task subject to this thesis. The used data stems from car camera data sets, i.e. the scenes that are to be predicted show ordinary traffic situations recorded from a roof-mounted camera of a car. Convolutional LSTM has already proven feasibility for such tasks. This thesis explores new ways of presenting the data to a known model architecture, the Predictive Coding Network, which is based on convolutional LSTM. One approach is to present delta information, i.e. the change between two consecutive frames, to the model in order to ease the learning task. The second approach examined tries to bundle the models capacity as to concentrate on the prediction of a single frame instead of a sequence. In both cases, the model architecture is not or only marginally affected by the necessary changes. This ensures, that the models remain comparable. Previous to these experiments, a feature of convolutional LSTMs called peepholes is studied with respect to its effectivity in the context of next frame prediction. This is done on a toy data set named Moving MNIST, while the prediction techniques are examined on the real world car camera sets KITTI and Caltech Pedestrian.