Training artificial neural networks is hard. To achieve high predictive capabilities on previously unseen data, artificial neural networks need a big amount of samples to train on. And it gets even harder if those training samples are dominated by one or more classes i.e. if the training data is unbalanced. This is because the training algorithms, mostly the backpropagation algorithm or algorithms based on it, considers every training sample as equally important. In the 2005 paper on "Optimal Gradient-Based Learning Using Importance Weights" Hochreiter and Obermayer argue that there are samples that are more important for training than others. Further they show that standard backpropagation algorithm doesnt utilize on this fact and introduce a method called "gradient based importance weighting" that does so. With experiments on artificial and real world data they show that the method works for feed-forward neural networks as well as for recurrent neural networks. However compared to modern day datasets the number of samples of the dataset used in the experiments is fairly small, making it possible to apply the sample weighting on the whole training data set. Because of the size of current datasets this is impossible, making it necessary to apply mini batch learning. Mini batch learning means that per training step only a subset of training samples are used to calculate the weight update of an artificial neural network. This thesis is concerned with evaluating the performance of the gradient based importance weighting method for feed-forward artificial neural networks trained under mini batch learning. The theoretical background is covered as well as the implementation of importance weighting for modern feed-forward neural networks. With this implementation the performance of gradient based importance weighting with mini batch learning is evaluated on five datasets, among them three unbalanced datasets. The experiments show that for mini batch learning the method doesnt yield better results than the standard backpropagation algorithm in terms of predictive accuracy on the test set. Finally an outline of an algorithm is provided that possibly improves the performance of gradient based importance weighting under batch learning.