Stochastic Gradient Descent on a Portfolio Management Training Criterion Using the IPA Gradient Estimator
In this paper, we set the basis for learning a multitype assets portfolio management technique relying on no assumptions over the distributions of the financial data. The neural network based model tries to capture patterns in the evolution of the market. Furthermore, the model allows a stochastic perturbation in the asset pricing from the network to avoid local maxima in the decision space. Under those settings, we prove that our investment decision is a Markovian decision process which is Lipschitz continuous almost surely in its parameters. Therefore, the IPA gradient estimator, obtained here by the classical backpropagation algorithm, can be used in a gradient descent procedure to converge to a local maximum of our learning criterion, the Sharpe ratio.
[ - ]