Simplifying Neural Networks - RNN

This post explains a basic RNN. RNNs are typically used for sequence related ml tasks. The input is not a cross sectional data. It is sequential data. The order of inputs matter. 

Consider your input a 1D vector. This vector contains the features of one time step. If your model contains 10 features, it will be a vector of length 10. In NLP, for most of the tasks, these are typically a sequence of word indexes (either converted/initialized to embeddings of 300 dimensions of pretrained word vectors and updated during training else these representations are initialized randomly and learnt during training. pretrained vectors reduces training data needed). The input is not restricted to 1D vector. It can be a 2D matrix as well. Any shape you want.

Each time step will have the same dimension vector. The input dimension is same for all time steps. In NLP, if inputs are sentences of varying length, then sequences are made of same length by padding them. 

First the input layer will transform the input vector into a suitable shape for matrix operations with hidden layer matrix. Each input goes through the same input matrix. Weights are shared. It is one unit unrolling again and again for each input in the sequence. One reason why RNN has so few parameters.

When you read about neural networks, you will come across terms like functions applied on inputs etc. It will be easier to understand things if you know that functions are also represented as matrices. 

For 1D input example, hidden layer unit will be a 2D matrix with number of rows and columns same as number of hidden units. 

After each input, a new hidden state is generated which is used for the computation with new input. An output state is also generated for each step. 
 
Activation  functions are also applied on the matrices to modify their impact. 

These weight matrices are learnt during the training process.

A typical RNN moves forward. Once the weight matrices are learnt, it does not take into account the value coming next to the current input for deciding the fate of the current input. This problem is handled in bidirectional lstm where the next inputs are also considered for classifying the current input.

The problem is neural networks are thought to be panacea for everything. They are not. Most of the time data is not adequate and the patterns keep evolving over time due to various factors. NN may learn most of the representations (a neural network is a universal functional approximator) but they will overfit. And if regularised a lot to avoid overfitting, may not be effective enough. Good feature engineering is important. 

I can guarantee that a multivariate time series is going to perform better than a univariate time series for stock market prediction. A univariate series lacks a lot of information. A very simple example. A univariate series of closing prices has no way of knowing if there was some volatility on the previous day. But this volatility might have significant impact on next day price. This is just a basic example. Some might argue to increase the sampling interval to capture the volatility. It will just increase the NNs workload without achieving much. 

I can cite many more. Without proper featue engineering, neural networks will overfit. They may learn features which are extremely good and beyond what can be achieved by feature engineering. But they may also learn absurd features which will just overfit.

I don't understand why people keep building univariate LSTMs for stock market prediction. Stock market is not a univariate time series problem. It is a multivariate time series problem. 

Technical indicators are similar to state of the system. These states are also learnt by algos such as HMM, LSTM albeit through more complex means. TA states are more intuitive but will most likely have poorer performance compared to states learnt by neural network (ignoring overfitting). Keep technical indicators to a minimum. Some can be based on quant formulas (difficult for neural networks to learn). Add mostly complimentary i.e volume, delivery ratio etc if found significant. Add C-H, C-L, C-O as variables in percentages. This way neural networks will learn candlestick analysis.

use z score instead of Bollinger bands.   (current price - sma10)/sd10. Algo will  automatically learn what settings (lower, upper band) work best. Maybe more than one based on different timeframes and different types of averages.

Target variables: Next day OHLC (%). 3 day trend, Weekly trend, monthly trend (this will require more variables). Separate models for all.Can try the slope and duration of trend thing. 

Market phases are important. Features will behave differently in different markets. This can be given as a separate variable as an output of another model. 

Comments

Popular posts from this blog

On Quora

No Marriage

Pls Don't Fuck My Leg