RNN basic

tensorflow

Recurrent Neural Network (RNN)

A class of artificial neural network where connections between units form a directed cycle. This allows it to exhibit dynamic temporal behavior. RNN can use their internal memory to process arbitrary sequences of inputs. - Wiki
Data in nature is usually in a sequence, and sequential data gives us many clues for context.
- Weather
- Speech and conversation
- Object detection in a scene
RNN is a neural network for sequential data. Compared to other neural networks, RNN has a feedback(recursion) path. With the path, the previous states affect the next output, and RNN learns the context from the sequential data.

Image 1. RNN and FCN

Like CNN, fully connected network (FCN) is followed after RNN. RNN extracts context information from sequential data, then it is required to analyze what they are in DNN.

Image 2. Representation of RNN

The left side of above image is a basic representation of RNN, and the right side of RNN is an unrolled representation. The unrolling representation shows several RNN layers, but actual RNN layer is just one. The extended RNN layers represent time flow.
In the unrolled representation, RNN has at least 2 inputs and 2 outputs. The inputs for RNN are real input data and the states from the previous input data. Also, the outputs from RNN are real output and the current state from the inputs.
It is represented by

$$ h_t = f_W(h_{t-n}, ..., h_{t-2}, h_{t-1}, x_t) $$

$h_t$ is the new state.
$f_W$ is a function with weights.
$h_{t-n}$ is old states. n is the number of previous steps. As n is larger, RNN learns more previous states simultaneously. If n is 1, RNN considers only the one previous state. If n is 3, RNN considers three previous states.
$x_t$ is the input data.

Weights of Recurrent Neural Network

Vanilla RNN is a RNN whose state consists of a single hidden vector $h$. In this chapter, vanilla RNN is used to explain because it is simple.

Vanilla RNN is trained with the input data and the last previous state, so n is 1.

$$ h_t = f_W(h_{t-1}, x_t) $$

For RNN, $tanh()$ is used as an activation function widely.

$$ h_t = tanh(W_{hh} \cdot h_{t-1} + W_{xh} \cdot x_t) $$

For state, there are two weights. One is for the previous state, and the other is for the input data.
The output of RNN is

$$ y_t = W_{hy} \cdot h_t $$

Not only the previous state and input data, RNN also has weights of the current state for the output.
In vanilla RNN, there are 3 weights, and these weights and activation function are used at every step.

Example: Character Level Language Model

Now, RNN will train a word, "hello". After training, the RNN will suggest next character of what user gives. For instance, when user types "h", RNN will suggest "e". If user types "e", RNN will suggest "l".

Image 3. Character level training for "hello" with RNN

The numbers of input data and output data are 4, not 5, because there is no need to predict the next character for the last character.

Image 4. Train "hello" word with RNN

To train RNN, the characters should be represented by one-hot encoded vector. "hello" has 4 identical characters, "h", "e", "l" and "o".
- h = [0, 0, 0, 1]
- e = [0, 0, 1, 0]
- l = [0, 1, 0, 0]
- o = [1, 0, 0, 0]
The state for the first data is set as 0, because there is no previous step.
RNN trains the input data, and update $W_{hh}$, $W_{xh}$ and $W_hy$.

$$ h_t = tanh(W_{hh} \cdot h_{t-1} + W_{xh} \cdot x_t) $$ $$ y_t = W_{hy} \cdot h_t $$

$y_t$ is fed to softmax regression neural network, then the final decision comes out.
In this example, the results of the first 2 step are wrong. The results should be "e" and "l", but they are "o" and "o". These wrong results will be used to calculate cost and update the weights and biases.

Network Variation of RNN

With the concept of RNN, there are many variations of neural network.

Image 5. Network variations(src: karpathy.github.io)

One-to-one: ex) Vanilla neural network
One-to-many: ex) Image captioning
Many-to-one: ex) Sentiment classification
Many-to-many: ex) Machine translation
Many-to-many: ex) Video classification on frame level

Further RNN

RNN is a great approach to train sequential data, but there are better new approaches have been proposed until now.
Long Short-term Memory (LSTM) - Wiki
Gated Recurrent Unit (GRU) - Wiki

Universe In Computer

Header$type=social_icons

$type=grid$count=3$meta=0$sn=0$rm=0

30. RNN Basic

TOC

Recurrent Neural Network (RNN)

Weights of Recurrent Neural Network

Example: Character Level Language Model

Network Variation of RNN

Further RNN

라벨:

COMMENTS

Labels

RECENT$type=list-tab$date=0$au=0$c=5

REPLIES$type=list-tab$com=0$c=4$src=recent-comments

RANDOM$type=list-tab$date=0$au=0$c=5$src=random-posts

$type=grid$count=3$meta=0$sn=0$rm=0

30. RNN Basic

TOC

Recurrent Neural Network (RNN)

Weights of Recurrent Neural Network

Example: Character Level Language Model

Network Variation of RNN

Further RNN

라벨:

SHARE:

COMMENTS

Labels

RECENT$type=list-tab$date=0$au=0$c=5

REPLIES$type=list-tab$com=0$c=4$src=recent-comments

RANDOM$type=list-tab$date=0$au=0$c=5$src=random-posts