NATURAL LANGUAGE PARSING WITH RECURSIVE NEURAL NETWORK Mujeeb Rehman O. Second sem M.tech Dept.of CSE Govt. Engg. College Sreekrishnapuram

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

omrehman@gmail.com

April 10, 2012

RNN

April 10, 2012

1 / 25

Overview:

Introduction to Artificial Neural Network Natural Language parsing using ANN Recursive Neural Network Recursive Neural Networks for Structure Prediction Learning of RNN Conclusion

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

2 / 25

Introduction to Artificial Neural Network

The Concept of Artificial Neural Network(ANN) has been motivated by Human brain. Most important property of ANN is that it can learn from a training data. Basic model of ANN is McCulloch-Pitts model. The Learning process is said to be the weight adjustments of synaptic interconnections.

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

3 / 25

Introduction to Artificial Neural Network(continue.........)

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

4 / 25

Natural Language parsing using ANN:

Definition â&#x20AC;&#x153;A parsing Algorithm can be described as a procedure that searches through various ways of combining grammatical rule to find a combination that generates a tree that could be the structure of the input sentence [1]. parsing generates a tree that could be the structure of an input sentence. natural language parsing is done by the most matching featured words grouped together. it needs an input sentence and some grammatical rules to parse.

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

5 / 25

Natural Language parsing using ANN(continue...........):

Parsing with example : John ate the cat NAME → John V → ate ART → the N → cat RULES: 1

S ← NP VP

2

VP ← V NP

3

NP ← NAME

4

NP ← ART N

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

6 / 25

Natural Language parsing using ANN(continue...........):

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

7 / 25

Natural Language parsing using ANN(continue........):

Issues of parsing with Natural Network: NN classifies words into classes. we need to setup different ANN for each level of tree. computationally very expensive.

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

8 / 25

Recursive Neural Network:

Definition Recursive Neural Networks (RNNs) are able to process structured inputs by repeatedly applying the same neural network at each node of a directed acyclic graph(DAG)[2]. RNN can be used repeatedly for each node. Here we consider binary tree structure.

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

9 / 25

Recursive Neural Network(continue........):

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

10 / 25

Recursive Neural Network(continue........):

In RNN the given binary tree is assumed in the form of branching triplets (p → c1 c2 ). p is the parent node and has two children c1 c2 . each ck can be either an input xi or a non-terminal node in the tree Eg: y1 → x3 x4 , y2 → x2 y1 , y1 → x1 y2 the activation function in each node is, p = thanh(W [c1 ; c2 ] + b])

(1)

where c1 , c2 , p ∈ R n [c1 ; c2 ] denotes the concatenation of the two child column vectors

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

11 / 25

Recursive Neural Networks for Structure Prediction

RNNs are used to construct the tree in a bottom-up fashion The input vectors(x1 , ..., xn ) come from a look-up table of dimensionality L ∈ R n×|V | , were |V | is the size of the vocabulary Each word in the input sequence has an associated index k in the table so the input can be xi = Lbk ∈ R n

(2)

where bk is a binary variable whose value is non-zero at the index k in the look-up table.

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

12 / 25

Recursive Neural Networks for Structure Prediction(continue......)

There are four models for predicting the structure 1

Model 1: Greedy RNN

2

Model 2: Greedy, Context-sensitive RNN

3

Model 3: Greedy, Context-sensitive RNN and Category Classifier

4

Model 4: Global, Context-sensitive RNN and Category Classifier

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

13 / 25

Model 1: Greedy RNN

The problem has two phases : First, it computes a new representation of the phrase which would combine the vectors of the two children into a new vector. Second, it scores how likely this is a correct phrase The algorithm takes the first pair of neighboring vectors i.e., (c1 , c2 ) = (x1 , x2 ). And give it as the input of RNN. For simplicity we assume that NN is single layered. W score can be calculated as inner product of two inputs (x1 , x2 ). So we get s1,2 = W score p. (3)

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

14 / 25

Model 1: Greedy RNN(continue............)

This score (W score ) measures how well the two words are combined into a phrase After computing the first score we shift right and take its neighboring pair (x2 , x3 ) and compute with Eq.3. This process is repeat until all the inputs are over Then replace the maximum scored function with potential (p) and repeat all the steps for next level also. This process continues till all inputs and non-terminals are converted to one potential(p).

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

15 / 25

Model 2: Greedy, Context-sensitive RNN:

Parsing decisions are often context dependent So here we embed the context features along with input features. Then the parent node equation can be p = tanh(W [x1 ; c1 ; c2 ; x+1 ; ] + b)

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

(4)

April 10, 2012

16 / 25

Model 3: Greedy, Context-sensitive RNN and Category Classifier

One of the main advantages of RNN-based approach is that each phrase has associated with it a distributed feature representation. We can leverage on this representation by adding to each Context-sensitive RNNâ&#x20AC;&#x2122;s(CRNNâ&#x20AC;&#x2122;s) parent node (after removing the scoring layer), a simple softmax layer to predict class labels such as syntactic categories or named entity classes

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

17 / 25

Model 4: Global, Context-sensitive RNN and Category Classifier

Instead of greedy model, we can formulate a global, regularized risk objective in a max-margin framework. For each sentence, we can use dynamic programming parsing algorithms to efficiently find the globally optimal tree. Let the training data consist of (sentence, tree) pairs: (xi , yi ) A(xi ) is all the possible tree that can be generated from a sentence So we want to maximize this objective, X J = s(xi , yi ) − maxy ∈A(xi ) (s(xi , y ), ∆(y , yi ))

(5)

i

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

18 / 25

Model 4: Global, Context-sensitive RNN and Category Classifier(continue..............)

The structure loss ∆ penalizes trees more when they deviate from the correct tree The total score of each tree is the sum of scores of each collapsing decision X s(xi , yi ) = sd (c1 , c2 ) (6) d∈T (yi )

Also the structure loss can be given as X ∆(y , yi ) = λ1{d ∈ / T (yi )}

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

(7)

d∈T (y )

RNN

April 10, 2012

19 / 25

Learning of RNN

Learning means adjusting the weights to get a better output. Here we are using sigmoid function as activation function. There are two methods for learning of RNN 1 2

Error Back-propagation Through Structure Subgradient Methods

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

20 / 25

Error Back-propagation Through Structure

In order to maximize the objective in Eq.5, we compute the derivative by using back propagation through structure The hidden layer back prop derivatives for tree i are: X ∂Ji = [c1; c2](d) [δ (d+1) ]T 1:n + CW ∂W

(8)

δ (d) = (W T δ (d+1) ) · f 0 ([c1; c2](d) )

(9)

d∈T (yi )

C is a weight regularization constant [.]1:n is the first n elements of a vector while δ is left child

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

21 / 25

This method will generalize gradient ascent via the sub-gradient method which computes a gradient-like direction called the sub-gradient. For any of our parameters such as W , the gradient becomes X ∂s(xi , yi ) ∂s(xi , ymax ) ∂J = − ∂W ∂w ∂w

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

(10)

i

RNN

April 10, 2012

22 / 25

Conclusion

The Artificial Neural Network is a complex and expensive technique for tasks such as parsing. A solution is to use RNN. Recursive Neural Network is an extension of ANN. RNN can be used repeatedly for each node in parse tree. There are four methods used to predict a tree structure using RNN. RNN is usually trained by supervised learning method. Error Back-propagation and Sub-gradient are the two approaches for learning of RNN.

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

23 / 25

References James Alen, “ Natural Language Understanding ,”second edition PEARSON. , 2012 Richard Socher, Christopher D. Manning, Andrew Y. Ng “ Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks ” Department of Computer Science Stanford University 2010 Taskar, B., Klein, D., Collins, M., Koller, D., and Manning, C. “Max-margin parsing.” EMNLP, 2004. Don Hush, Chaouki Abdallah, and Bill Horne “The Recursive Neural Network” Department of Electrical Engineering and Computer Engineering University of New Mexico, 2010.

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

RNN

April 10, 2012

24 / 25

Mujeeb Rehman O. (G. E. C. Sreekrishnapuram )

THANK YOU

RNN

April 10, 2012

25 / 25

NATURAL LANGUAGE PARSING WITH RECURSIVE NEURAL NETWORK
NATURAL LANGUAGE PARSING WITH RECURSIVE NEURAL NETWORK

A seminar slides