#Gradient Symbolic Computation
##Paul Smolensky and Matt Goldrick
with the essential help of
##Nicholas Becker and Pyeong Whan Cho ###LSA 2015 Summer Linguistic Institute, University of Chicago
<font; color="red">
Class 3 (Monday, July 13, 2015)
GSC Implementation in Local Neural Networks
In this class we take the first step towards defining the neural network architecture that undergirds GSC.
A neural network is a mathematical object defined by:
a set of 'units';
a set of 'connections', one from each unit to each other unit;
the state of at a given time consists in
a particular activation state : for each unit, labeled , an 'activation level' in some set of possible activation values ; the activation vector is the ordered list of all activation levels ;
a particular connection-weight state : for each connection, between the units labelled and , a 'connection weight' ; is the 2-D, , array or matrix in which is the number in the cell located in row , column ;
the dynamics of consists in
an activation dynamics, an equation that dictates how the activation state of the network evolves over time ;
a learning dynamics, an equation that dictates how the connection weights state of the network evolves over time ; we will not take up this aspect of GSC networks until Class 6)
We will not take up the activation dynamics of GSC networks until Class 6; we will not consider the learning dynamics of GSC networks in this course.
Typically, the set of possible activation levels is the entire real number line or a subsegment such as the interval between 0 and 1, inclusive : this is a continuous neural network, which we will consider shortly. For the moment, we consider a discrete (indeed, a binary) neural network in which .
Until we take up the activation dynamics in Class 6, our concern is with:
How the network activation pattern at some moment, , encodes a Gradient Symbolic Representation (GSR) , and
Given a weight matrix that encodes a Harmonic Grammar in a network , how the Network Harmony of the activation pattern , is defined, so that it equals the Harmonic Grammar 's Harmony value of the GSR that is encoded by :
In this class we consider networks in which the activity of each possible filler/role binding in a Gradient Symbol Structure is encoded as the activation level of a single network unit -- "local representation".
In Class 5 we take up the general GSC architecture, which uses distributed representations, in which the activity of a filler/role binding is encoded by the strength of an activation pattern across many network units that also simultaneously host (through a superposition of activation patterns) the encoding of the activity of many different filler/role bindings.
We begin by continuing the example of Harmonic Context-Free Grammars (HCFGs), then introduce a new example of phonological alternation, Devoicing.
<font; color="blue"> ##Recall from Class 1: From a Context-Free rewrite-rule Grammar to a Context-Free Harmonic Grammar in Harmonic Normal Form
Recall: how to specify exactly the trees generated by a given HNF grammar (or the language of their terminal strings) using a Harmonic Grammar?
For the proof, see THM vol. 1 pp. 397-398. For the intuition, see the picture in Fig. 2.
<font; color="purple"> ##An example network
Consider the simple grammar introduced in Class 1 (the default grammar of the course):
Here we've adopted the further simplification of in which we have dropped ; instead, both and are considered start symbols: grammatically eligible to label the tree root node.
In a GSC neural network, as in most neural networks, the spatial arrangement of network units is irrelevant to the network's functioning: all that matters is the strength of the connections between units. For ease of interpreting network states, however, we can choose to lay the units out spatially as in the following Figure 3-1:
###Figure 3-1
A Gradient Symbolic Representation is encoded by a state of this network in which the unit labelled has an activation level that equals the activity in of the particular filler/role binding The three pools of units are arranged spatially by the roles, with the role of root-position () at the top, and the left- () and right-child-of-root () roles at the lower left- and right, respectively. In each pool there are four units, one for each of the symbols in the alphabet . In the IPython implementation, the filler/role bindings corresponding to the units shown in gray rather than black are actually absent: they are never active in any grammatical tree.
The tree [𝚂[𝟷] 𝙰 𝙰] i.e. 𝚃𝚛𝚎𝚎(′𝚂[𝟷] (𝙰 𝙰)′) is encoded by the vector (1 0, 1 0, 1 0) -- assuming the gray units are not present. What is its Harmony?
¿ How should Figure 3-1 be ammended in order to encode the full HNF grammar
An exception has occurred, use %tb to see the full traceback.
SystemExit: Error: Invalid filler (S).
<font; color=red> Homework Exercise 3-1. What vector encodes the tree i.e. ?
What is its Harmony?
<font; color="green"> #Continuous Neural Networks: (Gradient) Devoicing
##Unit Harmony
As in the discrete case, the probability (now a density) assigned to a state is .
But since now varies over all of ( the total number of possible filler/role bindings), it is essential that, as , i.e., .
(Actually, the full requirement is stronger: in order for it to be possible to normalize the distribution, so that it assigns probability 1 to the entire state space , we must have .)
The raw Harmony from the Harmonic Grammar, , will typically not meet this condition, so we will add another term to the total Harmony: , the "Unit Harmony", which will ensure that as ; in Class 6, it will also be seen to play an important role in the neural dynamics, preventing the activity of units from going to or . favors the activity value half way between the two discrete values {0, 1}: the further an activity value is from 0.5, the greater the penalty (negative Harmony) assessed by .
(The shape of is that of an inverted bowl, so is sometimes referred to as "the bowl".)
##The Devoicing Grammar
The Devoicing Grammar consists of two simple constraints:
FAITH(voice): The value of the feature [voice] of a segment in the output must be the same as the value of the [voice] feature in the corresponding input segment.
MARK(voice): No [+voice] obstruent output segments in syllable coda position.
Let: activity of output [+voice]; underlying /voice/ value; , weights of Faithfulness, Markedness.
The FAITH(Voice) constraint will assess Harmony : if (underlying [+voice]), this rewards positive activity of surface [+voice]; if (underlying [voice]), it penalizes positive activity of surface [+voice].
The MARK(voice) constraint could be implemented as the Harmony penalty for a coda obstruent, 0 otherwise. But we will instead (for reasons to be explained in Class 4) implement it as the non-linear (quadratic or 2nd-order) penalty .
Define total Harmony for the devoicing example as: where:
is the Harmonic Grammar value (Harmony from the 2 constraints),
is the "Unit Harmony" (or "the bowl").
Figure 3-2 provides a picture:
###Figure 3-2
Note the definitions of -- used below.
###Incomplete Neutralization (as in Dutch) Here, the optima can be computed with simple algebra because the Harmony function is just quadratic (2nd-order).
###Finding optima using canned algorithms When we add Quantization Harmony, which is 4th order, it will be useful to appeal to canned numerical optimization algorithms to find total Harmony optima. When these algorithms provide global optima, those results are significant: they give the competence-theoretic predicted output values.
But for temporal dynamics intended to model real-time internal psycholinguistic processing, we must wait for the GSC dynamics presented in Class 6.
<font; color=red> Homework Exercise 3-2. What happens if we increase the Markedness weight ?