#Gradient Symbolic Computation
Paul Smolensky and Matt Goldrick with the essential help of Nicholas Becker and Pyeong Whan Cho
LSA 2015 Summer Linguistic Institute, University of Chicago
Perhaps it seems undeniable, but it isn't: in language, mental representations must have discrete structure. Recurring examples in this course:
syllables in phonology
phrase constituency in syntax
It appears that general knowledge of language must refer to structural positions or roles:
obstruents must be voiceless in coda position
elements of a chain must be constituents
Calling into question the existence of structure:
Extreme Reducto-Empiricist philosophies of science (e.g., low-level exemplar theory)
will not figure in this class
Physicalism: Knowledge is physically realized in neural properties (e.g., syntaptic efficacies/interconnection weights)
will figure prominently in this course (Class 3 onward)
In Gradient Symbolic Computation: the same notional structure exists, but in continuous or gradient rather than inherently discrete form.
e.g., a segment might occupy the role
or equivalently, the role might be occupied or filled by
##Filler/role decompositions
We will use two filler/role decompositions for binary trees:
"Recursive roles" (useful for production/generation of trees)
with where '0' = 'left child of'; '1' = 'right child of'
"Span roles" (good for comprehension via chart-parsing of strings of terminal symbols)
= 'constituent spanning from terminal-string position to position with internal break between subconstituents at position '
perhaps this is why the language processing system in the human mind/brain seems to treat comprehension and production fairly separately
To use the iP[y] Tree class, simply make a Tree object and pass in a text version of the desired tree. Formatting is crucial here:
Begin the string with the root node.
Indicate subtrees using parentheses.
Separate sibling nodes using spaces.
Nodes can contain any characters except parentheses, commas, and percent signs (%). Thus, it is possible to manually enter a tree in HNF form using brackets.
Works for n-ary trees, too:
##Grammars
[The Harmonic Mind Section 10.1: "$\S$10.1" in the Master Tableau]
###Harmonic Grammars
The Grammatical Harmony of a discrete symbol structure is gotten by summing the contributions of all the "soft rules" {} that define the Harmonic Grammar .
(1) : If simultaneously contains the constituents and , then add the numerical quantity to the grammatical Harmony .
Equivalently, we can speak of "soft constraints":
(2) : must not/may simultaneously contain the constituents and (strength: ).
where , and if , "must not" applies in (2), otherwise "may" applies.
A Harmonic Grammar can (and will) be used in several ways:
Specify a set of trees as grammatical: those with the highest value of (often designed to be 0; illicit trees have negative Grammatical Harmony)
Specify a well-formedness function over all trees: the well-formedness of is simply .
Specify a probabilistic language: the probability of any tree is proportional to , where is the randomness parameter temperature.
###Harmonic Normal Form (HNF)
As illustrated in (2), Harmonic Grammar -- in its strictest sense -- allows constraints only up to 2nd order, i.e., rewarding/penalizing the co-presence of at most 2 filler/role bindings.
We will see in Class 6 that this originates in the fact that (standard) neural network interactions occur between pairs of units/nodes in the network.
Even for trees with as low a branching factor as 2, though, 2nd order constraints do not suffice; consider:
and the ill-formed local tree .
Solution: Impose
(2.1) The Unique Branching Condition (a.k.a. Invertibility): There can be at most one branching rule with a given left-hand side.
This is respected by grammars in Harmonic Normal Form (HNF); all rules must of the form shown in Fig. 1 (HNF has an enforced distinction between bracketed and unbracketed non-terminal symbols).
The filler/role bindings of HNF trees can also be retrieved from an object of Class :
Translation from a CNF to an HNF covering grammar is straightforward.
You can also enter a listing of rules from a file (containing every rule on a separate line).
###From an HNF Grammar to a Harmonic Context-Free Grammar (HCFG)
Now: how to specify exactly the trees generated by a given HNF grammar (or the language of their terminal strings) using a Harmonic Grammar?
For the proof, see THM vol. 1 pp. 397-398. For the intuition, see the picture in Fig. 2.
The iP[y] Class includes the Harmonic Grammar weighted constraints as well as the rewrite rules in CNF and HNF.
It is possible to enter a grammar into in either conventional CFG form (CNF) OR in Harmonic Normal Form (HNF). will convert rules entered in CNF to HNF, and vice versa. However, if rules are going to be entered in HNF, a flag must be set to indicate this (regardless of whether input is supplied as a string of grammar rules or as a file name).
Throughout the rest of the course, the default grammar we will study is the one following, -- although to simplify even further, to minimize irrelevant distractions, we will often omit 'S -> S[1] | S[2]' and instead treat both S[1] and S[2] as legal start symbols (legal at the tree root).
can handle recursive rules, but for the sake of creating the Harmonic Grammar constraints, a maximum depth must be set; the default is 6.
If you wish to change the maximum depth while using recursive rules, you can simply set the parameter when calling the method.
###Probabilistic Harmonic Grammars (PHGs) and PCFGs
In the probabilistic version of HG, higher Harmony entails higher probability, according to:
where , the randomness parameter or temperature, will be set to 1 for now. (More on in Class 6.)
This distribution can be derived in at least two ways:
Given that Harmonies combine additively and probabilities combine multiplicatively, the only continuous functions that can map Harmony to probability are exponential functions; choice of amounts to choice of which exponential function is deployed.
Alhough we omit the demonstration here, this distribution is a Maximum Entropy distribution, hence a consequence of the Maximum Entropy induction principle, which essentially states that a learner should extrapolate from observed data to the probability distribution that has the least information among those consistent with that data. 'Maxent' models in computational linguistics are probabilistic Harmonic Grammars (construed generally, to include constraints that may be higher than 2nd order); there, the term 'features' refers to what we call 'constraints' here. Hayes & Wilson 2006 is a seminal work applying Maxent modeling to phonology.
This means that
where is the normalizing constant ensuring that the probabilities, summed over all discrete trees , equals 1.
Computing is a perennial challenge. But note that in probability ratios, cancels so need not be calculated; we have simply:
or equivalently, in terms of log-probabilities (log to base , i.e., the natural logarithm ):
The class can also be used for PCFGs by setting the flag to .
Within , differential probability is implemented by adjusting the 1st-order Harmonic Grammar constraint wieghts differentially for and ; to achieve the specified ratio
these 1st-order constraint weights are adjusted so that their difference equals
(recall that for now). This is done by increasing by the weight of the 1st-order constraint for higher-probability symbol, .
To check this, we can get the Harmony of the two possible grammatical trees in this grammar using the method. Note that trees must be entered in HNF.
It is also possible to get the Harmony of ungrammatical structures (so long as all fillers and roles in the structure are valid).
The probability of a structure can be found with the computeProb() method.
(Note: We are considering only purely discrete output candidates until Class 3)
An OT account with 2 constraints Faith(voi), *Voi/Cod
A corresponding HG account
Probability of error harmonemes; harrmons