Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
436 views
Kernel: Python 3

#Gradient Symbolic Computation

Paul Smolensky and Matt Goldrick with the essential help of Nicholas Becker and Pyeong Whan Cho

LSA 2015 Summer Linguistic Institute, University of Chicago

#CLASS 1 (Monday, July 6, 2015)
#Structure

Perhaps it seems undeniable, but it isn't: in language, mental representations must have discrete structure. Recurring examples in this course:

  • syllables in phonology

  • phrase constituency in syntax

It appears that general knowledge of language must refer to structural positions or roles:

  • obstruents must be voiceless in coda position

  • elements of a chain must be constituents

Calling into question the existence of structure:

  • Extreme Reducto-Empiricist philosophies of science (e.g., low-level exemplar theory)

  • will not figure in this class

  • Physicalism: Knowledge is physically realized in neural properties (e.g., syntaptic efficacies/interconnection weights)

  • will figure prominently in this course (Class 3 onward)

In Gradient Symbolic Computation: the same notional structure exists, but in continuous or gradient rather than inherently discrete form.

  • e.g., a segment [d][\rm d] might occupy the role 0.7rcoda0.7 \cdot r_{\rm coda}

  • or equivalently, the role rcodar_{\rm coda} might be occupied or filled by 0.7[d]0.7 [\rm d]

#Trees and Context-Free Languages

##Filler/role decompositions

We will use two filler/role decompositions for binary trees:

  • "Recursive roles" (useful for production/generation^\dagger of trees)

rxr_x with x{0,1}x \in \{0, 1\}^\ast where '0' = 'left child of'; '1' = 'right child of'

  • "Span roles" (good for comprehension^\dagger via chart-parsing of strings of terminal symbols)

R[i,j,k]R[i,j,k] = 'constituent spanning from terminal-string position ii to position kk with internal break between subconstituents at position jj'

^\dagger perhaps this is why the language processing system in the human mind/brain seems to treat comprehension and production fairly separately

Draw Figures\fbox{Draw Figures}

To use the iP[y] Tree class, simply make a Tree object and pass in a text version of the desired tree. Formatting is crucial here:

  • Begin the string with the root node.

  • Indicate subtrees using parentheses.

  • Separate sibling nodes using spaces.

  • Nodes can contain any characters except parentheses, commas, and percent signs (%). Thus, it is possible to manually enter a tree in HNF form using brackets.

%run ../../code/tree sentence1 = Tree('S1 (NP (the dog) VP (chased NP (the cat)))') print("Recursive roles r:\n" + sentence1.recursiveFRtoString()) print("Span roles R:\n" + sentence1.spanFRtoString())
Recursive roles r: {S1/r; NP/0r; VP/1r; the/00r; dog/10r; chased/01r; NP/11r; the/011r; cat/111r} Span roles R: {S1/025; NP/012; the/01; dog/12; VP/235; chased/23; NP/345; the/34; cat/45}
# print("Recursive roles r:\n" + sentence1.recursiveFRtoString()) # print("Range roles R:\n" + sentence1.spanFRtoString()) def printAllRoles(sentence): print("Recursive roles r:\n" + sentence.recursiveFRtoString()) print("Span roles R:\n" + sentence.spanFRtoString()) printAllRoles(sentence1)
Recursive roles r: {S1/r; NP/0r; VP/1r; the/00r; dog/10r; chased/01r; NP/11r; the/011r; cat/111r} Span roles R: {S1/025; NP/012; the/01; dog/12; VP/235; chased/23; NP/345; the/34; cat/45}

Works for n-ary trees, too:

sentence3 = Tree('S1 (NP (the dog) VP (chased NP (the cat) PP (P (up) NP (the tree))))') printAllRoles(sentence3)
Recursive roles r: {S1/r; NP/0r; VP/1r; the/00r; dog/10r; chased/01r; NP/11r; PP/21r; the/011r; cat/111r; P/021r; NP/121r; up/0021r; the/0121r; tree/1121r} Span roles R: {S1/028; NP/012; the/01; dog/12; VP/2358; chased/23; NP/345; the/34; cat/45; PP/568; P/56; up/56; NP/678; the/67; tree/78}

##Grammars

[The Harmonic Mind Section 10.1: "$\S$10.1" in the Master Tableau]

###Harmonic Grammars

The Grammatical Harmony HG(s)H_{\cal{G}}(\tt{s}) of a discrete symbol structure s\tt{s} is gotten by summing the contributions of all the "soft rules" {RijR_{ij}} that define the Harmonic Grammar G\cal{G}.

(1) RijR_{ij}: If s\tt{s} simultaneously contains the constituents cic_i and cjc_j, then add the numerical quantity H(ci,cj)H(c_i, c_j) to the grammatical Harmony HG(s)H_{\cal{G}}(\tt{s}).

Equivalently, we can speak of "soft constraints":

(2) CijC_{ij}: s\tt{s} must not/may simultaneously contain the constituents cic_i and cjc_j (strength: wijw_{ij}).

where wijH(ci,cj)w_{ij} \equiv | H(c_i, c_j)| , and if H(ci,cj)<0H(c_i, c_j) < 0, "must not" applies in (2), otherwise "may" applies.

A Harmonic Grammar can (and will) be used in several ways:

  1. Specify a set of trees as grammatical: those with the highest value of HG(s)H_{\cal{G}}(\tt{s}) (often designed to be 0; illicit trees have negative Grammatical Harmony)

  2. Specify a well-formedness function over all trees: the well-formedness of s\tt{s} is simply HG(s)H_{\cal{G}}(\tt{s}).

  3. Specify a probabilistic language: the probability of any tree s\tt{s} is proportional to eHG(s)/Te^{H_{\cal{G}}(\tt{s})/T}, where TT is the randomness parameter temperature.

###Harmonic Normal Form (HNF)

As illustrated in (2), Harmonic Grammar -- in its strictest sense -- allows constraints only up to 2nd order, i.e., rewarding/penalizing the co-presence of at most 2 filler/role bindings.\dagger

\dagger We will see in Class 6 that this originates in the fact that (standard) neural network interactions occur between pairs of units/nodes in the network.

Even for trees with as low a branching factor as 2, though, 2nd order constraints do not suffice; consider:

G1{AB C;AD E;FB E}{\cal G}_{1} ≡ \{\tt{A → B \ C; A → D \ E; F → B \ E}\}

and the ill-formed local tree [A B E]\tt{[_{A} \ B \ E]}.

Solution: Impose

(2.1) The Unique Branching Condition (a.k.a. Invertibility): There can be at most one branching rule with a given left-hand side.

This is respected by grammars in Harmonic Normal Form (HNF); all rules must of the form shown in Fig. 1 (HNF has an enforced distinction between bracketed and unbracketed non-terminal symbols).

The filler/role bindings of HNF trees can also be retrieved from an object of Class Tree\tt Tree:

sentence2 = Tree('S (S[1] (A A))') # HNF tree printAllRoles(sentence2)
Recursive roles r: {S/r; S[1]/0r; A/00r; A/10r} Span roles R: {S/02; S[1]/012; A/01; A/12}

Translation from a CNF to an HNF covering grammar is straightforward.

%run ../../code/grammar gram1 = Grammar('S -> NP VP; NP -> the dog|the cat; VP -> chased NP') gram2 = Grammar('A -> B C; A -> D E; F -> B E') def printAllRewriteRules(gram): print('CNF:\n' + gram.cnfRulesToString() + '\n') print('HNF:\n' + gram.hnfRulesToString() + '\n') printAllRewriteRules(gram2)
CNF: {A -> B C; A -> D E; F -> B E} HNF: {A -> A[1]; A -> A[2]; A[1] -> B C; A[2] -> D E; F -> F[1]; F[1] -> B E}

You can also enter a listing of rules from a file (containing every rule on a separate line).

gram2 = Grammar('test-grammar.txt') printAllRewriteRules(gram2)
CNF: {S -> A B; S -> C B; B -> D D} HNF: {S -> S[1]; S -> S[2]; S[1] -> A B; S[2] -> C B; B -> B[1]; B[1] -> D D}
**Homework Exercise 1-1**: Apply the CNF \rightarrow HNF transformation to G1{\cal G}_{1} above and explain how it solves the problem.

###From an HNF Grammar to a Harmonic Context-Free Grammar (HCFG)

Now: how to specify exactly the trees generated by a given HNF grammar (or the language L\cal{L} of their terminal strings) using a Harmonic Grammar?

For the proof, see THM vol. 1 pp. 397-398. For the intuition, see the picture in Fig. 2.

The iP[y] Class Grammar\tt Grammar includes the Harmonic Grammar weighted constraints as well as the rewrite rules in CNF and HNF.

# gram1 = Grammar('S -> NP VP; NP -> the dog|the cat; VP -> chased NP') def printAllRules(gram): printAllRewriteRules(gram) print('HG rules:\n' + gram.hgRulesToString() + '\n') printAllRules(gram1)
CNF: {S -> NP VP; NP -> the dog; NP -> the cat; VP -> chased NP} HNF: {S -> S[1]; S[1] -> NP VP; NP -> NP[1]; NP -> NP[2]; NP[1] -> the dog; NP[2] -> the cat; VP -> VP[1]; VP[1] -> chased NP} HG rules: {[(S/r, S[1]/0r), 2]; [(S[1]/0r, NP/00r), 2]; [(S[1]/0r, VP/10r), 2]; [(NP/00r, NP[1]/000r), 2]; [(NP/00r, NP[2]/000r), 2]; [(VP/10r, VP[1]/010r), 2]; [(NP[1]/000r, the/0000r), 2]; [(NP[1]/000r, dog/1000r), 2]; [(NP[2]/000r, the/0000r), 2]; [(NP[2]/000r, cat/1000r), 2]; [(VP[1]/010r, chased/0010r), 2]; [(VP[1]/010r, NP/1010r), 2]; [(NP/1010r, NP[1]/01010r), 2]; [(NP/1010r, NP[2]/01010r), 2]; [r, -1]; [0r, -3]; [00r, -2]; [10r, -2]; [000r, -3]; [010r, -3]; [0000r, -1]; [1000r, -1]; [0010r, -1]; [1010r, -2]; [01010r, -1]}

It is possible to enter a grammar into Grammar()\tt Grammar() in either conventional CFG form (CNF) OR in Harmonic Normal Form (HNF). Grammar()\tt Grammar() will convert rules entered in CNF to HNF, and vice versa. However, if rules are going to be entered in HNF, a flag must be set to indicate this (regardless of whether input is supplied as a string of grammar rules or as a file name).

Throughout the rest of the course, the default grammar we will study is the one following, G0{\cal G}_{0} -- although to simplify even further, to minimize irrelevant distractions, we will often omit 'S -> S[1] | S[2]' and instead treat both S[1] and S[2] as legal start symbols (legal at the tree root).

gram0 = Grammar('S -> S[1] | S[2]; S[1] -> A A; S[2] -> B B', isHnf=True) printAllRules(gram0)
CNF: {S -> A A; S -> B B} HNF: {S -> S[1]; S -> S[2]; S[1] -> A A; S[2] -> B B} HG rules: {[(S/r, S[1]/0r), 2]; [(S/r, S[2]/0r), 2]; [(S[1]/0r, A/00r), 2]; [(S[1]/0r, A/10r), 2]; [(S[2]/0r, B/00r), 2]; [(S[2]/0r, B/10r), 2]; [r, -1]; [0r, -3]; [00r, -1]; [10r, -1]}

Grammar(){\tt Grammar()} can handle recursive rules, but for the sake of creating the Harmonic Grammar constraints, a maximum depth must be set; the default is 6.

gram4 = Grammar('S -> S') printAllRules(gram4)
CNF: {S -> S} HNF: {S -> S[1]; S[1] -> S} HG rules: {[(S/r, S[1]/0r), 2]; [(S[1]/0r, S/00r), 2]; [(S/00r, S[1]/000r), 2]; [(S[1]/000r, S/0000r), 2]; [(S/0000r, S[1]/00000r), 2]; [r, -1]; [0r, -2]; [00r, -2]; [000r, -2]; [0000r, -2]; [00000r, -1]}

If you wish to change the maximum depth while using recursive rules, you can simply set the parameter maxDepth\tt maxDepth when calling the setHarmonicGrammarRules()\tt setHarmonicGrammarRules() method.

gram4.setHarmonicGrammarRules(maxDepth=10) print(gram4.hgRulesToString() + '\n')
{[(S/r, S[1]/0r), 2]; [(S[1]/0r, S/00r), 2]; [(S/00r, S[1]/000r), 2]; [(S[1]/000r, S/0000r), 2]; [(S/0000r, S[1]/00000r), 2]; [(S[1]/00000r, S/000000r), 2]; [(S/000000r, S[1]/0000000r), 2]; [(S[1]/0000000r, S/00000000r), 2]; [(S/00000000r, S[1]/000000000r), 2]; [r, -1]; [0r, -2]; [00r, -2]; [000r, -2]; [0000r, -2]; [00000r, -2]; [000000r, -2]; [0000000r, -2]; [00000000r, -2]; [000000000r, -1]}

###Probabilistic Harmonic Grammars (PHGs) and PCFGs

In the probabilistic version of HG, higher Harmony entails higher probability, according to:^\dagger

p(s)eHG(s)/Tp({\tt s}) \propto e^{H_{\cal G}({\tt s})/T}

where TT, the randomness parameter or temperature, will be set to 1 for now. (More on TT in Class 6.)

^\dagger This distribution can be derived in at least two ways:

  1. Given that Harmonies combine additively and probabilities combine multiplicatively, the only continuous functions that can map Harmony to probability are exponential functions; choice of TT amounts to choice of which exponential function is deployed.

  2. Alhough we omit the demonstration here, this distribution is a Maximum Entropy distribution, hence a consequence of the Maximum Entropy induction principle, which essentially states that a learner should extrapolate from observed data to the probability distribution that has the least information among those consistent with that data. 'Maxent' models in computational linguistics are probabilistic Harmonic Grammars (construed generally, to include constraints that may be higher than 2nd order); there, the term 'features' refers to what we call 'constraints' here. Hayes & Wilson 2006 is a seminal work applying Maxent modeling to phonology.

This means that

p(s)=Z1eHG(s)/Tp({\tt s}) = Z^{-1} e^{H_{\cal G}({\tt s})/T}

where ZZ is the normalizing constant ensuring that the probabilities, summed over all discrete trees s\tt{s}, equals 1.

Computing ZZ is a perennial challenge. But note that in probability ratios, ZZ cancels so need not be calculated; we have simply:

p(s1)p(s2)=e[HG(s1)HG(s2)]/T\frac{p({\tt s}_{1})}{p({\tt s}_{2})} = e^{[H_{\cal G}({\tt s}_1) - H_{\cal G}({\tt s}_2)]/T}

or equivalently, in terms of log-probabilities (log to base ee, i.e., the natural logarithm ln\ln):

log(p(s1)p(s2))=[HG(s1)HG(s2)]/T\log \left( \frac{p({\tt s}_{1})}{p({\tt s}_{2})} \right) = [H_{\cal G}({\tt s}_1) - H_{\cal G}({\tt s}_2)]/T

The Grammar\tt Grammar class can also be used for PCFGs by setting the isPcfg\tt isPcfg flag to True\tt True.

gram5 = Grammar('S -> 0.6 A A; S -> 0.4 B B', isPcfg=True) print(gram5.hnfRulesToString() + '\n')
{S -> 0.6 S[1]; S -> 0.4 S[2]; S[1] -> 1.0 A A; S[2] -> 1.0 B B}

Within Grammar\tt Grammar, differential probability is implemented by adjusting the 1st-order Harmonic Grammar constraint wieghts differentially for S[1]\tt S[1] and S[2]\tt S[2]; to achieve the specified ratio

p(s1)p(s2)\frac{p({\tt s}_{1})}{p({\tt s}_{2})}

these 1st-order constraint weights are adjusted so that their difference equals

ΔHHG(s1)HG(s2)\Delta H \equiv H_{\cal G}({\tt s}_1) - H_{\cal G}({\tt s}_2)

(recall that T=1T = 1 for now). This is done by increasing by ΔH\Delta H the weight of the 1st-order constraint for higher-probability symbol, S[1]\tt S[1].

To check this, we can get the Harmony of the two possible grammatical trees in this grammar using the getHarmony()\tt getHarmony() method. Note that trees must be entered in HNF.

H1 = gram5.getHarmony('S (S[1] (A A))') H1
/projects/e0c271a6-81f7-42a7-9681-fa02642c85a5/code/grammar.py:434: RuntimeWarning: divide by zero encountered in log harmonyDiff = np.log(rhs[0]) - np.log(1 - rhs[0])
0.40546510810816372
H2 = gram5.getHarmony('S (S[2] (B B))') H2
0.0

It is also possible to get the Harmony of ungrammatical structures (so long as all fillers and roles in the structure are valid).

Hu = gram5.getHarmony('S (S[1] (A B))') Hu
-1.5945348918918363

The probability of a structure can be found with the computeProb() method.

p1 = gram5.computeProb('S (S[1] (A A))') p1
0.10771600362000196
p2 = gram5.computeProb('S (S[2] (B B))') p2
0.071810669080001346
pu = gram5.computeProb('S (S[1] (A B))') pu
0.014577775859028964
print('log probability ratio S[1] to S[2] = ') np.log(p1/pu)
log probability ratio S[1] to S[2] =
2.0
print('Harmony difference S[1] minus S[2] = ') H1 - Hu
Harmony difference S[1] minus S[2] =
2.0
**Homework Exercise 1-2**: Verify the corresponding *probability ratio ~ Harmony difference* relation for (p1,pu)\tt (p1, pu) ~ (H1,Hu)\tt (H1, Hu)
#Devoicing

(Note: We are considering only purely discrete output candidates until Class 3)

An OT account with 2 constraints Faith(voi), *Voi/Cod

A corresponding HG account

Probability of error harmonemes; harrmons