CoCalc -- class-1.ipynb

Download

GSC Shared Software

/ class-docs / class-1_6-july / class-1.ipynb

⁴³⁶ views

Kernel: Python 3

#Gradient Symbolic Computation

Paul Smolensky and Matt Goldrick with the essential help of Nicholas Becker and Pyeong Whan Cho

LSA 2015 Summer Linguistic Institute, University of Chicago

#CLASS 1 (Monday, July 6, 2015)

#Structure

Perhaps it seems undeniable, but it isn't: in language, mental representations must have discrete structure. Recurring examples in this course:

syllables in phonology
phrase constituency in syntax

It appears that general knowledge of language must refer to structural positions or roles:

obstruents must be voiceless in coda position
elements of a chain must be constituents

Calling into question the existence of structure:

Extreme Reducto-Empiricist philosophies of science (e.g., low-level exemplar theory)
will not figure in this class
Physicalism: Knowledge is physically realized in neural properties (e.g., syntaptic efficacies/interconnection weights)
will figure prominently in this course (Class 3 onward)

In Gradient Symbolic Computation: the same notional structure exists, but in continuous or gradient rather than inherently discrete form.

e.g., a segment $[\rm d]$ might occupy the role $0.7 \cdot r_{\rm coda}$
or equivalently, the role $r_{\rm coda}$ might be occupied or filled by $0.7 [\rm d]$

#Trees and Context-Free Languages

##Filler/role decompositions

We will use two filler/role decompositions for binary trees:

"Recursive roles" (useful for production/generation $^\dagger$ of trees)

$r_x$ with $x \in \{0, 1\}^\ast$ where '0' = 'left child of'; '1' = 'right child of'

"Span roles" (good for comprehension $^\dagger$ via chart-parsing of strings of terminal symbols)

$R[i,j,k]$ = 'constituent spanning from terminal-string position $i$ to position $k$ with internal break between subconstituents at position $j$ '

$^\dagger$ perhaps this is why the language processing system in the human mind/brain seems to treat comprehension and production fairly separately

$\fbox{Draw Figures}$

To use the iP[y] Tree class, simply make a Tree object and pass in a text version of the desired tree. Formatting is crucial here:

Begin the string with the root node.
Indicate subtrees using parentheses.
Separate sibling nodes using spaces.
Nodes can contain any characters except parentheses, commas, and percent signs (%). Thus, it is possible to manually enter a tree in HNF form using brackets.

In [4]:

%run ../../code/tree
sentence1 = Tree('S1 (NP (the dog) VP (chased NP (the cat)))')
print("Recursive roles r:\n" + sentence1.recursiveFRtoString())  
print("Span roles R:\n" + sentence1.spanFRtoString())

Out[4]:

Recursive roles r:
{S1/r; 
 NP/0r; 
 VP/1r; 
 the/00r; 
 dog/10r; 
 chased/01r; 
 NP/11r; 
 the/011r; 
 cat/111r}

Span roles R:
{S1/025; 
 NP/012; 
 the/01; 
 dog/12; 
 VP/235; 
 chased/23; 
 NP/345; 
 the/34; 
 cat/45}

In [5]:

# print("Recursive roles r:\n" + sentence1.recursiveFRtoString())  
# print("Range roles R:\n" + sentence1.spanFRtoString())
def printAllRoles(sentence):
    print("Recursive roles r:\n" + sentence.recursiveFRtoString())  
    print("Span roles R:\n" + sentence.spanFRtoString())
printAllRoles(sentence1)

Out[5]:

Recursive roles r:
{S1/r; 
 NP/0r; 
 VP/1r; 
 the/00r; 
 dog/10r; 
 chased/01r; 
 NP/11r; 
 the/011r; 
 cat/111r}

Span roles R:
{S1/025; 
 NP/012; 
 the/01; 
 dog/12; 
 VP/235; 
 chased/23; 
 NP/345; 
 the/34; 
 cat/45}

Works for n-ary trees, too:

In [6]:

sentence3 = Tree('S1 (NP (the dog) VP (chased NP (the cat) PP (P (up) NP (the tree))))')
printAllRoles(sentence3)

Out[6]:

Recursive roles r:
{S1/r; 
 NP/0r; 
 VP/1r; 
 the/00r; 
 dog/10r; 
 chased/01r; 
 NP/11r; 
 PP/21r; 
 the/011r; 
 cat/111r; 
 P/021r; 
 NP/121r; 
 up/0021r; 
 the/0121r; 
 tree/1121r}

Span roles R:
{S1/028; 
 NP/012; 
 the/01; 
 dog/12; 
 VP/2358; 
 chased/23; 
 NP/345; 
 the/34; 
 cat/45; 
 PP/568; 
 P/56; 
 up/56; 
 NP/678; 
 the/67; 
 tree/78}

##Grammars

[The Harmonic Mind Section 10.1: "$\S$10.1" in the Master Tableau]

###Harmonic Grammars

The Grammatical Harmony $H_{\cal{G}}(\tt{s})$ of a discrete symbol structure $\tt{s}$ is gotten by summing the contributions of all the "soft rules" { $R_{ij}$ } that define the Harmonic Grammar $\cal{G}$ .

(1) $R_{ij}$ : If $\tt{s}$ simultaneously contains the constituents $c_i$ and $c_j$ , then add the numerical quantity $H(c_i, c_j)$ to the grammatical Harmony $H_{\cal{G}}(\tt{s})$ .

Equivalently, we can speak of "soft constraints":

(2) $C_{ij}$ : $\tt{s}$ must not/may simultaneously contain the constituents $c_i$ and $c_j$ (strength: $w_{ij}$ ).

where $w_{ij} \equiv | H(c_i, c_j)|$ , and if $H(c_i, c_j) < 0$ , "must not" applies in (2), otherwise "may" applies.

A Harmonic Grammar can (and will) be used in several ways:

Specify a set of trees as grammatical: those with the highest value of $H_{\cal{G}}(\tt{s})$ (often designed to be 0; illicit trees have negative Grammatical Harmony)
Specify a well-formedness function over all trees: the well-formedness of $\tt{s}$ is simply $H_{\cal{G}}(\tt{s})$ .
Specify a probabilistic language: the probability of any tree $\tt{s}$ is proportional to $e^{H_{\cal{G}}(\tt{s})/T}$ , where $T$ is the randomness parameter temperature.

###Harmonic Normal Form (HNF)

As illustrated in (2), Harmonic Grammar -- in its strictest sense -- allows constraints only up to 2nd order, i.e., rewarding/penalizing the co-presence of at most 2 filler/role bindings. $\dagger$

$\dagger$ We will see in Class 6 that this originates in the fact that (standard) neural network interactions occur between pairs of units/nodes in the network.

Even for trees with as low a branching factor as 2, though, 2nd order constraints do not suffice; consider:

${\cal G}_{1} ≡ \{\tt{A → B \ C; A → D \ E; F → B \ E}\}$

and the ill-formed local tree $\tt{[_{A} \ B \ E]}$ .

Solution: Impose

(2.1) The Unique Branching Condition (a.k.a. Invertibility): There can be at most one branching rule with a given left-hand side.

This is respected by grammars in Harmonic Normal Form (HNF); all rules must of the form shown in Fig. 1 (HNF has an enforced distinction between bracketed and unbracketed non-terminal symbols).

The filler/role bindings of HNF trees can also be retrieved from an object of Class $\tt Tree$ :

In [7]:

sentence2 = Tree('S (S[1] (A A))') # HNF tree 
printAllRoles(sentence2)

Out[7]:

Recursive roles r:
{S/r; 
 S[1]/0r; 
 A/00r; 
 A/10r}

Span roles R:
{S/02; 
 S[1]/012; 
 A/01; 
 A/12}

Translation from a CNF to an HNF covering grammar is straightforward.

In [8]:

%run ../../code/grammar
gram1 = Grammar('S -> NP VP; NP -> the dog|the cat; VP -> chased NP')
gram2 = Grammar('A -> B C; A -> D E; F -> B E')
def printAllRewriteRules(gram):
    print('CNF:\n' + gram.cnfRulesToString() + '\n')
    print('HNF:\n' + gram.hnfRulesToString() + '\n')
printAllRewriteRules(gram2)

Out[8]:

CNF:
{A -> B C; 
 A -> D E; 
 F -> B E}

HNF:
{A -> A[1]; 
 A -> A[2]; 
 A[1] -> B C; 
 A[2] -> D E; 
 F -> F[1]; 
 F[1] -> B E}

You can also enter a listing of rules from a file (containing every rule on a separate line).

In [9]:

gram2 = Grammar('test-grammar.txt')
printAllRewriteRules(gram2)

Out[9]:

CNF:
{S -> A B; 
 S -> C B; 
 B -> D D}

HNF:
{S -> S[1]; 
 S -> S[2]; 
 S[1] -> A B; 
 S[2] -> C B; 
 B -> B[1]; 
 B[1] -> D D}

**Homework Exercise 1-1**: Apply the CNF

\rightarrow

HNF transformation to

{\cal G}_{1}

above and explain how it solves the problem.

###From an HNF Grammar to a Harmonic Context-Free Grammar (HCFG)

Now: how to specify exactly the trees generated by a given HNF grammar (or the language $\cal{L}$ of their terminal strings) using a Harmonic Grammar?

For the proof, see THM vol. 1 pp. 397-398. For the intuition, see the picture in Fig. 2.

The iP[y] Class $\tt Grammar$ includes the Harmonic Grammar weighted constraints as well as the rewrite rules in CNF and HNF.

In [10]:

# gram1 = Grammar('S -> NP VP; NP -> the dog|the cat; VP -> chased NP')
def printAllRules(gram):
    printAllRewriteRules(gram)
    print('HG rules:\n' + gram.hgRulesToString() + '\n')
printAllRules(gram1)

Out[10]:

CNF:
{S -> NP VP; 
 NP -> the dog; 
 NP -> the cat; 
 VP -> chased NP}

HNF:
{S -> S[1]; 
 S[1] -> NP VP; 
 NP -> NP[1]; 
 NP -> NP[2]; 
 NP[1] -> the dog; 
 NP[2] -> the cat; 
 VP -> VP[1]; 
 VP[1] -> chased NP}

HG rules:
{[(S/r, S[1]/0r), 2];
 [(S[1]/0r, NP/00r), 2];
 [(S[1]/0r, VP/10r), 2];
 [(NP/00r, NP[1]/000r), 2];
 [(NP/00r, NP[2]/000r), 2];
 [(VP/10r, VP[1]/010r), 2];
 [(NP[1]/000r, the/0000r), 2];
 [(NP[1]/000r, dog/1000r), 2];
 [(NP[2]/000r, the/0000r), 2];
 [(NP[2]/000r, cat/1000r), 2];
 [(VP[1]/010r, chased/0010r), 2];
 [(VP[1]/010r, NP/1010r), 2];
 [(NP/1010r, NP[1]/01010r), 2];
 [(NP/1010r, NP[2]/01010r), 2];
 [r, -1]; 
 [0r, -3]; 
 [00r, -2]; 
 [10r, -2]; 
 [000r, -3]; 
 [010r, -3]; 
 [0000r, -1]; 
 [1000r, -1]; 
 [0010r, -1]; 
 [1010r, -2]; 
 [01010r, -1]}

It is possible to enter a grammar into $\tt Grammar()$ in either conventional CFG form (CNF) OR in Harmonic Normal Form (HNF). $\tt Grammar()$ will convert rules entered in CNF to HNF, and vice versa. However, if rules are going to be entered in HNF, a flag must be set to indicate this (regardless of whether input is supplied as a string of grammar rules or as a file name).

Throughout the rest of the course, the default grammar we will study is the one following, ${\cal G}_{0}$ -- although to simplify even further, to minimize irrelevant distractions, we will often omit 'S -> S[1] | S[2]' and instead treat both S[1] and S[2] as legal start symbols (legal at the tree root).

In [11]:

gram0 = Grammar('S -> S[1] | S[2]; S[1] -> A A; S[2] -> B B', isHnf=True)
printAllRules(gram0)

Out[11]:

CNF:
{S -> A A; 
 S -> B B}

HNF:
{S -> S[1]; 
 S -> S[2]; 
 S[1] -> A A; 
 S[2] -> B B}

HG rules:
{[(S/r, S[1]/0r), 2];
 [(S/r, S[2]/0r), 2];
 [(S[1]/0r, A/00r), 2];
 [(S[1]/0r, A/10r), 2];
 [(S[2]/0r, B/00r), 2];
 [(S[2]/0r, B/10r), 2];
 [r, -1]; 
 [0r, -3]; 
 [00r, -1]; 
 [10r, -1]}

${\tt Grammar()}$ can handle recursive rules, but for the sake of creating the Harmonic Grammar constraints, a maximum depth must be set; the default is 6.

In [12]:

gram4 = Grammar('S -> S')
printAllRules(gram4)

Out[12]:

CNF:
{S -> S}

HNF:
{S -> S[1]; 
 S[1] -> S}

HG rules:
{[(S/r, S[1]/0r), 2];
 [(S[1]/0r, S/00r), 2];
 [(S/00r, S[1]/000r), 2];
 [(S[1]/000r, S/0000r), 2];
 [(S/0000r, S[1]/00000r), 2];
 [r, -1]; 
 [0r, -2]; 
 [00r, -2]; 
 [000r, -2]; 
 [0000r, -2]; 
 [00000r, -1]}

If you wish to change the maximum depth while using recursive rules, you can simply set the parameter $\tt maxDepth$ when calling the $\tt setHarmonicGrammarRules()$ method.

In [13]:

gram4.setHarmonicGrammarRules(maxDepth=10)
print(gram4.hgRulesToString() + '\n')

Out[13]:

{[(S/r, S[1]/0r), 2];
 [(S[1]/0r, S/00r), 2];
 [(S/00r, S[1]/000r), 2];
 [(S[1]/000r, S/0000r), 2];
 [(S/0000r, S[1]/00000r), 2];
 [(S[1]/00000r, S/000000r), 2];
 [(S/000000r, S[1]/0000000r), 2];
 [(S[1]/0000000r, S/00000000r), 2];
 [(S/00000000r, S[1]/000000000r), 2];
 [r, -1]; 
 [0r, -2]; 
 [00r, -2]; 
 [000r, -2]; 
 [0000r, -2]; 
 [00000r, -2]; 
 [000000r, -2]; 
 [0000000r, -2]; 
 [00000000r, -2]; 
 [000000000r, -1]}

###Probabilistic Harmonic Grammars (PHGs) and PCFGs

In the probabilistic version of HG, higher Harmony entails higher probability, according to: $^\dagger$

p({\tt s}) \propto e^{H_{\cal G}({\tt s})/T}

where $T$ , the randomness parameter or temperature, will be set to 1 for now. (More on $T$ in Class 6.)

$^\dagger$ This distribution can be derived in at least two ways:
Given that Harmonies combine additively and probabilities combine multiplicatively, the only continuous functions that can map Harmony to probability are exponential functions; choice of $T$ amounts to choice of which exponential function is deployed.
Alhough we omit the demonstration here, this distribution is a Maximum Entropy distribution, hence a consequence of the Maximum Entropy induction principle, which essentially states that a learner should extrapolate from observed data to the probability distribution that has the least information among those consistent with that data. 'Maxent' models in computational linguistics are probabilistic Harmonic Grammars (construed generally, to include constraints that may be higher than 2nd order); there, the term 'features' refers to what we call 'constraints' here. Hayes & Wilson 2006 is a seminal work applying Maxent modeling to phonology.

This means that

p({\tt s}) = Z^{-1} e^{H_{\cal G}({\tt s})/T}

where $Z$ is the normalizing constant ensuring that the probabilities, summed over all discrete trees $\tt{s}$ , equals 1.

Computing $Z$ is a perennial challenge. But note that in probability ratios, $Z$ cancels so need not be calculated; we have simply:

\frac{p({\tt s}_{1})}{p({\tt s}_{2})} = e^{[H_{\cal G}({\tt s}_1) - H_{\cal G}({\tt s}_2)]/T}

or equivalently, in terms of log-probabilities (log to base $e$ , i.e., the natural logarithm $\ln$ ):

\log \left( \frac{p({\tt s}_{1})}{p({\tt s}_{2})} \right) = [H_{\cal G}({\tt s}_1) - H_{\cal G}({\tt s}_2)]/T

The $\tt Grammar$ class can also be used for PCFGs by setting the $\tt isPcfg$ flag to $\tt True$ .

In [14]:

gram5 = Grammar('S -> 0.6 A A; S -> 0.4 B B', isPcfg=True)
print(gram5.hnfRulesToString() + '\n')

Out[14]:

{S -> 0.6 S[1]; 
 S -> 0.4 S[2]; 
 S[1] -> 1.0 A A; 
 S[2] -> 1.0 B B}

Within $\tt Grammar$ , differential probability is implemented by adjusting the 1st-order Harmonic Grammar constraint wieghts differentially for $\tt S[1]$ and $\tt S[2]$ ; to achieve the specified ratio

\frac{p({\tt s}_{1})}{p({\tt s}_{2})}

these 1st-order constraint weights are adjusted so that their difference equals

\Delta H \equiv H_{\cal G}({\tt s}_1) - H_{\cal G}({\tt s}_2)

(recall that $T = 1$ for now). This is done by increasing by $\Delta H$ the weight of the 1st-order constraint for higher-probability symbol, $\tt S[1]$ .

To check this, we can get the Harmony of the two possible grammatical trees in this grammar using the $\tt getHarmony()$ method. Note that trees must be entered in HNF.

In [15]:

H1 = gram5.getHarmony('S (S[1] (A A))')
H1

Out[15]:

/projects/e0c271a6-81f7-42a7-9681-fa02642c85a5/code/grammar.py:434: RuntimeWarning: divide by zero encountered in log
  harmonyDiff = np.log(rhs[0]) - np.log(1 - rhs[0])

0.40546510810816372

In [16]:

H2 = gram5.getHarmony('S (S[2] (B B))')
H2

Out[16]:

0.0

It is also possible to get the Harmony of ungrammatical structures (so long as all fillers and roles in the structure are valid).

In [17]:

Hu = gram5.getHarmony('S (S[1] (A B))')
Hu

Out[17]:

-1.5945348918918363

The probability of a structure can be found with the computeProb() method.

In [18]:

p1 = gram5.computeProb('S (S[1] (A A))')
p1

Out[18]:

0.10771600362000196

In [19]:

p2 = gram5.computeProb('S (S[2] (B B))')
p2

Out[19]:

0.071810669080001346

In [20]:

pu = gram5.computeProb('S (S[1] (A B))')
pu

Out[20]:

0.014577775859028964

In [21]:

print('log probability ratio S[1] to S[2] = ')
np.log(p1/pu)

Out[21]:

log probability ratio S[1] to S[2] = 

2.0

In [22]:

print('Harmony difference S[1] minus S[2] = ')
H1 - Hu

Out[22]:

Harmony difference S[1] minus S[2] = 

2.0

**Homework Exercise 1-2**: Verify the corresponding *probability ratio ~ Harmony difference* relation for

\tt (p1, pu)

\tt (H1, Hu)

#Devoicing

(Note: We are considering only purely discrete output candidates until Class 3)

An OT account with 2 constraints Faith(voi), *Voi/Cod

A corresponding HG account

Probability of error harmonemes; harrmons

Product

Resources

Company