Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
1577 views


�t�W�{�@s�dZddlZddlZddlZddlZddlmZddl	m
Z
ddlmZGdd�de
�ZdS)zw
GSC Software: Grammar class
Author: Nick Becker
Department of Cognitive Science, Johns Hopkins University
Summer 2015
�N)�OrderedDict)�product)�Treec@s�eZdZdZed�Zdddd�Zdd�Zdd	�Zd
d�Z	dd
�Z
dd�Zdd�Zdd�Z
ddd�Zddd�Zdd�Zdd�Zddddd d!�Zd"d#�Zd$d%�Zd&d'�Zdd(d)�Zd*d+�Zd,d-�Zd.d/�Zdd0d1�Zd2d3�Zd4d5�Zd6d7�Zd8d9�Zd:d;�Zd<d=�Z d>d?�Z!d@dA�Z"dBS)C�GrammarzQ
    The last grammar class you will ever need. Currently under development.
    �InfFc	Cs�||_g}tjj|�r[t|d�}x|D]}|j|�q7W|j�n|jd�}tj	d�}xct
t|��D]O}||j�||<tj
|||�dkr�tjd||d�q�W|s|j|�|_|j�\|_|_n*|j|�|_|j�\|_|_|jr]|j�\|_|_d|_d|_d|_d|_d|_dS)a�
        Create a grammar object from a grammar specification.

        If grammar rules are being passed in a file, they must be written in the form
            A -> B C | D E
            F -> G H
        ... etc. Separate rules must be written on separate lines.

        If grammar rules are being passed as an argument, they must be written in the form
            "A -> B C | D E; F -> G H"
        ... etc. Separate rules must be separated by a semicolon (;).

        In both cases, rules with multiple right-hand sides can use a vertical bar ('|') to
        separate different right-hand sides, or every rule can just be written on a separate line.
        All sister symbols must be separated by a space (' '). The first node of the first
        rule will be treated as the root node internally. For now, all rules must be binary
        branching at most; this is enforced within the program.

        By default, this function accepts rules in CNF form and converts them to HNF.
        If you want to enter HNF rules using the standard bracketing practice, you must set the hnf
        parameter to True by passing this as a value to the function.

        Rules can be entered in HNF; if they are, the flag 'isHnf' must be set to 'True.'
        PCFG functionality is still under development. For now, best not to use this functionality.
        Some features (i.e. 'toString' methods) may not work if you do.

        This function is based on one originally written by Colin Wilson.
        �r�;z.+->.+Nz
Error: input z is not formatted correctly.F)�isPcfg�os�path�isfile�open�append�close�split�re�compile�range�len�strip�match�sys�exit�createRules�cnfRules�cnfToHnf�hnfRules�
branchSyms�hnfToCnf�setRemainingProbabilities�
hgRulesAreSet�networkInfoIsSet�allGridpointsAreSet�zIsSet�allHarmoniesAreSet)	�self�rulesr	ZisHnf�listOfRules�f�lineZruleForm�i�r+�9/projects/18c77389-a5c3-49de-946a-7593b53d3fb2/grammar.py�__init__s2	

					zGrammar.__init__cCsd}tjd�}t�}x�|D]�}|jd�}|dj�}|djd�}x�|D]�}	|	j�}
t|
�}|jr�tj||
d�dkr�t|
d�|
d<|d8}||kr�t	j
dtt|
��d	�|j|g�j
|
�qdWq%W|S)
z`
        Create the internal representation of the CFG passed as parameter listOfRules.
        �z
^0*1?\.?\d+?$z->r��|NzSeriously? Do you really need z-ary trees?)rrrrrrr	r�floatrr�str�
setdefaultr)r%r'�maxN�flr&�ruleZ	splitRule�lhs�wholeRhs�rhsZrhsListZ	nChildrenr+r+r,rhs$	

%
!zGrammar.createRulesc
Cs�t�}g}tjd�}d}x^|jj�D]M}tj||�dkrXd}d}x |j|D]}t|�}|dkrl|dt|�d}	|	|kr�|j|	�|j	r t
|dt�r |j|g�j|d|	g�|j|	g�j|dd��n?|j|g�j|	g�|j|	g�j|dd��|d7}ql||j|�qlWq4W|s�t
jd	d
d�||fS)a
        Convert rules in "Conventional Normal Form" to Harmonic Normal Form. (In other
        words, add intermediate bracketed symbols.) Must be called after self.cnfRules has
        been created.

        This function is based on one originally written by Colin Wilson.
        z	.*\[\d+\]TNFr/�[�]rzOError: HNF-style bracketing detected. If you meant to enter your rules in HNF, zZyou must pass parameter 'True' to setRules(); otherwise, you should avoid using HNF-style z#bracketing structure in your nodes.)rrrr�keysrrr2rr	�
isinstancer1r3rr)
r%rr�b�	goodInputr7ZbracketIndexr9�degreeZnewSymr+r+r,r�s4	
#&#
zGrammar.cnfToHnfcCs�t�}g}tjd�}d}x�|jj�D]}tj||�dkr4|j|�|jd�d}x8|j|D])}|j|g�j|dd��q�Wd}q4W|s�t	j
dd�||fS)	z�
        Convert rules in Harmonic Normal Form to "Conventional Normal Form." (In other
        words, remove intermediate bracketed symbols.) Must be called after self.hnfRules
        has been created.
        z	.*\[\d+\]FNr:rTzQError: HNF not detected. If you did not intend to use HNF, do not pass parameter z'True' to setRules().)rrrrr<rrrr3rr)r%rrr>r?r7ZnewLhsr9r+r+r,r�s	
'
	zGrammar.hnfToCnfc	Cs�|js|j�t|�}|j�}i}xx|D]p}|d|jkrjtjd|dd�|d|jkr�tjd|dd�|d||d<q8Wg}x�|jD]�}||kr3|jrtjd|ddd	|j	d
d|j
��n#tjd|dd|j
��dgt|j�}d||jj||�<||7}q�Wt
jt
j|��S)
z�
        Use grammar to convert a tree in HNF (input as text) to a state based on the current
        network settings.
        rzError: Invalid filler (z).r/zError: Invalid role (zError: Role z* not in tree. Check that 
(a) you entered zQyour tree in HNF, 
(b) null elements in the tree are explicitly represented with zthe null symbol (z)), and 
(b) your tree is licensed by the zfollowing grammar:

zLyour tree in HNF, and 
(b) your tree is licensed by the following grammar:

)r!�setNetworkInforZ
getFRbindings�fillerNamesrr�	roleNames�padWithNulls�
nullSymbol�hnfRulesToStringr�index�np�	transpose�array)	r%ZinputStringZcurrentTreeZ
frBindingsZbyRole�binding�state�roleZthisRoler+r+r,�hnfTreeToState�s0	

	3zGrammar.hnfTreeToStatecCsYtj|�}|dd�}|jdd�}|jdd�}|jdd�}|S)zS
        Convert state stored as a column vector to a string of 1s and 0s.
        r/� ��.�
�����)rH�	array_str�replace)r%rL�returnStringr+r+r,�
stateToString�szGrammar.stateToStringcCs�|j|jg}xn|D]f}x]|j�D]O}||}g}g}d}x�|D]�}t|dt�r�|ddks�|ddkr�tjdt||d�d�q�|j|d�qU|j|j	|��qUWt
|�dkr9dt|�}	|	t
|�}x"|D]}
||
jd|�qWt
jdt
|�|t|��dkr,tjd|d�q,WqW|d|dfS)a
        Fix rules such that if this is a PCFG, all rules that can expand to more than
        one right-hand side are given an equal probability to expand to each of those right-hand
        sides (asically, divide up remaining probability). Make sense?
        rr/zError: Invalid probability (z).g{�G�z�?zAError: Probabilities do not sum to 1 (check rules beginning with �))rrr<r=r1rrr2rrGr�sum�insertrH�abs)r%ZrulesSetr&r7r8ZcurrentProbValuesZ
noProbIndicesZprobToDistributer9ZprobToDividerGr+r+r,r	s*


 &
-z!Grammar.setRemainingProbabilitiescCs�x�|jD]�}|ddkr
|d|jkr
|ddd|dd}|j|jj|�|d7<|j|jj|�|d7<q
WdS)z�
        Adds values from self.biasAdjustments to self.biasVector to influence probability.
        Here, the probability difference is simply added to the most likely structure.
        r/r�/N)�biasAdjustments�INF�biasVector_byFiller�
allFRbindingsrG�biasVector_byRole)r%�
adjustmentZ	frBindingr+r+r,�adjustBiases(s
#!zGrammar.adjustBiasesr/cCs4|js|j�tj|j|�|�|jS)z�
        Compute probability of tree (input as text), using all gridpoints as the sample
        space and T = 1 by default.
        )r#�computeZrH�exp�
getHarmony�z)r%�tree�Tr+r+r,�computeProb4s	
zGrammar.computeProbcCsZ|js|j�tjt|jj���}tj||�j�|_	d|_
dS)z=
        Computes sum_i (exp(H(tree_i)/T)) for T = 1
        TN)r$�setAllHarmoniesrHrJ�list�allHarmonies�valuesrerYrgr#)r%rirmr+r+r,rd>s
	
zGrammar.computeZc	CsB|js|j�t|j�}t|j�}g}xJtt|j��D]3}dgt|j�}d||<|j|�qMWtt|dt|j���}t	j
d||t|�f�}xbtt|��D]N}t	jt	jt	j
||�||df��}||dd�|f<q�W||_d|_dS)zE
        Generate all gridpoints and evaluate their Harmony.
        rr/�repeat�shapeNT)r!rArrCrBrrrlrrH�zerosrI�reshaperJ�
allGridpointsr")	r%ZnRZnFZ	allPointsr*ZcurrentPointZallGridsListZallGridsMatZgridColr+r+r,�setAllGridpointsNs 	

!"2	zGrammar.setAllGridpointscCs�|js|j�|jjd}i}x\t|�D]N}|j|jdd�|f�}|j|jdd�|f�||<q6W||_d|_dS)z�
        Store the harmony for every gridpoint because, why not? This may take a while to run,
        but will only need to be run once per grammar.
        r/NT)	r"rtrsrprrWrfrmr$)r%ZnGridsrmr*ZstateKeyr+r+r,rkks	
"*	zGrammar.setAllHarmonies�T�_c	Cs�||_||_||_|jr0|j}n	|j}|j|�df}|r~||_|j||j�|_|j}g}g}g|_	|j
||||�\|_|_|j
|j|j�\|_|_d|_dS)zl
        Create Harmonic Grammar rules based on the CFG passed as parameter
        ruleDictionary.
        rTN)�maxDepth�useHnf�needNullFillerrr�getRootNoderErDZnullPaddedRulesr]�
expandHGrules�	hgWeights�hgBiases�sortHGrulesr )	r%rwrxZaddNullFillersrE�ruleSet�startr|r}r+r+r,�setHarmonicGrammarRules|s"								$$zGrammar.setHarmonicGrammarRulescCs�d}xG|j�D]9}x0||D]$}t|�}||kr$|}q$WqWxa|j�D]S}||jkr]x;||D]/}x&t|�|kr�|j|j�q�Wq}Wq]W|S)aG
        "Symmetrizes" the CFG grammar in ruleSet by padding projections with null symbols.
        For example, given the grammar
            S -> A; S -> B B,
        this function creates
            S -> A _; S -> B B
        Note that after this function is run, all parent nodes have the same number of children.
        r)r<rrrrE)r%rrEr4r7r9ZcurrentNr+r+r,rD�s
zGrammar.padWithNullsc
Cst|d�|jkr�|d|j�kr�x�||dD]r}d}|jr�|dd�}tj|d�tjd|d�}n|dd�}|ddkr�|j|d|dt|�g�n)|j|d|dt|�dg�d|d}x�|D]�}	|jj|	|g|g�|j|d|df|	|fdg�|j|	|f|||�\}}t	t
|d�d�|dd�}qWq>Wq�|j|d|ddg�n|j|d|ddg�||fS)	z�
        Recursive function to find Harmonic Grammar rules. Stops when all possible
        paths through the CFG are explored, or the maximum depth is reached.

        Right now, biases are added to roles only, not filler/role bindings.
        r/rNr�0r.rSrS)rrwr<r	rH�logrr]r{r2�int)
r%�parentrr|r}r9ZharmonyDiff�tempZ
childLevelZchildSymbolr+r+r,r{�s(	+()
*$3!zGrammar.expandHGrulescCs!d}x�|r�d}x�tt|�d�D]j}t||dd�t||ddd�kr,||}||d||<|||d<d}q,Wq	Wg}t�}xo|D]g}dj|dd|dd|dd|ddg�}	|	|kr�|j|�|j|	�q�Wd}x�|r�d}xztt|�d�D]b}t||d�t||dd�krK||}||d||<|||d<d}qKWq(Wg}
t�}xP|D]H}|d|df|kr�|
j|�|j|d|df�q�W||
fS)z�
        This is pretty sloppy. So it will remain until we come up with a a more
        clever data structure to store the HG rules (low priority right now).
        TFr/rrP)rr�set�joinr�add)r%r|r}ZneedSwappedr*r�ZhgWeightsNoDuplicates�seen�weightZstringWeightZhgBiasesNoDuplicatesZbiasPairr+r+r,r~�s@	4
	
;
	,
	

zGrammar.sortHGrulescCsc|js|j�g}g}x{tt|j��D]d}|j|d|krj|j|j|d�|j|d|kr5|j|j|d�q5W|jr�|j|kr�|j|j�g}xTtt|��D]@}x7tt|��D]#}|j||d||�q�Wq�Wtj	t|�t|�f�}x�tt|j
��D]�}|j|j
|ddd|j
|dd�}|j|j
|ddd|j
|dd�}	|j
|d|||	f<|j
|d||	|f<qYWtj	t|��}
tj	t|��}x�tt|j��D]�}|j|d}|j|d}
|j|d}xott|��D][}||dt|��|kr�||
|<||t|
�d�|
kr�|||<q�WqCW||_||_
||_|
|_||_||_d|_|rC|
|_n	||_|jr_|j�dS)zp
        Set the role names, filler names, weight matrix, and bias vector for
        this HNF grammar.
        rr/r\r.NT)r r�rrr}rryrErHrqr|rGrCrB�weightMatrixr_rar`r!�defaultBiasVectorr	rc)r%�biasByFillerrBrCr*r`�jr�Zindex1�index2r_raZ
currentFillerZcurrentRoleZcurrentBiasr+r+r,rAsX	
%!55 
!									zGrammar.setNetworkInfocCs�|js|j�t|t�r4|j|�}n|}tjtjtj|�|j�|�}tjtj|�|j	�}d||S)zD
        Calculate harmony of state input as column vector.
        g�?)
r!rAr=r2rNrH�dotrIr�r�)r%rLZstateVectorZhWeightZhBiasr+r+r,rfJs	
*zGrammar.getHarmonycCst|j��dS)zQ
        Given a dictionary of rules, find the first possible root node.
        r)rlr<)r%rr+r+r,rz[szGrammar.getRootNodecCs~g}g}|j�}g}xY|D]Q}xH||D]<}x3|D]+}||krC||krC|j|�qCWq6Wq%W|S)zI
        Given a dictionary of rules, find all terminal symbols.
        )r<r)r%rZ	terminalsZrhSidesr9r7�noder+r+r,�getTerminalNodesas

zGrammar.getTerminalNodescCs/|js|j�|j|j|j|jfS)zu
        Get the information needed to create a weight matrix for use in neural network
        computation.
        )r!rArCrBr�r�)r%r�r+r+r,�getNetworkInfoss	
zGrammar.getNetworkInfocCs�||jk}||jk}|rL|rLtjd|d|d�ne|sjtjd|d�nG|s�tjd|d�n)|j|jj|�|jj|�fSdS)z9
        Get specific weight from weight matrix.
        zError: 'z' and 'z%' are not valid filler/role bindings.z%' is not a valid filler/role binding.N)r`rrr�rG)r%Zbinding1Zbinding2Zbinding1_isValidZbinding2_isValidr+r+r,�	getWeight}s zGrammar.getWeightcCsN||jk}|s-tjd|d�n|j|jj|�dfSdS)z5
        Get specific bias from bias vector.
        zError: 'z%' is not a valid filler/role binding.rN)r`rrr�rG)r%rKZbindingIsValidr+r+r,�getBias�szGrammar.getBiascCsBd}x.|jj�D]}|t|j|�7}qWd}d}x�|jj�D]�}x�|j|D]�}|dkr�|d7}||d7}xttt|��D]`}|jr�|dkr�|t||�7}n|||7}|t|�dkr�|d7}q�W||dkr"|d7}|d7}qgWqSW|d7}|S)z9
        Gets a pretty string for the CNF rules.
        r�{rOz -> r/z; 
�})rr<rrr	r2)r%�nRulesr7�nStringifiedrVr9r*r+r+r,�cnfRulesToString�s*


zGrammar.cnfRulesToStringcCsBd}x.|jj�D]}|t|j|�7}qWd}d}x�|jj�D]�}x�|j|D]�}|dkr�|d7}||d7}xttt|��D]`}|jr�|dkr�|t||�7}n|||7}|t|�dkr�|d7}q�W||dkr"|d7}|d7}qgWqSW|d7}|S)z9
        Gets a pretty string for the HNF rules.
        rr�rOz -> r/z; 
r�)rr<rrr	r2)r%r�r7r�rVr9r*r+r+r,rF�s*


zGrammar.hnfRulesToStringcCs�|js|j�d}x�tt|j��D]�}|dkrK|d7}|d|j|ddd|j|ddd|j|ddd|j|dddt|j|d�d	7}|t|j�dkr/|d
7}q/W|d7}|S)
z:
        Gets a pretty string for the HG weights.
        r�rrOz[(r\r/z, z), r;z; 
r�rS)r r�rrr|r2)r%rVr*r+r+r,�hgWeightsToString�s	

|
zGrammar.hgWeightsToStringcCs�|js|j�d}x�tt|j��D]u}|dkrK|d7}|d|j|ddt|j|d�d7}|t|j�dkr/|d7}q/W|d	7}|S)
z9
        Gets a pretty string for the HG biases.
        r�rrOr:z, r/r;z; 
r�)r r�rrr}r2)r%rVr*r+r+r,�hgBiasesToString�s	

6
zGrammar.hgBiasesToStringcCsW|js|j�d}x�tt|j��D]�}|dkrK|d7}|d|j|ddd|j|ddd|j|ddd|j|dddt|j|d�d	7}q/Wxvtt|j��D]_}|d
|j|ddt|j|d�d7}|t|j�dkr�|d7}q�W|d
7}|jrSg}xA|jD]6}|ddkrj|d|j	krj|j
|�qjW|d7}x�tt|��D]�}|dkr�|d7}|d||ddd||dddt||d�d7}|t|�dkr�|d7}q�W|d
7}|S)z[
        Concatenates the HG weights and biases and gets a pretty string for them.
        r�rrOz[(r\r/z, z), z];
z [r;z; 
r�z;
{r:rS)r!rArrr|r2r}r	r]r^r)r%rVr*ZadjustmentsToPrintrbr+r+r,�hgRulesToString�s8	

�6
	#

E
zGrammar.hgRulesToStringcCsTd}xGtt|j��D]0}||j|dt|j|�d7}qW|S)z;
        Gets a pretty string for the bias vector.
        rPz, rR)rrr`r2r�)r%rVr*r+r+r,�biasVectorToString s.zGrammar.biasVectorToStringN)#�__name__�
__module__�__qualname__�__doc__r1r^r-rrrrNrWrrcrjrdrtrkr�rDr{r~rArfrzr�r�r�r�r�rFr�r�r�r�r+r+r+r,rs>M7%
&2?
&r)r��os.pathr
rr�numpyrH�collectionsr�	itertoolsrrhr�objectrr+r+r+r,�<module>s