#Gradient Symbolic Computation & Incremental Processing
Pyeong Whan Cho ([email protected])
Department of Cognitive Science, Johns Hopkins University
##Example 1: Bifurcation in the GSC network
We investigate a simple artificial language which was designed to reveal two important computational problems arising in incremental processing. To help understanding the model, we introduce several concepts from dynamical systems theory without formal definitions.
The plots suggest that the model converged to a state. We call it an equilibrium point (or fixed point); once the state is set to an equilibrium point, the system does not change and stay there forever. Moreover, in this example, the state seems to be attracting. We can test if the state is attracting by adding a small random noise to the current state and then allowing the model to update its state.
The model converged to the equilibrium point again. This type of equilibrium point is called an attractor (or sink) (see the lecture slides). For an attractor, we can think of a set of states that will move to the attractor. This set of states is called a basin of attraction. In the GSC model, they correspond to the the top of each hump in harmony landscape. When , the GSC network has a single global optimum; the harmony surface has a single peak. Thus, even if a very large noise is added to the current state, the system will converge to the equilibrium point in the long run.
We interpret the end state as a blend of four different sentences. Notice that in each role, not all fillers are equally activated. This is because some f/r bindings (e.g., A/R[0,1,2]) do not receive support from others while other f/r bindings (e.g., S[1]/R[0,1,2]) receive support from others (their mothers or their daughters; e.g., A/R[0,1] and B/R[1,2]).
Now imagine what will happen if increases; remember that a non-zero introduces competition among fillers in each role. You can see ungrammatical f/r bindings cannot be a winner in the competition (when no noise is added [T = 0]) because they are less activated than grammatical f/r bindings. For example, in , A, B, C, D, _ cannot be a winner. In a highly noisy situation, this does not hold. In a mildly noisy situation, not all grid points (poles) won't be chosen. Now let us change gradually and slowly.
With the change, the equilibrium state has moved to a new equilibrium point.
When becomes greater than a certain value, there is a qualitative change in the harmony landscape. Now the landscape has multiple (in this example, four) basins of attraction each of which is associated with a sentence. So even with a very small precision error, the state quickly moves away from the balanced blend state and toward one of the grammatical grid point. This change is called and the parameter (in the current example, ) that introduces this kind of qualitative change is called . It is possible to analyze bifurcation more accurately by using specialized software (e.g., COCO).
Now let us investigate what happens when the model receives external input. We will present a sentence A B sequentially, one word at a time, and check if the system can build the target constituent structure, {S[1]/R[0,1,2], A/R[0,1], B/R[1,2]}. Notice that this is not an easy task because the second word is consistent with both S[1] and S[3] so the model needs memory to keep what it has processed with the first word input. Word input will be modeled by external input supporting a target constituent. Harmony landscape changes whenever new word input is presented to the model.
The system converges to an equilibrium point which is a blend of four sentences. Because A/R[0,1] is receiving external input, the constituent is more strongly activated than D/R[0,1]. In the role R[0,1,2], S[1] and S[2] are more strongly activated than S[3] and S[4] because they receive supporting signal from A/R[0,1]. We can say the model prefers S1 and S2 to S3 and S4. If we increase at this state with no noise (T = 0), S[3] and S[4] cannot be a winner in the competition in the role R[0,1,2].
Now we can remove the first word and then provide the second word input to the model. Again, and are fixed to 0.
With the new word input, the system converged to a new equilbrium point. As in the previous case, the equilibrium point is a blend of four sentences. At this time, however, S1 and S3 are more weighted than S2 and S4; S[1] and S[3] are more strongly activated than S[2] and S[4] in the R[0,1,2] because they receive supporting signal from B/R[1,2] which receives supporting signal from external input. Notice that in a sense, the model lost what it had processed. Given the first word, the model should prefer S[1] to S[3] in the R[0,1,2] but it didn't.
One solution would be to provide cumulative external input; in other words, when the second word is presented, we can present both words A and B. But it is a quesiton how a system can keep the input in this way. We propose that by controlling appropriately, the model can correctly build the target structure without losing the information from the first word.
Before presenting the second word, we will increase .
The model disprefers impossible structures (S3 and S4) given the current input (A/R[0,1]) more strongly. When becomes greater than a certain value (Getting its estimate requires more careful bifurcation analysis), bifurcation happened; in the current setting, there are multiple attractors. To show it, we set the initial state to a particular grid point corresponding to an ideal S3 representation and then run the model.
Now two possible sentences are seprated from the other two impossible sentences.
The model converged to a different equilibrium point (in this case, attractor) when the initial state was set to a grid point corresponding to the ideal representation of S3. At this level of , the harmony landscape has multiple humps (basins of attraction) separated by valleys---. Now check if the previous end state is an attractor.
When the model was perturbed with small noise, it seemed to return to the previous end state. Thus, the backuped state seems to be an attractor. (Caution: This is not an exact investigation. Near the saddle point, the rate of change can be very small so the system may look staying at a point.)
Now let us investigate what happens when a large noise is added.
At this time, the model converged to a new equilibrium point corresponding to S2. The topology of the attractors is presented below. The figure shows topology of harmony landscape when = 35. (The below figure was created by running more careful bifurcation analysis with a model with local representation. Due to precision issues, the equilibrium points from the above simulation may differ slightly from the equilibrium points in the below figure.)
<img src="images/gramAmbi-inputA-gamma35.png", height=300px>
Now set q_rate, T, and T_decay_rate to 0 again and present the second word alone to the model.
The model correctly built the target representation of the sentence from sequentially presented words. This is because after processing the first word with increasing , the state will be more likely to be in the basins of the blend of S1 and S2 (or S1 and S2 if T is not 0) which are separated from the basins of S3 and S4 by deep harmony barriers. The new word input 'B/R[1,2]' changes harmony landscape so S3 is as good as S1. However, the state is close to the basin of S1 so the model will converge to S1.
<img src="images/gramAmbi-inputB-gamma35.png", height=300px>
The investigation of the model's behavior suggests that in principle, the model can build a target constituent structure from sequentially presented input. Finally, we run the model in more reasonable way below. will be set to 0 at the beginning and then increase over time in a fixed rate. Temperature is fixed to a small value (T = 0.005). Due to this noise, the model won't converge to a blend state at which S1 and S2 are perfectly balanced. However, the state will be still close to S1 and S2. When the second word is presented, the region close to S2 becomes unstable---the region goes down in the harmony space.