Path: blob/master/src/java.base/share/classes/jdk/internal/icu/text/BidiBase.java
41161 views
/*1* Copyright (c) 2009, 2021, Oracle and/or its affiliates. All rights reserved.2* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.3*4* This code is free software; you can redistribute it and/or modify it5* under the terms of the GNU General Public License version 2 only, as6* published by the Free Software Foundation. Oracle designates this7* particular file as subject to the "Classpath" exception as provided8* by Oracle in the LICENSE file that accompanied this code.9*10* This code is distributed in the hope that it will be useful, but WITHOUT11* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or12* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License13* version 2 for more details (a copy is included in the LICENSE file that14* accompanied this code).15*16* You should have received a copy of the GNU General Public License version17* 2 along with this work; if not, write to the Free Software Foundation,18* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.19*20* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA21* or visit www.oracle.com if you need additional information or have any22* questions.23*/2425/*26*******************************************************************************27* Copyright (C) 2001-2014, International Business Machines28* Corporation and others. All Rights Reserved.29*******************************************************************************30*/3132/* FOOD FOR THOUGHT: currently the reordering modes are a mixture of33* algorithm for direct BiDi, algorithm for inverse Bidi and the bizarre34* concept of RUNS_ONLY which is a double operation.35* It could be advantageous to divide this into 3 concepts:36* a) Operation: direct / inverse / RUNS_ONLY37* b) Direct algorithm: default / NUMBERS_SPECIAL / GROUP_NUMBERS_WITH_L38* c) Inverse algorithm: default / INVERSE_LIKE_DIRECT / NUMBERS_SPECIAL39* This would allow combinations not possible today like RUNS_ONLY with40* NUMBERS_SPECIAL.41* Also allow to set INSERT_MARKS for the direct step of RUNS_ONLY and42* REMOVE_CONTROLS for the inverse step.43* Not all combinations would be supported, and probably not all do make sense.44* This would need to document which ones are supported and what are the45* fallbacks for unsupported combinations.46*/4748package jdk.internal.icu.text;4950import java.lang.reflect.Array;51import java.text.AttributedCharacterIterator;52import java.text.Bidi;53import java.util.Arrays;54import jdk.internal.access.JavaAWTFontAccess;55import jdk.internal.access.SharedSecrets;56import jdk.internal.icu.lang.UCharacter;57import jdk.internal.icu.impl.UBiDiProps;5859/**60*61* <h2>Bidi algorithm for ICU</h2>62*63* This is an implementation of the Unicode Bidirectional Algorithm. The64* algorithm is defined in the65* <a href="http://www.unicode.org/reports/tr9/">Unicode Standard Annex #9:66* Unicode Bidirectional Algorithm</a>.67* <p>68*69* Note: Libraries that perform a bidirectional algorithm and reorder strings70* accordingly are sometimes called "Storage Layout Engines". ICU's Bidi and71* shaping (ArabicShaping) classes can be used at the core of such "Storage72* Layout Engines".73*74* <h3>General remarks about the API:</h3>75*76* The "limit" of a sequence of characters is the position just after77* their last character, i.e., one more than that position.78* <p>79*80* Some of the API methods provide access to "runs". Such a81* "run" is defined as a sequence of characters that are at the same82* embedding level after performing the Bidi algorithm.83*84* <h3>Basic concept: paragraph</h3>85* A piece of text can be divided into several paragraphs by characters86* with the Bidi class <code>Block Separator</code>. For handling of87* paragraphs, see:88* <ul>89* <li>{@link #countParagraphs}90* <li>{@link #getParaLevel}91* <li>{@link #getParagraph}92* <li>{@link #getParagraphByIndex}93* </ul>94*95* <h3>Basic concept: text direction</h3>96* The direction of a piece of text may be:97* <ul>98* <li>{@link #LTR}99* <li>{@link #RTL}100* <li>{@link #MIXED}101* <li>{@link #NEUTRAL}102* </ul>103*104* <h3>Basic concept: levels</h3>105*106* Levels in this API represent embedding levels according to the Unicode107* Bidirectional Algorithm.108* Their low-order bit (even/odd value) indicates the visual direction.<p>109*110* Levels can be abstract values when used for the111* <code>paraLevel</code> and <code>embeddingLevels</code>112* arguments of <code>setPara()</code>; there:113* <ul>114* <li>the high-order bit of an <code>embeddingLevels[]</code>115* value indicates whether the using application is116* specifying the level of a character to <i>override</i> whatever the117* Bidi implementation would resolve it to.</li>118* <li><code>paraLevel</code> can be set to the119* pseudo-level values <code>LEVEL_DEFAULT_LTR</code>120* and <code>LEVEL_DEFAULT_RTL</code>.</li>121* </ul>122*123* <p>The related constants are not real, valid level values.124* <code>DEFAULT_XXX</code> can be used to specify125* a default for the paragraph level for126* when the <code>setPara()</code> method127* shall determine it but there is no128* strongly typed character in the input.<p>129*130* Note that the value for <code>LEVEL_DEFAULT_LTR</code> is even131* and the one for <code>LEVEL_DEFAULT_RTL</code> is odd,132* just like with normal LTR and RTL level values -133* these special values are designed that way. Also, the implementation134* assumes that MAX_EXPLICIT_LEVEL is odd.135*136* <p><b>See Also:</b>137* <ul>138* <li>{@link #LEVEL_DEFAULT_LTR}139* <li>{@link #LEVEL_DEFAULT_RTL}140* <li>{@link #LEVEL_OVERRIDE}141* <li>{@link #MAX_EXPLICIT_LEVEL}142* <li>{@link #setPara}143* </ul>144*145* <h3>Basic concept: Reordering Mode</h3>146* Reordering mode values indicate which variant of the Bidi algorithm to147* use.148*149* <p><b>See Also:</b>150* <ul>151* <li>{@link #setReorderingMode}152* <li>{@link #REORDER_DEFAULT}153* <li>{@link #REORDER_NUMBERS_SPECIAL}154* <li>{@link #REORDER_GROUP_NUMBERS_WITH_R}155* <li>{@link #REORDER_RUNS_ONLY}156* <li>{@link #REORDER_INVERSE_NUMBERS_AS_L}157* <li>{@link #REORDER_INVERSE_LIKE_DIRECT}158* <li>{@link #REORDER_INVERSE_FOR_NUMBERS_SPECIAL}159* </ul>160*161* <h3>Basic concept: Reordering Options</h3>162* Reordering options can be applied during Bidi text transformations.163*164* <p><b>See Also:</b>165* <ul>166* <li>{@link #setReorderingOptions}167* <li>{@link #OPTION_DEFAULT}168* <li>{@link #OPTION_INSERT_MARKS}169* <li>{@link #OPTION_REMOVE_CONTROLS}170* <li>{@link #OPTION_STREAMING}171* </ul>172*173*174* @author Simon Montagu, Matitiahu Allouche (ported from C code written by Markus W. Scherer)175* @stable ICU 3.8176*177*178* <h4> Sample code for the ICU Bidi API </h4>179*180* <h5>Rendering a paragraph with the ICU Bidi API</h5>181*182* This is (hypothetical) sample code that illustrates how the ICU Bidi API183* could be used to render a paragraph of text. Rendering code depends highly on184* the graphics system, therefore this sample code must make a lot of185* assumptions, which may or may not match any existing graphics system's186* properties.187*188* <p>189* The basic assumptions are:190* </p>191* <ul>192* <li>Rendering is done from left to right on a horizontal line.</li>193* <li>A run of single-style, unidirectional text can be rendered at once.194* </li>195* <li>Such a run of text is passed to the graphics system with characters196* (code units) in logical order.</li>197* <li>The line-breaking algorithm is very complicated and Locale-dependent -198* and therefore its implementation omitted from this sample code.</li>199* </ul>200*201* <pre>{@code202*203* package com.ibm.icu.dev.test.bidi;204*205* import com.ibm.icu.text.Bidi;206* import com.ibm.icu.text.BidiRun;207*208* public class Sample {209*210* static final int styleNormal = 0;211* static final int styleSelected = 1;212* static final int styleBold = 2;213* static final int styleItalics = 4;214* static final int styleSuper=8;215* static final int styleSub = 16;216*217* static class StyleRun {218* int limit;219* int style;220*221* public StyleRun(int limit, int style) {222* this.limit = limit;223* this.style = style;224* }225* }226*227* static class Bounds {228* int start;229* int limit;230*231* public Bounds(int start, int limit) {232* this.start = start;233* this.limit = limit;234* }235* }236*237* static int getTextWidth(String text, int start, int limit,238* StyleRun[] styleRuns, int styleRunCount) {239* // simplistic way to compute the width240* return limit - start;241* }242*243* // set limit and StyleRun limit for a line244* // from text[start] and from styleRuns[styleRunStart]245* // using Bidi.getLogicalRun(...)246* // returns line width247* static int getLineBreak(String text, Bounds line, Bidi para,248* StyleRun styleRuns[], Bounds styleRun) {249* // dummy return250* return 0;251* }252*253* // render runs on a line sequentially, always from left to right254*255* // prepare rendering a new line256* static void startLine(byte textDirection, int lineWidth) {257* System.out.println();258* }259*260* // render a run of text and advance to the right by the run width261* // the text[start..limit-1] is always in logical order262* static void renderRun(String text, int start, int limit,263* byte textDirection, int style) {264* }265*266* // We could compute a cross-product267* // from the style runs with the directional runs268* // and then reorder it.269* // Instead, here we iterate over each run type270* // and render the intersections -271* // with shortcuts in simple (and common) cases.272* // renderParagraph() is the main function.273*274* // render a directional run with275* // (possibly) multiple style runs intersecting with it276* static void renderDirectionalRun(String text, int start, int limit,277* byte direction, StyleRun styleRuns[],278* int styleRunCount) {279* int i;280*281* // iterate over style runs282* if (direction == Bidi.LTR) {283* int styleLimit;284* for (i = 0; i < styleRunCount; ++i) {285* styleLimit = styleRuns[i].limit;286* if (start < styleLimit) {287* if (styleLimit > limit) {288* styleLimit = limit;289* }290* renderRun(text, start, styleLimit,291* direction, styleRuns[i].style);292* if (styleLimit == limit) {293* break;294* }295* start = styleLimit;296* }297* }298* } else {299* int styleStart;300*301* for (i = styleRunCount-1; i >= 0; --i) {302* if (i > 0) {303* styleStart = styleRuns[i-1].limit;304* } else {305* styleStart = 0;306* }307* if (limit >= styleStart) {308* if (styleStart < start) {309* styleStart = start;310* }311* renderRun(text, styleStart, limit, direction,312* styleRuns[i].style);313* if (styleStart == start) {314* break;315* }316* limit = styleStart;317* }318* }319* }320* }321*322* // the line object represents text[start..limit-1]323* static void renderLine(Bidi line, String text, int start, int limit,324* StyleRun styleRuns[], int styleRunCount) {325* byte direction = line.getDirection();326* if (direction != Bidi.MIXED) {327* // unidirectional328* if (styleRunCount <= 1) {329* renderRun(text, start, limit, direction, styleRuns[0].style);330* } else {331* renderDirectionalRun(text, start, limit, direction,332* styleRuns, styleRunCount);333* }334* } else {335* // mixed-directional336* int count, i;337* BidiRun run;338*339* try {340* count = line.countRuns();341* } catch (IllegalStateException e) {342* e.printStackTrace();343* return;344* }345* if (styleRunCount <= 1) {346* int style = styleRuns[0].style;347*348* // iterate over directional runs349* for (i = 0; i < count; ++i) {350* run = line.getVisualRun(i);351* renderRun(text, run.getStart(), run.getLimit(),352* run.getDirection(), style);353* }354* } else {355* // iterate over both directional and style runs356* for (i = 0; i < count; ++i) {357* run = line.getVisualRun(i);358* renderDirectionalRun(text, run.getStart(),359* run.getLimit(), run.getDirection(),360* styleRuns, styleRunCount);361* }362* }363* }364* }365*366* static void renderParagraph(String text, byte textDirection,367* StyleRun styleRuns[], int styleRunCount,368* int lineWidth) {369* int length = text.length();370* Bidi para = new Bidi();371* try {372* para.setPara(text,373* textDirection != 0 ? Bidi.LEVEL_DEFAULT_RTL374* : Bidi.LEVEL_DEFAULT_LTR,375* null);376* } catch (Exception e) {377* e.printStackTrace();378* return;379* }380* byte paraLevel = (byte)(1 & para.getParaLevel());381* StyleRun styleRun = new StyleRun(length, styleNormal);382*383* if (styleRuns == null || styleRunCount <= 0) {384* styleRuns = new StyleRun[1];385* styleRunCount = 1;386* styleRuns[0] = styleRun;387* }388* // assume styleRuns[styleRunCount-1].limit>=length389*390* int width = getTextWidth(text, 0, length, styleRuns, styleRunCount);391* if (width <= lineWidth) {392* // everything fits onto one line393*394* // prepare rendering a new line from either left or right395* startLine(paraLevel, width);396*397* renderLine(para, text, 0, length, styleRuns, styleRunCount);398* } else {399* // we need to render several lines400* Bidi line = new Bidi(length, 0);401* int start = 0, limit;402* int styleRunStart = 0, styleRunLimit;403*404* for (;;) {405* limit = length;406* styleRunLimit = styleRunCount;407* width = getLineBreak(text, new Bounds(start, limit),408* para, styleRuns,409* new Bounds(styleRunStart, styleRunLimit));410* try {411* line = para.setLine(start, limit);412* } catch (Exception e) {413* e.printStackTrace();414* return;415* }416* // prepare rendering a new line417* // from either left or right418* startLine(paraLevel, width);419*420* if (styleRunStart > 0) {421* int newRunCount = styleRuns.length - styleRunStart;422* StyleRun[] newRuns = new StyleRun[newRunCount];423* System.arraycopy(styleRuns, styleRunStart, newRuns, 0,424* newRunCount);425* renderLine(line, text, start, limit, newRuns,426* styleRunLimit - styleRunStart);427* } else {428* renderLine(line, text, start, limit, styleRuns,429* styleRunLimit - styleRunStart);430* }431* if (limit == length) {432* break;433* }434* start = limit;435* styleRunStart = styleRunLimit - 1;436* if (start >= styleRuns[styleRunStart].limit) {437* ++styleRunStart;438* }439* }440* }441* }442*443* public static void main(String[] args)444* {445* renderParagraph("Some Latin text...", Bidi.LTR, null, 0, 80);446* renderParagraph("Some Hebrew text...", Bidi.RTL, null, 0, 60);447* }448* }449*450* }</pre>451*/452453/*454* General implementation notes:455*456* Throughout the implementation, there are comments like (W2) that refer to457* rules of the BiDi algorithm, in this example to the second rule of the458* resolution of weak types.459*460* For handling surrogate pairs, where two UChar's form one "abstract" (or UTF-32)461* character according to UTF-16, the second UChar gets the directional property of462* the entire character assigned, while the first one gets a BN, a boundary463* neutral, type, which is ignored by most of the algorithm according to464* rule (X9) and the implementation suggestions of the BiDi algorithm.465*466* Later, adjustWSLevels() will set the level for each BN to that of the467* following character (UChar), which results in surrogate pairs getting the468* same level on each of their surrogates.469*470* In a UTF-8 implementation, the same thing could be done: the last byte of471* a multi-byte sequence would get the "real" property, while all previous472* bytes of that sequence would get BN.473*474* It is not possible to assign all those parts of a character the same real475* property because this would fail in the resolution of weak types with rules476* that look at immediately surrounding types.477*478* As a related topic, this implementation does not remove Boundary Neutral479* types from the input, but ignores them wherever this is relevant.480* For example, the loop for the resolution of the weak types reads481* types until it finds a non-BN.482* Also, explicit embedding codes are neither changed into BN nor removed.483* They are only treated the same way real BNs are.484* As stated before, adjustWSLevels() takes care of them at the end.485* For the purpose of conformance, the levels of all these codes486* do not matter.487*488* Note that this implementation modifies the dirProps489* after the initial setup, when applying X5c (replace FSI by LRI or RLI),490* X6, N0 (replace paired brackets by L or R).491*492* In this implementation, the resolution of weak types (W1 to W6),493* neutrals (N1 and N2), and the assignment of the resolved level (In)494* are all done in one single loop, in resolveImplicitLevels().495* Changes of dirProp values are done on the fly, without writing496* them back to the dirProps array.497*498*499* This implementation contains code that allows to bypass steps of the500* algorithm that are not needed on the specific paragraph501* in order to speed up the most common cases considerably,502* like text that is entirely LTR, or RTL text without numbers.503*504* Most of this is done by setting a bit for each directional property505* in a flags variable and later checking for whether there are506* any LTR characters or any RTL characters, or both, whether507* there are any explicit embedding codes, etc.508*509* If the (Xn) steps are performed, then the flags are re-evaluated,510* because they will then not contain the embedding codes any more511* and will be adjusted for override codes, so that subsequently512* more bypassing may be possible than what the initial flags suggested.513*514* If the text is not mixed-directional, then the515* algorithm steps for the weak type resolution are not performed,516* and all levels are set to the paragraph level.517*518* If there are no explicit embedding codes, then the (Xn) steps519* are not performed.520*521* If embedding levels are supplied as a parameter, then all522* explicit embedding codes are ignored, and the (Xn) steps523* are not performed.524*525* White Space types could get the level of the run they belong to,526* and are checked with a test of (flags&MASK_EMBEDDING) to527* consider if the paragraph direction should be considered in528* the flags variable.529*530* If there are no White Space types in the paragraph, then531* (L1) is not necessary in adjustWSLevels().532*/533534// Original filename in ICU4J: Bidi.java535public class BidiBase {536537static class Point {538int pos; /* position in text */539int flag; /* flag for LRM/RLM, before/after */540}541542static class InsertPoints {543int size;544int confirmed;545Point[] points = new Point[0];546}547548static class Opening {549int position; /* position of opening bracket */550int match; /* matching char or -position of closing bracket */551int contextPos; /* position of last strong char found before opening */552short flags; /* bits for L or R/AL found within the pair */553byte contextDir; /* L or R according to last strong char before opening */554}555556static class IsoRun {557int contextPos; /* position of char determining context */558short start; /* index of first opening entry for this run */559short limit; /* index after last opening entry for this run */560byte level; /* level of this run */561byte lastStrong; /* bidi class of last strong char found in this run */562byte lastBase; /* bidi class of last base char found in this run */563byte contextDir; /* L or R to use as context for following openings */564}565566static class BracketData {567Opening[] openings = new Opening[SIMPLE_PARAS_COUNT];568int isoRunLast; /* index of last used entry */569/* array of nested isolated sequence entries; can never excess UBIDI_MAX_EXPLICIT_LEVEL570+ 1 for index 0, + 1 for before the first isolated sequence */571IsoRun[] isoRuns = new IsoRun[MAX_EXPLICIT_LEVEL+2];572boolean isNumbersSpecial; /*reordering mode for NUMBERS_SPECIAL */573}574575static class Isolate {576int startON;577int start1;578short stateImp;579short state;580}581582/** Paragraph level setting<p>583*584* Constant indicating that the base direction depends on the first strong585* directional character in the text according to the Unicode Bidirectional586* Algorithm. If no strong directional character is present,587* then set the paragraph level to 0 (left-to-right).<p>588*589* If this value is used in conjunction with reordering modes590* <code>REORDER_INVERSE_LIKE_DIRECT</code> or591* <code>REORDER_INVERSE_FOR_NUMBERS_SPECIAL</code>, the text to reorder592* is assumed to be visual LTR, and the text after reordering is required593* to be the corresponding logical string with appropriate contextual594* direction. The direction of the result string will be RTL if either595* the rightmost or leftmost strong character of the source text is RTL596* or Arabic Letter, the direction will be LTR otherwise.<p>597*598* If reordering option <code>OPTION_INSERT_MARKS</code> is set, an RLM may599* be added at the beginning of the result string to ensure round trip600* (that the result string, when reordered back to visual, will produce601* the original source text).602* @see #REORDER_INVERSE_LIKE_DIRECT603* @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL604* @stable ICU 3.8605*/606public static final byte LEVEL_DEFAULT_LTR = (byte)0x7e;607608/** Paragraph level setting<p>609*610* Constant indicating that the base direction depends on the first strong611* directional character in the text according to the Unicode Bidirectional612* Algorithm. If no strong directional character is present,613* then set the paragraph level to 1 (right-to-left).<p>614*615* If this value is used in conjunction with reordering modes616* <code>REORDER_INVERSE_LIKE_DIRECT</code> or617* <code>REORDER_INVERSE_FOR_NUMBERS_SPECIAL</code>, the text to reorder618* is assumed to be visual LTR, and the text after reordering is required619* to be the corresponding logical string with appropriate contextual620* direction. The direction of the result string will be RTL if either621* the rightmost or leftmost strong character of the source text is RTL622* or Arabic Letter, or if the text contains no strong character;623* the direction will be LTR otherwise.<p>624*625* If reordering option <code>OPTION_INSERT_MARKS</code> is set, an RLM may626* be added at the beginning of the result string to ensure round trip627* (that the result string, when reordered back to visual, will produce628* the original source text).629* @see #REORDER_INVERSE_LIKE_DIRECT630* @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL631* @stable ICU 3.8632*/633public static final byte LEVEL_DEFAULT_RTL = (byte)0x7f;634635/**636* Maximum explicit embedding level.637* (The maximum resolved level can be up to <code>MAX_EXPLICIT_LEVEL+1</code>).638* @stable ICU 3.8639*/640public static final byte MAX_EXPLICIT_LEVEL = 125;641642/**643* Bit flag for level input.644* Overrides directional properties.645* @stable ICU 3.8646*/647public static final byte LEVEL_OVERRIDE = (byte)0x80;648649/**650* Special value which can be returned by the mapping methods when a651* logical index has no corresponding visual index or vice-versa. This may652* happen for the logical-to-visual mapping of a Bidi control when option653* <code>OPTION_REMOVE_CONTROLS</code> is654* specified. This can also happen for the visual-to-logical mapping of a655* Bidi mark (LRM or RLM) inserted by option656* <code>OPTION_INSERT_MARKS</code>.657* @see #getVisualIndex658* @see #getVisualMap659* @see #getLogicalIndex660* @see #getLogicalMap661* @see #OPTION_INSERT_MARKS662* @see #OPTION_REMOVE_CONTROLS663* @stable ICU 3.8664*/665public static final int MAP_NOWHERE = -1;666667/**668* Left-to-right text.669* <ul>670* <li>As return value for <code>getDirection()</code>, it means671* that the source string contains no right-to-left characters, or672* that the source string is empty and the paragraph level is even.673* <li>As return value for <code>getBaseDirection()</code>, it674* means that the first strong character of the source string has675* a left-to-right direction.676* </ul>677* @stable ICU 3.8678*/679public static final byte LTR = 0;680681/**682* Right-to-left text.683* <ul>684* <li>As return value for <code>getDirection()</code>, it means685* that the source string contains no left-to-right characters, or686* that the source string is empty and the paragraph level is odd.687* <li>As return value for <code>getBaseDirection()</code>, it688* means that the first strong character of the source string has689* a right-to-left direction.690* </ul>691* @stable ICU 3.8692*/693public static final byte RTL = 1;694695/**696* Mixed-directional text.697* <p>As return value for <code>getDirection()</code>, it means698* that the source string contains both left-to-right and699* right-to-left characters.700* @stable ICU 3.8701*/702public static final byte MIXED = 2;703704/**705* option bit for writeReordered():706* keep combining characters after their base characters in RTL runs707*708* @see #writeReordered709* @stable ICU 3.8710*/711public static final short KEEP_BASE_COMBINING = 1;712713/**714* option bit for writeReordered():715* replace characters with the "mirrored" property in RTL runs716* by their mirror-image mappings717*718* @see #writeReordered719* @stable ICU 3.8720*/721public static final short DO_MIRRORING = 2;722723/**724* option bit for writeReordered():725* surround the run with LRMs if necessary;726* this is part of the approximate "inverse Bidi" algorithm727*728* <p>This option does not imply corresponding adjustment of the index729* mappings.</p>730*731* @see #setInverse732* @see #writeReordered733* @stable ICU 3.8734*/735public static final short INSERT_LRM_FOR_NUMERIC = 4;736737/**738* option bit for writeReordered():739* remove Bidi control characters740* (this does not affect INSERT_LRM_FOR_NUMERIC)741*742* <p>This option does not imply corresponding adjustment of the index743* mappings.</p>744*745* @see #writeReordered746* @see #INSERT_LRM_FOR_NUMERIC747* @stable ICU 3.8748*/749public static final short REMOVE_BIDI_CONTROLS = 8;750751/**752* option bit for writeReordered():753* write the output in reverse order754*755* <p>This has the same effect as calling <code>writeReordered()</code>756* first without this option, and then calling757* <code>writeReverse()</code> without mirroring.758* Doing this in the same step is faster and avoids a temporary buffer.759* An example for using this option is output to a character terminal that760* is designed for RTL scripts and stores text in reverse order.</p>761*762* @see #writeReordered763* @stable ICU 3.8764*/765public static final short OUTPUT_REVERSE = 16;766767/** Reordering mode: Regular Logical to Visual Bidi algorithm according to Unicode.768* @see #setReorderingMode769* @stable ICU 3.8770*/771private static final short REORDER_DEFAULT = 0;772773/** Reordering mode: Logical to Visual algorithm which handles numbers in774* a way which mimicks the behavior of Windows XP.775* @see #setReorderingMode776* @stable ICU 3.8777*/778private static final short REORDER_NUMBERS_SPECIAL = 1;779780/** Reordering mode: Logical to Visual algorithm grouping numbers with781* adjacent R characters (reversible algorithm).782* @see #setReorderingMode783* @stable ICU 3.8784*/785private static final short REORDER_GROUP_NUMBERS_WITH_R = 2;786787/** Reordering mode: Reorder runs only to transform a Logical LTR string788* to the logical RTL string with the same display, or vice-versa.<br>789* If this mode is set together with option790* <code>OPTION_INSERT_MARKS</code>, some Bidi controls in the source791* text may be removed and other controls may be added to produce the792* minimum combination which has the required display.793* @see #OPTION_INSERT_MARKS794* @see #setReorderingMode795* @stable ICU 3.8796*/797static final short REORDER_RUNS_ONLY = 3;798799/** Reordering mode: Visual to Logical algorithm which handles numbers800* like L (same algorithm as selected by <code>setInverse(true)</code>.801* @see #setInverse802* @see #setReorderingMode803* @stable ICU 3.8804*/805static final short REORDER_INVERSE_NUMBERS_AS_L = 4;806807/** Reordering mode: Visual to Logical algorithm equivalent to the regular808* Logical to Visual algorithm.809* @see #setReorderingMode810* @stable ICU 3.8811*/812static final short REORDER_INVERSE_LIKE_DIRECT = 5;813814/** Reordering mode: Inverse Bidi (Visual to Logical) algorithm for the815* <code>REORDER_NUMBERS_SPECIAL</code> Bidi algorithm.816* @see #setReorderingMode817* @stable ICU 3.8818*/819static final short REORDER_INVERSE_FOR_NUMBERS_SPECIAL = 6;820821/* Reordering mode values must be ordered so that all the regular logical to822* visual modes come first, and all inverse Bidi modes come last.823*/824private static final short REORDER_LAST_LOGICAL_TO_VISUAL =825REORDER_NUMBERS_SPECIAL;826827/**828* Option bit for <code>setReorderingOptions</code>:829* insert Bidi marks (LRM or RLM) when needed to ensure correct result of830* a reordering to a Logical order831*832* <p>This option must be set or reset before calling833* <code>setPara</code>.</p>834*835* <p>This option is significant only with reordering modes which generate836* a result with Logical order, specifically.</p>837* <ul>838* <li><code>REORDER_RUNS_ONLY</code></li>839* <li><code>REORDER_INVERSE_NUMBERS_AS_L</code></li>840* <li><code>REORDER_INVERSE_LIKE_DIRECT</code></li>841* <li><code>REORDER_INVERSE_FOR_NUMBERS_SPECIAL</code></li>842* </ul>843*844* <p>If this option is set in conjunction with reordering mode845* <code>REORDER_INVERSE_NUMBERS_AS_L</code> or with calling846* <code>setInverse(true)</code>, it implies option847* <code>INSERT_LRM_FOR_NUMERIC</code> in calls to method848* <code>writeReordered()</code>.</p>849*850* <p>For other reordering modes, a minimum number of LRM or RLM characters851* will be added to the source text after reordering it so as to ensure852* round trip, i.e. when applying the inverse reordering mode on the853* resulting logical text with removal of Bidi marks854* (option <code>OPTION_REMOVE_CONTROLS</code> set before calling855* <code>setPara()</code> or option856* <code>REMOVE_BIDI_CONTROLS</code> in857* <code>writeReordered</code>), the result will be identical to the858* source text in the first transformation.859*860* <p>This option will be ignored if specified together with option861* <code>OPTION_REMOVE_CONTROLS</code>. It inhibits option862* <code>REMOVE_BIDI_CONTROLS</code> in calls to method863* <code>writeReordered()</code> and it implies option864* <code>INSERT_LRM_FOR_NUMERIC</code> in calls to method865* <code>writeReordered()</code> if the reordering mode is866* <code>REORDER_INVERSE_NUMBERS_AS_L</code>.</p>867*868* @see #setReorderingMode869* @see #setReorderingOptions870* @see #INSERT_LRM_FOR_NUMERIC871* @see #REMOVE_BIDI_CONTROLS872* @see #OPTION_REMOVE_CONTROLS873* @see #REORDER_RUNS_ONLY874* @see #REORDER_INVERSE_NUMBERS_AS_L875* @see #REORDER_INVERSE_LIKE_DIRECT876* @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL877* @stable ICU 3.8878*/879static final int OPTION_INSERT_MARKS = 1;880881/**882* Option bit for <code>setReorderingOptions</code>:883* remove Bidi control characters884*885* <p>This option must be set or reset before calling886* <code>setPara</code>.</p>887*888* <p>This option nullifies option889* <code>OPTION_INSERT_MARKS</code>. It inhibits option890* <code>INSERT_LRM_FOR_NUMERIC</code> in calls to method891* <code>writeReordered()</code> and it implies option892* <code>REMOVE_BIDI_CONTROLS</code> in calls to that method.</p>893*894* @see #setReorderingMode895* @see #setReorderingOptions896* @see #OPTION_INSERT_MARKS897* @see #INSERT_LRM_FOR_NUMERIC898* @see #REMOVE_BIDI_CONTROLS899* @stable ICU 3.8900*/901static final int OPTION_REMOVE_CONTROLS = 2;902903/**904* Option bit for <code>setReorderingOptions</code>:905* process the output as part of a stream to be continued906*907* <p>This option must be set or reset before calling908* <code>setPara</code>.</p>909*910* <p>This option specifies that the caller is interested in processing911* large text object in parts. The results of the successive calls are912* expected to be concatenated by the caller. Only the call for the last913* part will have this option bit off.</p>914*915* <p>When this option bit is on, <code>setPara()</code> may process916* less than the full source text in order to truncate the text at a917* meaningful boundary. The caller should call918* <code>getProcessedLength()</code> immediately after calling919* <code>setPara()</code> in order to determine how much of the source920* text has been processed. Source text beyond that length should be921* resubmitted in following calls to <code>setPara</code>. The922* processed length may be less than the length of the source text if a923* character preceding the last character of the source text constitutes a924* reasonable boundary (like a block separator) for text to be continued.<br>925* If the last character of the source text constitutes a reasonable926* boundary, the whole text will be processed at once.<br>927* If nowhere in the source text there exists928* such a reasonable boundary, the processed length will be zero.<br>929* The caller should check for such an occurrence and do one of the following:930* <ul><li>submit a larger amount of text with a better chance to include931* a reasonable boundary.</li>932* <li>resubmit the same text after turning off option933* <code>OPTION_STREAMING</code>.</li></ul>934* In all cases, this option should be turned off before processing the last935* part of the text.</p>936*937* <p>When the <code>OPTION_STREAMING</code> option is used, it is938* recommended to call <code>orderParagraphsLTR(true)</code> before calling939* <code>setPara()</code> so that later paragraphs may be concatenated to940* previous paragraphs on the right.941* </p>942*943* @see #setReorderingMode944* @see #setReorderingOptions945* @see #getProcessedLength946* @stable ICU 3.8947*/948private static final int OPTION_STREAMING = 4;949950/*951* Comparing the description of the Bidi algorithm with this implementation952* is easier with the same names for the Bidi types in the code as there.953* See UCharacterDirection954*/955/* private */ static final byte L = 0;956private static final byte R = 1;957private static final byte EN = 2;958private static final byte ES = 3;959private static final byte ET = 4;960private static final byte AN = 5;961private static final byte CS = 6;962static final byte B = 7;963private static final byte S = 8;964private static final byte WS = 9;965private static final byte ON = 10;966private static final byte LRE = 11;967private static final byte LRO = 12;968private static final byte AL = 13;969private static final byte RLE = 14;970private static final byte RLO = 15;971private static final byte PDF = 16;972private static final byte NSM = 17;973private static final byte BN = 18;974private static final byte FSI = 19;975private static final byte LRI = 20;976private static final byte RLI = 21;977private static final byte PDI = 22;978private static final byte ENL = PDI + 1; /* EN after W7 */979private static final byte ENR = ENL + 1; /* EN not subject to W7 */980981// Number of directional types982private static final int CHAR_DIRECTION_COUNT = 23;983984/**985* Enumerated property Bidi_Paired_Bracket_Type (new in Unicode 6.3).986* Used in987* <a href="http://www.unicode.org/reports/tr9/">Unicode Standard Annex #9:988* Unicode Bidirectional Algorithm</a>.989* Returns UCharacter.BidiPairedBracketType values.990* @stable ICU 52991*/992public static final int BIDI_PAIRED_BRACKET_TYPE = 0x1015;993994/**995* Bidi Paired Bracket Type constants.996*997* @see UProperty#BIDI_PAIRED_BRACKET_TYPE998* @stable ICU 52999*/1000public static interface BidiPairedBracketType {1001/**1002* Not a paired bracket.1003* @stable ICU 521004*/1005public static final int NONE = 0;1006/**1007* Open paired bracket.1008* @stable ICU 521009*/1010public static final int OPEN = 1;1011/**1012* Close paired bracket.1013* @stable ICU 521014*/1015public static final int CLOSE = 2;1016/**1017* @stable ICU 521018*/1019public static final int COUNT = 3;1020}10211022/* number of paras entries allocated initially */1023static final int SIMPLE_PARAS_COUNT = 10;10241025private static final char CR = '\r';1026private static final char LF = '\n';10271028static final int LRM_BEFORE = 1;1029static final int LRM_AFTER = 2;1030static final int RLM_BEFORE = 4;1031static final int RLM_AFTER = 8;10321033/* flags for Opening.flags */1034static final byte FOUND_L = (byte)DirPropFlag(L);1035static final byte FOUND_R = (byte)DirPropFlag(R);10361037/*1038* The following bit is used for the directional isolate status.1039* Stack entries corresponding to isolate sequences are greater than ISOLATE.1040*/1041static final int ISOLATE = 0x0100;10421043/*1044* reference to parent paragraph object (reference to self if this object is1045* a paragraph object); set to null in a newly opened object; set to a1046* real value after a successful execution of setPara or setLine1047*/1048BidiBase paraBidi;10491050final UBiDiProps bdp;10511052/* character array representing the current text */1053char[] text;10541055/* length of the current text */1056int originalLength;10571058/* if the option OPTION_STREAMING is set, this is the length of1059* text actually processed by <code>setPara</code>, which may be shorter1060* than the original length. Otherwise, it is identical to the original1061* length.1062*/1063public int length;10641065/* if option OPTION_REMOVE_CONTROLS is set, and/or Bidi1066* marks are allowed to be inserted in one of the reordering modes, the1067* length of the result string may be different from the processed length.1068*/1069int resultLength;10701071/* indicators for whether memory may be allocated after construction */1072boolean mayAllocateText;1073boolean mayAllocateRuns;10741075/* arrays with one value per text-character */1076byte[] dirPropsMemory = new byte[1];1077byte[] levelsMemory = new byte[1];1078byte[] dirProps;1079byte[] levels;10801081/* are we performing an approximation of the "inverse Bidi" algorithm? */1082boolean isInverse;10831084/* are we using the basic algorithm or its variation? */1085int reorderingMode;10861087/* bitmask for reordering options */1088int reorderingOptions;10891090/* must block separators receive level 0? */1091boolean orderParagraphsLTR;10921093/* the paragraph level */1094byte paraLevel;10951096/* original paraLevel when contextual */1097/* must be one of DEFAULT_xxx or 0 if not contextual */1098byte defaultParaLevel;10991100/* the following is set in setPara, used in processPropertySeq */11011102ImpTabPair impTabPair; /* reference to levels state table pair */11031104/* the overall paragraph or line directionality*/1105byte direction;11061107/* flags is a bit set for which directional properties are in the text */1108int flags;11091110/* lastArabicPos is index to the last AL in the text, -1 if none */1111int lastArabicPos;11121113/* characters after trailingWSStart are WS and are */1114/* implicitly at the paraLevel (rule (L1)) - levels may not reflect that */1115int trailingWSStart;11161117/* fields for paragraph handling, set in getDirProps() */1118int paraCount;1119int[] paras_limit = new int[SIMPLE_PARAS_COUNT];1120byte[] paras_level = new byte[SIMPLE_PARAS_COUNT];11211122/* fields for line reordering */1123int runCount; /* ==-1: runs not set up yet */1124BidiRun[] runsMemory = new BidiRun[0];1125BidiRun[] runs;11261127/* for non-mixed text, we only need a tiny array of runs (no allocation) */1128BidiRun[] simpleRuns = {new BidiRun()};11291130/* fields for managing isolate sequences */1131Isolate[] isolates;11321133/* maximum or current nesting depth of isolate sequences */1134/* Within resolveExplicitLevels() and checkExplicitLevels(), this is the maximal1135nesting encountered.1136Within resolveImplicitLevels(), this is the index of the current isolates1137stack entry. */1138int isolateCount;11391140/* mapping of runs in logical order to visual order */1141int[] logicalToVisualRunsMap;1142/* flag to indicate that the map has been updated */1143boolean isGoodLogicalToVisualRunsMap;11441145/* for inverse Bidi with insertion of directional marks */1146InsertPoints insertPoints = new InsertPoints();11471148/* for option OPTION_REMOVE_CONTROLS */1149int controlCount;11501151/*1152* Sometimes, bit values are more appropriate1153* to deal with directionality properties.1154* Abbreviations in these method names refer to names1155* used in the Bidi algorithm.1156*/1157static int DirPropFlag(byte dir) {1158return (1 << dir);1159}11601161boolean testDirPropFlagAt(int flag, int index) {1162return ((DirPropFlag(dirProps[index]) & flag) != 0);1163}11641165static final int DirPropFlagMultiRuns = DirPropFlag((byte)31);11661167/* to avoid some conditional statements, use tiny constant arrays */1168static final int DirPropFlagLR[] = { DirPropFlag(L), DirPropFlag(R) };1169static final int DirPropFlagE[] = { DirPropFlag(LRE), DirPropFlag(RLE) };1170static final int DirPropFlagO[] = { DirPropFlag(LRO), DirPropFlag(RLO) };11711172static final int DirPropFlagLR(byte level) { return DirPropFlagLR[level & 1]; }1173static final int DirPropFlagE(byte level) { return DirPropFlagE[level & 1]; }1174static final int DirPropFlagO(byte level) { return DirPropFlagO[level & 1]; }1175static final byte DirFromStrong(byte strong) { return strong == L ? L : R; }1176static final byte NoOverride(byte level) { return (byte)(level & ~LEVEL_OVERRIDE); }11771178/* are there any characters that are LTR or RTL? */1179static final int MASK_LTR =1180DirPropFlag(L)|DirPropFlag(EN)|DirPropFlag(ENL)|DirPropFlag(ENR)|DirPropFlag(AN)|DirPropFlag(LRE)|DirPropFlag(LRO)|DirPropFlag(LRI);1181static final int MASK_RTL = DirPropFlag(R)|DirPropFlag(AL)|DirPropFlag(RLE)|DirPropFlag(RLO)|DirPropFlag(RLI);11821183static final int MASK_R_AL = DirPropFlag(R)|DirPropFlag(AL);11841185/* explicit embedding codes */1186private static final int MASK_EXPLICIT = DirPropFlag(LRE)|DirPropFlag(LRO)|DirPropFlag(RLE)|DirPropFlag(RLO)|DirPropFlag(PDF);1187private static final int MASK_BN_EXPLICIT = DirPropFlag(BN)|MASK_EXPLICIT;11881189/* explicit isolate codes */1190private static final int MASK_ISO = DirPropFlag(LRI)|DirPropFlag(RLI)|DirPropFlag(FSI)|DirPropFlag(PDI);11911192/* paragraph and segment separators */1193private static final int MASK_B_S = DirPropFlag(B)|DirPropFlag(S);11941195/* all types that are counted as White Space or Neutral in some steps */1196static final int MASK_WS = MASK_B_S|DirPropFlag(WS)|MASK_BN_EXPLICIT|MASK_ISO;11971198/* types that are neutrals or could becomes neutrals in (Wn) */1199private static final int MASK_POSSIBLE_N = DirPropFlag(ON)|DirPropFlag(CS)|DirPropFlag(ES)|DirPropFlag(ET)|MASK_WS;12001201/*1202* These types may be changed to "e",1203* the embedding type (L or R) of the run,1204* in the Bidi algorithm (N2)1205*/1206private static final int MASK_EMBEDDING = DirPropFlag(NSM)|MASK_POSSIBLE_N;12071208/*1209* the dirProp's L and R are defined to 0 and 1 values in UCharacterDirection.java1210*/1211private static byte GetLRFromLevel(byte level)1212{1213return (byte)(level & 1);1214}12151216private static boolean IsDefaultLevel(byte level)1217{1218return ((level & LEVEL_DEFAULT_LTR) == LEVEL_DEFAULT_LTR);1219}12201221static boolean IsBidiControlChar(int c)1222{1223/* check for range 0x200c to 0x200f (ZWNJ, ZWJ, LRM, RLM) or12240x202a to 0x202e (LRE, RLE, PDF, LRO, RLO) */1225return (((c & 0xfffffffc) == 0x200c) || ((c >= 0x202a) && (c <= 0x202e))1226|| ((c >= 0x2066) && (c <= 0x2069)));1227}12281229void verifyValidPara()1230{1231if (!(this == this.paraBidi)) {1232throw new IllegalStateException();1233}1234}12351236void verifyValidParaOrLine()1237{1238BidiBase para = this.paraBidi;1239/* verify Para */1240if (this == para) {1241return;1242}1243/* verify Line */1244if ((para == null) || (para != para.paraBidi)) {1245throw new IllegalStateException();1246}1247}12481249void verifyRange(int index, int start, int limit)1250{1251if (index < start || index >= limit) {1252throw new IllegalArgumentException("Value " + index +1253" is out of range " + start + " to " + limit);1254}1255}12561257/**1258* Allocate a <code>Bidi</code> object with preallocated memory1259* for internal structures.1260* This method provides a <code>Bidi</code> object like the default constructor1261* but it also preallocates memory for internal structures1262* according to the sizings supplied by the caller.<p>1263* The preallocation can be limited to some of the internal memory1264* by setting some values to 0 here. That means that if, e.g.,1265* <code>maxRunCount</code> cannot be reasonably predetermined and should not1266* be set to <code>maxLength</code> (the only failproof value) to avoid1267* wasting memory, then <code>maxRunCount</code> could be set to 0 here1268* and the internal structures that are associated with it will be allocated1269* on demand, just like with the default constructor.1270*1271* @param maxLength is the maximum text or line length that internal memory1272* will be preallocated for. An attempt to associate this object with a1273* longer text will fail, unless this value is 0, which leaves the allocation1274* up to the implementation.1275*1276* @param maxRunCount is the maximum anticipated number of same-level runs1277* that internal memory will be preallocated for. An attempt to access1278* visual runs on an object that was not preallocated for as many runs1279* as the text was actually resolved to will fail,1280* unless this value is 0, which leaves the allocation up to the implementation.<br><br>1281* The number of runs depends on the actual text and maybe anywhere between1282* 1 and <code>maxLength</code>. It is typically small.1283*1284* @throws IllegalArgumentException if maxLength or maxRunCount is less than 01285* @stable ICU 3.81286*/1287public BidiBase(int maxLength, int maxRunCount)1288{1289/* check the argument values */1290if (maxLength < 0 || maxRunCount < 0) {1291throw new IllegalArgumentException();1292}12931294/* reset the object, all reference variables null, all flags false,1295all sizes 0.1296In fact, we don't need to do anything, since class members are1297initialized as zero when an instance is created.1298*/1299/*1300mayAllocateText = false;1301mayAllocateRuns = false;1302orderParagraphsLTR = false;1303paraCount = 0;1304runCount = 0;1305trailingWSStart = 0;1306flags = 0;1307paraLevel = 0;1308defaultParaLevel = 0;1309direction = 0;1310*/1311/* get Bidi properties */1312bdp = UBiDiProps.INSTANCE;13131314/* allocate memory for arrays as requested */1315if (maxLength > 0) {1316getInitialDirPropsMemory(maxLength);1317getInitialLevelsMemory(maxLength);1318} else {1319mayAllocateText = true;1320}13211322if (maxRunCount > 0) {1323// if maxRunCount == 1, use simpleRuns[]1324if (maxRunCount > 1) {1325getInitialRunsMemory(maxRunCount);1326}1327} else {1328mayAllocateRuns = true;1329}1330}13311332/*1333* We are allowed to allocate memory if object==null or1334* mayAllocate==true for each array that we need.1335*1336* Assume sizeNeeded>0.1337* If object != null, then assume size > 0.1338*/1339private Object getMemory(String label, Object array, Class<?> arrayClass,1340boolean mayAllocate, int sizeNeeded)1341{1342int len = Array.getLength(array);13431344/* we have at least enough memory and must not allocate */1345if (sizeNeeded == len) {1346return array;1347}1348if (!mayAllocate) {1349/* we must not allocate */1350if (sizeNeeded <= len) {1351return array;1352}1353throw new OutOfMemoryError("Failed to allocate memory for "1354+ label);1355}1356/* we may try to grow or shrink */1357/* FOOD FOR THOUGHT: when shrinking it should be possible to avoid1358the allocation altogether and rely on this.length */1359try {1360return Array.newInstance(arrayClass, sizeNeeded);1361} catch (Exception e) {1362throw new OutOfMemoryError("Failed to allocate memory for "1363+ label);1364}1365}13661367/* helper methods for each allocated array */1368private void getDirPropsMemory(boolean mayAllocate, int len)1369{1370Object array = getMemory("DirProps", dirPropsMemory, Byte.TYPE, mayAllocate, len);1371dirPropsMemory = (byte[]) array;1372}13731374void getDirPropsMemory(int len)1375{1376getDirPropsMemory(mayAllocateText, len);1377}13781379private void getLevelsMemory(boolean mayAllocate, int len)1380{1381Object array = getMemory("Levels", levelsMemory, Byte.TYPE, mayAllocate, len);1382levelsMemory = (byte[]) array;1383}13841385void getLevelsMemory(int len)1386{1387getLevelsMemory(mayAllocateText, len);1388}13891390private void getRunsMemory(boolean mayAllocate, int len)1391{1392Object array = getMemory("Runs", runsMemory, BidiRun.class, mayAllocate, len);1393runsMemory = (BidiRun[]) array;1394}13951396void getRunsMemory(int len)1397{1398getRunsMemory(mayAllocateRuns, len);1399}14001401/* additional methods used by constructor - always allow allocation */1402private void getInitialDirPropsMemory(int len)1403{1404getDirPropsMemory(true, len);1405}14061407private void getInitialLevelsMemory(int len)1408{1409getLevelsMemory(true, len);1410}14111412private void getInitialRunsMemory(int len)1413{1414getRunsMemory(true, len);1415}14161417/**1418* Is this <code>Bidi</code> object set to perform the inverse Bidi1419* algorithm?1420* <p>Note: calling this method after setting the reordering mode with1421* <code>setReorderingMode</code> will return <code>true</code> if the1422* reordering mode was set to1423* <code>REORDER_INVERSE_NUMBERS_AS_L</code>, <code>false</code>1424* for all other values.</p>1425*1426* @return <code>true</code> if the <code>Bidi</code> object is set to1427* perform the inverse Bidi algorithm by handling numbers as L.1428*1429* @see #setInverse1430* @see #setReorderingMode1431* @see #REORDER_INVERSE_NUMBERS_AS_L1432* @stable ICU 3.81433*/1434public boolean isInverse() {1435return isInverse;1436}14371438/* perform (P2)..(P3) ------------------------------------------------------- */14391440/*1441* Check that there are enough entries in the arrays paras_limit and paras_level1442*/1443private void checkParaCount() {1444int[] saveLimits;1445byte[] saveLevels;1446int count = paraCount;1447if (count <= paras_level.length)1448return;1449int oldLength = paras_level.length;1450saveLimits = paras_limit;1451saveLevels = paras_level;1452try {1453paras_limit = new int[count * 2];1454paras_level = new byte[count * 2];1455} catch (Exception e) {1456throw new OutOfMemoryError("Failed to allocate memory for paras");1457}1458System.arraycopy(saveLimits, 0, paras_limit, 0, oldLength);1459System.arraycopy(saveLevels, 0, paras_level, 0, oldLength);1460}14611462/*1463* Get the directional properties for the text, calculate the flags bit-set, and1464* determine the paragraph level if necessary (in paras_level[i]).1465* FSI initiators are also resolved and their dirProp replaced with LRI or RLI.1466* When encountering an FSI, it is initially replaced with an LRI, which is the1467* default. Only if a strong R or AL is found within its scope will the LRI be1468* replaced by an RLI.1469*/1470static final int NOT_SEEKING_STRONG = 0; /* 0: not contextual paraLevel, not after FSI */1471static final int SEEKING_STRONG_FOR_PARA = 1; /* 1: looking for first strong char in para */1472static final int SEEKING_STRONG_FOR_FSI = 2; /* 2: looking for first strong after FSI */1473static final int LOOKING_FOR_PDI = 3; /* 3: found strong after FSI, looking for PDI */14741475private void getDirProps()1476{1477int i = 0, i0, i1;1478flags = 0; /* collect all directionalities in the text */1479int uchar;1480byte dirProp;1481byte defaultParaLevel = 0; /* initialize to avoid compiler warnings */1482boolean isDefaultLevel = IsDefaultLevel(paraLevel);1483/* for inverse Bidi, the default para level is set to RTL if there is a1484strong R or AL character at either end of the text */1485boolean isDefaultLevelInverse=isDefaultLevel &&1486(reorderingMode == REORDER_INVERSE_LIKE_DIRECT ||1487reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL);1488lastArabicPos = -1;1489int controlCount = 0;1490boolean removeBidiControls = (reorderingOptions & OPTION_REMOVE_CONTROLS) != 0;14911492byte state;1493byte lastStrong = ON; /* for default level & inverse Bidi */1494/* The following stacks are used to manage isolate sequences. Those1495sequences may be nested, but obviously never more deeply than the1496maximum explicit embedding level.1497lastStack is the index of the last used entry in the stack. A value of -11498means that there is no open isolate sequence.1499lastStack is reset to -1 on paragraph boundaries. */1500/* The following stack contains the position of the initiator of1501each open isolate sequence */1502int[] isolateStartStack= new int[MAX_EXPLICIT_LEVEL+1];1503/* The following stack contains the last known state before1504encountering the initiator of an isolate sequence */1505byte[] previousStateStack = new byte[MAX_EXPLICIT_LEVEL+1];1506int stackLast=-1;15071508if ((reorderingOptions & OPTION_STREAMING) != 0)1509length = 0;1510defaultParaLevel = (byte)(paraLevel & 1);15111512if (isDefaultLevel) {1513paras_level[0] = defaultParaLevel;1514lastStrong = defaultParaLevel;1515state = SEEKING_STRONG_FOR_PARA;1516} else {1517paras_level[0] = paraLevel;1518state = NOT_SEEKING_STRONG;1519}1520/* count paragraphs and determine the paragraph level (P2..P3) */1521/*1522* see comment on constant fields:1523* the LEVEL_DEFAULT_XXX values are designed so that1524* their low-order bit alone yields the intended default1525*/15261527for (i = 0; i < originalLength; /* i is incremented in the loop */) {1528i0 = i; /* index of first code unit */1529uchar = UTF16.charAt(text, 0, originalLength, i);1530i += UTF16.getCharCount(uchar);1531i1 = i - 1; /* index of last code unit, gets the directional property */15321533dirProp = (byte)getCustomizedClass(uchar);1534flags |= DirPropFlag(dirProp);1535dirProps[i1] = dirProp;1536if (i1 > i0) { /* set previous code units' properties to BN */1537flags |= DirPropFlag(BN);1538do {1539dirProps[--i1] = BN;1540} while (i1 > i0);1541}1542if (removeBidiControls && IsBidiControlChar(uchar)) {1543controlCount++;1544}1545if (dirProp == L) {1546if (state == SEEKING_STRONG_FOR_PARA) {1547paras_level[paraCount - 1] = 0;1548state = NOT_SEEKING_STRONG;1549}1550else if (state == SEEKING_STRONG_FOR_FSI) {1551if (stackLast <= MAX_EXPLICIT_LEVEL) {1552/* no need for next statement, already set by default */1553/* dirProps[isolateStartStack[stackLast]] = LRI; */1554flags |= DirPropFlag(LRI);1555}1556state = LOOKING_FOR_PDI;1557}1558lastStrong = L;1559continue;1560}1561if (dirProp == R || dirProp == AL) {1562if (state == SEEKING_STRONG_FOR_PARA) {1563paras_level[paraCount - 1] = 1;1564state = NOT_SEEKING_STRONG;1565}1566else if (state == SEEKING_STRONG_FOR_FSI) {1567if (stackLast <= MAX_EXPLICIT_LEVEL) {1568dirProps[isolateStartStack[stackLast]] = RLI;1569flags |= DirPropFlag(RLI);1570}1571state = LOOKING_FOR_PDI;1572}1573lastStrong = R;1574if (dirProp == AL)1575lastArabicPos = i - 1;1576continue;1577}1578if (dirProp >= FSI && dirProp <= RLI) { /* FSI, LRI or RLI */1579stackLast++;1580if (stackLast <= MAX_EXPLICIT_LEVEL) {1581isolateStartStack[stackLast] = i - 1;1582previousStateStack[stackLast] = state;1583}1584if (dirProp == FSI) {1585dirProps[i-1] = LRI; /* default if no strong char */1586state = SEEKING_STRONG_FOR_FSI;1587}1588else1589state = LOOKING_FOR_PDI;1590continue;1591}1592if (dirProp == PDI) {1593if (state == SEEKING_STRONG_FOR_FSI) {1594if (stackLast <= MAX_EXPLICIT_LEVEL) {1595/* no need for next statement, already set by default */1596/* dirProps[isolateStartStack[stackLast]] = LRI; */1597flags |= DirPropFlag(LRI);1598}1599}1600if (stackLast >= 0) {1601if (stackLast <= MAX_EXPLICIT_LEVEL)1602state = previousStateStack[stackLast];1603stackLast--;1604}1605continue;1606}1607if (dirProp == B) {1608if (i < originalLength && uchar == CR && text[i] == LF) /* do nothing on the CR */1609continue;1610paras_limit[paraCount - 1] = i;1611if (isDefaultLevelInverse && lastStrong == R)1612paras_level[paraCount - 1] = 1;1613if ((reorderingOptions & OPTION_STREAMING) != 0) {1614/* When streaming, we only process whole paragraphs1615thus some updates are only done on paragraph boundaries */1616length = i; /* i is index to next character */1617this.controlCount = controlCount;1618}1619if (i < originalLength) { /* B not last char in text */1620paraCount++;1621checkParaCount(); /* check that there is enough memory for a new para entry */1622if (isDefaultLevel) {1623paras_level[paraCount - 1] = defaultParaLevel;1624state = SEEKING_STRONG_FOR_PARA;1625lastStrong = defaultParaLevel;1626} else {1627paras_level[paraCount - 1] = paraLevel;1628state = NOT_SEEKING_STRONG;1629}1630stackLast = -1;1631}1632continue;1633}1634}1635/* +Ignore still open isolate sequences with overflow */1636if (stackLast > MAX_EXPLICIT_LEVEL) {1637stackLast = MAX_EXPLICIT_LEVEL;1638state=SEEKING_STRONG_FOR_FSI; /* to be on the safe side */1639}1640/* Resolve direction of still unresolved open FSI sequences */1641while (stackLast >= 0) {1642if (state == SEEKING_STRONG_FOR_FSI) {1643/* no need for next statement, already set by default */1644/* dirProps[isolateStartStack[stackLast]] = LRI; */1645flags |= DirPropFlag(LRI);1646break;1647}1648state = previousStateStack[stackLast];1649stackLast--;1650}1651/* When streaming, ignore text after the last paragraph separator */1652if ((reorderingOptions & OPTION_STREAMING) != 0) {1653if (length < originalLength)1654paraCount--;1655} else {1656paras_limit[paraCount - 1] = originalLength;1657this.controlCount = controlCount;1658}1659/* For inverse bidi, default para direction is RTL if there is1660a strong R or AL at either end of the paragraph */1661if (isDefaultLevelInverse && lastStrong == R) {1662paras_level[paraCount - 1] = 1;1663}1664if (isDefaultLevel) {1665paraLevel = paras_level[0];1666}1667/* The following is needed to resolve the text direction for default level1668paragraphs containing no strong character */1669for (i = 0; i < paraCount; i++)1670flags |= DirPropFlagLR(paras_level[i]);16711672if (orderParagraphsLTR && (flags & DirPropFlag(B)) != 0) {1673flags |= DirPropFlag(L);1674}1675}16761677/* determine the paragraph level at position index */1678byte GetParaLevelAt(int pindex)1679{1680if (defaultParaLevel == 0 || pindex < paras_limit[0])1681return paraLevel;1682int i;1683for (i = 1; i < paraCount; i++)1684if (pindex < paras_limit[i])1685break;1686if (i >= paraCount)1687i = paraCount - 1;1688return paras_level[i];1689}16901691/* Functions for handling paired brackets ----------------------------------- */16921693/* In the isoRuns array, the first entry is used for text outside of any1694isolate sequence. Higher entries are used for each more deeply nested1695isolate sequence. isoRunLast is the index of the last used entry. The1696openings array is used to note the data of opening brackets not yet1697matched by a closing bracket, or matched but still susceptible to change1698level.1699Each isoRun entry contains the index of the first and1700one-after-last openings entries for pending opening brackets it1701contains. The next openings entry to use is the one-after-last of the1702most deeply nested isoRun entry.1703isoRun entries also contain their current embedding level and the last1704encountered strong character, since these will be needed to resolve1705the level of paired brackets. */17061707private void bracketInit(BracketData bd) {1708bd.isoRunLast = 0;1709bd.isoRuns[0] = new IsoRun();1710bd.isoRuns[0].start = 0;1711bd.isoRuns[0].limit = 0;1712bd.isoRuns[0].level = GetParaLevelAt(0);1713bd.isoRuns[0].lastStrong = bd.isoRuns[0].lastBase = bd.isoRuns[0].contextDir = (byte)(GetParaLevelAt(0) & 1);1714bd.isoRuns[0].contextPos = 0;1715bd.openings = new Opening[SIMPLE_PARAS_COUNT];1716bd.isNumbersSpecial = reorderingMode == REORDER_NUMBERS_SPECIAL ||1717reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL;1718}17191720/* paragraph boundary */1721private void bracketProcessB(BracketData bd, byte level) {1722bd.isoRunLast = 0;1723bd.isoRuns[0].limit = 0;1724bd.isoRuns[0].level = level;1725bd.isoRuns[0].lastStrong = bd.isoRuns[0].lastBase = bd.isoRuns[0].contextDir = (byte)(level & 1);1726bd.isoRuns[0].contextPos = 0;1727}17281729/* LRE, LRO, RLE, RLO, PDF */1730private void bracketProcessBoundary(BracketData bd, int lastCcPos,1731byte contextLevel, byte embeddingLevel) {1732IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast];1733if ((DirPropFlag(dirProps[lastCcPos]) & MASK_ISO) != 0) /* after an isolate */1734return;1735if (NoOverride(embeddingLevel) > NoOverride(contextLevel)) /* not a PDF */1736contextLevel = embeddingLevel;1737pLastIsoRun.limit = pLastIsoRun.start;1738pLastIsoRun.level = embeddingLevel;1739pLastIsoRun.lastStrong = pLastIsoRun.lastBase = pLastIsoRun.contextDir = (byte)(contextLevel & 1);1740pLastIsoRun.contextPos = lastCcPos;1741}17421743/* LRI or RLI */1744private void bracketProcessLRI_RLI(BracketData bd, byte level) {1745IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast];1746short lastLimit;1747pLastIsoRun.lastBase = ON;1748lastLimit = pLastIsoRun.limit;1749bd.isoRunLast++;1750pLastIsoRun = bd.isoRuns[bd.isoRunLast];1751if (pLastIsoRun == null)1752pLastIsoRun = bd.isoRuns[bd.isoRunLast] = new IsoRun();1753pLastIsoRun.start = pLastIsoRun.limit = lastLimit;1754pLastIsoRun.level = level;1755pLastIsoRun.lastStrong = pLastIsoRun.lastBase = pLastIsoRun.contextDir = (byte)(level & 1);1756pLastIsoRun.contextPos = 0;1757}17581759/* PDI */1760private void bracketProcessPDI(BracketData bd) {1761IsoRun pLastIsoRun;1762bd.isoRunLast--;1763pLastIsoRun = bd.isoRuns[bd.isoRunLast];1764pLastIsoRun.lastBase = ON;1765}17661767/* newly found opening bracket: create an openings entry */1768private void bracketAddOpening(BracketData bd, char match, int position) {1769IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast];1770Opening pOpening;1771if (pLastIsoRun.limit >= bd.openings.length) { /* no available new entry */1772Opening[] saveOpenings = bd.openings;1773int count;1774try {1775count = bd.openings.length;1776bd.openings = new Opening[count * 2];1777} catch (Exception e) {1778throw new OutOfMemoryError("Failed to allocate memory for openings");1779}1780System.arraycopy(saveOpenings, 0, bd.openings, 0, count);1781}1782pOpening = bd.openings[pLastIsoRun.limit];1783if (pOpening == null)1784pOpening = bd.openings[pLastIsoRun.limit]= new Opening();1785pOpening.position = position;1786pOpening.match = match;1787pOpening.contextDir = pLastIsoRun.contextDir;1788pOpening.contextPos = pLastIsoRun.contextPos;1789pOpening.flags = 0;1790pLastIsoRun.limit++;1791}17921793/* change N0c1 to N0c2 when a preceding bracket is assigned the embedding level */1794private void fixN0c(BracketData bd, int openingIndex, int newPropPosition, byte newProp) {1795/* This function calls itself recursively */1796IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast];1797Opening qOpening;1798int k, openingPosition, closingPosition;1799for (k = openingIndex+1; k < pLastIsoRun.limit; k++) {1800qOpening = bd.openings[k];1801if (qOpening.match >= 0) /* not an N0c match */1802continue;1803if (newPropPosition < qOpening.contextPos)1804break;1805if (newPropPosition >= qOpening.position)1806continue;1807if (newProp == qOpening.contextDir)1808break;1809openingPosition = qOpening.position;1810dirProps[openingPosition] = newProp;1811closingPosition = -(qOpening.match);1812dirProps[closingPosition] = newProp;1813qOpening.match = 0; /* prevent further changes */1814fixN0c(bd, k, openingPosition, newProp);1815fixN0c(bd, k, closingPosition, newProp);1816}1817}18181819/* process closing bracket; return L or R if N0b or N0c, ON if N0d */1820private byte bracketProcessClosing(BracketData bd, int openIdx, int position) {1821IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast];1822Opening pOpening, qOpening;1823byte direction;1824boolean stable;1825byte newProp;1826pOpening = bd.openings[openIdx];1827direction = (byte)(pLastIsoRun.level & 1);1828stable = true; /* assume stable until proved otherwise */18291830/* The stable flag is set when brackets are paired and their1831level is resolved and cannot be changed by what will be1832found later in the source string.1833An unstable match can occur only when applying N0c, where1834the resolved level depends on the preceding context, and1835this context may be affected by text occurring later.1836Example: RTL paragraph containing: abc[(latin) HEBREW]1837When the closing parenthesis is encountered, it appears1838that N0c1 must be applied since 'abc' sets an opposite1839direction context and both parentheses receive level 2.1840However, when the closing square bracket is processed,1841N0b applies because of 'HEBREW' being included within the1842brackets, thus the square brackets are treated like R and1843receive level 1. However, this changes the preceding1844context of the opening parenthesis, and it now appears1845that N0c2 must be applied to the parentheses rather than1846N0c1. */18471848if ((direction == 0 && (pOpening.flags & FOUND_L) > 0) ||1849(direction == 1 && (pOpening.flags & FOUND_R) > 0)) { /* N0b */1850newProp = direction;1851}1852else if ((pOpening.flags & (FOUND_L | FOUND_R)) != 0) { /* N0c */1853/* it is stable if there is no preceding text or in1854conditions too complicated and not worth checking */1855stable = (openIdx == pLastIsoRun.start);1856if (direction != pOpening.contextDir)1857newProp = pOpening.contextDir; /* N0c1 */1858else1859newProp = direction; /* N0c2 */1860} else {1861/* forget this and any brackets nested within this pair */1862pLastIsoRun.limit = (short)openIdx;1863return ON; /* N0d */1864}1865dirProps[pOpening.position] = newProp;1866dirProps[position] = newProp;1867/* Update nested N0c pairs that may be affected */1868fixN0c(bd, openIdx, pOpening.position, newProp);1869if (stable) {1870pLastIsoRun.limit = (short)openIdx; /* forget any brackets nested within this pair */1871/* remove lower located synonyms if any */1872while (pLastIsoRun.limit > pLastIsoRun.start &&1873bd.openings[pLastIsoRun.limit - 1].position == pOpening.position)1874pLastIsoRun.limit--;1875} else {1876int k;1877pOpening.match = -position;1878/* neutralize lower located synonyms if any */1879k = openIdx - 1;1880while (k >= pLastIsoRun.start &&1881bd.openings[k].position == pOpening.position)1882bd.openings[k--].match = 0;1883/* neutralize any unmatched opening between the current pair;1884this will also neutralize higher located synonyms if any */1885for (k = openIdx + 1; k < pLastIsoRun.limit; k++) {1886qOpening =bd.openings[k];1887if (qOpening.position >= position)1888break;1889if (qOpening.match > 0)1890qOpening.match = 0;1891}1892}1893return newProp;1894}18951896/* handle strong characters, digits and candidates for closing brackets */1897private void bracketProcessChar(BracketData bd, int position) {1898IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast];1899byte dirProp, newProp;1900byte level;1901dirProp = dirProps[position];1902if (dirProp == ON) {1903char c, match;1904int idx;1905/* First see if it is a matching closing bracket. Hopefully, this is1906more efficient than checking if it is a closing bracket at all */1907c = text[position];1908for (idx = pLastIsoRun.limit - 1; idx >= pLastIsoRun.start; idx--) {1909if (bd.openings[idx].match != c)1910continue;1911/* We have a match */1912newProp = bracketProcessClosing(bd, idx, position);1913if(newProp == ON) { /* N0d */1914c = 0; /* prevent handling as an opening */1915break;1916}1917pLastIsoRun.lastBase = ON;1918pLastIsoRun.contextDir = newProp;1919pLastIsoRun.contextPos = position;1920level = levels[position];1921if ((level & LEVEL_OVERRIDE) != 0) { /* X4, X5 */1922short flag;1923int i;1924newProp = (byte)(level & 1);1925pLastIsoRun.lastStrong = newProp;1926flag = (short)DirPropFlag(newProp);1927for (i = pLastIsoRun.start; i < idx; i++)1928bd.openings[i].flags |= flag;1929/* matching brackets are not overridden by LRO/RLO */1930levels[position] &= ~LEVEL_OVERRIDE;1931}1932/* matching brackets are not overridden by LRO/RLO */1933levels[bd.openings[idx].position] &= ~LEVEL_OVERRIDE;1934return;1935}1936/* We get here only if the ON character is not a matching closing1937bracket or it is a case of N0d */1938/* Now see if it is an opening bracket */1939if (c != 0) {1940match = (char)UCharacter.getBidiPairedBracket(c); /* get the matching char */1941} else {1942match = 0;1943}1944if (match != c && /* has a matching char */1945UCharacter.getIntPropertyValue(c, BIDI_PAIRED_BRACKET_TYPE) ==1946/* opening bracket */ BidiPairedBracketType.OPEN) {1947/* special case: process synonyms1948create an opening entry for each synonym */1949if (match == 0x232A) { /* RIGHT-POINTING ANGLE BRACKET */1950bracketAddOpening(bd, (char)0x3009, position);1951}1952else if (match == 0x3009) { /* RIGHT ANGLE BRACKET */1953bracketAddOpening(bd, (char)0x232A, position);1954}1955bracketAddOpening(bd, match, position);1956}1957}1958level = levels[position];1959if ((level & LEVEL_OVERRIDE) != 0) { /* X4, X5 */1960newProp = (byte)(level & 1);1961if (dirProp != S && dirProp != WS && dirProp != ON)1962dirProps[position] = newProp;1963pLastIsoRun.lastBase = newProp;1964pLastIsoRun.lastStrong = newProp;1965pLastIsoRun.contextDir = newProp;1966pLastIsoRun.contextPos = position;1967}1968else if (dirProp <= R || dirProp == AL) {1969newProp = DirFromStrong(dirProp);1970pLastIsoRun.lastBase = dirProp;1971pLastIsoRun.lastStrong = dirProp;1972pLastIsoRun.contextDir = newProp;1973pLastIsoRun.contextPos = position;1974}1975else if(dirProp == EN) {1976pLastIsoRun.lastBase = EN;1977if (pLastIsoRun.lastStrong == L) {1978newProp = L; /* W7 */1979if (!bd.isNumbersSpecial)1980dirProps[position] = ENL;1981pLastIsoRun.contextDir = L;1982pLastIsoRun.contextPos = position;1983}1984else {1985newProp = R; /* N0 */1986if (pLastIsoRun.lastStrong == AL)1987dirProps[position] = AN; /* W2 */1988else1989dirProps[position] = ENR;1990pLastIsoRun.contextDir = R;1991pLastIsoRun.contextPos = position;1992}1993}1994else if (dirProp == AN) {1995newProp = R; /* N0 */1996pLastIsoRun.lastBase = AN;1997pLastIsoRun.contextDir = R;1998pLastIsoRun.contextPos = position;1999}2000else if (dirProp == NSM) {2001/* if the last real char was ON, change NSM to ON so that it2002will stay ON even if the last real char is a bracket which2003may be changed to L or R */2004newProp = pLastIsoRun.lastBase;2005if (newProp == ON)2006dirProps[position] = newProp;2007}2008else {2009newProp = dirProp;2010pLastIsoRun.lastBase = dirProp;2011}2012if (newProp <= R || newProp == AL) {2013int i;2014short flag = (short)DirPropFlag(DirFromStrong(newProp));2015for (i = pLastIsoRun.start; i < pLastIsoRun.limit; i++)2016if (position > bd.openings[i].position)2017bd.openings[i].flags |= flag;2018}2019}20202021/* perform (X1)..(X9) ------------------------------------------------------- */20222023/* determine if the text is mixed-directional or single-directional */2024private byte directionFromFlags() {20252026/* if the text contains AN and neutrals, then some neutrals may become RTL */2027if (!((flags & MASK_RTL) != 0 ||2028((flags & DirPropFlag(AN)) != 0 &&2029(flags & MASK_POSSIBLE_N) != 0))) {2030return LTR;2031} else if ((flags & MASK_LTR) == 0) {2032return RTL;2033} else {2034return MIXED;2035}2036}20372038/*2039* Resolve the explicit levels as specified by explicit embedding codes.2040* Recalculate the flags to have them reflect the real properties2041* after taking the explicit embeddings into account.2042*2043* The BiDi algorithm is designed to result in the same behavior whether embedding2044* levels are externally specified (from "styled text", supposedly the preferred2045* method) or set by explicit embedding codes (LRx, RLx, PDF, FSI, PDI) in the plain text.2046* That is why (X9) instructs to remove all not-isolate explicit codes (and BN).2047* However, in a real implementation, the removal of these codes and their index2048* positions in the plain text is undesirable since it would result in2049* reallocated, reindexed text.2050* Instead, this implementation leaves the codes in there and just ignores them2051* in the subsequent processing.2052* In order to get the same reordering behavior, positions with a BN or a not-isolate2053* explicit embedding code just get the same level assigned as the last "real"2054* character.2055*2056* Some implementations, not this one, then overwrite some of these2057* directionality properties at "real" same-level-run boundaries by2058* L or R codes so that the resolution of weak types can be performed on the2059* entire paragraph at once instead of having to parse it once more and2060* perform that resolution on same-level-runs.2061* This limits the scope of the implicit rules in effectively2062* the same way as the run limits.2063*2064* Instead, this implementation does not modify these codes, except for2065* paired brackets whose properties (ON) may be replaced by L or R.2066* On one hand, the paragraph has to be scanned for same-level-runs, but2067* on the other hand, this saves another loop to reset these codes,2068* or saves making and modifying a copy of dirProps[].2069*2070*2071* Note that (Pn) and (Xn) changed significantly from version 4 of the BiDi algorithm.2072*2073*2074* Handling the stack of explicit levels (Xn):2075*2076* With the BiDi stack of explicit levels, as pushed with each2077* LRE, RLE, LRO, RLO, LRI, RLI and FSI and popped with each PDF and PDI,2078* the explicit level must never exceed MAX_EXPLICIT_LEVEL.2079*2080* In order to have a correct push-pop semantics even in the case of overflows,2081* overflow counters and a valid isolate counter are used as described in UAX#92082* section 3.3.2 "Explicit Levels and Directions".2083*2084* This implementation assumes that MAX_EXPLICIT_LEVEL is odd.2085*2086* Returns the direction2087*2088*/2089private byte resolveExplicitLevels() {2090int i = 0;2091byte dirProp;2092byte level = GetParaLevelAt(0);2093byte dirct;2094isolateCount = 0;20952096/* determine if the text is mixed-directional or single-directional */2097dirct = directionFromFlags();20982099/* we may not need to resolve any explicit levels */2100if (dirct != MIXED) {2101/* not mixed directionality: levels don't matter - trailingWSStart will be 0 */2102return dirct;2103}21042105if (reorderingMode > REORDER_LAST_LOGICAL_TO_VISUAL) {2106/* inverse BiDi: mixed, but all characters are at the same embedding level */2107/* set all levels to the paragraph level */2108int paraIndex, start, limit;2109for (paraIndex = 0; paraIndex < paraCount; paraIndex++) {2110if (paraIndex == 0)2111start = 0;2112else2113start = paras_limit[paraIndex - 1];2114limit = paras_limit[paraIndex];2115level = paras_level[paraIndex];2116for (i = start; i < limit; i++)2117levels[i] =level;2118}2119return dirct; /* no bracket matching for inverse BiDi */2120}2121if ((flags & (MASK_EXPLICIT | MASK_ISO)) == 0) {2122/* no embeddings, set all levels to the paragraph level */2123/* we still have to perform bracket matching */2124int paraIndex, start, limit;2125BracketData bracketData = new BracketData();2126bracketInit(bracketData);2127for (paraIndex = 0; paraIndex < paraCount; paraIndex++) {2128if (paraIndex == 0)2129start = 0;2130else2131start = paras_limit[paraIndex-1];2132limit = paras_limit[paraIndex];2133level = paras_level[paraIndex];2134for (i = start; i < limit; i++) {2135levels[i] = level;2136dirProp = dirProps[i];2137if (dirProp == BN)2138continue;2139if (dirProp == B) {2140if ((i + 1) < length) {2141if (text[i] == CR && text[i + 1] == LF)2142continue; /* skip CR when followed by LF */2143bracketProcessB(bracketData, level);2144}2145continue;2146}2147bracketProcessChar(bracketData, i);2148}2149}2150return dirct;2151}2152/* continue to perform (Xn) */21532154/* (X1) level is set for all codes, embeddingLevel keeps track of the push/pop operations */2155/* both variables may carry the LEVEL_OVERRIDE flag to indicate the override status */2156byte embeddingLevel = level, newLevel;2157byte previousLevel = level; /* previous level for regular (not CC) characters */2158int lastCcPos = 0; /* index of last effective LRx,RLx, PDx */21592160/* The following stack remembers the embedding level and the ISOLATE flag of level runs.2161stackLast points to its current entry. */2162short[] stack = new short[MAX_EXPLICIT_LEVEL + 2]; /* we never push anything >= MAX_EXPLICIT_LEVEL2163but we need one more entry as base */2164int stackLast = 0;2165int overflowIsolateCount = 0;2166int overflowEmbeddingCount = 0;2167int validIsolateCount = 0;2168BracketData bracketData = new BracketData();2169bracketInit(bracketData);2170stack[0] = level; /* initialize base entry to para level, no override, no isolate */21712172/* recalculate the flags */2173flags = 0;21742175for (i = 0; i < length; i++) {2176dirProp = dirProps[i];2177switch (dirProp) {2178case LRE:2179case RLE:2180case LRO:2181case RLO:2182/* (X2, X3, X4, X5) */2183flags |= DirPropFlag(BN);2184levels[i] = previousLevel;2185if (dirProp == LRE || dirProp == LRO) {2186/* least greater even level */2187newLevel = (byte)((embeddingLevel+2) & ~(LEVEL_OVERRIDE | 1));2188} else {2189/* least greater odd level */2190newLevel = (byte)((NoOverride(embeddingLevel) + 1) | 1);2191}2192if (newLevel <= MAX_EXPLICIT_LEVEL && overflowIsolateCount == 0 &&2193overflowEmbeddingCount == 0) {2194lastCcPos = i;2195embeddingLevel = newLevel;2196if (dirProp == LRO || dirProp == RLO)2197embeddingLevel |= LEVEL_OVERRIDE;2198stackLast++;2199stack[stackLast] = embeddingLevel;2200/* we don't need to set LEVEL_OVERRIDE off for LRE and RLE2201since this has already been done for newLevel which is2202the source for embeddingLevel.2203*/2204} else {2205if (overflowIsolateCount == 0)2206overflowEmbeddingCount++;2207}2208break;2209case PDF:2210/* (X7) */2211flags |= DirPropFlag(BN);2212levels[i] = previousLevel;2213/* handle all the overflow cases first */2214if (overflowIsolateCount > 0) {2215break;2216}2217if (overflowEmbeddingCount > 0) {2218overflowEmbeddingCount--;2219break;2220}2221if (stackLast > 0 && stack[stackLast] < ISOLATE) { /* not an isolate entry */2222lastCcPos = i;2223stackLast--;2224embeddingLevel = (byte)stack[stackLast];2225}2226break;2227case LRI:2228case RLI:2229flags |= DirPropFlag(ON) | DirPropFlagLR(embeddingLevel);2230levels[i] = NoOverride(embeddingLevel);2231if (NoOverride(embeddingLevel) != NoOverride(previousLevel)) {2232bracketProcessBoundary(bracketData, lastCcPos,2233previousLevel, embeddingLevel);2234flags |= DirPropFlagMultiRuns;2235}2236previousLevel = embeddingLevel;2237/* (X5a, X5b) */2238if (dirProp == LRI)2239/* least greater even level */2240newLevel=(byte)((embeddingLevel+2)&~(LEVEL_OVERRIDE|1));2241else2242/* least greater odd level */2243newLevel=(byte)((NoOverride(embeddingLevel)+1)|1);2244if (newLevel <= MAX_EXPLICIT_LEVEL && overflowIsolateCount == 02245&& overflowEmbeddingCount == 0) {2246flags |= DirPropFlag(dirProp);2247lastCcPos = i;2248validIsolateCount++;2249if (validIsolateCount > isolateCount)2250isolateCount = validIsolateCount;2251embeddingLevel = newLevel;2252/* we can increment stackLast without checking because newLevel2253will exceed UBIDI_MAX_EXPLICIT_LEVEL before stackLast overflows */2254stackLast++;2255stack[stackLast] = (short)(embeddingLevel + ISOLATE);2256bracketProcessLRI_RLI(bracketData, embeddingLevel);2257} else {2258/* make it WS so that it is handled by adjustWSLevels() */2259dirProps[i] = WS;2260overflowIsolateCount++;2261}2262break;2263case PDI:2264if (NoOverride(embeddingLevel) != NoOverride(previousLevel)) {2265bracketProcessBoundary(bracketData, lastCcPos,2266previousLevel, embeddingLevel);2267flags |= DirPropFlagMultiRuns;2268}2269/* (X6a) */2270if (overflowIsolateCount > 0) {2271overflowIsolateCount--;2272/* make it WS so that it is handled by adjustWSLevels() */2273dirProps[i] = WS;2274}2275else if (validIsolateCount > 0) {2276flags |= DirPropFlag(PDI);2277lastCcPos = i;2278overflowEmbeddingCount = 0;2279while (stack[stackLast] < ISOLATE) /* pop embedding entries */2280stackLast--; /* until the last isolate entry */2281stackLast--; /* pop also the last isolate entry */2282validIsolateCount--;2283bracketProcessPDI(bracketData);2284} else2285/* make it WS so that it is handled by adjustWSLevels() */2286dirProps[i] = WS;2287embeddingLevel = (byte)(stack[stackLast] & ~ISOLATE);2288flags |= DirPropFlag(ON) | DirPropFlagLR(embeddingLevel);2289previousLevel = embeddingLevel;2290levels[i] = NoOverride(embeddingLevel);2291break;2292case B:2293flags |= DirPropFlag(B);2294levels[i] = GetParaLevelAt(i);2295if ((i + 1) < length) {2296if (text[i] == CR && text[i + 1] == LF)2297break; /* skip CR when followed by LF */2298overflowEmbeddingCount = overflowIsolateCount = 0;2299validIsolateCount = 0;2300stackLast = 0;2301previousLevel = embeddingLevel = GetParaLevelAt(i + 1);2302stack[0] = embeddingLevel; /* initialize base entry to para level, no override, no isolate */2303bracketProcessB(bracketData, embeddingLevel);2304}2305break;2306case BN:2307/* BN, LRE, RLE, and PDF are supposed to be removed (X9) */2308/* they will get their levels set correctly in adjustWSLevels() */2309levels[i] = previousLevel;2310flags |= DirPropFlag(BN);2311break;2312default:2313/* all other types are normal characters and get the "real" level */2314if (NoOverride(embeddingLevel) != NoOverride(previousLevel)) {2315bracketProcessBoundary(bracketData, lastCcPos,2316previousLevel, embeddingLevel);2317flags |= DirPropFlagMultiRuns;2318if ((embeddingLevel & LEVEL_OVERRIDE) != 0)2319flags |= DirPropFlagO(embeddingLevel);2320else2321flags |= DirPropFlagE(embeddingLevel);2322}2323previousLevel = embeddingLevel;2324levels[i] = embeddingLevel;2325bracketProcessChar(bracketData, i);2326/* the dirProp may have been changed in bracketProcessChar() */2327flags |= DirPropFlag(dirProps[i]);2328break;2329}2330}2331if ((flags & MASK_EMBEDDING) != 0) {2332flags |= DirPropFlagLR(paraLevel);2333}2334if (orderParagraphsLTR && (flags & DirPropFlag(B)) != 0) {2335flags |= DirPropFlag(L);2336}2337/* again, determine if the text is mixed-directional or single-directional */2338dirct = directionFromFlags();23392340return dirct;2341}23422343/*2344* Use a pre-specified embedding levels array:2345*2346* Adjust the directional properties for overrides (->LEVEL_OVERRIDE),2347* ignore all explicit codes (X9),2348* and check all the preset levels.2349*2350* Recalculate the flags to have them reflect the real properties2351* after taking the explicit embeddings into account.2352*/2353private byte checkExplicitLevels() {2354byte dirProp;2355int i;2356int isolateCount = 0;23572358this.flags = 0; /* collect all directionalities in the text */2359byte level;2360this.isolateCount = 0;23612362for (i = 0; i < length; ++i) {2363if (levels[i] == 0) {2364levels[i] = paraLevel;2365}23662367// for backward compatibility2368if (MAX_EXPLICIT_LEVEL < (levels[i]&0x7f)) {2369if ((levels[i] & LEVEL_OVERRIDE) != 0) {2370levels[i] = (byte)(paraLevel|LEVEL_OVERRIDE);2371} else {2372levels[i] = paraLevel;2373}2374}23752376level = levels[i];2377dirProp = dirProps[i];2378if (dirProp == LRI || dirProp == RLI) {2379isolateCount++;2380if (isolateCount > this.isolateCount)2381this.isolateCount = isolateCount;2382}2383else if (dirProp == PDI) {2384isolateCount--;2385} else if (dirProp == B) {2386isolateCount = 0;2387}2388if ((level & LEVEL_OVERRIDE) != 0) {2389/* keep the override flag in levels[i] but adjust the flags */2390level &= ~LEVEL_OVERRIDE; /* make the range check below simpler */2391flags |= DirPropFlagO(level);2392} else {2393/* set the flags */2394flags |= DirPropFlagE(level) | DirPropFlag(dirProp);2395}2396if ((level < GetParaLevelAt(i) &&2397!((0 == level) && (dirProp == B))) ||2398(MAX_EXPLICIT_LEVEL < level)) {2399/* level out of bounds */2400throw new IllegalArgumentException("level " + level +2401" out of bounds at " + i);2402}2403}2404if ((flags & MASK_EMBEDDING) != 0) {2405flags |= DirPropFlagLR(paraLevel);2406}2407/* determine if the text is mixed-directional or single-directional */2408return directionFromFlags();2409}24102411/*********************************************************************/2412/* The Properties state machine table */2413/*********************************************************************/2414/* */2415/* All table cells are 8 bits: */2416/* bits 0..4: next state */2417/* bits 5..7: action to perform (if > 0) */2418/* */2419/* Cells may be of format "n" where n represents the next state */2420/* (except for the rightmost column). */2421/* Cells may also be of format "_(x,y)" where x represents an action */2422/* to perform and y represents the next state. */2423/* */2424/*********************************************************************/2425/* Definitions and type for properties state tables */2426/*********************************************************************/2427private static final int IMPTABPROPS_COLUMNS = 16;2428private static final int IMPTABPROPS_RES = IMPTABPROPS_COLUMNS - 1;2429private static short GetStateProps(short cell) {2430return (short)(cell & 0x1f);2431}2432private static short GetActionProps(short cell) {2433return (short)(cell >> 5);2434}24352436private static final short groupProp[] = /* dirProp regrouped */2437{2438/* L R EN ES ET AN CS B S WS ON LRE LRO AL RLE RLO PDF NSM BN FSI LRI RLI PDI ENL ENR */24390, 1, 2, 7, 8, 3, 9, 6, 5, 4, 4, 10, 10, 12, 10, 10, 10, 11, 10, 4, 4, 4, 4, 13, 142440};2441private static final short _L = 0;2442private static final short _R = 1;2443private static final short _EN = 2;2444private static final short _AN = 3;2445private static final short _ON = 4;2446private static final short _S = 5;2447private static final short _B = 6; /* reduced dirProp */24482449/*********************************************************************/2450/* */2451/* PROPERTIES STATE TABLE */2452/* */2453/* In table impTabProps, */2454/* - the ON column regroups ON and WS, FSI, RLI, LRI and PDI */2455/* - the BN column regroups BN, LRE, RLE, LRO, RLO, PDF */2456/* - the Res column is the reduced property assigned to a run */2457/* */2458/* Action 1: process current run1, init new run1 */2459/* 2: init new run2 */2460/* 3: process run1, process run2, init new run1 */2461/* 4: process run1, set run1=run2, init new run2 */2462/* */2463/* Notes: */2464/* 1) This table is used in resolveImplicitLevels(). */2465/* 2) This table triggers actions when there is a change in the Bidi*/2466/* property of incoming characters (action 1). */2467/* 3) Most such property sequences are processed immediately (in */2468/* fact, passed to processPropertySeq(). */2469/* 4) However, numbers are assembled as one sequence. This means */2470/* that undefined situations (like CS following digits, until */2471/* it is known if the next char will be a digit) are held until */2472/* following chars define them. */2473/* Example: digits followed by CS, then comes another CS or ON; */2474/* the digits will be processed, then the CS assigned */2475/* as the start of an ON sequence (action 3). */2476/* 5) There are cases where more than one sequence must be */2477/* processed, for instance digits followed by CS followed by L: */2478/* the digits must be processed as one sequence, and the CS */2479/* must be processed as an ON sequence, all this before starting */2480/* assembling chars for the opening L sequence. */2481/* */2482/* */2483private static final short impTabProps[][] =2484{2485/* L, R, EN, AN, ON, S, B, ES, ET, CS, BN, NSM, AL, ENL, ENR, Res */2486/* 0 Init */ { 1, 2, 4, 5, 7, 15, 17, 7, 9, 7, 0, 7, 3, 18, 21, _ON },2487/* 1 L */ { 1, 32+2, 32+4, 32+5, 32+7, 32+15, 32+17, 32+7, 32+9, 32+7, 1, 1, 32+3, 32+18, 32+21, _L },2488/* 2 R */ { 32+1, 2, 32+4, 32+5, 32+7, 32+15, 32+17, 32+7, 32+9, 32+7, 2, 2, 32+3, 32+18, 32+21, _R },2489/* 3 AL */ { 32+1, 32+2, 32+6, 32+6, 32+8, 32+16, 32+17, 32+8, 32+8, 32+8, 3, 3, 3, 32+18, 32+21, _R },2490/* 4 EN */ { 32+1, 32+2, 4, 32+5, 32+7, 32+15, 32+17, 64+10, 11, 64+10, 4, 4, 32+3, 18, 21, _EN },2491/* 5 AN */ { 32+1, 32+2, 32+4, 5, 32+7, 32+15, 32+17, 32+7, 32+9, 64+12, 5, 5, 32+3, 32+18, 32+21, _AN },2492/* 6 AL:EN/AN */ { 32+1, 32+2, 6, 6, 32+8, 32+16, 32+17, 32+8, 32+8, 64+13, 6, 6, 32+3, 18, 21, _AN },2493/* 7 ON */ { 32+1, 32+2, 32+4, 32+5, 7, 32+15, 32+17, 7, 64+14, 7, 7, 7, 32+3, 32+18, 32+21, _ON },2494/* 8 AL:ON */ { 32+1, 32+2, 32+6, 32+6, 8, 32+16, 32+17, 8, 8, 8, 8, 8, 32+3, 32+18, 32+21, _ON },2495/* 9 ET */ { 32+1, 32+2, 4, 32+5, 7, 32+15, 32+17, 7, 9, 7, 9, 9, 32+3, 18, 21, _ON },2496/*10 EN+ES/CS */ { 96+1, 96+2, 4, 96+5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 10, 128+7, 96+3, 18, 21, _EN },2497/*11 EN+ET */ { 32+1, 32+2, 4, 32+5, 32+7, 32+15, 32+17, 32+7, 11, 32+7, 11, 11, 32+3, 18, 21, _EN },2498/*12 AN+CS */ { 96+1, 96+2, 96+4, 5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 12, 128+7, 96+3, 96+18, 96+21, _AN },2499/*13 AL:EN/AN+CS */ { 96+1, 96+2, 6, 6, 128+8, 96+16, 96+17, 128+8, 128+8, 128+8, 13, 128+8, 96+3, 18, 21, _AN },2500/*14 ON+ET */ { 32+1, 32+2, 128+4, 32+5, 7, 32+15, 32+17, 7, 14, 7, 14, 14, 32+3,128+18,128+21, _ON },2501/*15 S */ { 32+1, 32+2, 32+4, 32+5, 32+7, 15, 32+17, 32+7, 32+9, 32+7, 15, 32+7, 32+3, 32+18, 32+21, _S },2502/*16 AL:S */ { 32+1, 32+2, 32+6, 32+6, 32+8, 16, 32+17, 32+8, 32+8, 32+8, 16, 32+8, 32+3, 32+18, 32+21, _S },2503/*17 B */ { 32+1, 32+2, 32+4, 32+5, 32+7, 32+15, 17, 32+7, 32+9, 32+7, 17, 32+7, 32+3, 32+18, 32+21, _B },2504/*18 ENL */ { 32+1, 32+2, 18, 32+5, 32+7, 32+15, 32+17, 64+19, 20, 64+19, 18, 18, 32+3, 18, 21, _L },2505/*19 ENL+ES/CS */ { 96+1, 96+2, 18, 96+5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 19, 128+7, 96+3, 18, 21, _L },2506/*20 ENL+ET */ { 32+1, 32+2, 18, 32+5, 32+7, 32+15, 32+17, 32+7, 20, 32+7, 20, 20, 32+3, 18, 21, _L },2507/*21 ENR */ { 32+1, 32+2, 21, 32+5, 32+7, 32+15, 32+17, 64+22, 23, 64+22, 21, 21, 32+3, 18, 21, _AN },2508/*22 ENR+ES/CS */ { 96+1, 96+2, 21, 96+5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 22, 128+7, 96+3, 18, 21, _AN },2509/*23 ENR+ET */ { 32+1, 32+2, 21, 32+5, 32+7, 32+15, 32+17, 32+7, 23, 32+7, 23, 23, 32+3, 18, 21, _AN }2510};25112512/*********************************************************************/2513/* The levels state machine tables */2514/*********************************************************************/2515/* */2516/* All table cells are 8 bits: */2517/* bits 0..3: next state */2518/* bits 4..7: action to perform (if > 0) */2519/* */2520/* Cells may be of format "n" where n represents the next state */2521/* (except for the rightmost column). */2522/* Cells may also be of format "_(x,y)" where x represents an action */2523/* to perform and y represents the next state. */2524/* */2525/* This format limits each table to 16 states each and to 15 actions.*/2526/* */2527/*********************************************************************/2528/* Definitions and type for levels state tables */2529/*********************************************************************/2530private static final int IMPTABLEVELS_COLUMNS = _B + 2;2531private static final int IMPTABLEVELS_RES = IMPTABLEVELS_COLUMNS - 1;2532private static short GetState(byte cell) { return (short)(cell & 0x0f); }2533private static short GetAction(byte cell) { return (short)(cell >> 4); }25342535private static class ImpTabPair {2536byte[][][] imptab;2537short[][] impact;25382539ImpTabPair(byte[][] table1, byte[][] table2,2540short[] act1, short[] act2) {2541imptab = new byte[][][] {table1, table2};2542impact = new short[][] {act1, act2};2543}2544}25452546/*********************************************************************/2547/* */2548/* LEVELS STATE TABLES */2549/* */2550/* In all levels state tables, */2551/* - state 0 is the initial state */2552/* - the Res column is the increment to add to the text level */2553/* for this property sequence. */2554/* */2555/* The impact arrays for each table of a pair map the local action */2556/* numbers of the table to the total list of actions. For instance, */2557/* action 2 in a given table corresponds to the action number which */2558/* appears in entry [2] of the impact array for that table. */2559/* The first entry of all impact arrays must be 0. */2560/* */2561/* Action 1: init conditional sequence */2562/* 2: prepend conditional sequence to current sequence */2563/* 3: set ON sequence to new level - 1 */2564/* 4: init EN/AN/ON sequence */2565/* 5: fix EN/AN/ON sequence followed by R */2566/* 6: set previous level sequence to level 2 */2567/* */2568/* Notes: */2569/* 1) These tables are used in processPropertySeq(). The input */2570/* is property sequences as determined by resolveImplicitLevels. */2571/* 2) Most such property sequences are processed immediately */2572/* (levels are assigned). */2573/* 3) However, some sequences cannot be assigned a final level till */2574/* one or more following sequences are received. For instance, */2575/* ON following an R sequence within an even-level paragraph. */2576/* If the following sequence is R, the ON sequence will be */2577/* assigned basic run level+1, and so will the R sequence. */2578/* 4) S is generally handled like ON, since its level will be fixed */2579/* to paragraph level in adjustWSLevels(). */2580/* */25812582private static final byte impTabL_DEFAULT[][] = /* Even paragraph level */2583/* In this table, conditional sequences receive the lower possible level2584until proven otherwise.2585*/2586{2587/* L, R, EN, AN, ON, S, B, Res */2588/* 0 : init */ { 0, 1, 0, 2, 0, 0, 0, 0 },2589/* 1 : R */ { 0, 1, 3, 3, 0x14, 0x14, 0, 1 },2590/* 2 : AN */ { 0, 1, 0, 2, 0x15, 0x15, 0, 2 },2591/* 3 : R+EN/AN */ { 0, 1, 3, 3, 0x14, 0x14, 0, 2 },2592/* 4 : R+ON */ { 0, 0x21, 0x33, 0x33, 4, 4, 0, 0 },2593/* 5 : AN+ON */ { 0, 0x21, 0, 0x32, 5, 5, 0, 0 }2594};25952596private static final byte impTabR_DEFAULT[][] = /* Odd paragraph level */2597/* In this table, conditional sequences receive the lower possible level2598until proven otherwise.2599*/2600{2601/* L, R, EN, AN, ON, S, B, Res */2602/* 0 : init */ { 1, 0, 2, 2, 0, 0, 0, 0 },2603/* 1 : L */ { 1, 0, 1, 3, 0x14, 0x14, 0, 1 },2604/* 2 : EN/AN */ { 1, 0, 2, 2, 0, 0, 0, 1 },2605/* 3 : L+AN */ { 1, 0, 1, 3, 5, 5, 0, 1 },2606/* 4 : L+ON */ { 0x21, 0, 0x21, 3, 4, 4, 0, 0 },2607/* 5 : L+AN+ON */ { 1, 0, 1, 3, 5, 5, 0, 0 }2608};26092610private static final short[] impAct0 = {0,1,2,3,4};26112612private static final ImpTabPair impTab_DEFAULT = new ImpTabPair(2613impTabL_DEFAULT, impTabR_DEFAULT, impAct0, impAct0);26142615private static final byte impTabL_NUMBERS_SPECIAL[][] = { /* Even paragraph level */2616/* In this table, conditional sequences receive the lower possible2617level until proven otherwise.2618*/2619/* L, R, EN, AN, ON, S, B, Res */2620/* 0 : init */ { 0, 2, 0x11, 0x11, 0, 0, 0, 0 },2621/* 1 : L+EN/AN */ { 0, 0x42, 1, 1, 0, 0, 0, 0 },2622/* 2 : R */ { 0, 2, 4, 4, 0x13, 0x13, 0, 1 },2623/* 3 : R+ON */ { 0, 0x22, 0x34, 0x34, 3, 3, 0, 0 },2624/* 4 : R+EN/AN */ { 0, 2, 4, 4, 0x13, 0x13, 0, 2 }2625};2626private static final ImpTabPair impTab_NUMBERS_SPECIAL = new ImpTabPair(2627impTabL_NUMBERS_SPECIAL, impTabR_DEFAULT, impAct0, impAct0);26282629private static final byte impTabL_GROUP_NUMBERS_WITH_R[][] = {2630/* In this table, EN/AN+ON sequences receive levels as if associated with R2631until proven that there is L or sor/eor on both sides. AN is handled like EN.2632*/2633/* L, R, EN, AN, ON, S, B, Res */2634/* 0 init */ { 0, 3, 0x11, 0x11, 0, 0, 0, 0 },2635/* 1 EN/AN */ { 0x20, 3, 1, 1, 2, 0x20, 0x20, 2 },2636/* 2 EN/AN+ON */ { 0x20, 3, 1, 1, 2, 0x20, 0x20, 1 },2637/* 3 R */ { 0, 3, 5, 5, 0x14, 0, 0, 1 },2638/* 4 R+ON */ { 0x20, 3, 5, 5, 4, 0x20, 0x20, 1 },2639/* 5 R+EN/AN */ { 0, 3, 5, 5, 0x14, 0, 0, 2 }2640};2641private static final byte impTabR_GROUP_NUMBERS_WITH_R[][] = {2642/* In this table, EN/AN+ON sequences receive levels as if associated with R2643until proven that there is L on both sides. AN is handled like EN.2644*/2645/* L, R, EN, AN, ON, S, B, Res */2646/* 0 init */ { 2, 0, 1, 1, 0, 0, 0, 0 },2647/* 1 EN/AN */ { 2, 0, 1, 1, 0, 0, 0, 1 },2648/* 2 L */ { 2, 0, 0x14, 0x14, 0x13, 0, 0, 1 },2649/* 3 L+ON */ { 0x22, 0, 4, 4, 3, 0, 0, 0 },2650/* 4 L+EN/AN */ { 0x22, 0, 4, 4, 3, 0, 0, 1 }2651};2652private static final ImpTabPair impTab_GROUP_NUMBERS_WITH_R = new2653ImpTabPair(impTabL_GROUP_NUMBERS_WITH_R,2654impTabR_GROUP_NUMBERS_WITH_R, impAct0, impAct0);26552656private static final byte impTabL_INVERSE_NUMBERS_AS_L[][] = {2657/* This table is identical to the Default LTR table except that EN and AN2658are handled like L.2659*/2660/* L, R, EN, AN, ON, S, B, Res */2661/* 0 : init */ { 0, 1, 0, 0, 0, 0, 0, 0 },2662/* 1 : R */ { 0, 1, 0, 0, 0x14, 0x14, 0, 1 },2663/* 2 : AN */ { 0, 1, 0, 0, 0x15, 0x15, 0, 2 },2664/* 3 : R+EN/AN */ { 0, 1, 0, 0, 0x14, 0x14, 0, 2 },2665/* 4 : R+ON */ { 0x20, 1, 0x20, 0x20, 4, 4, 0x20, 1 },2666/* 5 : AN+ON */ { 0x20, 1, 0x20, 0x20, 5, 5, 0x20, 1 }2667};2668private static final byte impTabR_INVERSE_NUMBERS_AS_L[][] = {2669/* This table is identical to the Default RTL table except that EN and AN2670are handled like L.2671*/2672/* L, R, EN, AN, ON, S, B, Res */2673/* 0 : init */ { 1, 0, 1, 1, 0, 0, 0, 0 },2674/* 1 : L */ { 1, 0, 1, 1, 0x14, 0x14, 0, 1 },2675/* 2 : EN/AN */ { 1, 0, 1, 1, 0, 0, 0, 1 },2676/* 3 : L+AN */ { 1, 0, 1, 1, 5, 5, 0, 1 },2677/* 4 : L+ON */ { 0x21, 0, 0x21, 0x21, 4, 4, 0, 0 },2678/* 5 : L+AN+ON */ { 1, 0, 1, 1, 5, 5, 0, 0 }2679};2680private static final ImpTabPair impTab_INVERSE_NUMBERS_AS_L = new ImpTabPair2681(impTabL_INVERSE_NUMBERS_AS_L, impTabR_INVERSE_NUMBERS_AS_L,2682impAct0, impAct0);26832684private static final byte impTabR_INVERSE_LIKE_DIRECT[][] = { /* Odd paragraph level */2685/* In this table, conditional sequences receive the lower possible level2686until proven otherwise.2687*/2688/* L, R, EN, AN, ON, S, B, Res */2689/* 0 : init */ { 1, 0, 2, 2, 0, 0, 0, 0 },2690/* 1 : L */ { 1, 0, 1, 2, 0x13, 0x13, 0, 1 },2691/* 2 : EN/AN */ { 1, 0, 2, 2, 0, 0, 0, 1 },2692/* 3 : L+ON */ { 0x21, 0x30, 6, 4, 3, 3, 0x30, 0 },2693/* 4 : L+ON+AN */ { 0x21, 0x30, 6, 4, 5, 5, 0x30, 3 },2694/* 5 : L+AN+ON */ { 0x21, 0x30, 6, 4, 5, 5, 0x30, 2 },2695/* 6 : L+ON+EN */ { 0x21, 0x30, 6, 4, 3, 3, 0x30, 1 }2696};2697private static final short[] impAct1 = {0,1,13,14};2698private static final ImpTabPair impTab_INVERSE_LIKE_DIRECT = new ImpTabPair(2699impTabL_DEFAULT, impTabR_INVERSE_LIKE_DIRECT, impAct0, impAct1);27002701private static final byte impTabL_INVERSE_LIKE_DIRECT_WITH_MARKS[][] = {2702/* The case handled in this table is (visually): R EN L2703*/2704/* L, R, EN, AN, ON, S, B, Res */2705/* 0 : init */ { 0, 0x63, 0, 1, 0, 0, 0, 0 },2706/* 1 : L+AN */ { 0, 0x63, 0, 1, 0x12, 0x30, 0, 4 },2707/* 2 : L+AN+ON */ { 0x20, 0x63, 0x20, 1, 2, 0x30, 0x20, 3 },2708/* 3 : R */ { 0, 0x63, 0x55, 0x56, 0x14, 0x30, 0, 3 },2709/* 4 : R+ON */ { 0x30, 0x43, 0x55, 0x56, 4, 0x30, 0x30, 3 },2710/* 5 : R+EN */ { 0x30, 0x43, 5, 0x56, 0x14, 0x30, 0x30, 4 },2711/* 6 : R+AN */ { 0x30, 0x43, 0x55, 6, 0x14, 0x30, 0x30, 4 }2712};2713private static final byte impTabR_INVERSE_LIKE_DIRECT_WITH_MARKS[][] = {2714/* The cases handled in this table are (visually): R EN L2715R L AN L2716*/2717/* L, R, EN, AN, ON, S, B, Res */2718/* 0 : init */ { 0x13, 0, 1, 1, 0, 0, 0, 0 },2719/* 1 : R+EN/AN */ { 0x23, 0, 1, 1, 2, 0x40, 0, 1 },2720/* 2 : R+EN/AN+ON */ { 0x23, 0, 1, 1, 2, 0x40, 0, 0 },2721/* 3 : L */ { 3, 0, 3, 0x36, 0x14, 0x40, 0, 1 },2722/* 4 : L+ON */ { 0x53, 0x40, 5, 0x36, 4, 0x40, 0x40, 0 },2723/* 5 : L+ON+EN */ { 0x53, 0x40, 5, 0x36, 4, 0x40, 0x40, 1 },2724/* 6 : L+AN */ { 0x53, 0x40, 6, 6, 4, 0x40, 0x40, 3 }2725};2726private static final short[] impAct2 = {0,1,2,5,6,7,8};2727private static final short[] impAct3 = {0,1,9,10,11,12};2728private static final ImpTabPair impTab_INVERSE_LIKE_DIRECT_WITH_MARKS =2729new ImpTabPair(impTabL_INVERSE_LIKE_DIRECT_WITH_MARKS,2730impTabR_INVERSE_LIKE_DIRECT_WITH_MARKS, impAct2, impAct3);27312732private static final ImpTabPair impTab_INVERSE_FOR_NUMBERS_SPECIAL = new ImpTabPair(2733impTabL_NUMBERS_SPECIAL, impTabR_INVERSE_LIKE_DIRECT, impAct0, impAct1);27342735private static final byte impTabL_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS[][] = {2736/* The case handled in this table is (visually): R EN L2737*/2738/* L, R, EN, AN, ON, S, B, Res */2739/* 0 : init */ { 0, 0x62, 1, 1, 0, 0, 0, 0 },2740/* 1 : L+EN/AN */ { 0, 0x62, 1, 1, 0, 0x30, 0, 4 },2741/* 2 : R */ { 0, 0x62, 0x54, 0x54, 0x13, 0x30, 0, 3 },2742/* 3 : R+ON */ { 0x30, 0x42, 0x54, 0x54, 3, 0x30, 0x30, 3 },2743/* 4 : R+EN/AN */ { 0x30, 0x42, 4, 4, 0x13, 0x30, 0x30, 4 }2744};2745private static final ImpTabPair impTab_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS = new2746ImpTabPair(impTabL_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS,2747impTabR_INVERSE_LIKE_DIRECT_WITH_MARKS, impAct2, impAct3);27482749private static class LevState {2750byte[][] impTab; /* level table pointer */2751short[] impAct; /* action map array */2752int startON; /* start of ON sequence */2753int startL2EN; /* start of level 2 sequence */2754int lastStrongRTL; /* index of last found R or AL */2755int runStart; /* start position of the run */2756short state; /* current state */2757byte runLevel; /* run level before implicit solving */2758}27592760/*------------------------------------------------------------------------*/27612762static final int FIRSTALLOC = 10;2763/*2764* param pos: position where to insert2765* param flag: one of LRM_BEFORE, LRM_AFTER, RLM_BEFORE, RLM_AFTER2766*/2767private void addPoint(int pos, int flag)2768{2769Point point = new Point();27702771int len = insertPoints.points.length;2772if (len == 0) {2773insertPoints.points = new Point[FIRSTALLOC];2774len = FIRSTALLOC;2775}2776if (insertPoints.size >= len) { /* no room for new point */2777Point[] savePoints = insertPoints.points;2778insertPoints.points = new Point[len * 2];2779System.arraycopy(savePoints, 0, insertPoints.points, 0, len);2780}2781point.pos = pos;2782point.flag = flag;2783insertPoints.points[insertPoints.size] = point;2784insertPoints.size++;2785}27862787private void setLevelsOutsideIsolates(int start, int limit, byte level)2788{2789byte dirProp;2790int isolateCount = 0, k;2791for (k = start; k < limit; k++) {2792dirProp = dirProps[k];2793if (dirProp == PDI)2794isolateCount--;2795if (isolateCount == 0) {2796levels[k] = level;2797}2798if (dirProp == LRI || dirProp == RLI)2799isolateCount++;2800}2801}28022803/* perform rules (Wn), (Nn), and (In) on a run of the text ------------------ */28042805/*2806* This implementation of the (Wn) rules applies all rules in one pass.2807* In order to do so, it needs a look-ahead of typically 1 character2808* (except for W5: sequences of ET) and keeps track of changes2809* in a rule Wp that affect a later Wq (p<q).2810*2811* The (Nn) and (In) rules are also performed in that same single loop,2812* but effectively one iteration behind for white space.2813*2814* Since all implicit rules are performed in one step, it is not necessary2815* to actually store the intermediate directional properties in dirProps[].2816*/28172818private void processPropertySeq(LevState levState, short _prop,2819int start, int limit) {2820byte cell;2821byte[][] impTab = levState.impTab;2822short[] impAct = levState.impAct;2823short oldStateSeq,actionSeq;2824byte level, addLevel;2825int start0, k;28262827start0 = start; /* save original start position */2828oldStateSeq = levState.state;2829cell = impTab[oldStateSeq][_prop];2830levState.state = GetState(cell); /* isolate the new state */2831actionSeq = impAct[GetAction(cell)]; /* isolate the action */2832addLevel = impTab[levState.state][IMPTABLEVELS_RES];28332834if (actionSeq != 0) {2835switch (actionSeq) {2836case 1: /* init ON seq */2837levState.startON = start0;2838break;28392840case 2: /* prepend ON seq to current seq */2841start = levState.startON;2842break;28432844case 3: /* EN/AN after R+ON */2845level = (byte)(levState.runLevel + 1);2846setLevelsOutsideIsolates(levState.startON, start0, level);2847break;28482849case 4: /* EN/AN before R for NUMBERS_SPECIAL */2850level = (byte)(levState.runLevel + 2);2851setLevelsOutsideIsolates(levState.startON, start0, level);2852break;28532854case 5: /* L or S after possible relevant EN/AN */2855/* check if we had EN after R/AL */2856if (levState.startL2EN >= 0) {2857addPoint(levState.startL2EN, LRM_BEFORE);2858}2859levState.startL2EN = -1; /* not within previous if since could also be -2 */2860/* check if we had any relevant EN/AN after R/AL */2861if ((insertPoints.points.length == 0) ||2862(insertPoints.size <= insertPoints.confirmed)) {2863/* nothing, just clean up */2864levState.lastStrongRTL = -1;2865/* check if we have a pending conditional segment */2866level = impTab[oldStateSeq][IMPTABLEVELS_RES];2867if ((level & 1) != 0 && levState.startON > 0) { /* after ON */2868start = levState.startON; /* reset to basic run level */2869}2870if (_prop == _S) { /* add LRM before S */2871addPoint(start0, LRM_BEFORE);2872insertPoints.confirmed = insertPoints.size;2873}2874break;2875}2876/* reset previous RTL cont to level for LTR text */2877for (k = levState.lastStrongRTL + 1; k < start0; k++) {2878/* reset odd level, leave runLevel+2 as is */2879levels[k] = (byte)((levels[k] - 2) & ~1);2880}2881/* mark insert points as confirmed */2882insertPoints.confirmed = insertPoints.size;2883levState.lastStrongRTL = -1;2884if (_prop == _S) { /* add LRM before S */2885addPoint(start0, LRM_BEFORE);2886insertPoints.confirmed = insertPoints.size;2887}2888break;28892890case 6: /* R/AL after possible relevant EN/AN */2891/* just clean up */2892if (insertPoints.points.length > 0)2893/* remove all non confirmed insert points */2894insertPoints.size = insertPoints.confirmed;2895levState.startON = -1;2896levState.startL2EN = -1;2897levState.lastStrongRTL = limit - 1;2898break;28992900case 7: /* EN/AN after R/AL + possible cont */2901/* check for real AN */29022903if ((_prop == _AN) && (dirProps[start0] == AN) &&2904(reorderingMode != REORDER_INVERSE_FOR_NUMBERS_SPECIAL))2905{2906/* real AN */2907if (levState.startL2EN == -1) { /* if no relevant EN already found */2908/* just note the rightmost digit as a strong RTL */2909levState.lastStrongRTL = limit - 1;2910break;2911}2912if (levState.startL2EN >= 0) { /* after EN, no AN */2913addPoint(levState.startL2EN, LRM_BEFORE);2914levState.startL2EN = -2;2915}2916/* note AN */2917addPoint(start0, LRM_BEFORE);2918break;2919}2920/* if first EN/AN after R/AL */2921if (levState.startL2EN == -1) {2922levState.startL2EN = start0;2923}2924break;29252926case 8: /* note location of latest R/AL */2927levState.lastStrongRTL = limit - 1;2928levState.startON = -1;2929break;29302931case 9: /* L after R+ON/EN/AN */2932/* include possible adjacent number on the left */2933for (k = start0-1; k >= 0 && ((levels[k] & 1) == 0); k--) {2934}2935if (k >= 0) {2936addPoint(k, RLM_BEFORE); /* add RLM before */2937insertPoints.confirmed = insertPoints.size; /* confirm it */2938}2939levState.startON = start0;2940break;29412942case 10: /* AN after L */2943/* AN numbers between L text on both sides may be trouble. */2944/* tentatively bracket with LRMs; will be confirmed if followed by L */2945addPoint(start0, LRM_BEFORE); /* add LRM before */2946addPoint(start0, LRM_AFTER); /* add LRM after */2947break;29482949case 11: /* R after L+ON/EN/AN */2950/* false alert, infirm LRMs around previous AN */2951insertPoints.size=insertPoints.confirmed;2952if (_prop == _S) { /* add RLM before S */2953addPoint(start0, RLM_BEFORE);2954insertPoints.confirmed = insertPoints.size;2955}2956break;29572958case 12: /* L after L+ON/AN */2959level = (byte)(levState.runLevel + addLevel);2960for (k=levState.startON; k < start0; k++) {2961if (levels[k] < level) {2962levels[k] = level;2963}2964}2965insertPoints.confirmed = insertPoints.size; /* confirm inserts */2966levState.startON = start0;2967break;29682969case 13: /* L after L+ON+EN/AN/ON */2970level = levState.runLevel;2971for (k = start0-1; k >= levState.startON; k--) {2972if (levels[k] == level+3) {2973while (levels[k] == level+3) {2974levels[k--] -= 2;2975}2976while (levels[k] == level) {2977k--;2978}2979}2980if (levels[k] == level+2) {2981levels[k] = level;2982continue;2983}2984levels[k] = (byte)(level+1);2985}2986break;29872988case 14: /* R after L+ON+EN/AN/ON */2989level = (byte)(levState.runLevel+1);2990for (k = start0-1; k >= levState.startON; k--) {2991if (levels[k] > level) {2992levels[k] -= 2;2993}2994}2995break;29962997default: /* we should never get here */2998throw new IllegalStateException("Internal ICU error in processPropertySeq");2999}3000}3001if ((addLevel) != 0 || (start < start0)) {3002level = (byte)(levState.runLevel + addLevel);3003if (start >= levState.runStart) {3004for (k = start; k < limit; k++) {3005levels[k] = level;3006}3007} else {3008setLevelsOutsideIsolates(start, limit, level);3009}3010}3011}30123013private void resolveImplicitLevels(int start, int limit, short sor, short eor)3014{3015byte dirProp;3016LevState levState = new LevState();3017int i, start1, start2;3018short oldStateImp, stateImp, actionImp;3019short gprop, resProp, cell;3020boolean inverseRTL;3021short nextStrongProp = R;3022int nextStrongPos = -1;30233024/* check for RTL inverse Bidi mode */3025/* FOOD FOR THOUGHT: in case of RTL inverse Bidi, it would make sense to3026* loop on the text characters from end to start.3027* This would need a different properties state table (at least different3028* actions) and different levels state tables (maybe very similar to the3029* LTR corresponding ones.3030*/3031inverseRTL=((start<lastArabicPos) && ((GetParaLevelAt(start) & 1)>0) &&3032(reorderingMode == REORDER_INVERSE_LIKE_DIRECT ||3033reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL));3034/* initialize for property and levels state table */3035levState.startL2EN = -1; /* used for INVERSE_LIKE_DIRECT_WITH_MARKS */3036levState.lastStrongRTL = -1; /* used for INVERSE_LIKE_DIRECT_WITH_MARKS */3037levState.runStart = start;3038levState.runLevel = levels[start];3039levState.impTab = impTabPair.imptab[levState.runLevel & 1];3040levState.impAct = impTabPair.impact[levState.runLevel & 1];30413042/* The isolates[] entries contain enough information to3043resume the bidi algorithm in the same state as it was3044when it was interrupted by an isolate sequence. */3045if (dirProps[start] == PDI) {3046levState.startON = isolates[isolateCount].startON;3047start1 = isolates[isolateCount].start1;3048stateImp = isolates[isolateCount].stateImp;3049levState.state = isolates[isolateCount].state;3050isolateCount--;3051} else {3052levState.startON = -1;3053start1 = start;3054if (dirProps[start] == NSM)3055stateImp = (short)(1 + sor);3056else3057stateImp = 0;3058levState.state = 0;3059processPropertySeq(levState, sor, start, start);3060}3061start2 = start; /* to make the Java compiler happy */30623063for (i = start; i <= limit; i++) {3064if (i >= limit) {3065int k;3066for (k = limit - 1;3067k > start &&3068(DirPropFlag(dirProps[k]) & MASK_BN_EXPLICIT) != 0;3069k--);3070dirProp = dirProps[k];3071if (dirProp == LRI || dirProp == RLI)3072break; /* no forced closing for sequence ending with LRI/RLI */3073gprop = eor;3074} else {3075byte prop, prop1;3076prop = dirProps[i];3077if (prop == B)3078isolateCount = -1; /* current isolates stack entry == none */3079if (inverseRTL) {3080if (prop == AL) {3081/* AL before EN does not make it AN */3082prop = R;3083} else if (prop == EN) {3084if (nextStrongPos <= i) {3085/* look for next strong char (L/R/AL) */3086int j;3087nextStrongProp = R; /* set default */3088nextStrongPos = limit;3089for (j = i+1; j < limit; j++) {3090prop1 = dirProps[j];3091if (prop1 == L || prop1 == R || prop1 == AL) {3092nextStrongProp = prop1;3093nextStrongPos = j;3094break;3095}3096}3097}3098if (nextStrongProp == AL) {3099prop = AN;3100}3101}3102}3103gprop = groupProp[prop];3104}3105oldStateImp = stateImp;3106cell = impTabProps[oldStateImp][gprop];3107stateImp = GetStateProps(cell); /* isolate the new state */3108actionImp = GetActionProps(cell); /* isolate the action */3109if ((i == limit) && (actionImp == 0)) {3110/* there is an unprocessed sequence if its property == eor */3111actionImp = 1; /* process the last sequence */3112}3113if (actionImp != 0) {3114resProp = impTabProps[oldStateImp][IMPTABPROPS_RES];3115switch (actionImp) {3116case 1: /* process current seq1, init new seq1 */3117processPropertySeq(levState, resProp, start1, i);3118start1 = i;3119break;3120case 2: /* init new seq2 */3121start2 = i;3122break;3123case 3: /* process seq1, process seq2, init new seq1 */3124processPropertySeq(levState, resProp, start1, start2);3125processPropertySeq(levState, _ON, start2, i);3126start1 = i;3127break;3128case 4: /* process seq1, set seq1=seq2, init new seq2 */3129processPropertySeq(levState, resProp, start1, start2);3130start1 = start2;3131start2 = i;3132break;3133default: /* we should never get here */3134throw new IllegalStateException("Internal ICU error in resolveImplicitLevels");3135}3136}3137}31383139/* look for the last char not a BN or LRE/RLE/LRO/RLO/PDF */3140for (i = limit - 1;3141i > start &&3142(DirPropFlag(dirProps[i]) & MASK_BN_EXPLICIT) != 0;3143i--);3144dirProp = dirProps[i];3145if ((dirProp == LRI || dirProp == RLI) && limit < length) {3146isolateCount++;3147if (isolates[isolateCount] == null)3148isolates[isolateCount] = new Isolate();3149isolates[isolateCount].stateImp = stateImp;3150isolates[isolateCount].state = levState.state;3151isolates[isolateCount].start1 = start1;3152isolates[isolateCount].startON = levState.startON;3153}3154else3155processPropertySeq(levState, eor, limit, limit);3156}31573158/* perform (L1) and (X9) ---------------------------------------------------- */31593160/*3161* Reset the embedding levels for some non-graphic characters (L1).3162* This method also sets appropriate levels for BN, and3163* explicit embedding types that are supposed to have been removed3164* from the paragraph in (X9).3165*/3166private void adjustWSLevels() {3167int i;31683169if ((flags & MASK_WS) != 0) {3170int flag;3171i = trailingWSStart;3172while (i > 0) {3173/* reset a sequence of WS/BN before eop and B/S to the paragraph paraLevel */3174while (i > 0 && ((flag = DirPropFlag(dirProps[--i])) & MASK_WS) != 0) {3175if (orderParagraphsLTR && (flag & DirPropFlag(B)) != 0) {3176levels[i] = 0;3177} else {3178levels[i] = GetParaLevelAt(i);3179}3180}31813182/* reset BN to the next character's paraLevel until B/S, which restarts above loop */3183/* here, i+1 is guaranteed to be <length */3184while (i > 0) {3185flag = DirPropFlag(dirProps[--i]);3186if ((flag & MASK_BN_EXPLICIT) != 0) {3187levels[i] = levels[i + 1];3188} else if (orderParagraphsLTR && (flag & DirPropFlag(B)) != 0) {3189levels[i] = 0;3190break;3191} else if ((flag & MASK_B_S) != 0){3192levels[i] = GetParaLevelAt(i);3193break;3194}3195}3196}3197}3198}31993200private void setParaSuccess() {3201paraBidi = this; /* mark successful setPara */3202}32033204private int Bidi_Min(int x, int y) {3205return x < y ? x : y;3206}32073208private int Bidi_Abs(int x) {3209return x >= 0 ? x : -x;3210}32113212void setParaRunsOnly(char[] parmText, byte parmParaLevel) {3213int[] visualMap;3214String visualText;3215int saveLength, saveTrailingWSStart;3216byte[] saveLevels;3217byte saveDirection;3218int i, j, visualStart, logicalStart,3219oldRunCount, runLength, addedRuns, insertRemove,3220start, limit, step, indexOddBit, logicalPos,3221index, index1;3222int saveOptions;32233224reorderingMode = REORDER_DEFAULT;3225int parmLength = parmText.length;3226if (parmLength == 0) {3227setPara(parmText, parmParaLevel, null);3228reorderingMode = REORDER_RUNS_ONLY;3229return;3230}3231/* obtain memory for mapping table and visual text */3232saveOptions = reorderingOptions;3233if ((saveOptions & OPTION_INSERT_MARKS) > 0) {3234reorderingOptions &= ~OPTION_INSERT_MARKS;3235reorderingOptions |= OPTION_REMOVE_CONTROLS;3236}3237parmParaLevel &= 1; /* accept only 0 or 1 */3238setPara(parmText, parmParaLevel, null);3239/* we cannot access directly levels since it is not yet set if3240* direction is not MIXED3241*/3242saveLevels = new byte[this.length];3243System.arraycopy(getLevels(), 0, saveLevels, 0, this.length);3244saveTrailingWSStart = trailingWSStart;32453246/* FOOD FOR THOUGHT: instead of writing the visual text, we could use3247* the visual map and the dirProps array to drive the second call3248* to setPara (but must make provision for possible removal of3249* Bidi controls. Alternatively, only use the dirProps array via3250* customized classifier callback.3251*/3252visualText = writeReordered(DO_MIRRORING);3253visualMap = getVisualMap();3254this.reorderingOptions = saveOptions;3255saveLength = this.length;3256saveDirection=this.direction;32573258this.reorderingMode = REORDER_INVERSE_LIKE_DIRECT;3259parmParaLevel ^= 1;3260setPara(visualText, parmParaLevel, null);3261BidiLine.getRuns(this);3262/* check if some runs must be split, count how many splits */3263addedRuns = 0;3264oldRunCount = this.runCount;3265visualStart = 0;3266for (i = 0; i < oldRunCount; i++, visualStart += runLength) {3267runLength = runs[i].limit - visualStart;3268if (runLength < 2) {3269continue;3270}3271logicalStart = runs[i].start;3272for (j = logicalStart+1; j < logicalStart+runLength; j++) {3273index = visualMap[j];3274index1 = visualMap[j-1];3275if ((Bidi_Abs(index-index1)!=1) || (saveLevels[index]!=saveLevels[index1])) {3276addedRuns++;3277}3278}3279}3280if (addedRuns > 0) {3281getRunsMemory(oldRunCount + addedRuns);3282if (runCount == 1) {3283/* because we switch from UBiDi.simpleRuns to UBiDi.runs */3284runsMemory[0] = runs[0];3285} else {3286System.arraycopy(runs, 0, runsMemory, 0, runCount);3287}3288runs = runsMemory;3289runCount += addedRuns;3290for (i = oldRunCount; i < runCount; i++) {3291if (runs[i] == null) {3292runs[i] = new BidiRun(0, 0, (byte)0);3293}3294}3295}3296/* split runs which are not consecutive in source text */3297int newI;3298for (i = oldRunCount-1; i >= 0; i--) {3299newI = i + addedRuns;3300runLength = i==0 ? runs[0].limit :3301runs[i].limit - runs[i-1].limit;3302logicalStart = runs[i].start;3303indexOddBit = runs[i].level & 1;3304if (runLength < 2) {3305if (addedRuns > 0) {3306runs[newI].copyFrom(runs[i]);3307}3308logicalPos = visualMap[logicalStart];3309runs[newI].start = logicalPos;3310runs[newI].level = (byte)(saveLevels[logicalPos] ^ indexOddBit);3311continue;3312}3313if (indexOddBit > 0) {3314start = logicalStart;3315limit = logicalStart + runLength - 1;3316step = 1;3317} else {3318start = logicalStart + runLength - 1;3319limit = logicalStart;3320step = -1;3321}3322for (j = start; j != limit; j += step) {3323index = visualMap[j];3324index1 = visualMap[j+step];3325if ((Bidi_Abs(index-index1)!=1) || (saveLevels[index]!=saveLevels[index1])) {3326logicalPos = Bidi_Min(visualMap[start], index);3327runs[newI].start = logicalPos;3328runs[newI].level = (byte)(saveLevels[logicalPos] ^ indexOddBit);3329runs[newI].limit = runs[i].limit;3330runs[i].limit -= Bidi_Abs(j - start) + 1;3331insertRemove = runs[i].insertRemove & (LRM_AFTER|RLM_AFTER);3332runs[newI].insertRemove = insertRemove;3333runs[i].insertRemove &= ~insertRemove;3334start = j + step;3335addedRuns--;3336newI--;3337}3338}3339if (addedRuns > 0) {3340runs[newI].copyFrom(runs[i]);3341}3342logicalPos = Bidi_Min(visualMap[start], visualMap[limit]);3343runs[newI].start = logicalPos;3344runs[newI].level = (byte)(saveLevels[logicalPos] ^ indexOddBit);3345}33463347cleanup1:3348/* restore initial paraLevel */3349this.paraLevel ^= 1;3350cleanup2:3351/* restore real text */3352this.text = parmText;3353this.length = saveLength;3354this.originalLength = parmLength;3355this.direction=saveDirection;3356this.levels = saveLevels;3357this.trailingWSStart = saveTrailingWSStart;3358if (runCount > 1) {3359this.direction = MIXED;3360}3361cleanup3:3362this.reorderingMode = REORDER_RUNS_ONLY;3363}33643365/**3366* Perform the Unicode Bidi algorithm. It is defined in the3367* <a href="http://www.unicode.org/reports/tr9/">Unicode Standard Annex #9:3368* Unicode Bidirectional Algorithm</a>, version 13,3369* also described in The Unicode Standard, Version 4.0 .<p>3370*3371* This method takes a piece of plain text containing one or more paragraphs,3372* with or without externally specified embedding levels from <i>styled</i>3373* text and computes the left-right-directionality of each character.<p>3374*3375* If the entire text is all of the same directionality, then3376* the method may not perform all the steps described by the algorithm,3377* i.e., some levels may not be the same as if all steps were performed.3378* This is not relevant for unidirectional text.<br>3379* For example, in pure LTR text with numbers the numbers would get3380* a resolved level of 2 higher than the surrounding text according to3381* the algorithm. This implementation may set all resolved levels to3382* the same value in such a case.<p>3383*3384* The text can be composed of multiple paragraphs. Occurrence of a block3385* separator in the text terminates a paragraph, and whatever comes next starts3386* a new paragraph. The exception to this rule is when a Carriage Return (CR)3387* is followed by a Line Feed (LF). Both CR and LF are block separators, but3388* in that case, the pair of characters is considered as terminating the3389* preceding paragraph, and a new paragraph will be started by a character3390* coming after the LF.3391*3392* Although the text is passed here as a <code>String</code>, it is3393* stored internally as an array of characters. Therefore the3394* documentation will refer to indexes of the characters in the text.3395*3396* @param text contains the text that the Bidi algorithm will be performed3397* on. This text can be retrieved with <code>getText()</code> or3398* <code>getTextAsString</code>.<br>3399*3400* @param paraLevel specifies the default level for the text;3401* it is typically 0 (LTR) or 1 (RTL).3402* If the method shall determine the paragraph level from the text,3403* then <code>paraLevel</code> can be set to3404* either <code>LEVEL_DEFAULT_LTR</code>3405* or <code>LEVEL_DEFAULT_RTL</code>; if the text contains multiple3406* paragraphs, the paragraph level shall be determined separately for3407* each paragraph; if a paragraph does not include any strongly typed3408* character, then the desired default is used (0 for LTR or 1 for RTL).3409* Any other value between 0 and <code>MAX_EXPLICIT_LEVEL</code>3410* is also valid, with odd levels indicating RTL.3411*3412* @param embeddingLevels (in) may be used to preset the embedding and override levels,3413* ignoring characters like LRE and PDF in the text.3414* A level overrides the directional property of its corresponding3415* (same index) character if the level has the3416* <code>LEVEL_OVERRIDE</code> bit set.<br><br>3417* Except for that bit, it must be3418* <code>paraLevel<=embeddingLevels[]<=MAX_EXPLICIT_LEVEL</code>,3419* with one exception: a level of zero may be specified for a3420* paragraph separator even if <code>paraLevel>0</code> when multiple3421* paragraphs are submitted in the same call to <code>setPara()</code>.<br><br>3422* <strong>Caution: </strong>A reference to this array, not a copy3423* of the levels, will be stored in the <code>Bidi</code> object;3424* the <code>embeddingLevels</code>3425* should not be modified to avoid unexpected results on subsequent3426* Bidi operations. However, the <code>setPara()</code> and3427* <code>setLine()</code> methods may modify some or all of the3428* levels.<br><br>3429* <strong>Note:</strong> the <code>embeddingLevels</code> array must3430* have one entry for each character in <code>text</code>.3431*3432* @throws IllegalArgumentException if the values in embeddingLevels are3433* not within the allowed range3434*3435* @see #LEVEL_DEFAULT_LTR3436* @see #LEVEL_DEFAULT_RTL3437* @see #LEVEL_OVERRIDE3438* @see #MAX_EXPLICIT_LEVEL3439* @stable ICU 3.83440*/3441void setPara(String text, byte paraLevel, byte[] embeddingLevels)3442{3443if (text == null) {3444setPara(new char[0], paraLevel, embeddingLevels);3445} else {3446setPara(text.toCharArray(), paraLevel, embeddingLevels);3447}3448}34493450/**3451* Perform the Unicode Bidi algorithm. It is defined in the3452* <a href="http://www.unicode.org/reports/tr9/">Unicode Standard Annex #9:3453* Unicode Bidirectional Algorithm</a>, version 13,3454* also described in The Unicode Standard, Version 4.0 .<p>3455*3456* This method takes a piece of plain text containing one or more paragraphs,3457* with or without externally specified embedding levels from <i>styled</i>3458* text and computes the left-right-directionality of each character.<p>3459*3460* If the entire text is all of the same directionality, then3461* the method may not perform all the steps described by the algorithm,3462* i.e., some levels may not be the same as if all steps were performed.3463* This is not relevant for unidirectional text.<br>3464* For example, in pure LTR text with numbers the numbers would get3465* a resolved level of 2 higher than the surrounding text according to3466* the algorithm. This implementation may set all resolved levels to3467* the same value in such a case.3468*3469* The text can be composed of multiple paragraphs. Occurrence of a block3470* separator in the text terminates a paragraph, and whatever comes next starts3471* a new paragraph. The exception to this rule is when a Carriage Return (CR)3472* is followed by a Line Feed (LF). Both CR and LF are block separators, but3473* in that case, the pair of characters is considered as terminating the3474* preceding paragraph, and a new paragraph will be started by a character3475* coming after the LF.3476*3477* The text is stored internally as an array of characters. Therefore the3478* documentation will refer to indexes of the characters in the text.3479*3480* @param chars contains the text that the Bidi algorithm will be performed3481* on. This text can be retrieved with <code>getText()</code> or3482* <code>getTextAsString</code>.<br>3483*3484* @param paraLevel specifies the default level for the text;3485* it is typically 0 (LTR) or 1 (RTL).3486* If the method shall determine the paragraph level from the text,3487* then <code>paraLevel</code> can be set to3488* either <code>LEVEL_DEFAULT_LTR</code>3489* or <code>LEVEL_DEFAULT_RTL</code>; if the text contains multiple3490* paragraphs, the paragraph level shall be determined separately for3491* each paragraph; if a paragraph does not include any strongly typed3492* character, then the desired default is used (0 for LTR or 1 for RTL).3493* Any other value between 0 and <code>MAX_EXPLICIT_LEVEL</code>3494* is also valid, with odd levels indicating RTL.3495*3496* @param embeddingLevels (in) may be used to preset the embedding and3497* override levels, ignoring characters like LRE and PDF in the text.3498* A level overrides the directional property of its corresponding3499* (same index) character if the level has the3500* <code>LEVEL_OVERRIDE</code> bit set.<br><br>3501* Except for that bit, it must be3502* <code>paraLevel<=embeddingLevels[]<=MAX_EXPLICIT_LEVEL</code>,3503* with one exception: a level of zero may be specified for a3504* paragraph separator even if <code>paraLevel>0</code> when multiple3505* paragraphs are submitted in the same call to <code>setPara()</code>.<br><br>3506* <strong>Caution: </strong>A reference to this array, not a copy3507* of the levels, will be stored in the <code>Bidi</code> object;3508* the <code>embeddingLevels</code>3509* should not be modified to avoid unexpected results on subsequent3510* Bidi operations. However, the <code>setPara()</code> and3511* <code>setLine()</code> methods may modify some or all of the3512* levels.<br><br>3513* <strong>Note:</strong> the <code>embeddingLevels</code> array must3514* have one entry for each character in <code>text</code>.3515*3516* @throws IllegalArgumentException if the values in embeddingLevels are3517* not within the allowed range3518*3519* @see #LEVEL_DEFAULT_LTR3520* @see #LEVEL_DEFAULT_RTL3521* @see #LEVEL_OVERRIDE3522* @see #MAX_EXPLICIT_LEVEL3523* @stable ICU 3.83524*/3525void setPara(char[] chars, byte paraLevel, byte[] embeddingLevels)3526{3527/* check the argument values */3528if (paraLevel < LEVEL_DEFAULT_LTR) {3529verifyRange(paraLevel, 0, MAX_EXPLICIT_LEVEL + 1);3530}3531if (chars == null) {3532chars = new char[0];3533}35343535/* special treatment for RUNS_ONLY mode */3536if (reorderingMode == REORDER_RUNS_ONLY) {3537setParaRunsOnly(chars, paraLevel);3538return;3539}35403541/* initialize the Bidi object */3542this.paraBidi = null; /* mark unfinished setPara */3543this.text = chars;3544this.length = this.originalLength = this.resultLength = text.length;3545this.paraLevel = paraLevel;3546this.direction = (byte)(paraLevel & 1);3547this.paraCount = 1;35483549/* Allocate zero-length arrays instead of setting to null here; then3550* checks for null in various places can be eliminated.3551*/3552dirProps = new byte[0];3553levels = new byte[0];3554runs = new BidiRun[0];3555isGoodLogicalToVisualRunsMap = false;3556insertPoints.size = 0; /* clean up from last call */3557insertPoints.confirmed = 0; /* clean up from last call */35583559/*3560* Save the original paraLevel if contextual; otherwise, set to 0.3561*/3562defaultParaLevel = IsDefaultLevel(paraLevel) ? paraLevel : 0;35633564if (length == 0) {3565/*3566* For an empty paragraph, create a Bidi object with the paraLevel and3567* the flags and the direction set but without allocating zero-length arrays.3568* There is nothing more to do.3569*/3570if (IsDefaultLevel(paraLevel)) {3571this.paraLevel &= 1;3572defaultParaLevel = 0;3573}3574flags = DirPropFlagLR(paraLevel);3575runCount = 0;3576paraCount = 0;3577setParaSuccess();3578return;3579}35803581runCount = -1;35823583/*3584* Get the directional properties,3585* the flags bit-set, and3586* determine the paragraph level if necessary.3587*/3588getDirPropsMemory(length);3589dirProps = dirPropsMemory;3590getDirProps();3591/* the processed length may have changed if OPTION_STREAMING is set */3592trailingWSStart = length; /* the levels[] will reflect the WS run */35933594/* are explicit levels specified? */3595if (embeddingLevels == null) {3596/* no: determine explicit levels according to the (Xn) rules */3597getLevelsMemory(length);3598levels = levelsMemory;3599direction = resolveExplicitLevels();3600} else {3601/* set BN for all explicit codes, check that all levels are 0 or paraLevel..MAX_EXPLICIT_LEVEL */3602levels = embeddingLevels;3603direction = checkExplicitLevels();3604}36053606/* allocate isolate memory */3607if (isolateCount > 0) {3608if (isolates == null || isolates.length < isolateCount)3609isolates = new Isolate[isolateCount + 3]; /* keep some reserve */3610}3611isolateCount = -1; /* current isolates stack entry == none */36123613/*3614* The steps after (X9) in the Bidi algorithm are performed only if3615* the paragraph text has mixed directionality!3616*/3617switch (direction) {3618case LTR:3619/* all levels are implicitly at paraLevel (important for getLevels()) */3620trailingWSStart = 0;3621break;3622case RTL:3623/* all levels are implicitly at paraLevel (important for getLevels()) */3624trailingWSStart = 0;3625break;3626default:3627/*3628* Choose the right implicit state table3629*/3630switch(reorderingMode) {3631case REORDER_DEFAULT:3632this.impTabPair = impTab_DEFAULT;3633break;3634case REORDER_NUMBERS_SPECIAL:3635this.impTabPair = impTab_NUMBERS_SPECIAL;3636break;3637case REORDER_GROUP_NUMBERS_WITH_R:3638this.impTabPair = impTab_GROUP_NUMBERS_WITH_R;3639break;3640case REORDER_RUNS_ONLY:3641/* we should never get here */3642throw new InternalError("Internal ICU error in setPara");3643/* break; */3644case REORDER_INVERSE_NUMBERS_AS_L:3645this.impTabPair = impTab_INVERSE_NUMBERS_AS_L;3646break;3647case REORDER_INVERSE_LIKE_DIRECT:3648if ((reorderingOptions & OPTION_INSERT_MARKS) != 0) {3649this.impTabPair = impTab_INVERSE_LIKE_DIRECT_WITH_MARKS;3650} else {3651this.impTabPair = impTab_INVERSE_LIKE_DIRECT;3652}3653break;3654case REORDER_INVERSE_FOR_NUMBERS_SPECIAL:3655if ((reorderingOptions & OPTION_INSERT_MARKS) != 0) {3656this.impTabPair = impTab_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS;3657} else {3658this.impTabPair = impTab_INVERSE_FOR_NUMBERS_SPECIAL;3659}3660break;3661}3662/*3663* If there are no external levels specified and there3664* are no significant explicit level codes in the text,3665* then we can treat the entire paragraph as one run.3666* Otherwise, we need to perform the following rules on runs of3667* the text with the same embedding levels. (X10)3668* "Significant" explicit level codes are ones that actually3669* affect non-BN characters.3670* Examples for "insignificant" ones are empty embeddings3671* LRE-PDF, LRE-RLE-PDF-PDF, etc.3672*/3673if (embeddingLevels == null && paraCount <= 1 &&3674(flags & DirPropFlagMultiRuns) == 0) {3675resolveImplicitLevels(0, length,3676GetLRFromLevel(GetParaLevelAt(0)),3677GetLRFromLevel(GetParaLevelAt(length - 1)));3678} else {3679/* sor, eor: start and end types of same-level-run */3680int start, limit = 0;3681byte level, nextLevel;3682short sor, eor;36833684/* determine the first sor and set eor to it because of the loop body (sor=eor there) */3685level = GetParaLevelAt(0);3686nextLevel = levels[0];3687if (level < nextLevel) {3688eor = GetLRFromLevel(nextLevel);3689} else {3690eor = GetLRFromLevel(level);3691}36923693do {3694/* determine start and limit of the run (end points just behind the run) */36953696/* the values for this run's start are the same as for the previous run's end */3697start = limit;3698level = nextLevel;3699if ((start > 0) && (dirProps[start - 1] == B)) {3700/* except if this is a new paragraph, then set sor = para level */3701sor = GetLRFromLevel(GetParaLevelAt(start));3702} else {3703sor = eor;3704}37053706/* search for the limit of this run */3707while ((++limit < length) &&3708((levels[limit] == level) ||3709((DirPropFlag(dirProps[limit]) & MASK_BN_EXPLICIT) != 0))) {}37103711/* get the correct level of the next run */3712if (limit < length) {3713nextLevel = levels[limit];3714} else {3715nextLevel = GetParaLevelAt(length - 1);3716}37173718/* determine eor from max(level, nextLevel); sor is last run's eor */3719if (NoOverride(level) < NoOverride(nextLevel)) {3720eor = GetLRFromLevel(nextLevel);3721} else {3722eor = GetLRFromLevel(level);3723}37243725/* if the run consists of overridden directional types, then there3726are no implicit types to be resolved */3727if ((level & LEVEL_OVERRIDE) == 0) {3728resolveImplicitLevels(start, limit, sor, eor);3729} else {3730/* remove the LEVEL_OVERRIDE flags */3731do {3732levels[start++] &= ~LEVEL_OVERRIDE;3733} while (start < limit);3734}3735} while (limit < length);3736}37373738/* reset the embedding levels for some non-graphic characters (L1), (X9) */3739adjustWSLevels();37403741break;3742}37433744/* add RLM for inverse Bidi with contextual orientation resolving3745* to RTL which would not round-trip otherwise3746*/3747if ((defaultParaLevel > 0) &&3748((reorderingOptions & OPTION_INSERT_MARKS) != 0) &&3749((reorderingMode == REORDER_INVERSE_LIKE_DIRECT) ||3750(reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL))) {3751int start, last;3752byte level;3753byte dirProp;3754for (int i = 0; i < paraCount; i++) {3755last = paras_limit[i] - 1;3756level = paras_level[i];3757if (level == 0)3758continue; /* LTR paragraph */3759start = i == 0 ? 0 : paras_limit[i - 1];3760for (int j = last; j >= start; j--) {3761dirProp = dirProps[j];3762if (dirProp == L) {3763if (j < last) {3764while (dirProps[last] == B) {3765last--;3766}3767}3768addPoint(last, RLM_BEFORE);3769break;3770}3771if ((DirPropFlag(dirProp) & MASK_R_AL) != 0) {3772break;3773}3774}3775}3776}37773778if ((reorderingOptions & OPTION_REMOVE_CONTROLS) != 0) {3779resultLength -= controlCount;3780} else {3781resultLength += insertPoints.size;3782}3783setParaSuccess();3784}37853786/**3787* Perform the Unicode Bidi algorithm on a given paragraph, as defined in the3788* <a href="http://www.unicode.org/reports/tr9/">Unicode Standard Annex #9:3789* Unicode Bidirectional Algorithm</a>, version 13,3790* also described in The Unicode Standard, Version 4.0 .<p>3791*3792* This method takes a paragraph of text and computes the3793* left-right-directionality of each character. The text should not3794* contain any Unicode block separators.<p>3795*3796* The RUN_DIRECTION attribute in the text, if present, determines the base3797* direction (left-to-right or right-to-left). If not present, the base3798* direction is computed using the Unicode Bidirectional Algorithm,3799* defaulting to left-to-right if there are no strong directional characters3800* in the text. This attribute, if present, must be applied to all the text3801* in the paragraph.<p>3802*3803* The BIDI_EMBEDDING attribute in the text, if present, represents3804* embedding level information. Negative values from -1 to -62 indicate3805* overrides at the absolute value of the level. Positive values from 1 to3806* 62 indicate embeddings. Where values are zero or not defined, the base3807* embedding level as determined by the base direction is assumed.<p>3808*3809* The NUMERIC_SHAPING attribute in the text, if present, converts European3810* digits to other decimal digits before running the bidi algorithm. This3811* attribute, if present, must be applied to all the text in the paragraph.3812*3813* If the entire text is all of the same directionality, then3814* the method may not perform all the steps described by the algorithm,3815* i.e., some levels may not be the same as if all steps were performed.3816* This is not relevant for unidirectional text.<br>3817* For example, in pure LTR text with numbers the numbers would get3818* a resolved level of 2 higher than the surrounding text according to3819* the algorithm. This implementation may set all resolved levels to3820* the same value in such a case.<p>3821*3822* @param paragraph a paragraph of text with optional character and3823* paragraph attribute information3824* @stable ICU 3.83825*/3826public void setPara(AttributedCharacterIterator paragraph)3827{3828byte paraLvl;3829char ch = paragraph.first();3830Boolean runDirection =3831(Boolean) paragraph.getAttribute(TextAttributeConstants.RUN_DIRECTION);3832Object shaper = paragraph.getAttribute(TextAttributeConstants.NUMERIC_SHAPING);38333834if (runDirection == null) {3835paraLvl = LEVEL_DEFAULT_LTR;3836} else {3837paraLvl = (runDirection.equals(TextAttributeConstants.RUN_DIRECTION_LTR)) ?3838LTR : RTL;3839}38403841byte[] lvls = null;3842int len = paragraph.getEndIndex() - paragraph.getBeginIndex();3843byte[] embeddingLevels = new byte[len];3844char[] txt = new char[len];3845int i = 0;3846while (ch != AttributedCharacterIterator.DONE) {3847txt[i] = ch;3848Integer embedding =3849(Integer) paragraph.getAttribute(TextAttributeConstants.BIDI_EMBEDDING);3850if (embedding != null) {3851byte level = embedding.byteValue();3852if (level == 0) {3853/* no-op */3854} else if (level < 0) {3855lvls = embeddingLevels;3856embeddingLevels[i] = (byte)((0 - level) | LEVEL_OVERRIDE);3857} else {3858lvls = embeddingLevels;3859embeddingLevels[i] = level;3860}3861}3862ch = paragraph.next();3863++i;3864}38653866if (shaper != null) {3867NumericShapings.shape(shaper, txt, 0, len);3868}3869setPara(txt, paraLvl, lvls);3870}38713872/**3873* Specify whether block separators must be allocated level zero,3874* so that successive paragraphs will progress from left to right.3875* This method must be called before <code>setPara()</code>.3876* Paragraph separators (B) may appear in the text. Setting them to level zero3877* means that all paragraph separators (including one possibly appearing3878* in the last text position) are kept in the reordered text after the text3879* that they follow in the source text.3880* When this feature is not enabled, a paragraph separator at the last3881* position of the text before reordering will go to the first position3882* of the reordered text when the paragraph level is odd.3883*3884* @param ordarParaLTR specifies whether paragraph separators (B) must3885* receive level 0, so that successive paragraphs progress from left to right.3886*3887* @see #setPara3888* @stable ICU 3.83889*/3890public void orderParagraphsLTR(boolean ordarParaLTR) {3891orderParagraphsLTR = ordarParaLTR;3892}38933894/**3895* Get the directionality of the text.3896*3897* @return a value of <code>LTR</code>, <code>RTL</code> or <code>MIXED</code>3898* that indicates if the entire text3899* represented by this object is unidirectional,3900* and which direction, or if it is mixed-directional.3901*3902* @throws IllegalStateException if this call is not preceded by a successful3903* call to <code>setPara</code> or <code>setLine</code>3904*3905* @see #LTR3906* @see #RTL3907* @see #MIXED3908* @stable ICU 3.83909*/3910public byte getDirection()3911{3912verifyValidParaOrLine();3913return direction;3914}39153916/**3917* Get the length of the text.3918*3919* @return The length of the text that the <code>Bidi</code> object was3920* created for.3921*3922* @throws IllegalStateException if this call is not preceded by a successful3923* call to <code>setPara</code> or <code>setLine</code>3924* @stable ICU 3.83925*/3926public int getLength()3927{3928verifyValidParaOrLine();3929return originalLength;3930}39313932/* paragraphs API methods ------------------------------------------------- */39333934/**3935* Get the paragraph level of the text.3936*3937* @return The paragraph level. If there are multiple paragraphs, their3938* level may vary if the required paraLevel is LEVEL_DEFAULT_LTR or3939* LEVEL_DEFAULT_RTL. In that case, the level of the first paragraph3940* is returned.3941*3942* @throws IllegalStateException if this call is not preceded by a successful3943* call to <code>setPara</code> or <code>setLine</code>3944*3945* @see #LEVEL_DEFAULT_LTR3946* @see #LEVEL_DEFAULT_RTL3947* @see #getParagraph3948* @see #getParagraphByIndex3949* @stable ICU 3.83950*/3951public byte getParaLevel()3952{3953verifyValidParaOrLine();3954return paraLevel;3955}39563957/**3958* Retrieves the Bidi class for a given code point.3959* <p>If a <code>BidiClassifier</code> is defined and returns a value3960* other than <code>CLASS_DEFAULT</code>, that value is used; otherwise3961* the default class determination mechanism is invoked.</p>3962*3963* @param c The code point to get a Bidi class for.3964*3965* @return The Bidi class for the character <code>c</code> that is in effect3966* for this <code>Bidi</code> instance.3967*3968* @stable ICU 3.83969*/3970public int getCustomizedClass(int c) {3971int dir;39723973dir = bdp.getClass(c);3974if (dir >= CHAR_DIRECTION_COUNT)3975dir = ON;3976return dir;3977}39783979/**3980* <code>setLine()</code> returns a <code>Bidi</code> object to3981* contain the reordering information, especially the resolved levels,3982* for all the characters in a line of text. This line of text is3983* specified by referring to a <code>Bidi</code> object representing3984* this information for a piece of text containing one or more paragraphs,3985* and by specifying a range of indexes in this text.<p>3986* In the new line object, the indexes will range from 0 to <code>limit-start-1</code>.<p>3987*3988* This is used after calling <code>setPara()</code>3989* for a piece of text, and after line-breaking on that text.3990* It is not necessary if each paragraph is treated as a single line.<p>3991*3992* After line-breaking, rules (L1) and (L2) for the treatment of3993* trailing WS and for reordering are performed on3994* a <code>Bidi</code> object that represents a line.<p>3995*3996* <strong>Important: </strong>the line <code>Bidi</code> object may3997* reference data within the global text <code>Bidi</code> object.3998* You should not alter the content of the global text object until3999* you are finished using the line object.4000*4001* @param start is the line's first index into the text.4002*4003* @param limit is just behind the line's last index into the text4004* (its last index +1).4005*4006* @return a <code>Bidi</code> object that will now represent a line of the text.4007*4008* @throws IllegalStateException if this call is not preceded by a successful4009* call to <code>setPara</code>4010* @throws IllegalArgumentException if start and limit are not in the range4011* <code>0<=start<limit<=getProcessedLength()</code>,4012* or if the specified line crosses a paragraph boundary4013*4014* @see #setPara4015* @see #getProcessedLength4016* @stable ICU 3.84017*/4018public Bidi setLine(Bidi bidi, BidiBase bidiBase, Bidi newBidi, BidiBase newBidiBase, int start, int limit)4019{4020verifyValidPara();4021verifyRange(start, 0, limit);4022verifyRange(limit, 0, length+1);40234024return BidiLine.setLine(this, newBidi, newBidiBase, start, limit);4025}40264027/**4028* Get the level for one character.4029*4030* @param charIndex the index of a character.4031*4032* @return The level for the character at <code>charIndex</code>.4033*4034* @throws IllegalStateException if this call is not preceded by a successful4035* call to <code>setPara</code> or <code>setLine</code>4036* @throws IllegalArgumentException if charIndex is not in the range4037* <code>0<=charIndex<getProcessedLength()</code>4038*4039* @see #getProcessedLength4040* @stable ICU 3.84041*/4042public byte getLevelAt(int charIndex)4043{4044// for backward compatibility4045if (charIndex < 0 || charIndex >= length) {4046return (byte)getBaseLevel();4047}40484049verifyValidParaOrLine();4050verifyRange(charIndex, 0, length);4051return BidiLine.getLevelAt(this, charIndex);4052}40534054/**4055* Get an array of levels for each character.<p>4056*4057* Note that this method may allocate memory under some4058* circumstances, unlike <code>getLevelAt()</code>.4059*4060* @return The levels array for the text,4061* or <code>null</code> if an error occurs.4062*4063* @throws IllegalStateException if this call is not preceded by a successful4064* call to <code>setPara</code> or <code>setLine</code>4065* @stable ICU 3.84066*/4067byte[] getLevels()4068{4069verifyValidParaOrLine();4070if (length <= 0) {4071return new byte[0];4072}4073return BidiLine.getLevels(this);4074}40754076/**4077* Get the number of runs.4078* This method may invoke the actual reordering on the4079* <code>Bidi</code> object, after <code>setPara()</code>4080* may have resolved only the levels of the text. Therefore,4081* <code>countRuns()</code> may have to allocate memory,4082* and may throw an exception if it fails to do so.4083*4084* @return The number of runs.4085*4086* @throws IllegalStateException if this call is not preceded by a successful4087* call to <code>setPara</code> or <code>setLine</code>4088* @stable ICU 3.84089*/4090public int countRuns()4091{4092verifyValidParaOrLine();4093BidiLine.getRuns(this);4094return runCount;4095}40964097/**4098*4099* Get a <code>BidiRun</code> object according to its index. BidiRun methods4100* may be used to retrieve the run's logical start, length and level,4101* which can be even for an LTR run or odd for an RTL run.4102* In an RTL run, the character at the logical start is4103* visually on the right of the displayed run.4104* The length is the number of characters in the run.<p>4105* <code>countRuns()</code> is normally called4106* before the runs are retrieved.4107*4108* <p>4109* Example:4110* <pre>4111* Bidi bidi = new Bidi();4112* String text = "abc 123 DEFG xyz";4113* bidi.setPara(text, Bidi.RTL, null);4114* int i, count=bidi.countRuns(), logicalStart, visualIndex=0, length;4115* BidiRun run;4116* for (i = 0; i < count; ++i) {4117* run = bidi.getVisualRun(i);4118* logicalStart = run.getStart();4119* length = run.getLength();4120* if (Bidi.LTR == run.getEmbeddingLevel()) {4121* do { // LTR4122* show_char(text.charAt(logicalStart++), visualIndex++);4123* } while (--length > 0);4124* } else {4125* logicalStart += length; // logicalLimit4126* do { // RTL4127* show_char(text.charAt(--logicalStart), visualIndex++);4128* } while (--length > 0);4129* }4130* }4131* </pre>4132* <p>4133* Note that in right-to-left runs, code like this places4134* second surrogates before first ones (which is generally a bad idea)4135* and combining characters before base characters.4136* <p>4137* Use of <code>{@link #writeReordered}</code>, optionally with the4138* <code>{@link #KEEP_BASE_COMBINING}</code> option, can be considered in4139* order to avoid these issues.4140*4141* @param runIndex is the number of the run in visual order, in the4142* range <code>[0..countRuns()-1]</code>.4143*4144* @return a BidiRun object containing the details of the run. The4145* directionality of the run is4146* <code>LTR==0</code> or <code>RTL==1</code>,4147* never <code>MIXED</code>.4148*4149* @throws IllegalStateException if this call is not preceded by a successful4150* call to <code>setPara</code> or <code>setLine</code>4151* @throws IllegalArgumentException if <code>runIndex</code> is not in4152* the range <code>0<=runIndex<countRuns()</code>4153*4154* @see #countRuns()4155* @see com.ibm.icu.text.BidiRun4156* @see com.ibm.icu.text.BidiRun#getStart()4157* @see com.ibm.icu.text.BidiRun#getLength()4158* @see com.ibm.icu.text.BidiRun#getEmbeddingLevel()4159* @stable ICU 3.84160*/4161BidiRun getVisualRun(int runIndex)4162{4163verifyValidParaOrLine();4164BidiLine.getRuns(this);4165verifyRange(runIndex, 0, runCount);4166return BidiLine.getVisualRun(this, runIndex);4167}41684169/**4170* Get a visual-to-logical index map (array) for the characters in the4171* <code>Bidi</code> (paragraph or line) object.4172* <p>4173* Some values in the map may be <code>MAP_NOWHERE</code> if the4174* corresponding text characters are Bidi marks inserted in the visual4175* output by the option <code>OPTION_INSERT_MARKS</code>.4176* <p>4177* When the visual output is altered by using options of4178* <code>writeReordered()</code> such as <code>INSERT_LRM_FOR_NUMERIC</code>,4179* <code>KEEP_BASE_COMBINING</code>, <code>OUTPUT_REVERSE</code>,4180* <code>REMOVE_BIDI_CONTROLS</code>, the logical positions returned may not4181* be correct. It is advised to use, when possible, reordering options4182* such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}.4183*4184* @return an array of <code>getResultLength()</code>4185* indexes which will reflect the reordering of the characters.<br><br>4186* The index map will result in4187* <code>indexMap[visualIndex]==logicalIndex</code>, where4188* <code>indexMap</code> represents the returned array.4189*4190* @throws IllegalStateException if this call is not preceded by a successful4191* call to <code>setPara</code> or <code>setLine</code>4192*4193* @see #getLogicalMap4194* @see #getLogicalIndex4195* @see #getResultLength4196* @see #MAP_NOWHERE4197* @see #OPTION_INSERT_MARKS4198* @see #writeReordered4199* @stable ICU 3.84200*/4201private int[] getVisualMap()4202{4203/* countRuns() checks successful call to setPara/setLine */4204countRuns();4205if (resultLength <= 0) {4206return new int[0];4207}4208return BidiLine.getVisualMap(this);4209}42104211/**4212* This is a convenience method that does not use a <code>Bidi</code> object.4213* It is intended to be used for when an application has determined the levels4214* of objects (character sequences) and just needs to have them reordered (L2).4215* This is equivalent to using <code>getVisualMap()</code> on a4216* <code>Bidi</code> object.4217*4218* @param levels is an array of levels that have been determined by4219* the application.4220*4221* @return an array of <code>levels.length</code>4222* indexes which will reflect the reordering of the characters.<p>4223* The index map will result in4224* <code>indexMap[visualIndex]==logicalIndex</code>, where4225* <code>indexMap</code> represents the returned array.4226*4227* @stable ICU 3.84228*/4229private static int[] reorderVisual(byte[] levels)4230{4231return BidiLine.reorderVisual(levels);4232}42334234/**4235* Constant indicating that the base direction depends on the first strong4236* directional character in the text according to the Unicode Bidirectional4237* Algorithm. If no strong directional character is present, the base4238* direction is right-to-left.4239* @stable ICU 3.84240*/4241public static final int DIRECTION_DEFAULT_RIGHT_TO_LEFT = LEVEL_DEFAULT_RTL;42424243/**4244* Create Bidi from the given text, embedding, and direction information.4245* The embeddings array may be null. If present, the values represent4246* embedding level information. Negative values from -1 to -61 indicate4247* overrides at the absolute value of the level. Positive values from 1 to4248* 61 indicate embeddings. Where values are zero, the base embedding level4249* as determined by the base direction is assumed.<p>4250*4251* Note: this constructor calls setPara() internally.4252*4253* @param text an array containing the paragraph of text to process.4254* @param textStart the index into the text array of the start of the4255* paragraph.4256* @param embeddings an array containing embedding values for each character4257* in the paragraph. This can be null, in which case it is assumed4258* that there is no external embedding information.4259* @param embStart the index into the embedding array of the start of the4260* paragraph.4261* @param paragraphLength the length of the paragraph in the text and4262* embeddings arrays.4263* @param flags a collection of flags that control the algorithm. The4264* algorithm understands the flags DIRECTION_LEFT_TO_RIGHT,4265* DIRECTION_RIGHT_TO_LEFT, DIRECTION_DEFAULT_LEFT_TO_RIGHT, and4266* DIRECTION_DEFAULT_RIGHT_TO_LEFT. Other values are reserved.4267*4268* @throws IllegalArgumentException if the values in embeddings are4269* not within the allowed range4270*4271* @see #DIRECTION_LEFT_TO_RIGHT4272* @see #DIRECTION_RIGHT_TO_LEFT4273* @see #DIRECTION_DEFAULT_LEFT_TO_RIGHT4274* @see #DIRECTION_DEFAULT_RIGHT_TO_LEFT4275* @stable ICU 3.84276*/4277public BidiBase(char[] text,4278int textStart,4279byte[] embeddings,4280int embStart,4281int paragraphLength,4282int flags)4283{4284this(0, 0);4285byte paraLvl;4286switch (flags) {4287case Bidi.DIRECTION_LEFT_TO_RIGHT:4288default:4289paraLvl = LTR;4290break;4291case Bidi.DIRECTION_RIGHT_TO_LEFT:4292paraLvl = RTL;4293break;4294case Bidi.DIRECTION_DEFAULT_LEFT_TO_RIGHT:4295paraLvl = LEVEL_DEFAULT_LTR;4296break;4297case Bidi.DIRECTION_DEFAULT_RIGHT_TO_LEFT:4298paraLvl = LEVEL_DEFAULT_RTL;4299break;4300}4301byte[] paraEmbeddings;4302if (embeddings == null) {4303paraEmbeddings = null;4304} else {4305paraEmbeddings = new byte[paragraphLength];4306byte lev;4307for (int i = 0; i < paragraphLength; i++) {4308lev = embeddings[i + embStart];4309if (lev < 0) {4310lev = (byte)((- lev) | LEVEL_OVERRIDE);4311} else if (lev == 0) {4312lev = paraLvl;4313if (paraLvl > MAX_EXPLICIT_LEVEL) {4314lev &= 1;4315}4316}4317paraEmbeddings[i] = lev;4318}4319}43204321char[] paraText = new char[paragraphLength];4322System.arraycopy(text, textStart, paraText, 0, paragraphLength);4323setPara(paraText, paraLvl, paraEmbeddings);4324}43254326/**4327* Return true if the line is not left-to-right or right-to-left. This means4328* it either has mixed runs of left-to-right and right-to-left text, or the4329* base direction differs from the direction of the only run of text.4330*4331* @return true if the line is not left-to-right or right-to-left.4332*4333* @throws IllegalStateException if this call is not preceded by a successful4334* call to <code>setPara</code>4335* @stable ICU 3.84336*/4337public boolean isMixed()4338{4339return (!isLeftToRight() && !isRightToLeft());4340}43414342/**4343* Return true if the line is all left-to-right text and the base direction4344* is left-to-right.4345*4346* @return true if the line is all left-to-right text and the base direction4347* is left-to-right.4348*4349* @throws IllegalStateException if this call is not preceded by a successful4350* call to <code>setPara</code>4351* @stable ICU 3.84352*/4353public boolean isLeftToRight()4354{4355return (getDirection() == LTR && (paraLevel & 1) == 0);4356}43574358/**4359* Return true if the line is all right-to-left text, and the base direction4360* is right-to-left4361*4362* @return true if the line is all right-to-left text, and the base4363* direction is right-to-left4364*4365* @throws IllegalStateException if this call is not preceded by a successful4366* call to <code>setPara</code>4367* @stable ICU 3.84368*/4369public boolean isRightToLeft()4370{4371return (getDirection() == RTL && (paraLevel & 1) == 1);4372}43734374/**4375* Return true if the base direction is left-to-right4376*4377* @return true if the base direction is left-to-right4378*4379* @throws IllegalStateException if this call is not preceded by a successful4380* call to <code>setPara</code> or <code>setLine</code>4381*4382* @stable ICU 3.84383*/4384public boolean baseIsLeftToRight()4385{4386return (getParaLevel() == LTR);4387}43884389/**4390* Return the base level (0 if left-to-right, 1 if right-to-left).4391*4392* @return the base level4393*4394* @throws IllegalStateException if this call is not preceded by a successful4395* call to <code>setPara</code> or <code>setLine</code>4396*4397* @stable ICU 3.84398*/4399public int getBaseLevel()4400{4401return getParaLevel();4402}44034404/**4405* Compute the logical to visual run mapping4406*/4407void getLogicalToVisualRunsMap()4408{4409if (isGoodLogicalToVisualRunsMap) {4410return;4411}4412int count = countRuns();4413if ((logicalToVisualRunsMap == null) ||4414(logicalToVisualRunsMap.length < count)) {4415logicalToVisualRunsMap = new int[count];4416}4417int i;4418long[] keys = new long[count];4419for (i = 0; i < count; i++) {4420keys[i] = ((long)(runs[i].start)<<32) + i;4421}4422Arrays.sort(keys);4423for (i = 0; i < count; i++) {4424logicalToVisualRunsMap[i] = (int)(keys[i] & 0x00000000FFFFFFFF);4425}4426isGoodLogicalToVisualRunsMap = true;4427}44284429/**4430* Return the level of the nth logical run in this line.4431*4432* @param run the index of the run, between 0 and <code>countRuns()-1</code>4433*4434* @return the level of the run4435*4436* @throws IllegalStateException if this call is not preceded by a successful4437* call to <code>setPara</code> or <code>setLine</code>4438* @throws IllegalArgumentException if <code>run</code> is not in4439* the range <code>0<=run<countRuns()</code>4440* @stable ICU 3.84441*/4442public int getRunLevel(int run)4443{4444verifyValidParaOrLine();4445BidiLine.getRuns(this);44464447// for backward compatibility4448if (run < 0 || run >= runCount) {4449return getParaLevel();4450}44514452getLogicalToVisualRunsMap();4453return runs[logicalToVisualRunsMap[run]].level;4454}44554456/**4457* Return the index of the character at the start of the nth logical run in4458* this line, as an offset from the start of the line.4459*4460* @param run the index of the run, between 0 and <code>countRuns()</code>4461*4462* @return the start of the run4463*4464* @throws IllegalStateException if this call is not preceded by a successful4465* call to <code>setPara</code> or <code>setLine</code>4466* @throws IllegalArgumentException if <code>run</code> is not in4467* the range <code>0<=run<countRuns()</code>4468* @stable ICU 3.84469*/4470public int getRunStart(int run)4471{4472verifyValidParaOrLine();4473BidiLine.getRuns(this);44744475// for backward compatibility4476if (runCount == 1) {4477return 0;4478} else if (run == runCount) {4479return length;4480}44814482getLogicalToVisualRunsMap();4483return runs[logicalToVisualRunsMap[run]].start;4484}44854486/**4487* Return the index of the character past the end of the nth logical run in4488* this line, as an offset from the start of the line. For example, this4489* will return the length of the line for the last run on the line.4490*4491* @param run the index of the run, between 0 and <code>countRuns()</code>4492*4493* @return the limit of the run4494*4495* @throws IllegalStateException if this call is not preceded by a successful4496* call to <code>setPara</code> or <code>setLine</code>4497* @throws IllegalArgumentException if <code>run</code> is not in4498* the range <code>0<=run<countRuns()</code>4499* @stable ICU 3.84500*/4501public int getRunLimit(int run)4502{4503verifyValidParaOrLine();4504BidiLine.getRuns(this);45054506// for backward compatibility4507if (runCount == 1) {4508return length;4509}45104511getLogicalToVisualRunsMap();4512int idx = logicalToVisualRunsMap[run];4513int len = idx == 0 ? runs[idx].limit :4514runs[idx].limit - runs[idx-1].limit;4515return runs[idx].start + len;4516}45174518/**4519* Return true if the specified text requires bidi analysis. If this returns4520* false, the text will display left-to-right. Clients can then avoid4521* constructing a Bidi object. Text in the Arabic Presentation Forms area of4522* Unicode is presumed to already be shaped and ordered for display, and so4523* will not cause this method to return true.4524*4525* @param text the text containing the characters to test4526* @param start the start of the range of characters to test4527* @param limit the limit of the range of characters to test4528*4529* @return true if the range of characters requires bidi analysis4530*4531* @stable ICU 3.84532*/4533public static boolean requiresBidi(char[] text,4534int start,4535int limit)4536{4537final int RTLMask = (1 << R |45381 << AL |45391 << RLE |45401 << RLO |45411 << AN);45424543if (0 > start || start > limit || limit > text.length) {4544throw new IllegalArgumentException("Value start " + start +4545" is out of range 0 to " + limit + ", or limit " + limit +4546" is beyond the text length " + text.length);4547}45484549for (int i = start; i < limit; ++i) {4550if (Character.isHighSurrogate(text[i]) && i < (limit-1) &&4551Character.isLowSurrogate(text[i+1])) {4552if (((1 << UCharacter.getDirection(Character.codePointAt(text, i))) & RTLMask) != 0) {4553return true;4554}4555} else if (((1 << UCharacter.getDirection(text[i])) & RTLMask) != 0) {4556return true;4557}4558}45594560return false;4561}45624563/**4564* Reorder the objects in the array into visual order based on their levels.4565* This is a utility method to use when you have a collection of objects4566* representing runs of text in logical order, each run containing text at a4567* single level. The elements at <code>index</code> from4568* <code>objectStart</code> up to <code>objectStart + count</code> in the4569* objects array will be reordered into visual order assuming4570* each run of text has the level indicated by the corresponding element in4571* the levels array (at <code>index - objectStart + levelStart</code>).4572*4573* @param levels an array representing the bidi level of each object4574* @param levelStart the start position in the levels array4575* @param objects the array of objects to be reordered into visual order4576* @param objectStart the start position in the objects array4577* @param count the number of objects to reorder4578* @stable ICU 3.84579*/4580public static void reorderVisually(byte[] levels,4581int levelStart,4582Object[] objects,4583int objectStart,4584int count)4585{4586// for backward compatibility4587if (0 > levelStart || levels.length <= levelStart) {4588throw new IllegalArgumentException("Value levelStart " +4589levelStart + " is out of range 0 to " +4590(levels.length-1));4591}4592if (0 > objectStart || objects.length <= objectStart) {4593throw new IllegalArgumentException("Value objectStart " +4594objectStart + " is out of range 0 to " +4595(objects.length-1));4596}4597if (0 > count || objects.length < (objectStart+count)) {4598throw new IllegalArgumentException("Value count " +4599count + " is less than zero, or objectStart + count" +4600" is beyond objects length " + objects.length);4601}46024603byte[] reorderLevels = new byte[count];4604System.arraycopy(levels, levelStart, reorderLevels, 0, count);4605int[] indexMap = reorderVisual(reorderLevels);4606Object[] temp = new Object[count];4607System.arraycopy(objects, objectStart, temp, 0, count);4608for (int i = 0; i < count; ++i) {4609objects[objectStart + i] = temp[indexMap[i]];4610}4611}46124613/**4614* Take a <code>Bidi</code> object containing the reordering4615* information for a piece of text (one or more paragraphs) set by4616* <code>setPara()</code> or for a line of text set by <code>setLine()</code>4617* and return a string containing the reordered text.4618*4619* <p>The text may have been aliased (only a reference was stored4620* without copying the contents), thus it must not have been modified4621* since the <code>setPara()</code> call.</p>4622*4623* This method preserves the integrity of characters with multiple4624* code units and (optionally) combining characters.4625* Characters in RTL runs can be replaced by mirror-image characters4626* in the returned string. Note that "real" mirroring has to be done in a4627* rendering engine by glyph selection and that for many "mirrored"4628* characters there are no Unicode characters as mirror-image equivalents.4629* There are also options to insert or remove Bidi control4630* characters; see the descriptions of the return value and the4631* <code>options</code> parameter, and of the option bit flags.4632*4633* @param options A bit set of options for the reordering that control4634* how the reordered text is written.4635* The options include mirroring the characters on a code4636* point basis and inserting LRM characters, which is used4637* especially for transforming visually stored text4638* to logically stored text (although this is still an4639* imperfect implementation of an "inverse Bidi" algorithm4640* because it uses the "forward Bidi" algorithm at its core).4641* The available options are:4642* <code>DO_MIRRORING</code>,4643* <code>INSERT_LRM_FOR_NUMERIC</code>,4644* <code>KEEP_BASE_COMBINING</code>,4645* <code>OUTPUT_REVERSE</code>,4646* <code>REMOVE_BIDI_CONTROLS</code>,4647* <code>STREAMING</code>4648*4649* @return The reordered text.4650* If the <code>INSERT_LRM_FOR_NUMERIC</code> option is set, then4651* the length of the returned string could be as large as4652* <code>getLength()+2*countRuns()</code>.<br>4653* If the <code>REMOVE_BIDI_CONTROLS</code> option is set, then the4654* length of the returned string may be less than4655* <code>getLength()</code>.<br>4656* If none of these options is set, then the length of the returned4657* string will be exactly <code>getProcessedLength()</code>.4658*4659* @throws IllegalStateException if this call is not preceded by a successful4660* call to <code>setPara</code> or <code>setLine</code>4661*4662* @see #DO_MIRRORING4663* @see #INSERT_LRM_FOR_NUMERIC4664* @see #KEEP_BASE_COMBINING4665* @see #OUTPUT_REVERSE4666* @see #REMOVE_BIDI_CONTROLS4667* @see #OPTION_STREAMING4668* @see #getProcessedLength4669* @stable ICU 3.84670*/4671public String writeReordered(int options)4672{4673verifyValidParaOrLine();4674if (length == 0) {4675/* nothing to do */4676return "";4677}4678return BidiWriter.writeReordered(this, options);4679}46804681/**4682* Display the bidi internal state, used in debugging.4683*/4684public String toString() {4685StringBuilder buf = new StringBuilder(getClass().getName());46864687buf.append("[dir: ");4688buf.append(direction);4689buf.append(" baselevel: ");4690buf.append(paraLevel);4691buf.append(" length: ");4692buf.append(length);4693buf.append(" runs: ");4694if (levels == null) {4695buf.append("none");4696} else {4697buf.append('[');4698buf.append(levels[0]);4699for (int i = 1; i < levels.length; i++) {4700buf.append(' ');4701buf.append(levels[i]);4702}4703buf.append(']');4704}4705buf.append(" text: [0x");4706buf.append(Integer.toHexString(text[0]));4707for (int i = 1; i < text.length; i++) {4708buf.append(" 0x");4709buf.append(Integer.toHexString(text[i]));4710}4711buf.append("]]");47124713return buf.toString();4714}47154716/**4717* A class that provides access to constants defined by4718* java.awt.font.TextAttribute without creating a static dependency.4719*/4720private static class TextAttributeConstants {4721// Make sure to load the AWT's TextAttribute class before using the constants, if any.4722static {4723try {4724Class.forName("java.awt.font.TextAttribute", true, null);4725} catch (ClassNotFoundException e) {}4726}4727static final JavaAWTFontAccess jafa = SharedSecrets.getJavaAWTFontAccess();47284729/**4730* TextAttribute instances (or a fake Attribute type if4731* java.awt.font.TextAttribute is not present)4732*/4733static final AttributedCharacterIterator.Attribute RUN_DIRECTION =4734getTextAttribute("RUN_DIRECTION");4735static final AttributedCharacterIterator.Attribute NUMERIC_SHAPING =4736getTextAttribute("NUMERIC_SHAPING");4737static final AttributedCharacterIterator.Attribute BIDI_EMBEDDING =4738getTextAttribute("BIDI_EMBEDDING");47394740/**4741* TextAttribute.RUN_DIRECTION_LTR4742*/4743static final Boolean RUN_DIRECTION_LTR = (jafa == null) ?4744Boolean.FALSE : (Boolean)jafa.getTextAttributeConstant("RUN_DIRECTION_LTR");47454746@SuppressWarnings("serial")4747private static AttributedCharacterIterator.Attribute4748getTextAttribute(String name)4749{4750if (jafa == null) {4751// fake attribute4752return new AttributedCharacterIterator.Attribute(name) { };4753} else {4754return (AttributedCharacterIterator.Attribute)jafa.getTextAttributeConstant(name);4755}4756}4757}47584759/**4760* A class that provides access to java.awt.font.NumericShaper without4761* creating a static dependency.4762*/4763private static class NumericShapings {4764// Make sure to load the AWT's NumericShaper class before calling shape, if any.4765static {4766try {4767Class.forName("java.awt.font.NumericShaper", true, null);4768} catch (ClassNotFoundException e) {}4769}4770static final JavaAWTFontAccess jafa = SharedSecrets.getJavaAWTFontAccess();47714772/**4773* Invokes NumericShaping shape(text,start,count) method.4774*/4775static void shape(Object shaper, char[] text, int start, int count) {4776if (jafa != null) {4777jafa.shape(shaper, text, start, count);4778}4779}4780}47814782}478347844785