Path: blob/master/src/java.base/share/classes/java/nio/charset/Charset.java
41159 views
/*1* Copyright (c) 2000, 2021, Oracle and/or its affiliates. All rights reserved.2* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.3*4* This code is free software; you can redistribute it and/or modify it5* under the terms of the GNU General Public License version 2 only, as6* published by the Free Software Foundation. Oracle designates this7* particular file as subject to the "Classpath" exception as provided8* by Oracle in the LICENSE file that accompanied this code.9*10* This code is distributed in the hope that it will be useful, but WITHOUT11* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or12* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License13* version 2 for more details (a copy is included in the LICENSE file that14* accompanied this code).15*16* You should have received a copy of the GNU General Public License version17* 2 along with this work; if not, write to the Free Software Foundation,18* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.19*20* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA21* or visit www.oracle.com if you need additional information or have any22* questions.23*/2425package java.nio.charset;2627import jdk.internal.misc.VM;28import sun.nio.cs.ThreadLocalCoders;29import sun.security.action.GetPropertyAction;3031import java.nio.ByteBuffer;32import java.nio.CharBuffer;33import java.nio.charset.spi.CharsetProvider;34import java.security.AccessController;35import java.security.PrivilegedAction;36import java.util.Arrays;37import java.util.Collections;38import java.util.HashSet;39import java.util.Iterator;40import java.util.Locale;41import java.util.Map;42import java.util.NoSuchElementException;43import java.util.Objects;44import java.util.ServiceConfigurationError;45import java.util.ServiceLoader;46import java.util.Set;47import java.util.SortedMap;48import java.util.TreeMap;495051/**52* A named mapping between sequences of sixteen-bit Unicode <a53* href="../../lang/Character.html#unicode">code units</a> and sequences of54* bytes. This class defines methods for creating decoders and encoders and55* for retrieving the various names associated with a charset. Instances of56* this class are immutable.57*58* <p> This class also defines static methods for testing whether a particular59* charset is supported, for locating charset instances by name, and for60* constructing a map that contains every charset for which support is61* available in the current Java virtual machine. Support for new charsets can62* be added via the service-provider interface defined in the {@link63* java.nio.charset.spi.CharsetProvider} class.64*65* <p> All of the methods defined in this class are safe for use by multiple66* concurrent threads.67*68*69* <h2><a id="names">Charset names</a></h2>70*71* <p> Charsets are named by strings composed of the following characters:72*73* <ul>74*75* <li> The uppercase letters {@code 'A'} through {@code 'Z'}76* (<code>'\u0041'</code> through <code>'\u005a'</code>),77*78* <li> The lowercase letters {@code 'a'} through {@code 'z'}79* (<code>'\u0061'</code> through <code>'\u007a'</code>),80*81* <li> The digits {@code '0'} through {@code '9'}82* (<code>'\u0030'</code> through <code>'\u0039'</code>),83*84* <li> The dash character {@code '-'}85* (<code>'\u002d'</code>, <small>HYPHEN-MINUS</small>),86*87* <li> The plus character {@code '+'}88* (<code>'\u002b'</code>, <small>PLUS SIGN</small>),89*90* <li> The period character {@code '.'}91* (<code>'\u002e'</code>, <small>FULL STOP</small>),92*93* <li> The colon character {@code ':'}94* (<code>'\u003a'</code>, <small>COLON</small>), and95*96* <li> The underscore character {@code '_'}97* (<code>'\u005f'</code>, <small>LOW LINE</small>).98*99* </ul>100*101* A charset name must begin with either a letter or a digit. The empty string102* is not a legal charset name. Charset names are not case-sensitive; that is,103* case is always ignored when comparing charset names. Charset names104* generally follow the conventions documented in <a105* href="http://www.ietf.org/rfc/rfc2278.txt"><i>RFC 2278: IANA Charset106* Registration Procedures</i></a>.107*108* <p> Every charset has a <i>canonical name</i> and may also have one or more109* <i>aliases</i>. The canonical name is returned by the {@link #name() name} method110* of this class. Canonical names are, by convention, usually in upper case.111* The aliases of a charset are returned by the {@link #aliases() aliases}112* method.113*114* <p><a id="hn">Some charsets have an <i>historical name</i> that is defined for115* compatibility with previous versions of the Java platform.</a> A charset's116* historical name is either its canonical name or one of its aliases. The117* historical name is returned by the {@code getEncoding()} methods of the118* {@link java.io.InputStreamReader#getEncoding InputStreamReader} and {@link119* java.io.OutputStreamWriter#getEncoding OutputStreamWriter} classes.120*121* <p><a id="iana"> </a>If a charset listed in the <a122* href="http://www.iana.org/assignments/character-sets"><i>IANA Charset123* Registry</i></a> is supported by an implementation of the Java platform then124* its canonical name must be the name listed in the registry. Many charsets125* are given more than one name in the registry, in which case the registry126* identifies one of the names as <i>MIME-preferred</i>. If a charset has more127* than one registry name then its canonical name must be the MIME-preferred128* name and the other names in the registry must be valid aliases. If a129* supported charset is not listed in the IANA registry then its canonical name130* must begin with one of the strings {@code "X-"} or {@code "x-"}.131*132* <p> The IANA charset registry does change over time, and so the canonical133* name and the aliases of a particular charset may also change over time. To134* ensure compatibility it is recommended that no alias ever be removed from a135* charset, and that if the canonical name of a charset is changed then its136* previous canonical name be made into an alias.137*138*139* <h2><a id="standard">Standard charsets</a></h2>140*141*142* <p> Every implementation of the Java platform is required to support the143* following standard charsets. Consult the release documentation for your144* implementation to see if any other charsets are supported. The behavior145* of such optional charsets may differ between implementations.146*147* <blockquote><table class="striped" style="width:80%">148* <caption style="display:none">Description of standard charsets</caption>149* <thead>150* <tr><th scope="col" style="text-align:left">Charset</th><th scope="col" style="text-align:left">Description</th></tr>151* </thead>152* <tbody>153* <tr><th scope="row" style="vertical-align:top">{@code US-ASCII}</th>154* <td>Seven-bit ASCII, a.k.a. {@code ISO646-US},155* a.k.a. the Basic Latin block of the Unicode character set</td></tr>156* <tr><th scope="row" style="vertical-align:top"><code>ISO-8859-1 </code></th>157* <td>ISO Latin Alphabet No. 1, a.k.a. {@code ISO-LATIN-1}</td></tr>158* <tr><th scope="row" style="vertical-align:top">{@code UTF-8}</th>159* <td>Eight-bit UCS Transformation Format</td></tr>160* <tr><th scope="row" style="vertical-align:top">{@code UTF-16BE}</th>161* <td>Sixteen-bit UCS Transformation Format,162* big-endian byte order</td></tr>163* <tr><th scope="row" style="vertical-align:top">{@code UTF-16LE}</th>164* <td>Sixteen-bit UCS Transformation Format,165* little-endian byte order</td></tr>166* <tr><th scope="row" style="vertical-align:top">{@code UTF-16}</th>167* <td>Sixteen-bit UCS Transformation Format,168* byte order identified by an optional byte-order mark</td></tr>169* </tbody>170* </table></blockquote>171*172* <p> The {@code UTF-8} charset is specified by <a173* href="http://www.ietf.org/rfc/rfc2279.txt"><i>RFC 2279</i></a>; the174* transformation format upon which it is based is specified in175* Amendment 2 of ISO 10646-1 and is also described in the <a176* href="http://www.unicode.org/standard/standard.html"><i>Unicode177* Standard</i></a>.178*179* <p> The {@code UTF-16} charsets are specified by <a180* href="http://www.ietf.org/rfc/rfc2781.txt"><i>RFC 2781</i></a>; the181* transformation formats upon which they are based are specified in182* Amendment 1 of ISO 10646-1 and are also described in the <a183* href="http://www.unicode.org/standard/standard.html"><i>Unicode184* Standard</i></a>.185*186* <p> The {@code UTF-16} charsets use sixteen-bit quantities and are187* therefore sensitive to byte order. In these encodings the byte order of a188* stream may be indicated by an initial <i>byte-order mark</i> represented by189* the Unicode character <code>'\uFEFF'</code>. Byte-order marks are handled190* as follows:191*192* <ul>193*194* <li><p> When decoding, the {@code UTF-16BE} and {@code UTF-16LE}195* charsets interpret the initial byte-order marks as a <small>ZERO-WIDTH196* NON-BREAKING SPACE</small>; when encoding, they do not write197* byte-order marks. </p></li>198*199* <li><p> When decoding, the {@code UTF-16} charset interprets the200* byte-order mark at the beginning of the input stream to indicate the201* byte-order of the stream but defaults to big-endian if there is no202* byte-order mark; when encoding, it uses big-endian byte order and writes203* a big-endian byte-order mark. </p></li>204*205* </ul>206*207* In any case, byte order marks occurring after the first element of an208* input sequence are not omitted since the same code is used to represent209* <small>ZERO-WIDTH NON-BREAKING SPACE</small>.210*211* <p> Every instance of the Java virtual machine has a default charset, which212* may or may not be one of the standard charsets. The default charset is213* determined during virtual-machine startup and typically depends upon the214* locale and charset being used by the underlying operating system. </p>215*216* <p> The {@link StandardCharsets} class defines constants for each of the217* standard charsets.218*219* <h2>Terminology</h2>220*221* <p> The name of this class is taken from the terms used in222* <a href="http://www.ietf.org/rfc/rfc2278.txt"><i>RFC 2278</i></a>.223* In that document a <i>charset</i> is defined as the combination of224* one or more coded character sets and a character-encoding scheme.225* (This definition is confusing; some other software systems define226* <i>charset</i> as a synonym for <i>coded character set</i>.)227*228* <p> A <i>coded character set</i> is a mapping between a set of abstract229* characters and a set of integers. US-ASCII, ISO 8859-1,230* JIS X 0201, and Unicode are examples of coded character sets.231*232* <p> Some standards have defined a <i>character set</i> to be simply a233* set of abstract characters without an associated assigned numbering.234* An alphabet is an example of such a character set. However, the subtle235* distinction between <i>character set</i> and <i>coded character set</i>236* is rarely used in practice; the former has become a short form for the237* latter, including in the Java API specification.238*239* <p> A <i>character-encoding scheme</i> is a mapping between one or more240* coded character sets and a set of octet (eight-bit byte) sequences.241* UTF-8, UTF-16, ISO 2022, and EUC are examples of242* character-encoding schemes. Encoding schemes are often associated with243* a particular coded character set; UTF-8, for example, is used only to244* encode Unicode. Some schemes, however, are associated with multiple245* coded character sets; EUC, for example, can be used to encode246* characters in a variety of Asian coded character sets.247*248* <p> When a coded character set is used exclusively with a single249* character-encoding scheme then the corresponding charset is usually250* named for the coded character set; otherwise a charset is usually named251* for the encoding scheme and, possibly, the locale of the coded252* character sets that it supports. Hence {@code US-ASCII} is both the253* name of a coded character set and of the charset that encodes it, while254* {@code EUC-JP} is the name of the charset that encodes the255* JIS X 0201, JIS X 0208, and JIS X 0212256* coded character sets for the Japanese language.257*258* <p> The native character encoding of the Java programming language is259* UTF-16. A charset in the Java platform therefore defines a mapping260* between sequences of sixteen-bit UTF-16 code units (that is, sequences261* of chars) and sequences of bytes. </p>262*263*264* @author Mark Reinhold265* @author JSR-51 Expert Group266* @since 1.4267*268* @see CharsetDecoder269* @see CharsetEncoder270* @see java.nio.charset.spi.CharsetProvider271* @see java.lang.Character272*/273274public abstract class Charset275implements Comparable<Charset>276{277278/* -- Static methods -- */279280/**281* Checks that the given string is a legal charset name. </p>282*283* @param s284* A purported charset name285*286* @throws IllegalCharsetNameException287* If the given name is not a legal charset name288*/289private static void checkName(String s) {290int n = s.length();291if (n == 0) {292throw new IllegalCharsetNameException(s);293}294for (int i = 0; i < n; i++) {295char c = s.charAt(i);296if (c >= 'A' && c <= 'Z') continue;297if (c >= 'a' && c <= 'z') continue;298if (c >= '0' && c <= '9') continue;299if (c == '-' && i != 0) continue;300if (c == '+' && i != 0) continue;301if (c == ':' && i != 0) continue;302if (c == '_' && i != 0) continue;303if (c == '.' && i != 0) continue;304throw new IllegalCharsetNameException(s);305}306}307308/* The standard set of charsets */309private static final CharsetProvider standardProvider310= new sun.nio.cs.StandardCharsets();311312private static final String[] zeroAliases = new String[0];313314// Cache of the most-recently-returned charsets,315// along with the names that were used to find them316//317private static volatile Object[] cache1; // "Level 1" cache318private static volatile Object[] cache2; // "Level 2" cache319320private static void cache(String charsetName, Charset cs) {321cache2 = cache1;322cache1 = new Object[] { charsetName, cs };323}324325// Creates an iterator that walks over the available providers, ignoring326// those whose lookup or instantiation causes a security exception to be327// thrown. Should be invoked with full privileges.328//329private static Iterator<CharsetProvider> providers() {330return new Iterator<>() {331ClassLoader cl = ClassLoader.getSystemClassLoader();332ServiceLoader<CharsetProvider> sl =333ServiceLoader.load(CharsetProvider.class, cl);334Iterator<CharsetProvider> i = sl.iterator();335CharsetProvider next = null;336337private boolean getNext() {338while (next == null) {339try {340if (!i.hasNext())341return false;342next = i.next();343} catch (ServiceConfigurationError sce) {344if (sce.getCause() instanceof SecurityException) {345// Ignore security exceptions346continue;347}348throw sce;349}350}351return true;352}353354public boolean hasNext() {355return getNext();356}357358public CharsetProvider next() {359if (!getNext())360throw new NoSuchElementException();361CharsetProvider n = next;362next = null;363return n;364}365366public void remove() {367throw new UnsupportedOperationException();368}369370};371}372373// Thread-local gate to prevent recursive provider lookups374private static ThreadLocal<ThreadLocal<?>> gate =375new ThreadLocal<ThreadLocal<?>>();376377@SuppressWarnings("removal")378private static Charset lookupViaProviders(final String charsetName) {379380// The runtime startup sequence looks up standard charsets as a381// consequence of the VM's invocation of System.initializeSystemClass382// in order to, e.g., set system properties and encode filenames. At383// that point the application class loader has not been initialized,384// however, so we can't look for providers because doing so will cause385// that loader to be prematurely initialized with incomplete386// information.387//388if (!VM.isBooted())389return null;390391if (gate.get() != null)392// Avoid recursive provider lookups393return null;394try {395gate.set(gate);396397return AccessController.doPrivileged(398new PrivilegedAction<>() {399public Charset run() {400for (Iterator<CharsetProvider> i = providers();401i.hasNext();) {402CharsetProvider cp = i.next();403Charset cs = cp.charsetForName(charsetName);404if (cs != null)405return cs;406}407return null;408}409});410411} finally {412gate.set(null);413}414}415416/* The extended set of charsets */417private static class ExtendedProviderHolder {418static final CharsetProvider[] extendedProviders = extendedProviders();419// returns ExtendedProvider, if installed420@SuppressWarnings("removal")421private static CharsetProvider[] extendedProviders() {422return AccessController.doPrivileged(new PrivilegedAction<>() {423public CharsetProvider[] run() {424CharsetProvider[] cps = new CharsetProvider[1];425int n = 0;426ServiceLoader<CharsetProvider> sl =427ServiceLoader.loadInstalled(CharsetProvider.class);428for (CharsetProvider cp : sl) {429if (n + 1 > cps.length) {430cps = Arrays.copyOf(cps, cps.length << 1);431}432cps[n++] = cp;433}434return n == cps.length ? cps : Arrays.copyOf(cps, n);435}});436}437}438439private static Charset lookupExtendedCharset(String charsetName) {440if (!VM.isBooted()) // see lookupViaProviders()441return null;442CharsetProvider[] ecps = ExtendedProviderHolder.extendedProviders;443for (CharsetProvider cp : ecps) {444Charset cs = cp.charsetForName(charsetName);445if (cs != null)446return cs;447}448return null;449}450451private static Charset lookup(String charsetName) {452if (charsetName == null)453throw new IllegalArgumentException("Null charset name");454Object[] a;455if ((a = cache1) != null && charsetName.equals(a[0]))456return (Charset)a[1];457// We expect most programs to use one Charset repeatedly.458// We convey a hint to this effect to the VM by putting the459// level 1 cache miss code in a separate method.460return lookup2(charsetName);461}462463private static Charset lookup2(String charsetName) {464Object[] a;465if ((a = cache2) != null && charsetName.equals(a[0])) {466cache2 = cache1;467cache1 = a;468return (Charset)a[1];469}470Charset cs;471if ((cs = standardProvider.charsetForName(charsetName)) != null ||472(cs = lookupExtendedCharset(charsetName)) != null ||473(cs = lookupViaProviders(charsetName)) != null)474{475cache(charsetName, cs);476return cs;477}478479/* Only need to check the name if we didn't find a charset for it */480checkName(charsetName);481return null;482}483484/**485* Tells whether the named charset is supported.486*487* @param charsetName488* The name of the requested charset; may be either489* a canonical name or an alias490*491* @return {@code true} if, and only if, support for the named charset492* is available in the current Java virtual machine493*494* @throws IllegalCharsetNameException495* If the given charset name is illegal496*497* @throws IllegalArgumentException498* If the given {@code charsetName} is null499*/500public static boolean isSupported(String charsetName) {501return (lookup(charsetName) != null);502}503504/**505* Returns a charset object for the named charset.506*507* @param charsetName508* The name of the requested charset; may be either509* a canonical name or an alias510*511* @return A charset object for the named charset512*513* @throws IllegalCharsetNameException514* If the given charset name is illegal515*516* @throws IllegalArgumentException517* If the given {@code charsetName} is null518*519* @throws UnsupportedCharsetException520* If no support for the named charset is available521* in this instance of the Java virtual machine522*/523public static Charset forName(String charsetName) {524Charset cs = lookup(charsetName);525if (cs != null)526return cs;527throw new UnsupportedCharsetException(charsetName);528}529530// Fold charsets from the given iterator into the given map, ignoring531// charsets whose names already have entries in the map.532//533private static void put(Iterator<Charset> i, Map<String,Charset> m) {534while (i.hasNext()) {535Charset cs = i.next();536if (!m.containsKey(cs.name()))537m.put(cs.name(), cs);538}539}540541/**542* Constructs a sorted map from canonical charset names to charset objects.543*544* <p> The map returned by this method will have one entry for each charset545* for which support is available in the current Java virtual machine. If546* two or more supported charsets have the same canonical name then the547* resulting map will contain just one of them; which one it will contain548* is not specified. </p>549*550* <p> The invocation of this method, and the subsequent use of the551* resulting map, may cause time-consuming disk or network I/O operations552* to occur. This method is provided for applications that need to553* enumerate all of the available charsets, for example to allow user554* charset selection. This method is not used by the {@link #forName555* forName} method, which instead employs an efficient incremental lookup556* algorithm.557*558* <p> This method may return different results at different times if new559* charset providers are dynamically made available to the current Java560* virtual machine. In the absence of such changes, the charsets returned561* by this method are exactly those that can be retrieved via the {@link562* #forName forName} method. </p>563*564* @return An immutable, case-insensitive map from canonical charset names565* to charset objects566*/567@SuppressWarnings("removal")568public static SortedMap<String,Charset> availableCharsets() {569return AccessController.doPrivileged(570new PrivilegedAction<>() {571public SortedMap<String,Charset> run() {572TreeMap<String,Charset> m =573new TreeMap<>(574String.CASE_INSENSITIVE_ORDER);575put(standardProvider.charsets(), m);576CharsetProvider[] ecps = ExtendedProviderHolder.extendedProviders;577for (CharsetProvider ecp :ecps) {578put(ecp.charsets(), m);579}580for (Iterator<CharsetProvider> i = providers(); i.hasNext();) {581CharsetProvider cp = i.next();582put(cp.charsets(), m);583}584return Collections.unmodifiableSortedMap(m);585}586});587}588589private static volatile Charset defaultCharset;590591/**592* Returns the default charset of this Java virtual machine.593*594* <p> The default charset is determined during virtual-machine startup and595* typically depends upon the locale and charset of the underlying596* operating system.597*598* @return A charset object for the default charset599*600* @since 1.5601*/602public static Charset defaultCharset() {603if (defaultCharset == null) {604synchronized (Charset.class) {605String csn = GetPropertyAction606.privilegedGetProperty("file.encoding");607Charset cs = lookup(csn);608if (cs != null)609defaultCharset = cs;610else611defaultCharset = sun.nio.cs.UTF_8.INSTANCE;612}613}614return defaultCharset;615}616617618/* -- Instance fields and methods -- */619620private final String name; // tickles a bug in oldjavac621private final String[] aliases; // tickles a bug in oldjavac622private Set<String> aliasSet = null;623624/**625* Initializes a new charset with the given canonical name and alias626* set.627*628* @param canonicalName629* The canonical name of this charset630*631* @param aliases632* An array of this charset's aliases, or null if it has no aliases633*634* @throws IllegalCharsetNameException635* If the canonical name or any of the aliases are illegal636*/637protected Charset(String canonicalName, String[] aliases) {638String[] as = Objects.requireNonNullElse(aliases, zeroAliases);639640// Skip checks for the standard, built-in Charsets we always load641// during initialization.642if (canonicalName != "ISO-8859-1"643&& canonicalName != "US-ASCII"644&& canonicalName != "UTF-8") {645checkName(canonicalName);646for (int i = 0; i < as.length; i++) {647checkName(as[i]);648}649}650this.name = canonicalName;651this.aliases = as;652}653654/**655* Returns this charset's canonical name.656*657* @return The canonical name of this charset658*/659public final String name() {660return name;661}662663/**664* Returns a set containing this charset's aliases.665*666* @return An immutable set of this charset's aliases667*/668public final Set<String> aliases() {669if (aliasSet != null)670return aliasSet;671int n = aliases.length;672HashSet<String> hs = new HashSet<>(n);673for (int i = 0; i < n; i++)674hs.add(aliases[i]);675aliasSet = Collections.unmodifiableSet(hs);676return aliasSet;677}678679/**680* Returns this charset's human-readable name for the default locale.681*682* <p> The default implementation of this method simply returns this683* charset's canonical name. Concrete subclasses of this class may684* override this method in order to provide a localized display name. </p>685*686* @return The display name of this charset in the default locale687*/688public String displayName() {689return name;690}691692/**693* Tells whether or not this charset is registered in the <a694* href="http://www.iana.org/assignments/character-sets">IANA Charset695* Registry</a>.696*697* @return {@code true} if, and only if, this charset is known by its698* implementor to be registered with the IANA699*/700public final boolean isRegistered() {701return !name.startsWith("X-") && !name.startsWith("x-");702}703704/**705* Returns this charset's human-readable name for the given locale.706*707* <p> The default implementation of this method simply returns this708* charset's canonical name. Concrete subclasses of this class may709* override this method in order to provide a localized display name. </p>710*711* @param locale712* The locale for which the display name is to be retrieved713*714* @return The display name of this charset in the given locale715*/716public String displayName(Locale locale) {717return name;718}719720/**721* Tells whether or not this charset contains the given charset.722*723* <p> A charset <i>C</i> is said to <i>contain</i> a charset <i>D</i> if,724* and only if, every character representable in <i>D</i> is also725* representable in <i>C</i>. If this relationship holds then it is726* guaranteed that every string that can be encoded in <i>D</i> can also be727* encoded in <i>C</i> without performing any replacements.728*729* <p> That <i>C</i> contains <i>D</i> does not imply that each character730* representable in <i>C</i> by a particular byte sequence is represented731* in <i>D</i> by the same byte sequence, although sometimes this is the732* case.733*734* <p> Every charset contains itself.735*736* <p> This method computes an approximation of the containment relation:737* If it returns {@code true} then the given charset is known to be738* contained by this charset; if it returns {@code false}, however, then739* it is not necessarily the case that the given charset is not contained740* in this charset.741*742* @param cs743* The given charset744*745* @return {@code true} if the given charset is contained in this charset746*/747public abstract boolean contains(Charset cs);748749/**750* Constructs a new decoder for this charset.751*752* @return A new decoder for this charset753*/754public abstract CharsetDecoder newDecoder();755756/**757* Constructs a new encoder for this charset.758*759* @return A new encoder for this charset760*761* @throws UnsupportedOperationException762* If this charset does not support encoding763*/764public abstract CharsetEncoder newEncoder();765766/**767* Tells whether or not this charset supports encoding.768*769* <p> Nearly all charsets support encoding. The primary exceptions are770* special-purpose <i>auto-detect</i> charsets whose decoders can determine771* which of several possible encoding schemes is in use by examining the772* input byte sequence. Such charsets do not support encoding because773* there is no way to determine which encoding should be used on output.774* Implementations of such charsets should override this method to return775* {@code false}. </p>776*777* @return {@code true} if, and only if, this charset supports encoding778*/779public boolean canEncode() {780return true;781}782783/**784* Convenience method that decodes bytes in this charset into Unicode785* characters.786*787* <p> An invocation of this method upon a charset {@code cs} returns the788* same result as the expression789*790* <pre>791* cs.newDecoder()792* .onMalformedInput(CodingErrorAction.REPLACE)793* .onUnmappableCharacter(CodingErrorAction.REPLACE)794* .decode(bb); </pre>795*796* except that it is potentially more efficient because it can cache797* decoders between successive invocations.798*799* <p> This method always replaces malformed-input and unmappable-character800* sequences with this charset's default replacement byte array. In order801* to detect such sequences, use the {@link802* CharsetDecoder#decode(java.nio.ByteBuffer)} method directly. </p>803*804* @param bb The byte buffer to be decoded805*806* @return A char buffer containing the decoded characters807*/808public final CharBuffer decode(ByteBuffer bb) {809try {810return ThreadLocalCoders.decoderFor(this)811.onMalformedInput(CodingErrorAction.REPLACE)812.onUnmappableCharacter(CodingErrorAction.REPLACE)813.decode(bb);814} catch (CharacterCodingException x) {815throw new Error(x); // Can't happen816}817}818819/**820* Convenience method that encodes Unicode characters into bytes in this821* charset.822*823* <p> An invocation of this method upon a charset {@code cs} returns the824* same result as the expression825*826* <pre>827* cs.newEncoder()828* .onMalformedInput(CodingErrorAction.REPLACE)829* .onUnmappableCharacter(CodingErrorAction.REPLACE)830* .encode(bb); </pre>831*832* except that it is potentially more efficient because it can cache833* encoders between successive invocations.834*835* <p> This method always replaces malformed-input and unmappable-character836* sequences with this charset's default replacement string. In order to837* detect such sequences, use the {@link838* CharsetEncoder#encode(java.nio.CharBuffer)} method directly. </p>839*840* @param cb The char buffer to be encoded841*842* @return A byte buffer containing the encoded characters843*/844public final ByteBuffer encode(CharBuffer cb) {845try {846return ThreadLocalCoders.encoderFor(this)847.onMalformedInput(CodingErrorAction.REPLACE)848.onUnmappableCharacter(CodingErrorAction.REPLACE)849.encode(cb);850} catch (CharacterCodingException x) {851throw new Error(x); // Can't happen852}853}854855/**856* Convenience method that encodes a string into bytes in this charset.857*858* <p> An invocation of this method upon a charset {@code cs} returns the859* same result as the expression860*861* <pre>862* cs.encode(CharBuffer.wrap(s)); </pre>863*864* @param str The string to be encoded865*866* @return A byte buffer containing the encoded characters867*/868public final ByteBuffer encode(String str) {869return encode(CharBuffer.wrap(str));870}871872/**873* Compares this charset to another.874*875* <p> Charsets are ordered by their canonical names, without regard to876* case. </p>877*878* @param that879* The charset to which this charset is to be compared880*881* @return A negative integer, zero, or a positive integer as this charset882* is less than, equal to, or greater than the specified charset883*/884public final int compareTo(Charset that) {885return (name().compareToIgnoreCase(that.name()));886}887888/**889* Computes a hashcode for this charset.890*891* @return An integer hashcode892*/893public final int hashCode() {894return name().hashCode();895}896897/**898* Tells whether or not this object is equal to another.899*900* <p> Two charsets are equal if, and only if, they have the same canonical901* names. A charset is never equal to any other type of object. </p>902*903* @return {@code true} if, and only if, this charset is equal to the904* given object905*/906public final boolean equals(Object ob) {907if (!(ob instanceof Charset))908return false;909if (this == ob)910return true;911return name.equals(((Charset)ob).name());912}913914/**915* Returns a string describing this charset.916*917* @return A string describing this charset918*/919public final String toString() {920return name();921}922923}924925926