Misplaced Pages

Unicode alias names and abbreviations

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Names and aliases of Unicode characters

In Unicode, characters can have a unique name. A character can also have one or more alias names. An alias name can be an abbreviation, a C0 or C1 control name, a correction, an alternate name or a figment. An alias too is unique over all names and aliases, and therefore identifying.

Background

The formal, primary Unicode name is unique over all names, only uses certain characters & format, and is guaranteed never to change. The formal name consists of characters A–Z (uppercase), 0–9, " " (space), and "-" (hyphen). Next to this name, a character can have one or more formal (normative) alias names. Such an alias name also follows the rules of a name: characters used (A-Z, -, 0-9, <space>) and not used (a-z, %, $, etc.). Alias names are also unique in the full name set (that is, all names and alias names are all unique in their combined set). Alias names are formally described in the Unicode Standard. In this sense, an abbreviation is also considered a Unicode name.

Reason to add an alias

There are five possible reasons to assign an alias name to a code point. A character can have multiple aliases: for example U+0008 <control-0008> has control alias BACKSPACE and abbreviation alias BS.

1. Abbreviation
Commonly occurring abbreviations (or acronyms) for control codes, format characters, spaces, and variation selectors.
There are 354 such aliases, including 256 aliases for variant selectors (VS-1 ... VS-256).
For example, U+00A0   NO-BREAK SPACE has alias NBSP.
Presentation: in the code charts, the abbreviation is shown in a dashed box: NBSP.
2. Control
ISO 6429 names for C0 and C1 control functions and similar commonly occurring names, are added as an alias to the character.
There are 84 such aliases.
For example, U+0008 <control-0008> has alias BACKSPACE.
Presentation: Control characters do not have a primary name, they are labeled like <control-0008>. Its alias name like BACKSPACE is used in the chart documentation, but never as a primary name. This prevents unintended (automated) replacement by the actual, disrupting control character. For example, using alias name BEL in line would be replaced by U+0007 <control-0007> , triggering the bell sound.
3. Correction
This is a correction for a "serious problem" in the primary character name, usually an error.
There are 35 such aliases.
For example, U+2118 ℘ SCRIPT CAPITAL P is actually a lowercase p, and so is given alias name ※ WEIERSTRASS ELLIPTIC FUNCTION: "actually this has the form of a lowercase calligraphic p, despite its name, and through the alias the correct spelling is added."
Presentation: A corrected name is preceded by symbol ※ (the reference mark).
4. Alternate
For widely used alternate name for a character.
There is 1 such alias.
Example: U+FEFF ZERO WIDTH NO-BREAK SPACE has alternate BYTE ORDER MARK.
Presentation: listed in character charts description.
5. Figment
Several documented labels for C1 control code points which were never actually approved in any standard (figment = feigned, in fiction).
There are 3 such aliases.
For example, U+0099 <control-0099> has figment alias SINGLE GRAPHIC CHARACTER INTRODUCER. This name is an architectural concept from early drafts of ISO/IEC 10646-1, but it was never approved and standardized.
Presentation: These figment abbreviations are not published in Standard; the chart shows "XXX" for each informally, that is: not a unique or identifying abbreviation.

List of aliases

Code point HTML
decimal
Name
or <label>
Alias Reason Chart Note
Abbr Name
U+0000 &#0; <control-0000> NUL NULL Control C0 Controls and Basic Latin (pdf)
U+0001 &#1; <control-0001> SOH START OF HEADING Control C0 Controls and Basic Latin (pdf)
U+0002 &#2; <control-0002> STX START OF TEXT Control C0 Controls and Basic Latin (pdf)
U+0003 &#3; <control-0003> ETX END OF TEXT Control C0 Controls and Basic Latin (pdf)
U+0004 &#4; <control-0004> EOT END OF TRANSMISSION Control C0 Controls and Basic Latin (pdf)
U+0005 &#5; <control-0005> ENQ ENQUIRY Control C0 Controls and Basic Latin (pdf)
U+0006 &#6; <control-0006> ACK ACKNOWLEDGE Control C0 Controls and Basic Latin (pdf)
U+0007 &#7; <control-0007> BEL ALERT Control C0 Controls and Basic Latin (pdf)
U+0008 &#8; <control-0008> BS BACKSPACE Control C0 Controls and Basic Latin (pdf)
U+0009 &Tab;
&#9;
<control-0009> TAB CHARACTER TABULATION Control C0 Controls and Basic Latin (pdf)
HT HORIZONTAL TABULATION Control
U+000A &#10; <control-000A> LF LINE FEED Control C0 Controls and Basic Latin (pdf)
NL NEW LINE Control
EOL END OF LINE Control
U+000B &#11; <control-000B> LINE TABULATION Control C0 Controls and Basic Latin (pdf)
VT VERTICAL TABULATION Control
U+000C &#12; <control-000C> FF FORM FEED Control C0 Controls and Basic Latin (pdf)
U+000D &#13; <control-000D> CR CARRIAGE RETURN Control C0 Controls and Basic Latin (pdf)
U+000E &#14; <control-000E> SO SHIFT OUT Control C0 Controls and Basic Latin (pdf)
LOCKING-SHIFT ONE Control
U+000F &#15; <control-000F> SI SHIFT IN Control C0 Controls and Basic Latin (pdf)
LOCKING-SHIFT ZERO Control
U+0010 &#16; <control-0010> DLE DATA LINK ESCAPE Control C0 Controls and Basic Latin (pdf)
U+0011 &#17; <control-0011> DC1 DEVICE CONTROL ONE Control C0 Controls and Basic Latin (pdf)
U+0012 &#18; <control-0012> DC2 DEVICE CONTROL TWO Control C0 Controls and Basic Latin (pdf)
U+0013 &#19; <control-0013> DC3 DEVICE CONTROL THREE Control C0 Controls and Basic Latin (pdf)
U+0014 &#20; <control-0014> DC4 DEVICE CONTROL FOUR Control C0 Controls and Basic Latin (pdf)
U+0015 &#21; <control-0015> NAK NEGATIVE ACKNOWLEDGE Control C0 Controls and Basic Latin (pdf)
U+0016 &#22; <control-0016> SYN SYNCHRONOUS IDLE Control C0 Controls and Basic Latin (pdf)
U+0017 &#23; <control-0017> ETB END OF TRANSMISSION BLOCK Control C0 Controls and Basic Latin (pdf)
U+0018 &#24; <control-0018> CAN CANCEL Control C0 Controls and Basic Latin (pdf)
U+0019 &#25; <control-0019> EOM END OF MEDIUM Control C0 Controls and Basic Latin (pdf)
EM Abbreviation added in version 15.0
U+001A &#26; <control-001A> SUB SUBSTITUTE Control C0 Controls and Basic Latin (pdf)
U+001B &#27; <control-001B> ESC ESCAPE Control C0 Controls and Basic Latin (pdf)
U+001C &#28; <control-001C> INFORMATION SEPARATOR FOUR Control C0 Controls and Basic Latin (pdf)
FS FILE SEPARATOR Control
U+001D &#29; <control-001D> INFORMATION SEPARATOR THREE Control C0 Controls and Basic Latin (pdf)
GS GROUP SEPARATOR Control
U+001E &#30; <control-001E> INFORMATION SEPARATOR TWO Control C0 Controls and Basic Latin (pdf)
RS RECORD SEPARATOR Control
U+001F &#31; <control-001F> INFORMATION SEPARATOR ONE Control C0 Controls and Basic Latin (pdf)
US UNIT SEPARATOR Control
U+0020 &#32; SPACE SP Abbreviation C0 Controls and Basic Latin (pdf)
U+007F &#127; <control-007F> DEL DELETE Control C0 Controls and Basic Latin (pdf)
U+0080 &#128; <control-0080> PAD PADDING CHARACTER Figment C1 Controls and Latin-1 Supplement (pdf) Aliases are not widely published by Unicode; chart shows non-unique XXX
U+0081 &#129; <control-0081> HOP HIGH OCTET PRESET Figment C1 Controls and Latin-1 Supplement (pdf) Aliases are not widely published by Unicode; chart shows non-unique XXX
U+0082 &#130; <control-0082> BPH BREAK PERMITTED HERE Control C1 Controls and Latin-1 Supplement (pdf)
U+0083 &#131; <control-0083> NBH NO BREAK HERE Control C1 Controls and Latin-1 Supplement (pdf)
U+0084 &#132; <control-0084> IND INDEX Control C1 Controls and Latin-1 Supplement (pdf)
U+0085 &#133; <control-0085> NEL NEXT LINE Control C1 Controls and Latin-1 Supplement (pdf)
U+0086 &#134; <control-0086> SSA START OF SELECTED AREA Control C1 Controls and Latin-1 Supplement (pdf)
U+0087 &#135; <control-0087> ESA END OF SELECTED AREA Control C1 Controls and Latin-1 Supplement (pdf)
U+0088 &#136; <control-0088> CHARACTER TABULATION SET Control C1 Controls and Latin-1 Supplement (pdf)
HTS HORIZONTAL TABULATION SET Control
U+0089 &#137; <control-0089> CHARACTER TABULATION WITH JUSTIFICATION Control C1 Controls and Latin-1 Supplement (pdf)
HTJ HORIZONTAL TABULATION WITH JUSTIFICATION Control
U+008A &#138; <control-008A> LINE TABULATION SET Control C1 Controls and Latin-1 Supplement (pdf)
VTS VERTICAL TABULATION SET Control
U+008B &#139; <control-008B> PARTIAL LINE FORWARD Control C1 Controls and Latin-1 Supplement (pdf)
PLD PARTIAL LINE DOWN Control
U+008C &#140; <control-008C> PARTIAL LINE BACKWARD Control C1 Controls and Latin-1 Supplement (pdf)
PLU PARTIAL LINE UP Control
U+008D &#141; <control-008D> REVERSE LINE FEED Control C1 Controls and Latin-1 Supplement (pdf)
RI REVERSE INDEX Control
U+008E &#142; <control-008E> SINGLE SHIFT TWO Control C1 Controls and Latin-1 Supplement (pdf)
SS2 SINGLE-SHIFT-2 Control
U+008F &#143; <control-008F> SINGLE SHIFT THREE Control C1 Controls and Latin-1 Supplement (pdf)
SS3 SINGLE-SHIFT-3 Control
U+0090 &#144; <control-0090> DCS DEVICE CONTROL STRING Control C1 Controls and Latin-1 Supplement (pdf)
U+0091 &#145; <control-0091> PRIVATE USE ONE Control C1 Controls and Latin-1 Supplement (pdf)
PU1 PRIVATE USE-1 Control
U+0092 &#146; <control-0092> PRIVATE USE TWO Control C1 Controls and Latin-1 Supplement (pdf)
PU2 PRIVATE USE-2 Control
U+0093 &#147; <control-0093> STS SET TRANSMIT STATE Control C1 Controls and Latin-1 Supplement (pdf)
U+0094 &#148; <control-0094> CCH CANCEL CHARACTER Control C1 Controls and Latin-1 Supplement (pdf)
U+0095 &#149; <control-0095> MW MESSAGE WAITING Control C1 Controls and Latin-1 Supplement (pdf)
U+0096 &#150; <control-0096> START OF GUARDED AREA Control C1 Controls and Latin-1 Supplement (pdf)
SPA START OF PROTECTED AREA Control
U+0097 &#151; <control-0097> END OF GUARDED AREA Control C1 Controls and Latin-1 Supplement (pdf)
EPA END OF PROTECTED AREA Control
U+0098 &#152; <control-0098> SOS START OF STRING Control C1 Controls and Latin-1 Supplement (pdf)
U+0099 &#153; <control-0099> SGC SINGLE GRAPHIC CHARACTER INTRODUCER Figment C1 Controls and Latin-1 Supplement (pdf) Aliases are not widely published by Unicode; chart shows non-unique XXX
U+009A &#154; <control-009A> SCI SINGLE CHARACTER INTRODUCER Control C1 Controls and Latin-1 Supplement (pdf)
U+009B &#155; <control-009B> CSI CONTROL SEQUENCE INTRODUCER Control C1 Controls and Latin-1 Supplement (pdf)
U+009C &#156; <control-009C> ST STRING TERMINATOR Control C1 Controls and Latin-1 Supplement (pdf)
U+009D &#157; <control-009D> OSC OPERATING SYSTEM COMMAND Control C1 Controls and Latin-1 Supplement (pdf)
U+009E &#158; <control-009E> PM PRIVACY MESSAGE Control C1 Controls and Latin-1 Supplement (pdf)
U+009F &#159; <control-009F> APC APPLICATION PROGRAM COMMAND Control C1 Controls and Latin-1 Supplement (pdf)
U+00A0 &nbsp; &NonBreakingSpace;
&#160;
NO-BREAK SPACE NBSP Abbreviation C1 Controls and Latin-1 Supplement (pdf)
U+00AD &shy;
&#173;
SOFT HYPHEN SHY Abbreviation C1 Controls and Latin-1 Supplement (pdf)
U+01A2 &#418; LATIN CAPITAL LETTER OI LATIN CAPITAL LETTER GHA ※ Correction Latin Extended-B (pdf)
U+01A3 &#419; LATIN SMALL LETTER OI LATIN SMALL LETTER GHA ※ Correction Latin Extended-B (pdf)
U+034F &#847; COMBINING GRAPHEME JOINER CGJ Abbreviation Combining Diacritical Marks (pdf) The name of this character is misleading; it does not actually join graphemes
U+0616 &#1558; ARABIC SMALL HIGH LIGATURE ALEF WITH LAM WITH YEH ARABIC SMALL HIGH LIGATURE ALEF WITH YEH BARREE ※ Correction Arabic  added in version 15.0
U+061C &#1564; ARABIC LETTER MARK ALM Abbreviation Arabic (pdf) See RLM
U+0709 &#1801; SYRIAC SUBLINEAR COLON SKEWED RIGHT SYRIAC SUBLINEAR COLON SKEWED LEFT ※ Correction Syriac (pdf)
U+0CDE &#3294; KANNADA LETTER FA KANNADA LETTER LLLA ※ Correction Kannada (pdf)
U+0E9D &#3741; LAO LETTER FO TAM LAO LETTER FO FON ※ Correction Lao (pdf)
U+0E9F &#3743; LAO LETTER FO SUNG LAO LETTER FO FAY ※ Correction Lao (pdf)
U+0EA3 &#3747; LAO LETTER LO LING LAO LETTER RO ※ Correction Lao (pdf)
U+0EA5 &#3749; LAO LETTER LO LOOT LAO LETTER LO ※ Correction Lao (pdf)
U+0FD0 &#4048; TIBETAN MARK BSKA- SHOG GI MGO RGYAN TIBETAN MARK BKA- SHOG GI MGO RGYAN ※ Correction Tibetan (pdf)
U+11EC &#4588; HANGUL JONGSEONG IEUNG-KIYEOK HANGUL JONGSEONG YESIEUNG-KIYEOK ※ Correction Hangul Jamo (pdf)
U+11ED &#4589; HANGUL JONGSEONG IEUNG-SSANGKIYEOK HANGUL JONGSEONG YESIEUNG-SSANGKIYEOK ※ Correction Hangul Jamo (pdf)
U+11EE &#4590; HANGUL JONGSEONG SSANGIEUNG HANGUL JONGSEONG SSANGYESIEUNG ※ Correction Hangul Jamo (pdf)
U+11EF &#4591; HANGUL JONGSEONG IEUNG-KHIEUKH HANGUL JONGSEONG YESIEUNG-KHIEUKH ※ Correction Hangul Jamo (pdf)
U+180B &#6155; MONGOLIAN FREE VARIATION SELECTOR ONE FVS1 Abbreviation Mongolian (pdf)
U+180C &#6156; MONGOLIAN FREE VARIATION SELECTOR TWO FVS2 Abbreviation Mongolian (pdf)
U+180D &#6157; MONGOLIAN FREE VARIATION SELECTOR THREE FVS3 Abbreviation Mongolian (pdf)
U+180E &#6158; MONGOLIAN VOWEL SEPARATOR MVS Abbreviation Mongolian (pdf)
U+180F &#6159; MONGOLIAN FREE VARIATION SELECTOR FOUR FVS4 Abbreviation Mongolian (pdf)
U+1BBD &#7101; SUNDANESE LETTER BHA SUNDANESE LETTER ARCHAIC I ※ Correction Sudanese (pdf) added in version 15.0
U+200B &NegativeMediumSpace; &NegativeThickSpace; &NegativeThinSpace; &NegativeVeryThinSpace; &ZeroWidthSpace;
&#8203;
ZERO WIDTH SPACE ZWSP Abbreviation General Punctuation (pdf)
U+200C &zwnj;
&#8204;
ZERO WIDTH NON-JOINER ZWNJ Abbreviation General Punctuation (pdf)
U+200D &zwj;
&#8205;
ZERO WIDTH JOINER ZWJ Abbreviation General Punctuation (pdf)
U+200E &lrm;
&#8206;
LEFT-TO-RIGHT MARK LRM Abbreviation General Punctuation (pdf)
U+200F &rlm;
&#8207;
RIGHT-TO-LEFT MARK RLM Abbreviation General Punctuation (pdf)
U+202A &#8234; LEFT-TO-RIGHT EMBEDDING LRE Abbreviation General Punctuation (pdf)
U+202B &#8235; RIGHT-TO-LEFT EMBEDDING RLE Abbreviation General Punctuation (pdf)
U+202C &#8236; POP DIRECTIONAL FORMATTING PDF Abbreviation General Punctuation (pdf)
U+202D &#8237; LEFT-TO-RIGHT OVERRIDE LRO Abbreviation General Punctuation (pdf)
U+202E &#8238; RIGHT-TO-LEFT OVERRIDE RLO Abbreviation General Punctuation (pdf)
U+202F &#8239; NARROW NO-BREAK SPACE NNBSP Abbreviation General Punctuation (pdf)
U+205F &MediumSpace;
&#8287;
MEDIUM MATHEMATICAL SPACE MMSP Abbreviation General Punctuation (pdf)
U+2060 &NoBreak;
&#8288;
WORD JOINER WJ Abbreviation General Punctuation (pdf)
U+2066 &#8294; LEFT-TO-RIGHT ISOLATE LRI Abbreviation General Punctuation (pdf)
U+2067 &#8295; RIGHT-TO-LEFT ISOLATE RLI Abbreviation General Punctuation (pdf)
U+2068 &#8296; FIRST STRONG ISOLATE FSI Abbreviation General Punctuation (pdf)
U+2069 &#8297; POP DIRECTIONAL ISOLATE PDI Abbreviation General Punctuation (pdf)
U+2118 &weierp; &wp;
&#8472;
SCRIPT CAPITAL P WEIERSTRASS ELLIPTIC FUNCTION ※ Correction Letterlike Symbols (pdf)
U+2448 &#9288; OCR DASH MICR ON US SYMBOL ※ Correction Optical Character Recognition (pdf)
U+2449 &#9289; OCR CUSTOMER ACCOUNT NUMBER MICR DASH SYMBOL ※ Correction Optical Character Recognition (pdf)
U+2B7A &#11130; LEFTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE LEFTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE VERTICAL STROKE ※ Correction Miscellaneous Symbols and Arrows (pdf)
U+2B7C &#11132; RIGHTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE RIGHTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE VERTICAL STROKE ※ Correction Miscellaneous Symbols and Arrows (pdf)
U+A015 &#40981; YI SYLLABLE WU YI SYLLABLE ITERATION MARK ※ Correction Yi Syllables (pdf)
U+AA6E &#43630; MYANMAR LETTER KHAMTI HHA MYANMAR LETTER KHAMTI LLA ※ Correction Myanmar Extended-A (pdf)
U+FE00
...U+FE0F
&#65024;...&#65039; VARIATION SELECTOR-1...VARIATION SELECTOR-16 VS1...VS16 Abbreviation Variation Selectors (pdf)
(16 code points)
Abbreviation
U+FE18 &#65048; PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET ※ Correction Vertical Forms (pdf)
U+FEFF &#65279; ZERO WIDTH NO-BREAK SPACE BOM BYTE ORDER MARK Alternate Arabic Presentation Forms-B (pdf)
ZWNBSP Abbreviation
U+122D4 &#74452; CUNEIFORM SIGN SHIR TENU CUNEIFORM SIGN NU11 TENU ※ Correction Cuneiform (pdf)
U+122D5 &#74453; CUNEIFORM SIGN SHIR OVER SHIR BUR OVER BUR CUNEIFORM SIGN NU11 OVER NU11 BUR OVER BUR ※ Correction Cuneiform (pdf)
U+12327 &#74535; CUNEIFORM SIGN UN GUNU CUNEIFORM SIGN KALAM ※ Correction Cuneiform (pdf)
U+1680B &#92171; BAMUM LETTER PHASE-A MAEMBGBIEE BAMUM LETTER PHASE-A MAEMGBIEE ※ Correction Bamum Supplement (pdf)
U+16E56 &#93782; MEDEFAIDRIN CAPITAL LETTER HP MEDEFAIDRIN CAPITAL LETTER H ※ Correction Medefaidrin (pdf)
U+16E57 &#93783; MEDEFAIDRIN CAPITAL LETTER NY MEDEFAIDRIN CAPITAL LETTER NG ※ Correction Medefaidrin (pdf)
U+16E76 &#93814; MEDEFAIDRIN SMALL LETTER HP MEDEFAIDRIN SMALL LETTER H ※ Correction Medefaidrin (pdf)
U+16E77 &#93815; MEDEFAIDRIN SMALL LETTER NY MEDEFAIDRIN SMALL LETTER NG ※ Correction Medefaidrin (pdf)
U+1B001 &#110593; HIRAGANA LETTER ARCHAIC YE HENTAIGANA LETTER E-1 ※ Correction Kana Supplement (pdf)
U+1D0C5 &#118981; BYZANTINE MUSICAL SYMBOL FHTORA SKLIRON CHROMA VASIS BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS ※ Correction Byzantine Musical Symbols (pdf)
U+1E899 &#125081; MENDE KIKAKUI SYLLABLE M172 MBOO MENDE KIKAKUI SYLLABLE M172 MBO ※ Correction Mende Kikakui (pdf)
U+1E89A &#125082; MENDE KIKAKUI SYLLABLE M174 MBO MENDE KIKAKUI SYLLABLE M174 MBOO ※ Correction Mende Kikakui (pdf)
U+E0100
...U+E01EF
&#917760;...&#917999; VARIATION SELECTOR-17...VARIATION SELECTOR-256 VS17...VS256 Abbreviation Variation Selectors Supplement (pdf)
(240 code points)
Abbreviation

Informal alternative names

The Unicode standard also uses and publishes alternative names that are not formal, and are not listed as normative alias names. These labels may not be unique and may use irregular characters in their name. They are used in Unicode code charts, for example U+070F   SYRIAC ABBREVIATION MARK: SAM.

See also

References

  1. ^ "NameAliases.txt". The Unicode Consortium. 2024-04-24. Retrieved 2024-09-11.
  2. "The Unicode Standard". The Unicode Consortium.
  3. "Unicode 14.0 Character Code Charts: Syriac" (PDF).
Unicode
Unicode
Code points
Characters
Special purpose
Lists
Processing
Algorithms
Comparison of encodings
On pairs of
code points
Usage
Related standards
Related topics
Scripts and symbols in Unicode
Common and
inherited scripts
Modern scripts
Ancient and
historic scripts
Notational scripts
Symbols, emojis
Categories: