|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Unicode® 17.0.02025 September 9 (Announcement)STATUS: This is a preliminary draft page for an upcoming release. Some details may be missing or incorrect, and some links may be wrong or broken. During the alpha review period, errors are expected and feedback is not necessary. During the beta review period, feedback about errors on this page will be helpful and appreciated.This page summarizes the important changes for the Unicode Standard, Version 17.0.0. This version supersedes all previous versions of the Unicode Standard.
A. SummaryUnicode 17.0 adds 4803 characters, for a total of 159,801 characters. The new additions include 4 new scripts:
New Data Files for Unicode 17.0
SynchronizationSeveral other important Unicode specifications have been updated for Version 17.0. The following four Unicode Technical Standards are versioned in synchrony with the Unicode Standard, because their data files cover the same repertoire. All have been updated to Version 17.0:
Some of the changes in Version 17.0 and associated Unicode Technical Standards may require modifications to implementations. For more information, see the migration and modification sections of UTS #10, UTS #39, UTS #46, and UTS #51. See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications. See the following resource links for general information about Unicode versions and other information about the Unicode Standard and other publications of the Unicode Consortium.
B. Technical OverviewVersion 17.0 of the Unicode Standard consists of:
The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard. Core SpecificationThe core specification for Version 17.0 is available for browsing online as per-chapter web pages. Because the full table of contents for the core specification is provided, with interactive links, no separate bookmarks page is provided for this release, nor are separate chapter links provided directly in this summary page for the Unicode Standard. Anchors for chapters, sections, tables, and figures in the core specification are shown with the convention of a "#" in the left margin of the heading or caption. Those anchors can be clicked on to provide custom bookmarks to any particular portion of the text, down to the level of subsections. Numbering of sections extends down to the subsection level, as well, to provide better referenceabiity of precise content. The HTML version of the core specification is authoritative. However, for convenience of reference, an archival version of core specification is also available as a single pdf. (13 MB) Code ChartsSeveral sets of code charts are available. They serve different purposes:
The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated. Han Radical-Stroke IndicesThere are a number of radical-stroke indices available to assist in the lookup of Han ideographs in the code charts.
The complete radical-stroke index is a stable part of this release of the Unicode Standard. It will never be updated. Unicode Standard AnnexesSTATUS: During the alpha review and beta review periods, links to individual UAXes (or UTSes) point to the proposed update for that document, if any. If no proposed update has been posted for the document, links point to the last published version of the document, for reference.Links to the individual Unicode Standard Annexes for this version are available in Section I, List of Components below. The summary list of significant changes in the content of each Unicode Standard Annex for Version 17.0 can be found in Section G, Changes in the Unicode Standard Annexes below. Unicode Character DatabaseSTATUS: During the beta review period, the draft of UCD data includes data for the complete, planned character repertoire of Unicode 17.0, including all data changes approved by UTC for version 17.0.Data files for Version 17.0 of the Unicode Character Database are available. The ReadMe.txt in that directory provides a roadmap to the functions of the various subdirectories. Detailed documentation about the data files can be found in UAX #44, Unicode Character Database. Version ReferencesVersion 17.0.0 of the Unicode Standard should be referenced as: The Unicode Consortium. The Unicode Standard, Version 17.0.0, (South San Francisco: The Unicode Consortium,
2025. ISBN 978-1-936213-35-1) The terms “Version 17.0” or “Unicode 17.0” are abbreviations for the full version reference, Version 17.0.0. The citation and permalink for the latest published version of the Unicode Standard is: The Unicode Consortium. The Unicode Standard. A complete specification of the contributory files for Unicode 17.0 is found below in Section I, List of Components. For examples of how to cite particular portions of the Unicode Standard, see also the Reference Examples. ErrataErrata incorporated into Unicode 17.0 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 17.0, see the list of current Updates and Errata. C. Stability Policy UpdateNo significant updates to the Character Encoding Stability Policies have occurred in the interval since the last release of the Unicode Standard. D. Textual Changes and Character AdditionsChanges in the Unicode Standard Annexes are listed in Section G. Character Assignment Overview4803 characters have been added. Most character additions are in new blocks, but there are also character additions to a number of existing blocks. For details, see the delta code charts. New BlocksThe following blocks are newly defined in Version 17.0:
E. Conformance ChangesThere are no new conformance requirements for the core specification in Unicode 17.0. F. Changes in the Unicode Character DatabaseThe detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 17.0 can be found in UAX #44, Unicode Character Database. The changes listed there include character additions and property revisions to existing characters that will affect implementations. Some of the important impacts on implementations migrating from earlier versions of the standard are highlighted in Section M. G. Changes in the Unicode Standard AnnexesIn Version 17.0, some of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.
H. Changes in Synchronized Unicode Technical StandardsThere are also significant revisions in the Unicode Technical Standards whose versions are synchronized with the Unicode Standard. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UTS, linked directly from the following list of UTSes.
I. List of ComponentsThis section lists the components of Version 17.0.0 of the Unicode Standard. The version numbering and the role of each component are explained in Versions of The Unicode Standard. M. Implications for MigrationThere are a significant number of changes in Unicode 17.0 which may impact implementations upgrading to Version 17.0 from earlier versions of the standard. The most important of these are listed and explained here, to help focus on the issues most likely to cause unexpected trouble during upgrades. Core Specification ChangesThe navigation bar for the core specification has been improved. New content has been added for Unicode 17.0, and many other improvements have been made to the text. In particular, many tables have been updated for better display, and the representative glyphs for many more Unicode characters in examples and citations are now displayed directly in the text. Script-related ChangesThere are four new scripts encoded in Unicode 17.0. One of these scripts, Tai Yo, has complex layout. General Character Property Issues
Security and Identifier-related Issues (See UAX #31 and UTS #39.)The Identifier_Type character property affects which characters are included in the General Security Profile for identifiers, which is a default recommendation for identifiers used in secure contexts. Depending on the Identifier_Type property value, characters are included (Identifier_Status = Allowed) or excluded (Identifier_Status = Restricted). For Unicode 17.0, the assignments of Identifier_Type for all existing characters in recommended scripts were reviewed and updated to match the best currently available data on usage. Note changes to Identifier_Type for numerous characters, particularly those whose associated Identifier_Status changed from Allowed to Restricted. See Choosing Identifier_Type Values in UTS #39 for an associated explanation of the rationale behind these changes.
Segmentation (See UAX #14.)UTC 181 approved a significant change to the linebreaking algorithm that introduces a new Line_Break character property value, Unambiguous_Hyphen. The need for this originated in changes related to handling of hyphens in Hebrew that had been approved for Unicode 16.0 (see decision 179-C25) but that proved to be problematic when being implemented in ICU. A temporary fix was made for Unicode 16.0 (see 180-C18 and section 5.6 of L2/24-162). The change for Unicode 17.0 is a more complete fix to those issues. See 181-C53 and section 6.1 of L2/24-224 for complete details. U+034F COMBINING GRAPHEME JOINER (CGJ) is not frequently used but is essential for certain situations, including in German and in Biblical Hebrew text. Although CGJ was first added to Unicode 3.2 in 2002, it has been difficult to specify stable character properties and segmentation rules for it. An analysis of the issues has now been done. A detailed history of how the handling of this character in Unicode’s specifications has evolved over the years has been added to UAX #14. See Section 6.3 of L2/24-224 for details. Numeric Property IssuesThere is one new set of decimal digits added in Unicode 17.0, for the newly encoded Tolong Siki script. Implementations of numeric values and numeric formatting should take this new set into account. CJK/Unihan Changes
See UAX #38, Unicode Han Database (Unihan) for further details on these changes Standardized Variation Sequences
Changes to Code Charts
Collation-related ChangesThe former documentation file, CollationTest.html, has been merged into a new section of UTS #10. The DUCET ordering of Tangut components with respect to Tangut ideographs has been modified. See Table 16, Computing Implicit Weights, in UTS #10 for details. Emoji ChangesFor details about emoji changes, see the Unicode 17.0 emoji charts and Emoji Recently Added, v17.0. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||