internationalized domain names

Enlarge picture
Example of Arabic IDN
Enlarge picture
Example of Chinese IDN
Enlarge picture
Example of Greek IDN
Enlarge picture
Example of Hebrew IDN
Enlarge picture
Example of Hindi IDN
An internationalized domain name (IDN) is an Internet domain name that (potentially) contains non-ASCII characters. Such domain names could contain letters with diacritics, as required by many European languages, or characters from non-Latin scripts such as Arabic or Chinese. However, the standard for domain names does not allow such characters, and much work has gone into finding a way around this, either by changing the standard, or by agreeing on a way to convert internationalized domain names into standard ASCII domain names while preserving the stability of the domain name system.

IDN has, by the standards of the Internet, a long history; it was originally proposed in 1996 (by M. Duerst) and implemented in 1998 (by T.W.Tan et al). After much debate and many competing proposals, a system called Internationalizing Domain Names in Applications (IDNA) was adopted as the chosen standard, and is currently, as of 2005, in the process of being rolled out.

In IDNA, the term internationalized domain name means specifically any domain name consisting only of labels to which the IDNA ToASCII algorithm can be successfully applied. (For the meaning of 'label' and 'ToASCII', see the section ToASCII and ToUnicode below.)

Internationalizing domain names in applications

Internationalizing Domain Names in Applications (IDNA) is a mechanism defined in 2003 for handling internationalized domain names containing non-ASCII characters. Such domain names could not be handled by the existing DNS and name resolver infrastructure. Rather than redesigning the existing DNS infrastructure, it was decided that non-ASCII domain names should be converted to a suitable ASCII-based form by web browsers and other user applications; IDNA specifies how this conversion is to be done.

IDNA was designed for maximum backward compatibility with the existing DNS system, which was designed for use with names using only a subset of the ASCII character set.

An IDNA-enabled application is able to convert between the restricted-ASCII and non-ASCII representations of a domain, using the ASCII form in cases where it is needed (such as for DNS lookup), but being able to present the more readable non-ASCII form to users. Applications that do not support IDNA will not be able to handle domain names with non-ASCII characters, but will still be able to access such domains if given the (usually rather cryptic) ASCII equivalent.

ICANN issued guidelines for the use of IDNA in June 2003, and it was already possible to register .jp domains using this system in July 2003. Several other top-level domain registries started accepting registrations in March 2004.

Mozilla 1.4, Netscape 7.1, Opera 7.11 and Safari are among the first applications to support IDNA. A browser plugin is available for Internet Explorer 6 to provide IDN support. Internet Explorer 7.0 and Windows Vista's URL APIs provide native support for IDN [1].

ToASCII and ToUnicode

The conversions between ASCII and non-ASCII forms of a domain name are accomplished by algorithms called ToASCII and ToUnicode. These algorithms are not applied to the domain name as a whole, but rather to individual labels. For example, if the domain name is www.example.com, then the labels are www, example and com, and ToASCII or ToUnicode would be applied to each of these three separately.

The details of these two algorithms are complex, and are specified in the RFCs linked at the end of this article. The following gives an overview of their behaviour.

ToASCII leaves unchanged any ASCII label, but will fail if the label is unsuitable for DNS. If given a label containing at least one non-ASCII character, ToASCII will apply the Nameprep algorithm (which converts the label to lowercase and performs other normalization) and will then translate the result to ASCII using Punycode before prepending the 4-character string "xn--". This 4-character string is called the ACE prefix, where ACE means ASCII Compatible Encoding, and is used to distinguish Punycode-encoded labels from ordinary ASCII labels. Note that the ToASCII algorithm can fail in a number of ways; for example, the final string could exceed the 63-character limit for the DNS. A label on which ToASCII fails cannot be used in an internationalized domain name.

ToUnicode reverses the action of ToASCII, stripping off the ACE prefix and applying the Punycode decode algorithm. It does not reverse the Nameprep processing, since that is merely a normalization and is by nature irreversible. Unlike ToASCII, ToUnicode always succeeds, because it simply returns the original string if decoding would fail. In particular, this means that ToUnicode has no effect on a string that does not begin with the ACE prefix.

Example of IDNA encoding

Main article: Punycode
As an example of how IDNA works, suppose the domain to be encoded is Bücher.ch (“Bücher” is German for “books”, and .ch is the country domain for Switzerland). This has two labels, Bücher and ch. The second label is pure ASCII, and so is left unchanged. The first label is processed by Nameprep to give bücher, and then by Punycode to give bcher-kva, and then has xn-- prepended to give xn--bcher-kva. The final domain suitable for use with the DNS is therefore xn--bcher-kva.ch.

ASCII Spoofing and squatting concerns

Main article: IDN homograph attack
Because IDN allows websites to use full Unicode names, it also makes it much easier to create a spoofed web site that looks exactly like another, including domain name and security certificate, but in fact is controlled by someone attempting to steal private information. These spoofing attacks potentially open users up to phishing attacks.

These attacks are not due to technical deficiencies in either the Unicode or IDNA specifications, but due to the fact that different characters in different languages can look the same, depending on the font used. For example, Unicode character U+0430, Cyrillic small letter a ("а"), can look identical to Unicode character U+0061, Latin small letter a, ("a") which is the lowercase "a" used in English. Characters that look alike in this way may be termed homonyms, homographs, or (less ambiguously) homoglyphs.

Although a computer may display visually identical or very similar glyphs for two different characters, these differences are still significant to the computer when locating web sites or validating certificates. The user assumes a one-to-one correspondence between the visual appearance of a name and the named entity, but when two names appear identical, this correspondence breaks down.

By contrast, with the old set of a to z, 0 to 9, and the hyphen, there is little in the way of homographs. l and 1 and 0 and o are the closest, and the combination "rn" looks similar to "m" in some fonts; however, most fonts make a noticeable visible distinction between them. Still, this means even in the worst case a site like Google would still only need to register 8 names to protect against the homograph attacks.

On December 2001, two Israeli researchers, Evgeniy Gabrilovich and Alex Gontmakher, published a paper titled "The Homograph Attack",[1] an attack that used Unicode URLs to spoof a website URL. To prove the feasibility of this kind of attack, the researchers successfully registered a variant of the domain name "Microsoft.com" which incorporated Russian language characters.

In general, this kind of attack is known as a homograph spoofing attack. This problem was anticipated before IDN was introduced, and guidelines were issued to registries to try and avoid or reduce the problem -- for example, recommending that registries only accept the Latin alphabet and that of their own country, not all of Unicode. Unfortunately this advice was not followed by those in control of a number of major TLDs.

On February 7 2005, Slashdot reported that this exploit was disclosed at the hacker conference Shmoocon with an example available at [2] On browsers supporting IDNA, the URL "[3] (where the first a is replaced by a Cyrillic а) appears to lead to paypal.com but instead lead to a spoofed PayPal web site that said "Meeow."

Internet Explorer 7 imposes restrictions on displaying non-ASCII domain names based on a user-defined list of allowed languages and provide an anti-phishing filter that checks suspicious Web sites against a remote database of known phishing sites.

Since Internet Explorer prior to version 7 does not support IDNs, it is not vulnerable to this kind of attack. However, older versions of Internet Explorer can be made IDN-compatible by browser plug-ins some of which are vulnerable to the spoofing attacks. On July 9 2005, the IDN-enabling plug-in Quero Toolbar 2.1.0 was released that implemented several anti-spoofing techniques like mixed-script detection and highlighting of characters belonging to different scripts.

On February 17, 2005, Mozilla developers announced that they would ship their next versions of their software with IDN support still enabled, but showing the punycode URLs instead, thus thwarting any attacks exploiting similarities between ASCII and non-ASCII letters (but not necessarily, for example, between Cyrillic and Greek letters, unless the user knows which Punycode URL corresponds to their chosen IDN URL) while still allowing people to access websites on an IDN domain. This is a change from the earlier plans to disable IDN entirely for the time being. [https://bugzilla.mozilla.org/show_bug.cgi?id=279099#c135]

Since then, both Mozilla and Opera have now announced that they will be using per-domain whitelists to selectively switch on IDN display for domain run by registries which are taking appropriate anti-spoofing precautions[4]. (See the article on homograph spoofing attacks for more details). As of September 9, 2005, the most recent version of Mozilla Firefox as well as the most recent Internet Explorer displays the spoofed Paypal URL as "[5] unsightly but clearly different from the actual [https://www.paypal.com paypal.com]. By contrast, the (non-existent) "[6] will display in the Firefox address bar as [7] as this form of domain is prohibited from registration at the Afilias registry level and therefore does not pose the same risk.

Safari's approach is to render problematic character sets as punycode. This can be changed by altering the settings in Safari's system preference files.

History of IDN

  • 12/1996: Martin Duerst's original Internet Draft proposing UTF5 (the first incarnation of what is known today as ACE) - UTF-5 was first defined by Martin Duerst at the University of Zürich in http://tools.ietf.org/html/draft-duerst-dns-i18n-00|draft-duerst-dns-i18n-00.txthttp://archive.minc.org/about/history/http://www.connect-world.com/articles/old_articles/Dr-TanTinWee.htm
  • 03/1998: Early Research on IDN at National University of Singapore (NUS), Center for Internet Research (formerly Internet Research and Development Unit - IRDU) led by Prof. Tan Tin Wee (IDN Project team - Lim Juay Kwang and Leong Kok Yong) and subsequently continued under a team at Bioinformatrix Pte. Ltd. (BIX Pte. Ltd.) - a NUS spin-off company led by Prof. S. Subbiah.
  • 07/1998: Geneva INET'98 conference with a BoF discussion on iDNS and APNG General Meeting and Working Group meeting.
  • 07/1998: Asia Pacific Networking Group (APNG, now still in existence http://www.apng.org and distinct from a gathering known as APSTAR http://www.apstar.org) iDNS Working Group formed. http://www.apng.org/old/commission/idns/
  • 10/1998: James Seng was recruited to lead further IDN development at BIX Pte. Ltd. by Prof. S. Subbiah.
  • 02/1999: iDNS Testbed launched by BIX Pte. Ltd. under the auspicies of APNG with participation from CNNIC, JPNIC, KRNIC, TWNIC, THNIC, HKNIC and SGNIC led by James Seng http://www.minc.org/about/history/idns/idomain/
  • 02/1999: Presentation of Report on IDN at Joint APNG-APTLD meeting, at APRICOT'99
  • 03/1999: Endorsement of the IDN Report at APNG General Meeting 1 March 1999.
  • 06/1999: Grant application by APNG jointly with the Centre for Internet Research (CIR), National University of Singapore, to the International Development Research Center (IDRC), a Canadian Government funded international organisation to work on IDN for IPv6. This APNG Project was funded under the Pan Asia R&D Grant administered on behalf of IDRC by the Canadian Committee on Occupational Health and Safety (CCOHS). Principal Investigator: Tan Tin Wee of National University of Singapore. http://www.apng.org/old/commission/idns/ipv6/
  • 07/1999 Tout, Walid R. (WALID Inc.) Filed IDNA patent application number US1999000358043 Method and system for internationalizing domain names. Published 2001-01-30 http://www.delphion.com/details?&pn=US06182148__
  • 07/1999: http://mirrors.isc.org/pub/www.watersprings.org/pub/id/draft-jseng-utf5-00.txt; Renewed 2000 http://www.nic.ad.jp/ja/idn/mdnkit/download/documents/mdnkit-2.4-doc/reference/draft/draft-jseng-utf5-01.txt Internet Draft on UTF5 by James Seng, Martin Duerst and Tan Tin Wee.
  • 08/1999: APTLD and APNG forms a working group to look into IDN issues chaired by Kilnam Chon. http://www.minc.org/about/history/idns/iname/
  • 10/1999: BIX Pte. Ltd. and National University of Singapore together with New York Venture Capital investors, General Atlantic Partners, spun-off the IDN effort into 2 new Singapore companies - i-DNS.net International Inc. and i-Email.net Pte. Ltd. that created the first commercial implementation of an IDN Solution for both domain names and IDN email addresses respectively.
  • 11/1999: IETF IDN Birds-of-Feather in Washington was initiated by i-DNS.net at the request of IETF officials.
  • 12/1999: i-DNS.net InternationalPte. Ltd. launched the first commercial IDN. It was in Taiwan and in Chinese characters under the top-level IDN TLD ".gongsi" (meaning loosely ".com") with endorsement by the Minister of Communications of Taiwan and some major Taiwanese ISPs with reports of over 200 000 names sold in a week in Taiwan, Hong Kong, Singapore, Malaysia, China, Australia and USA. Requires use of either plug-in or special DNS hacks.
  • Late 1999: Kilnam Chon initiates Task Force on IDNS which led to formation of MINC, the Multilingual Internet Names Consortium. http://www.minc.org/oldminc/old/meetings/
  • 01/2000: IETF IDN Working Group formed chaired by James Seng and Marc Blanchet
  • 01/2000: The second ever commercial IDN launch was IDN TLDs in the Tamil Language, corresponding to .com, .net, .org, and .edu. These were launched in India with IT Ministry support by i-DNS.net International. Requires use of either plug-in or special DNS hacks.
  • 02/2000: Multilingual Internet Names Consortium(MINC) Proposal BoF at IETF Adelaide. http://www.minc.org/oldminc/old//meetings/minc_20000327.html
  • 03/2000: APRICOT 2000 Multilingual DNS session http://www.apricot.net/apricot2000/index2.html
  • 04/2000: WALID Inc. (with IDNA patent pending application 6182148) started Registration & Resolving Multilingual Domain Names.
  • 05/2000: Interoperability Testing WG, MINC meeting. San Francisco, chaired by Bill Manning and Y.Yoneya 12 May 2000. http://www.minc.org/oldminc/old/meetings/sanfrancisco_20000512/testing_SFO.htm
  • 06/2000: Inaugural Launch of the Multilingual Internet Names Consortium (MINC) in Seoul http://www.minc.org to drive the collaborative roll-out of IDN starting from the Asia Pacific. http://www.minc.org/about/history/
  • 07/2000: Joint Engineering TaskForce (JET) initiated in Yokohama to study technical issues led by JPNIC (K.Konishi)
  • 07/2000: Official Formation of CDNC Chinese Domain Name Consortium to resolve issues related to and to deploy Han Character domain names, founded by CNNIC, TWNIC, HKNIC and MONIC in May 2000. http://www.cdnc.org/english/introduction/index.html http://www.cdnc.org/english/news/index.html
  • 03/2001: ICANN Board IDN Working Group formed
  • 07/2001: Japanese Domain Name Association : JDNA Lauch Ceremony (July 13, 2001) in Tokyo, Japan.
  • 07/2001: Urdu Internet Names System (July 28, 2001) in Islamabad, Pakistan, Organised Jointly by SDNP and MINC. http://urduworkshop.sdnpk.org
  • 07/2001: Presentation on IDN to the Committee Meeting of the Computer Science and Telecommunications Board, National Academies USA (JULY 11-13, 2001) at University of California School of Information Management and Systems, Berkeley, CA. http://www.nap.edu/books/0309096405/html/390.html
  • 08/2001: MINC presentation and outreach at the Asia Pacific Advanced Network annual conference, Penang, Malaysia 20th August 2001
  • 10/2001: Joint MINC-CDNC Meeting in Beijing 18-20 October 2001
  • 11/2001: ICANN IDN Committee formed
  • 12/2001: Joint ITU-WIPO Symposium on Multilingual Domain Names organised in association with MINC, 6-7 Dec 2001, International Conference Center, Geneva.
  • 01/2003: Free implementation of StringPrep, Punycode, and IDNA release in GNU Libidn.
  • 03/2003: Publication of RFC 3454, RFC 3490, RFC 3491 and RFC 3492
  • 06/2003: Publication of ICANN IDN Guidelines for registries Adopted by .cn, .info, .jp, .org, and .tw registries.
  • 05/2004: Publication of RFC 3743, Joint Engineering Team (JET) Guidelines for Internationalized Domain Names (IDN) Registration and Administration for Chinese, Japanese, and Korean
  • 03/2005: First Study Group 17 of ITU-T meeting on Internationalized Domain Names http://www.itu.int/md/T05-SG17-050330/sum/en
  • 05/2005: .IN ccTLD (India) creates expert IDN Working Group to create solutions for 22 official languages
  • 04/2006: ITU Study Group 17 meeting in Korea gave final approval to the Question on Internationalized Domain Names http://www.itu.int/ITU-T/newslog/Multilingual+Internet+Work+Progresses.aspx
  • 06/2006: Workshop on IDN at ICANN meeting at Marrakech, Morocco
  • 11/2006: ICANN GNSO IDN Working Group created to discuss policy implications of IDN TLDs. Ram Mohan elected Chair of the IDN Working Group.
  • 12/2006: ICANN meeting at São Paulo discusses status of lab tests of IDNs within the root.
  • 01/2007: Tamil and Malayalam variant table work completed by India's C-DAC and Afilias
  • 03/2007: ICANN GNSO IDN Working Group completes work, Ram Mohan presents report at ICANN Lisboa meeting. http://gnso.icann.org/correspondence/gnso-idn-wg-outcomes-ram.pdf
  • 10/2007: Eleven IDNA top-level domains were added to the root nameservers in order to evaluate the use of IDNA at the top level of the DNS.[2][3]

DNS registries known to have adopted IDNA

Non-IDNA or non-ICANN registries that support non-ASCII domain names

There are other registries that support non-ASCII domain names; a Singapore company called I-DNS, also proposes via an own registrar network generic domain name registrations in various languages, but the country codes at the end of the domain names are also transcripted into the same characters as the domain names. The company ThaiURL.com in Thailand supports .com registrations via its own modified DNS, ThaiURL.

Because these companies, and other organizations that offer modified DNS systems, do not subject themselves to ICANN's control, they must be regarded as alternate DNS roots. Domains registered with them will therefore not be supported by most Internet Service Providers, and as a result most users will not be able to look up such domains without manually configuring their computers to use the alternate DNS.

At ICANN's December meeting at São Paulo, IDNs were discussed in depth. ICANN has continued lab tests of IDNs within the root to implement the true IDN top level domains (IDN.IDN).

See also

References

4. GNSO IDN Working Group Outcomes Paper: [9]

External links

Internet is a worldwide, publicly accessible series of interconnected computer networks that transmit data by packet switching using the standard Internet Protocol (IP). It is a "network of networks" that consists of millions of smaller domestic, academic, business, and government
..... Click the link for more information.
domain name has multiple related meanings:
  • A name that identifies a computer or computers on the internet. These names appear as a component of a Web site's URL, e.g. wikipedia.org. This type of domain name is also called a hostname.

..... Click the link for more information.
American Standard Code for Information Interchange (ASCII), generally pronounced ask-ee IPA: /ˈæski/ ( [1] ), is a character encoding based on the English alphabet.
..... Click the link for more information.
A diacritical mark or diacritic, also called an accent, is a small sign added to a letter to alter pronunciation or to distinguish between similar words.
..... Click the link for more information.
Arabic abjad

Unicode range U+0600 to U+06FF
U+0750 to U+077F
U+FB50 to U+FDFF
U+FE70 to U+FEFF
ISO 15924 Arab (#160)

Note: This page may contain IPA phonetic symbols in Unicode.
..... Click the link for more information.
This page contains Chinese text.
Without proper rendering support, you may see question marks, boxes, or other symbols instead of Chinese characters.


A Chinese character or Han character (Simplified Chinese:
..... Click the link for more information.
20th century - 21st century - 22nd century
1970s  1980s  1990s  - 2000s -  2010s  2020s  2030s
2002 2003 2004 - 2005 - 2006 2007 2008

2005 by topic:
News by month
Jan - Feb - Mar - Apr - May - Jun
..... Click the link for more information.
American Standard Code for Information Interchange (ASCII), generally pronounced ask-ee IPA: /ˈæski/ ( [1] ), is a character encoding based on the English alphabet.
..... Click the link for more information.
domain name has multiple related meanings:
  • A name that identifies a computer or computers on the internet. These names appear as a component of a Web site's URL, e.g. wikipedia.org. This type of domain name is also called a hostname.

..... Click the link for more information.
On the Internet, the Domain Name System (DNS) associates various sorts of information with so-called domain names; most importantly, it serves as the "phone book" for the Internet by translating human-readable computer hostnames, e.g. en.wikipedia.
..... Click the link for more information.
A web browser is a software application that enables a user to display and interact with text, images, videos, music and other information typically located on a Web page at a website on the World Wide Web or a local area network.
..... Click the link for more information.
In technology, especially computing (irrespective of platform), a product is said to be backward compatible (or downward compatible) when it is able to take the place of an older product, by interoperating with other products that were designed for the older product.
..... Click the link for more information.
ICANN (IPA /aɪkæn/) is the Internet Corporation for Assigned Names and Numbers. Headquartered in Marina Del Rey, California, ICANN is a California non-profit corporation that was created on September 18, 1998 in order to oversee a number of Internet-related tasks
..... Click the link for more information.
top-level domain (TLD) is the last part of an Internet domain name; that is, the letters which follow the final dot of any domain name. For example, in the domain name www.example.
..... Click the link for more information.
Mozilla was the official, public, original name of Mozilla Application Suite by the Mozilla Foundation, currently known as SeaMonkey suite.

In informal use it has been used in a number of ways and in combination with other phrases, though all of them have been
..... Click the link for more information.
Maintainer: Netscape Communications Corporation

OS: Cross-platform

Use: Web browser

Website: [1] Netscape Navigator, also known as Netscape
..... Click the link for more information.
Maintainer: Opera Software ASA

OS: Cross-platform

Use: Internet suite
License: Proprietary
Website: [1] Opera is a cross-platform web browser and Internet suite developed by the Opera Software corporation.
..... Click the link for more information.
Maintainer: Apple Inc.

OS: Mac OS X, Microsoft Windows

Use: Web browser
License: Proprietary EULA, LGPL
Website: Apple: Safari

Safari is a web browser developed by Apple Inc. and included in Mac OS X.
..... Click the link for more information.
Windows Vista
(Part of the Microsoft Windows family)
Screenshot

Screenshot of Windows Vista Ultimate
Developer
Microsoft
Web site: Windows Vista: Homepage
Release information
Release date:
..... Click the link for more information.
Request for Comments (RFC) documents are a series of memoranda encompassing new research, innovations, and methodologies applicable to Internet technologies.
..... Click the link for more information.
Nameprep is the process of Unicode NFKC normalization, case-folding to lowercase and removal of some generally invisible code points before it is suitable to represent a domain name, or other such canonical name. It is used by IDNA.
..... Click the link for more information.
Punycode is a computer programming protocol by which a Unicode string of characters can be translated into the more-limited character set permitted in network host names. The protocol is published on the Internet in Request for Comments #.
..... Click the link for more information.
Punycode is a computer programming protocol by which a Unicode string of characters can be translated into the more-limited character set permitted in network host names. The protocol is published on the Internet in Request for Comments #.
..... Click the link for more information.
German language (Deutsch, ] ) is a West Germanic language and one of the world's major languages.
..... Click the link for more information.
.ch

Introduced 1987
TLD type Country code top-level domain
Status Active
Registry SWITCH Information Technology Services
Sponsor SWITCH Information Technology Services
Intended use Entities connected with Switzerland
..... Click the link for more information.
A country code top-level domain (ccTLD) is an Internet top-level domain generally used or reserved for a country or a dependent territory.

ccTLD identifiers are two letters long, and all two-letter top-level domains are ccTLDs.
..... Click the link for more information.
Motto
Unus pro omnibus, omnes pro uno (Latin) (traditional)[1]
"One for all, all for one"
Anthem
"Swiss Psalm"
..... Click the link for more information.
The internationalized domain name (IDN) homograph attack is a means by which a malicious party may seek to deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters may have nearly (or wholly)
..... Click the link for more information.
In the context of network security, a spoofing attack is a situation in which one person or program successfully masquerades as another by falsifying data and thereby gaining an illegitimate advantage.
..... Click the link for more information.
phishing is an attempt to criminally and fraudulently acquire sensitive information, such as usernames, passwords and credit card details, by masquerading as a trustworthy entity in an electronic communication. eBay, PayPal and online banks are common targets.
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.