Talk:Nameprep

This article is rated Stub-class on Wikipedia's content assessment scale.
It is of interest to the following WikiProjects:

Computing

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
???	This article has not yet received a rating on the project's importance scale.
	This article has been automatically rated by a bot or other tool as Stub-class because it uses a stub template. Please ensure the assessment is correct before removing the `\|auto=` parameter.

Nameprep does NOT map lookalikes[edit]

Nameprep does NOT map similar looking characters. From the RFC:

  Because it is impossible to map similar-looking characters
  without a great deal of context such as knowing the fonts used,
  stringprep does nothing to map similar-looking characters together
  nor to prohibit some characters because they look like others.

— Preceding unsigned comment added by 131.107.0.101 (talk • contribs)

Lookalike characters example[edit]

What exactly is that example?

it has potentially grave implications for security if not considered by the designers and administrators of systems based on nameprep (the best known example of this being VeriSign's handling of IDNA names in .com and .net)

Readers (including me) are not necessarily familiar with VeriSign or its practices. Can somebody clarify or provide a link? 183.76.109.24 (talk) 05:13, 1 March 2012 (UTC)[reply]

lookalike characters[edit]

"It does not map lookalike characters to a single character"

What would be "lookalike" in a UNICODE string of octets ? Claasical Kanji variants not found in UNICODE 6.0 ? Are these expected to be used in the "label" of a subdomain or as an approved domain name?

Are we saying that written domain names are submitted in physical script on paper? Does this 'lookalike' problem arise for digital submissions in UNICODE? In a UTF-8 encoded email submission to some DNS authority?

cf: http://en.wikipedia.org/wiki/Domain_Name_System#Internationalized_domain_names

cf: Internationalized_domain_name

See the first paragraph of Internationalized_domain_name#ToASCII_and_ToUnicode for clarification.

"Stringprep" should point to this article - if it is retained. It should be merged.

The "issue" here may be that of the article IDN_homograph_attack to which I will add a see also link.

G. Robert Shiplett 11:45, 28 March 2012 (UTC)

UNICODE versus glyphs[edit]

"There are good reasons for this, such as the fact that fonts differ in which characters are lookalikes, and the fact that any decision on which character to map to will obviously provide a bias towards users of one script, but it has potentially grave implications for security if not considered by the designers and administrators of systems based on nameprep"

This paragraph seems to defeat all of the careful effort that has gone into the article on UNICODE, language planes, code points and much else on character-encoding post-UNICODE 2.0 as distinct from issues of fonts and glyphs.

Perhaps the article should merged simply to remove this unintentionally misleading paragraph. A user who reads this believing that post 2.0 UNICODE is just another codepage for character-encoding will get the entirely wrong impression. I will start with a See Also link for UNICODE.

G. Robert Shiplett 12:22, 28 March 2012 (UTC)

Explanatory note unclear[edit]

This edit introduced an item that reads:

IDN homograph attack or "lookalike" character spoofing based on a URL's appearance as read by a web user or as entered by a web user ( read in a page font, entered in the user's font of choice.) Note: this is not URI ambiguity in encoding. Examples are provided in both of the above articles.

The final sentence, "Examples are provided in both of the above articles", is relatively unclear. Exactly which articles does "both of the above" refer to? International Components for Unicode and Internationalized domain name, the two preceding items in the bulleted list? Or Internationalized domain name and IDN homograph attack, i.e. just one preceding item and then the article referenced in the present item itself? As of this writing, there are no such examples included in International Components for Unicode, and also, there were none at the time that explanatory note was added.
It is perhaps generally risky to back-reference things mentioned above in Wikipedia, because the article or list in question might change, and less careful editors might omit updating the subsequent back-reference accordingly. When I encountered this back-reference, I dug into the article history in the expectation that that had happened here, but apparently this back-reference was unclear from the start.

Also, why is it necessary to attack the "URI ambiguity in encoding" straw man at all? Okay, so it's not that. Who said it was?

Finally, as to this part: "as read by a web user or as entered by a web user ( read in a page font, entered in the user's font of choice.)" – why is it necessary to specify that, and to then parenthetically further specify the specification?

—ReadOnlyAccount (talk) 04:08, 13 October 2023 (UTC)[reply]