User:לערי ריינהארט/tests/bugzilla:1691

From Wikipedia, the free encyclopedia

examples[edit]

  1. ro:Constantin Brâncusi
  2. ro:Constantin Brancuşi
  3. ro:Constantin Brâncuşi
  4. ro:Constantin Brâncuşi
  5. w:ro:Constantin Brâncusi
  6. w:ro:Constantin Brancuşi
  7. w:ro:Constantin Brâncuşi
  8. w:ro:Constantin Brâncuşi


  1. ro:Wikipedia:Caterogizare/Categorie Orase în Slovacia
  2. ro:Wikipedia:Caterogizare/Categorie Oraşe in Slovacia
  3. ro:Wikipedia:Caterogizare/Categorie Oraşe în Slovacia
  4. ro:Wikipedia:Caterogizare/Categorie Oraşe în Slovacia
  5. w:ro:Wikipedia:Caterogizare/Categorie Orase în Slovacia
  6. w:ro:Wikipedia:Caterogizare/Categorie Oraşe in Slovacia
  7. w:ro:Wikipedia:Caterogizare/Categorie Oraşe în Slovacia
  8. w:ro:Wikipedia:Caterogizare/Categorie Oraşe în Slovacia

11:56, 2005 Feb 28 (UTC)[edit]

  • #1 - #8 works properly at [[:de:], en:, sv: ... everywhere !

explanations[edit]

  • Here are differnt links. Please look at what the link looks like and what title it targets.
  • If you look at this page and compare #3 and #4 you will not see any difference. The difference will show up only if you edit the page.
    • The diffrenece is that #3 uses â or î ehile #4 uses for these characters the &#nnnn; encoding too.
  • The examples are using three types of characters:
  1. 7 bit
  2. 8 bit
  3. UTF-8 characters
  • It is very strange that you can use either 8-bit characters in interlanguage (also InterWiki w:... at en: only) links or UTF-8 characters in the link, see links #1 and #2 (and #5 and #6)
  • If you click on link #3 the target will be somthing else.
  • Only link #4 works.
  • This behaviour is not transparent to the users using a copy and paste method to insert interlanguage links. It is discriminatory to a lot of languages using combined types and should be considered as a critical error. Users will not be aware that common method #3 will fail, that the very technical method #4 is required or that their interlanguage links will be remouved sooner or later. Gangleri | Th | T 17:06, 2005 Feb 25 (UTC)

addtional tests[edit]


  • #1, #2 and #4 works properly at [[:sv:] but #3 not
  • #1 - #4 works properly at [[:de:]
  • #5 - #8 will all fail because "w:" is used

see also[edit]

test links for pyWikipediaBot-users[edit]

  • Notes:
  • in order to document here "what you see as documentation" is coded differently as "what is coded in the links"; the usual method is used:
  1. &#nnnn; stands for &#nnnn;
  2. &#xnnnn; stands for &#xnnnn;
  3. % stands for % for %
    an alternative would be % stands for % for %


  • all links have been inserted with the copy and paste method
    if you make a preview you will see links
  1. changed to &#nnnn; encoding and
  2. containing characters in the range 128 - 255
  • you should know that they will fail
  • there are more "workarounds" to fix the links
  1. using &#nnnn; encoding for all characters > 127
  2. using &#xnnnn; encoding for all characters > 127
  3. using hardcoded %nn for all characters > 127
  4. a mixture of the methods above

links to items from sk:Category:Slovenské mestá[edit]

  • important note:
  • Unicode ofers multiple ways to go.
    • "opticaly" the following two characters "seems" to be the same
      • uppercase letters
        • Š Š Š Š
        • Š Š Š Š
        • probably other more or less advanced Unicode or HTML coding
          Š Š (see alanwood.net)
      • lowercase letters
        • š š š š
        • š š š š
        • probably other more or less advanced Unicode or HTML coding
          š š (see alanwood.net)
    • because of "the exact match" for accessing titles with MediaWiki only one is allowed:
      • sk:Hnúšťa fails coded as [[:sk:Hnúšťa]]
      • fails also as [[:sk:Hnúšťa]] coded as [[:sk:Hnúšťa]]
      • works as sk:Hnúšťa coded as [[:sk:Hnúšťa]]


  • sk:Hnúšťa fails coded as [[:sk:Hnúšťa]]
    • fails also as [[:sk:Hnúšťa]] coded as [[:sk:Hnúšťa]]
    • works as sk:Hnúšťa coded as [[:sk:Hnúšťa]]



things to discuss[edit]

  • it looks to be necessary to have an "alias" translation table for pywikipediabot; hopefully only one for Latin-1 and one for UTF-8 type wikis and not one for every language;

some links to de:[edit]

ú[edit]


š[edit]


š failures[edit]


from bugzilla:65#c17[edit]

  • Brion:
    • NEVER use š or š for s-caron. Numeric character references always refer to Unicode code points, and U+009A is a reserved control character, *not* s-caron. It might appear to work sometimes due to a fluke and crappy workarounds for compatibility with a Windows bug, but should definitely not be relied upon. Use the real Unicode number, š. The same goes for the other characters in the Windows CP1252 extended range (see ISO 8859-1#Windows-1252 ).
    • For the moment the only named character references that will work in links are the ISO 8859-1 ones (s-caron does not appear in ISO 8859-1). Stick with the numbers for now.

MediaWiki, meta:PyWikipediaBot and en:Category:Diacritics[edit]

  • From the example above it can be seen that &<x>acute; is supported by MediaWiki and
  • From the example above it can be seen that &<x>scaron; not.
  • As you can see scaron are used in the code and titles:
  1. en: Josef_Hir%26scaron%3Bal
  2. en: Edvard_Bene%26scaron%3B
    By the way: Why Edvard Beneš is redirected to Edvard Benes? OK! If the en: comunity wants this so it's fine for me ;-)