User talk:Spitzak

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Hello Spitzak, and welcome to Wikipedia!
(If you find this note to be too bulky, feel free to remove it whenever you want)

Thank you for your contributions, we are pleased to have you here and I hope you stay. I recommend you have a quick look through the Five Pillars of Wikipedia - this page gives a brief summary of what we value here, and if you want some tutorial on how to edit have a glance at Wikipedia:Welcome. Below is a collection of some other pages that you may find helpful, feel free to read them at your leasure if and when you want. (But of course you don't have to read any of that to contribute!)

If you need help with anything, please feel free to contact me on my talk page. Or altertnatively type {{helpme}} here and a user will help you as soon as possible. But remember to sign all your posts on talk/discussion pages with ~~~~, this helps us track of who's saying what and when in discussions.

Once again Welcome to Wikipedia, and happy editing! 19:45, 17 October 2006 (UTC)Reply[reply]
Getting started
Getting your info out there
Getting more Wikipedia rules
Getting help
Getting along
Getting technical

Image without license[edit]

Unspecified source/license for Image:Utf8diagram.png[edit]


Thanks for uploading Image:Utf8diagram.png. The image has been identified as not specifying the copyright status of the image, which is required by Wikipedia's policy on images. Even if you created the image yourself, you still need to release it so Wikipedia can use it. If you don't indicate the copyright status of the image on the image's description page, using an appropriate copyright tag, it may be deleted some time in the next seven days. If you made this image yourself, you can use copyright tags like {{PD-self}} (to release all rights), {{self|CC-by-sa-3.0|GFDL}} (to require that you be credited), or any tag here - just go to the image, click edit, and add one of those. If you have uploaded other images, please verify that you have provided copyright information for them as well.

For more information on using images, see the following pages:

This is an automated notice by MifterBot. For assistance on the image use policy, see Wikipedia:Media copyright questions. NOTE: once you correct this, please remove the tag from the image's page. --MifterBot (TalkContribsOwner) 00:29, 5 September 2008 (UTC)Reply[reply]

File permission problem with File:Nuke screenshot.png[edit]

File Copyright problem

Thanks for uploading File:Nuke screenshot.png. I noticed that while you provided a valid copyright licensing tag, there is no proof that the creator of the file agreed to license it under the given license.

If you created this media entirely yourself but have previously published it elsewhere (especially online), please either

  • make a note permitting reuse under the CC-BY-SA or another acceptable free license (see this list) at the site of the original publication; or
  • Send an email from an address associated with the original publication to, stating your ownership of the material and your intention to publish it under a free license. You can find a sample permission letter here.

If you did not create it entirely yourself, please ask the person who created the file to take one of the two steps listed above, or if the owner of the file has already given their permission to you via email, please forward that email to

If you believe the media meets the criteria at Wikipedia:Non-free content, use a tag such as {{non-free fair use in|article name}} or one of the other tags listed at Wikipedia:Image copyright tags#Fair use, and add a rationale justifying the file's use on the article or articles where it is included. See Wikipedia:Image copyright tags for the full list of copyright tags that you can use.

If you have uploaded other files, consider checking that you have provided evidence that their copyright owners have agreed to license their works under the tags you supplied, too. You can find a list of files you have uploaded by following this link. Files lacking evidence of permission may be deleted one week after they have been tagged, as described on criteria for speedy deletion. If you have any questions please ask them at the Media copyright questions page. Thank you. MBisanz talk 02:11, 22 July 2009 (UTC)Reply[reply]

MS-DOS 1.0[edit]

It used the FAT-12 filesystem on 160kb single-sided 8-sector 5¼"-inch floppies. It was extremely primitive in some respects, yet still a great advance over commonly-used CP/M filesystems, since the exact file length, file modification date and time, etc. were recorded. Subdirectories were added in DOS 2.0, yet the DOS 1 directory entry format remained unchanged until the introduction of LFNs in Windows 95... AnonMoos (talk) 12:29, 2 December 2009 (UTC)Reply[reply]


Hi, I reverted you deletions in UTF-16, see edit summary. Probably you have a point in some deletions, but I did not see that in the whole. btw for my understanding, the thing "word" (as a bitlength unit) is not used in Unicode, so that makes it hard to understand for me. -DePiep (talk) 22:48, 21 June 2010 (UTC)Reply[reply]

UTF-8's compactness[edit]

Hi, I noticed you removed my addition to UTF-8 explaining that UTF-8 is popular in part because it is more compact than UTF-16 and UTF-32. I don't understand why, though, because the current wording (which is the same, or very nearly the same, as what it said before I added this part) suggests that the compactness of UTF-8 for Western European languages is not a significant reason for its popularity, because it cites its ASCII compatibility as the only reason ("for this reason", to me, suggests no other possible reasons), which I have a bit of a hard time believing. You also said "lots of other rejected multibyte encodings are shorter", but I don't understand why that's relevant or even what these encodings are... - furrykef (Talk at me) 00:32, 14 October 2010 (UTC)Reply[reply]

I believe the reason for UTF-8 popularity is the ASCII compatibility, not compactness. An encoding that reused the ASCII bytes as part of larger characters would be more compact, and this is what most alternatives to UTF-8 did. (the other reason is that other multibyte encodings did not map all of Unicode or made the mapping hard to figure out). Comparing it to UTF-16 for size here does not make sense, as the reason it wins over UTF-16 is certainly compatibility, UTF-16 is incompatible with every single possible ASCII string!
I don't see why non-Unicode encodings are relevant here. When we talk about the popularity of UTF-8, I think one would generally assume "as opposed to other Unicode encodings", since, as you said, non-Unicode encodings generally don't cover the Unicode set.
In any case, I think my main issue here is that you seem insistent on citing compatibility as the only reason for UTF-8's popularity over UTF-16. Surely the size factors into it at least a little? If I were to store big heaps of Japanese text, for example, I would use UTF-16 (unless I thought there was a high probability that the files would need to be used with a program that only understands UTF-8). - furrykef (Talk at me) 03:48, 15 October 2010 (UTC)Reply[reply]
@Furrykef: I realize that this comment is extremely old, but for a very brief time I worked at the company that manages 2channel. Uncompressed, as necessary in many databases on such large websites, for performance reasons, SHIFT-JIS is actually more compact than UTF-8, and we preferred it for that reason. So you should use it for large amounts of Japanese text. The more you know... SNice.svg Psiĥedelisto (talkcontribs) please always ping! 22:31, 8 July 2020 (UTC)Reply[reply]

License tagging for File:Unicode 2400 Chrome Ubuntu.png[edit]

Thanks for uploading File:Unicode 2400 Chrome Ubuntu.png. You don't seem to have indicated the license status of the image. Wikipedia uses a set of image copyright tags to indicate this information; to add a tag to the image, select the appropriate tag from this list, click on this link, then click "Edit this page" and add the tag to the image's description. If there doesn't seem to be a suitable tag, the image is probably not appropriate for use on Wikipedia.

For help in choosing the correct tag, or for any other questions, leave a message on Wikipedia:Media copyright questions. Thank you for your cooperation. --ImageTaggingBot (talk) 07:05, 26 October 2010 (UTC)Reply[reply]

ARF CLI GUI etc[edit]

Please respond at: Talk:Abort, Retry, Fail?#Take two. —DragonHawk (talk|hist) 11:38, 24 March 2011 (UTC)Reply[reply]

Still waiting. Please respond. —DragonHawk (talk|hist) 02:15, 4 May 2011 (UTC)Reply[reply]


I blocked and cleaned up after that person who was trying to impersonate you. -- Gogo Dodo (talk) 05:50, 8 April 2011 (UTC)Reply[reply]

I don't understand.[edit]

Can you explain your checkin note in the NeWS article? You added "Actually pure PS could not no matter how much helped", but I can't understand what this means in the context of the edit. Maury Markowitz (talk) 11:07, 28 June 2011 (UTC)Reply[reply]

Actually I'm not sure. It may make sense, but it seemed to me that the added text was just useless filler that provided no information. NeWS itself "needs additional software" (ie the operating system, probably other stuff) in order to work, too. I'm guessing you are saying that DPS could be used for windows provided you use something else for creating the windows and handling all the i/o, but when you specify it that way it is a true statement for any library, that it could be *part* of a windowing system. You could argue that DPS is designed for output, but when you do that you have to define X11 as being part of it in which case you might as well claim they are integrated and thus DPS+X11 is capable "without additional software".
Ahhh. OK I do believe this still needs mentioning in the article, but I'll fix the context. Maury Markowitz (talk) 12:12, 29 June 2011 (UTC)Reply[reply]
Actually you're right, in the intro it's not needed at all. Maury Markowitz (talk) 12:14, 29 June 2011 (UTC)Reply[reply]

i'm not getting why you have deleted the information on the page strcat from the section strcat_s. here i'm going to undo it. if you have any problem tplease drop message on my talk page. and give your suggestions. == Prasannjit Gondchawar (talk) 19:16, 1 October 2011 (UTC)Reply[reply]

about deletion of the information from the section strcat[edit]

i'm not getting why you have deleted the information on the page strcat from the section strcat_s. here i'm going to undo it. if you have any problem tplease drop message on my talk page. and give your suggestions. == Prasannjit Gondchawar (talk) 19:17, 1 October 2011 (UTC)Reply[reply]

Disambiguation link notification for March 13[edit]

Hi. When you recently edited Code page 437, you added links pointing to the disambiguation pages !! and 1/4 (check to confirm | fix with Dab solver). Such links are almost always unintended, since a disambiguation page is merely a list of "Did you mean..." article titles. Read the FAQ • Join us at the DPL WikiProject.

It's OK to remove this message. Also, to stop receiving these messages, follow these opt-out instructions. Thanks, DPL bot (talk) 10:58, 13 March 2012 (UTC)Reply[reply]

"Seek is O(1) in code units."[edit]

Can you give an algorithm demonstrating that? To find the nth character, do you not have to examine the preceding ones to determine that you indeed have the nth? (Of course, there are other issues here as well: the O(1) algorithm for char* is obviously at risk for buffer overruns, e.g., unless you have a solid upper bound, and can be fooled even then.) -- Elphion (talk) 16:27, 21 April 2012 (UTC)Reply[reply]

Oh, I see: I was misreading "code unit" as "character". But this is not interesting: no one is interested in seeking to the nth code unit unless you already have something like a lut for the string converting Char(n) into CU(m) (or more broadly, a list of starting points that you interested in -- beginning of paragraphs, etc. -- i.e., something you get by already having scanned the text). -- Elphion (talk) 16:32, 21 April 2012 (UTC)Reply[reply]
I should be clearer: I am "on your side" here -- the argument that UTF8 or UTF16 strings can't be treated as arrays is a red herring, because strings shouldn't be treated as arrays until they have been thoroughly scanned. If one truly needs an array of the characters, it can be built during the scan. But the argument that seeking the nth CU is O(1) is irrelevant to this. -- Elphion (talk) 16:53, 21 April 2012 (UTC)Reply[reply]
The mistake you are making is thinking that there is a need to count "characters" at all. First of all the word "character" is poorly defined in Unicode (it depends on the interpreted normalization and quite a few code points may not be "characters" so it is impossible to count them except by string scanning, in any encoding. I suspect however you mean "Unicode code points" when you say "characters". Or perhaps "UTF-16 code units" (where Unicode code points greater than U+FFFF are 2 units). I hope you can see from even these few examples, where I am unsure what you intend, why talking about "characters" is a bad idea.
In any case this makes as much sense as saying there is a need to find the N'th word or letter 'x' or anything else in O(1) time. There is no need for this, and text processing is quite fast despite the inability to do searches in less than linear time. The problem is that you need to remember *offsets* into strings and it is desirable to turn an offset into a pointer to the character in O(1) time. The obvious solution that any programmer should think of is to use fixed-size units for this "offset", in fact it is such a no-brainer that it seems hard to believe anybody would ever think otherwise. However decades of indoctrination where every man page says "characters" when talking about offsets seems to have turned even experienced programmers into complete morons when they encounter UTF-8.Spitzak (talk) 01:57, 24 April 2012 (UTC)Reply[reply]

[1] Please, restrain from personal attacks and assuming a bad faith, especially if there are no on-wiki evidences. Even if you allege to know something important about the real-life identity of that user, Wikipedia does not serve for spreading such rumours. Incnis Mrsi (talk) 06:04, 8 May 2012 (UTC)Reply[reply]

Yes sorry, that was stupid. It was obviously a good-faith edit.Spitzak (talk) 22:59, 8 May 2012 (UTC)Reply[reply]

UCS2 and UTF-16[edit]

Just curious: why is UTF-16 not an extension of UCS2? While it's true that the codepoints assigned to surrogate pairs are no longer available in UTF-16, those values had no character assignments in UCS2, so they did not lose their "original meaning". Are there other codepoints that were sacrificed in the transition? -- Elphion (talk) 21:09, 13 July 2012 (UTC)Reply[reply]


Hi -- not arguing the merits of your change, but pointing out that since the wording was being discussed on talk page, that's where you should have floated the change. Otherwise it's a quick descent into edit warring! -- Elphion (talk) 13:04, 18 September 2012 (UTC)Reply[reply]


You said

You really think "accustomed to ASCII" is why this was confusing? Really? Give me a break

Before EBCDIC and ASCII were developed, Variants of BCD were the most common character codes, and they all have non-contiguous alphabets smilar to EBCDIC, so EBCDIC probably wouldn't have been confusing then. I do think it was only after programmers becaame used to ASCII that anyone even gave it any thought. I've posed it as a question on the EBCDIC talk page. Peter Flass (talk) 22:46, 21 September 2013 (UTC)Reply[reply]


You said

You really think "accustomed to ASCII" is why this was confusing? Really? Give me a break

Before EBCDIC and ASCII were developed, Variants of BCD were the most common character codes, and they all have non-contiguous alphabets similar to EBCDIC, so EBCDIC probably wouldn't have been confusing. I do think it was only after programmers becaame used to ASCII that anyone even gave it any thought. I've posed it as a question on the EBCDIC talk page. Peter Flass (talk) 22:46, 21 September 2013 (UTC)Reply[reply]


Please don't remove sourced content. A "nak" is a female "yak" - it's in the yak article. Rklawton (talk) 03:13, 23 October 2013 (UTC)Reply[reply]

About the slashes[edit]

About Slash_(punctuation)#Encoding. The facts are clear (including the Unicode mislead), but I think we could get the prose better.

How about the section intro setup like: "Slashes are encoded in Unicode as ... and ...". But the Unicode naming is controversial/disputed.

(then the next paragraph says:) Typographically ... (zoom in on diffs).

One issue is, we should not push both definition and naming issue in one paragraph "encoding". What do you think? -DePiep (talk) 18:26, 17 April 2014 (UTC)Reply[reply]

UTF-8 and ASCII backward compatibility[edit]

Hello there! Regarding our recent back-and-forth on UTF-8 and ASCII backward compatibility, please allow me to explain.

In a few words, a text editor which is only ASCII-aware can be used to process 7-bit ASCII subset of the UTF-8 data only, for example, and everything else is pretty much garbled to the end-user using such a text editor. If we take an API client as another example, it can also understand 7-bit ASCII subset only; everything else is garbage to anything speaking only ASCII, and any changes performed outside the 7-bit ASCII subset are going to break UTF-8's multibyte characters.

Hope it makes sense, and of course, I'm more than open to discussing this further. — Dsimic (talk | contribs) 06:58, 22 April 2014 (UTC)Reply[reply]

"processing text" does NOT mean "text editor". What I meant is that code like this:
 printf("<some utf-8 here> %d\n", 10);
will work even if the C compiler and library does not know anything about UTF-8. The only requirement is that bytes with the high bit set are passed unchanged by the compiler, printf, and the output driver (it will help if the output is drawn on a terminal that understands UTF-8, but even if it is redirected to a file that is eventually displayed on a UTF-8 aware editor this works). This is true of virtually every language written to handle ISO-8859-1 or even the older IBM PC code pages.
Claiming the program must know something about the code point boundaries is as false as claiming it is impossible to process English text unless the program includes an english dictionary.— Preceding unsigned comment added by Spitzak (talkcontribs)
You're absolutely right with the above printf() example, and a bit later I'll change the wording in the article so it's covered. — Dsimic (talk | contribs) 15:23, 22 April 2014 (UTC)Reply[reply]
Just saw that you've already reverted my edits. Well, you can have it that way if you insist, though saying that ASCII-aware software "can process UTF-8 data as well" is somewhat misleading, as "processing" isn't well defined and can mean many things. — Dsimic (talk | contribs) 15:28, 22 April 2014 (UTC)Reply[reply]
Well, as always, things aren't that simple. For example, what if we had something like this, what's perfectly fine to be expected from an ASCII-aware application handling some ISO-8859-1 text:
printf("<some utf-8 here> %s\n", "<some iso-8859-1 here>");
That's happily producing invalid UTF-8 output. Thoughts? — Dsimic (talk | contribs) 15:40, 22 April 2014 (UTC)Reply[reply]
That example will still produce correct UTF-8 both before and after the ISO-8859-1 text. Depending on what is reading the output you will either see error indicators inside that text, or it will be recognized as ISO-8859-1 and rendered correctly.Spitzak (talk) 18:38, 22 April 2014 (UTC)Reply[reply]
How can UTF-8 be recognized as ISO-8859-1? — Dsimic (talk | contribs) 18:46, 22 April 2014 (UTC)Reply[reply]
Because it won't be valid UTF-8 (unless it is ASCII or an extremely unlikely arrangement of two or three letters and symbols in a row). This error can be detected by looking at no more than 4 bytes in order to determine that the first byte must not be the start of a UTF-8 character, there is no need to find the "ends" of the ISO-8859-1, this is strictly a one-pass algorithm. The display can then do something with this byte, such as show an error indicator, or guess that it is in ISO-8859-1 (or CP1252). It can then continue interpreting with the next byte, which will allow this to repeat if there is a long sequence of non-UTF-8 in the text. This will re synchronize correctly on the next valid UTF-8 code point.Spitzak (talk) 19:13, 22 April 2014 (UTC)Reply[reply]
That's all fine, but the point is that using ASCII-aware software provides forward compatibility with UTF-8 only in case 7-bit ASCII characters are used in combination with untouched multibyte UTF-8 characters. Why would those 128+ single-byte characters have to be ISO-8859-1 or CP1252? Why wouldn't they actually be Windows-1253, for example? Anything beyond 7-bit ASCII is a plain guessing game, if you agree. — Dsimic (talk | contribs) 19:33, 22 April 2014 (UTC)Reply[reply]
Sure, but I fail to see what a "UTF-8 aware" program could do that is any better if it is given a block of bytes that is not UTF-8. Yea it can throw an exception, but IMHO that is *worse*, not better. The trivial operation of copying a block of bytes that is IMPOSSIBLE to confuse with valid data is the correct behavior.Spitzak (talk) 23:50, 22 April 2014 (UTC)Reply[reply]
Well, I'd say that we're pretty much on the same page, and the whole confusion came from the vague definition of "processing". At the same time, the subset of bytes that can't be mistaken (in both directions) is the 7-bit ASCII. If it goes into 8-bit ASCII, backward/forward compatibility breaks due to design of UTF-8. Agreed? — Dsimic (talk | contribs) 02:59, 23 April 2014 (UTC)Reply[reply]
Yes, any program that assigns a meaning to an 8-bit byte will fail to handle UTF-8, mostly because they may change this byte to a different value. The biggest problems are programs that make NEL (0x85) and non-breaking space (0xA0) into whitespace characters. But there is a lot of programs, in particular both compiled and interpreted languages, that assign no meaning to any bytes between quotes in a string constant other than the quotes and backslash, which are ASCII.Spitzak (talk) 03:22, 23 April 2014 (UTC)Reply[reply]
... and other than dollar signs and curly brackets – as that's the case for PHP, for example, which is still good. — Dsimic (talk | contribs) 03:42, 23 April 2014 (UTC)Reply[reply]

Du erhältst einen Orden![edit]

Minor Barnstar Hires.png Der Detailorden
For your FLTK change Polluks 12:10, 28 August 2014 (UTC)Reply[reply]

Reversion without explanation[edit]

When you revert a good-faith edit (as you did here), it's best to provide a reason for the reversion. Otherwise the editor whose edit you reverted can't learn from their mistake. In this case, your reversions seem pretty low-effort, given that you could have also fixed the other occurrences of the word "octet" since you feel it's so obviously obscure and unused. When reverting good-faith edits, I'd encourage you to improve the page if the edit was prompted by inconsistency (as in this case) or some other easily-fixed problem. Electricmuffin11 (talk) 07:28, 24 September 2014 (UTC)Reply[reply]

November 2014[edit]

{{subst:User:BracketBot/inform|diff=635280213|page=Control character|by= by modifying 1 "()"s|debug=(-1, 0, 0, 0)|list=yes|remaining=*[[newline|CR and LF]] used to separate lines of text. The code 127 ([[Delete character|DEL]])) is also a control character. [[Extended ASCII]] sets defined by [[ISO 8859]] added the codes 128

  • character|escape]], <code>ESC</code>, <code>[[\e]]</code> ([[GCC (software)|GCC]] only), <code>^[</code>).
  • normally. For example, the sequence of code 27, followed by the printable characters <nowiki>"[2;10H", would cause a DEC VT-102 terminal to move its [[cursor (computers)|</nowiki>
  • with 31, forcing bits 6 and 7 to zero. For example, pressing "control" and the letter "g" or "G" (code 103 in [[octal]] or 71 in [[decimal|base 10]], which is 01000111 in [[Binary numeral system|


Please, a little lighter on the hammer![edit]

I realize your recent edits to the ANSI article are all in GF, but you deleted/reverted considerable useful formatting and ancillary information while in the process of deleting what you feel is "totally wrong". For instance, you deleted the statement that VT52 clones were made as part of that "totally wrong" edit, which I think you would agree is not "totally wrong". Please, a little lighter on a reverts! Maury Markowitz (talk) 17:23, 22 January 2015 (UTC)Reply[reply]

You appear to be eligible to vote in the current Arbitration Committee election. The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to enact binding solutions for disputes between editors, primarily related to serious behavioural issues that the community has been unable to resolve. This includes the ability to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail. If you wish to participate, you are welcome to review the candidates' statements and submit your choices on the voting page. For the Election committee, MediaWiki message delivery (talk) 12:49, 23 November 2015 (UTC)Reply[reply]

April 2016[edit]

Hello, I'm BracketBot. I have automatically detected that your edit to String (computer science) may have broken the syntax by modifying 1 "()"s. If you have, don't worry: just edit the page again to fix it. If I misunderstood what happened, or if you have any questions, you can leave a message on my operator's talk page.

List of unpaired brackets remaining on the page:
  • a total order on Σ<sup>*</sup> called [[lexicographical order]]. For example, if Σ = {0, 1} and 0 < 1, then the lexicographical order on Σ<sup>*</sup> includes the relationships ε < 0 < 00 < 000 < ... < 0001 < 001 < 01 < 010 < 011 < 0110 < 01111 < 1 < 10 < 100 < 101 < 111 < 1111 < 11111 ... The lexicographical order is [[total order|total]] if the alphabetical order is, but isn'
  • htm |archivedate=August 15, 2015 }}</ref> since this was the string delimiter in its BASIC language).

It's OK to remove this message. Also, to stop receiving these messages, follow these opt-out instructions. Thanks, BracketBot (talk) 22:40, 4 April 2016 (UTC)Reply[reply]

Yea, maybe you need to remove yourself from the ANSI page. You have reverted an improvement.[edit]

(Undid revision 737919284 by TiredTendencies (talk) "Replacement text is not better, seems to imply that you must type a zero"

That's because you must type a zero.  ??? What is the confusion here? It not only implies you must hit numpad 0,. but it suitably states such.

ALT+0 # # # where # # # are 3 numbers from the old DOS alt codes. To get the same alt code character, you must now add a zero in front of the original alt code. Not sure what the confusion here is. Was more re-placing 'citation needed' to proper spots either way. The reversion you completed has now made the CN marks in incorrectly suggestive places as to WHAT exactly needs a citation.

And yes, it seems to imply you must hit the numpad 0 (as I stated with my edit, specifically using the word 'numpad'). Why? Because YOU MUST HIT THE NUMPAD ZERO to access original DOS alt codes at their original alt code numbers! Perhaps it should not imply, perhaps it should simply STATE that you must do so. Because you must.

Might be time to take the ANSI page off your watchlist. Articles belong to nobody and it is clear from reading your talk page you have aggressively reverted this article multiple times for no real reason. You even have a 'give a reason for edit reversion in good faith' listing here. TiredTendencies (talk) 23:36, 5 September 2016 (UTC)Reply[reply]

No, you hit zero first to use the *NEW* codes. If there is no leading zero you get the old cp 437. Your text is implying the opposite.

ArbCom Elections 2016: Voting now open![edit]

Scale of justice 2.svg

Hello, Spitzak. Voting in the 2016 Arbitration Committee elections is open from Monday, 00:00, 21 November through Sunday, 23:59, 4 December to all unblocked users who have registered an account before Wednesday, 00:00, 28 October 2016 and have made at least 150 mainspace edits before Sunday, 00:00, 1 November 2016.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2016 election, please review the candidates' statements and submit your choices on the voting page. Mdann52 (talk) 22:08, 21 November 2016 (UTC)Reply[reply]

ArbCom Elections 2016: Voting now open![edit]

Scale of justice 2.svg

Hello, Spitzak. Voting in the 2016 Arbitration Committee elections is open from Monday, 00:00, 21 November through Sunday, 23:59, 4 December to all unblocked users who have registered an account before Wednesday, 00:00, 28 October 2016 and have made at least 150 mainspace edits before Sunday, 00:00, 1 November 2016.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2016 election, please review the candidates' statements and submit your choices on the voting page. MediaWiki message delivery (talk) 22:08, 21 November 2016 (UTC)Reply[reply]

Unicode terminology - BOM vs. Signature, code unit, ...[edit]

For the term Unicode Signature, see, chapter 2.13 Special Characters ("Unicode Signature. An initial BOM may also serve as an implicit marker to identify a file as containing Unicode text. ") and the following tables: Table 23-6, Table 23-7

A code point is the abstract form of a character, irrespective of its encoding.

A specific method of encoding code points, irrespective of endianness, is an encoding form.

A code unit is the basic unit of encoding in a given encoding form: one or more code units encode a single code point: 1-4 units in UTF-8, 1-2 units in UTF-16 (if 2 units are needed, they're called a surrogate pair), ...

An encoding form with specific endianness is an encoding scheme.

Strictly speaking, BOM is the single code point (character) U+FEFF, irrespective of its encoding.

The byte sequence that results when you encode the BOM character using a specific encoding scheme is what is loosely also called a BOM. More accurately, such a byte sequence is an encoding of the BOM character.

However, given that these byte sequences are also used to identify encoding schemes to which the concept of byte order doesn't apply - such as UTF-8 and UTF-7, which use bytes as the code units - it is better to call these byte sequences Unicode signatures.

Tabledhote (talk)


So fixing the image might be a good idea , reverting to previous version is not the best solution. DGerman (talk) 22:40, 3 May 2017 (UTC)Reply[reply]

Regarding Revert of changes on UTF-8[edit]


Your comment is """Apparently not valid UCS-2, those code points are called "invalid" in *all* unicode forms)""", which is not correct. As you can see in , """UCS-2, UTF-8, and UTF-32 can encode these code points in trivial and obvious ways""", and in fact, *this* is the difference between UTF-16 and UCS-2: UTF-16 doesn't encode the surrogates range (<U+D800..U+DFFF>), but encodes non-BMP characters (<U+10000 to U+10FFFF>), but UCS-2 encodes the surrogate code points (<U+D800..U+DFFF>), but nothing out of BMP. Behnam (talk) 22:15, 18 May 2017 (UTC)Reply[reply]

Unicode has declared that the code points for surrogate halves are invalid. They are equally unable/able to be encoded in UCS-2 as UTF-16. It is trivial to insert these codes into *both* encodings, which means any "validity" is equal in both. If a high and low surrogate half happen to be next to each other, most of modern Windows will interpret that as a single Unicode code point, therefore the text is UTF-16, not UCS-2 (which would require Windows to interpret it as two invalid code points).Spitzak (talk) 23:18, 18 May 2017 (UTC)Reply[reply]
Unicode Surrogate Pair characters are not "invalid code-points", they are "valid code-points" of type Surrogate, but, yes, they are not "Unicode Scalar Values". (See , and Anyways, the whole point of my change was to help readers understand why those code-unit/code-points are allowed in Windows and other systems in the first place (because they were based on UCS-2, before UTF-16 existed), where those values are valid code-points. Behnam (talk) 05:10, 19 May 2017 (UTC)Reply[reply]
The reason they are allowed in Windows filenames is that it is VASTLY simpler to implement the file system that way. Not for any idea of whether a code point is "valid" in Unicode. Several valid code points like '/' are not allowed in the filenames (I believe, though possibly the Win32 api has a non-path api to name files which would allow these too). A huge problem currently is code that believes these code points will magically not happen because they are "invalid" or whatever, and saying that changing to UCS-2 verses any other encoding somehow changes these code points validity is IMHO very harmful to getting the idiot savants who write things to stop doing such stupid actions.

ArbCom 2017 election voter message[edit]

Scale of justice 2.svg

Hello, Spitzak. Voting in the 2017 Arbitration Committee elections is now open until 23.59 on Sunday, 10 December. All users who registered an account before Saturday, 28 October 2017, made at least 150 mainspace edits before Wednesday, 1 November 2017 and are not currently blocked are eligible to vote. Users with alternate accounts may only vote once.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2017 election, please review the candidates and submit your choices on the voting page. MediaWiki message delivery (talk) 18:42, 3 December 2017 (UTC)Reply[reply]


Thanks for clearing that up. It was completely opaque before; I guessed one interpretation, trusting that someone would correct it if I guessed wrong! -- Elphion (talk) 06:01, 12 April 2018 (UTC)Reply[reply]

Yes, chcp 65001 is a thing[edit]

(Moved to Talk:Unicode_in_Microsoft_Windows#Yes, chcp 65001 is a thing)

--Artoria2e5 contrib 16:29, 9 May 2018 (UTC)Reply[reply]

Disambiguation link notification for June 5[edit]

Hi. Thank you for your recent edits. An automated process has detected that when you recently edited Meta key, you added a link pointing to the disambiguation page Super key (check to confirm | fix with Dab solver). Such links are usually incorrect, since a disambiguation page is merely a list of unrelated topics with similar titles. (Read the FAQ • Join us at the DPL WikiProject.)

It's OK to remove this message. Also, to stop receiving these messages, follow these opt-out instructions. Thanks, DPL bot (talk) 09:48, 5 June 2018 (UTC)Reply[reply]

HP 2000A[edit]

I undid your redo on the 2100 article that stated the 2000A was a dual-CPU machine. It was not, see The dual-CPU configurations started with the 2000B; it would have been difficult to fit two machines into the A model due to its much larger core. Maury Markowitz (talk) 11:20, 10 July 2018 (UTC)Reply[reply]

Was just restoring some old text, I have no idea. It sure sounded like there was one computer that ran BASIC, and another that did all the I/O to the terminals.Spitzak (talk) 18:31, 10 July 2018 (UTC)Reply[reply]

ASCII in UTF-8[edit]

ASCII is listed separately normally like here: <0.1% — Preceding unsigned comment added by 2003:CA:A72D:2000:9B2:9B82:511:8F06 (talk) 21:16, 6 September 2018 (UTC)Reply[reply]

Ordinal indicator[edit]

Understood, now. Thanks! Code Page Guy (talk) 11:22, 19 September 2018 (UTC)Reply[reply]

Stop you removal of information from the character set tables[edit]

Spitzak, some weeks ago you made bold changes to some character set / code page tables (including ASCII) which were reverted because there is no consensus for them. Some of your proposed changes would have made the tables basically useless, as has been explained to you several times (f.e. Talk:ASCII#New_smaller_table).

I now see that you have nevertheless mass-removed vital information from the tables and applied your changes to character set articles all over Wikipedia without consensus. You have thereby destroyed much of the utility value of these tables.

I firmly ask you to stop this disruptive behaviour immediately. It is extremely annoying to see wasted the precious time and energy of other editors who researched and added the info in the first place and will now have to try and restore the information again. --Matthiaspaul (talk) 14:50, 23 September 2018 (UTC)Reply[reply]

You NEVER posted any comments! Do so! I had an example of this up for a month Talk:ISO/IEC_8859-1, after you rejected the one that had the unicode removed. I addressed every one of your objections, aind in fact ADDED numerous change boxes to the tables (whifch your blind reversiohs have deleted!!!). In particular on ASCII I directly asked for an example of information that has been deleted, and you have not said anything. All I can think is that you think a number that is trivially derived by multiplying the table row by 16 and adding the column is "vital" information???

I am going to have to revert the reversions you did where you destroyed the many fixes to the boxes, code entries, and numerical entries in the tables. Thanks a lot. You talk about "constructive" but you are doing the exact opposite.

I very much would appreciate you putting an example of destroyed information here, as I have been very careful to not delete any footnotes, multiple definitions, etc:

Thank you.Spitzak (talk) 19:54, 23 September 2018 (UTC)Reply[reply]

I reverted a few but it is probably pointless as I expect you will revert them back. If you could try to preserve the work done to make the change boxes correct it would be nice, there were also a few glyph fixes (in particular in EBCDIC IBM puts slashes, not spaces, into the control character names). I also tried to make it consistent with a/b syntax for alternatives (a (b) was used in many cases but it is wider).

I have put on ISO-8859-1 my proposal for showing the decimal numbers, as you seem to think they are important. Possibly best to discuss this over there.

(edit-conflict) Spitzak, before you make mass-changes to the long established status quo, it is you who must find broad community consensus for them first, not me, who was (mostly) happy with the current presentation.
I already offered you my opinion at Talk:ASCII when the subject was brought up originally a few weeks back - but your proposed changes were so many and in some cases so extreme that I can't address all of them. With so many changes you seem to not like the tables at all - however, "as is" they are the long established standard for almost all character set tables in Wikipedia and the result of years of refinements.
If you would fix the few remaining shortcomings of the existing tables I would actually appreciate this, however, from what you wrote you obviously want something completely different than what the community has put together over the years.
You apparently want a small table optimized for smartphones, with as little information as possible, whereas I want them to be comprehensive and top-most reliable encyclopedic references useful for quick lookup (without mouse hovering) as well as for side-by-side comparisons or actual implementation work. The tables must provide all vital information at once, so that they could be printed out - tooltips or popup menus won't work.
You removed the decimal values from all the tables, which is unacceptable. The tables need to carry hex and decimal (and in some cases also octal) values to be useful.
I see that you are now mass-re-reverting me which means you have started to edit-war over changes for which you do not have community consensus. This is causing large-scale disruption of information which has been researched and put together over many years.
I hereby ask you to self-revert and find a community consensus first. If you continue to force your changes into the character set articles without a broad backing of the community, this will probably have to be escalated.
--Matthiaspaul (talk) 23:19, 23 September 2018 (UTC)Reply[reply]
Regarding my reversions, I had to bring the character set tables back to the established status quo as soon as possible, because things become even more complicated to sort out and resync once other editors have edited the pages as well. Since you changed many things at once (and over so many articles) it was sometimes impossible to only roll back the disputed changes, in particular if there were newer edits already.
If you fixed actual bugs in the tables then please reapply them individually (and without the disputed changes).
Right now, I don't understand what you mean by "change boxes".
--Matthiaspaul (talk) 23:19, 23 September 2018 (UTC)Reply[reply]
I mean boxes around characters that are different than some common reference character set, one of your primary requests for a replacement (see your comments in ASCII). These were edited in as I could run a diff with the saved text from the reference set.
The decimal numbers look like bloat to me and are trivial to calculate from the table location. In addition they are a continuous source of bad edits by people who think the decimal number and the hex number must be equal, and change one or the other to match. Use of decimal numbers to talk about code positions is almost non-existent in modern documentation as well. I really fail to see why you think this is important, and conversely why you don't see how distracting and ugly these numbers are. I would appreciate some examples where these numbers are actually used and the hex number is not also available (except for Alt codes which is why I left the numbers on DOS code pages).
I am not doing anything with smartphones, this is to compress it enough that a typical table fits on a desktop (they do fit on my 3K monitor but that still is not typical, and it would be really nice if *two* fit the 3K monitor.
I would appreciate a constructive comment, like alternative ways to display the information that does not distract from character identification. As you apparently do not look at samples in the talk pages, I have to resort to posting test edits in the pages so you will comment. My mistake was assuming your silence meant acceptance. Please take a look at ISO-8859-1 for another attempt to delete those hideous decimal numbers.

Spitzak (talk) 23:59, 23 September 2018 (UTC)Reply[reply]

ISO-8859-1 has been reverted and re-edited with a few steps to a proposed new version. Please check it out. Also ISO-8859-2 and ISO-8859-3 have been reverted and the box and spelling changes applied but leaving the decimal numbers in. It is easier than I though to remove the decimal numbers, they don't have to be deleted. That would have made reversion with saving the edits much easier. Spitzak (talk) 04:58, 25 September 2018 (UTC)Reply[reply]

I will review your proposals when I have time for it again. Someone broke into the house, so I have other obligations right now.
If you find actual bugs, please reapply them. Still, you should not remove the decimal values or make other potentially controversial changes, because there's no community consensus for this. If you do (as you unfortunately did already despite being told not to do it), this constitutes as edit-warring, and it is not covered by WP:BRD, in particular not for mass-changes all over the place. That's why I asked you to show your good will and try to clear you from the status of being an edit-warrior by self-reverting those edits ASAP and only reapply the actual bug-fixes, so that a constructive discussion without deadline can take place (ideally on the article talk page) before other changes can be possibly applied to the articles (but only if community consensus can be found for them). Thanks.
--Matthiaspaul (talk) 09:17, 26 September 2018 (UTC)Reply[reply]

ARBIPA sanctions alert[edit]


This is a standard message to notify contributors about an administrative ruling in effect. It does not imply that there are any issues with your contributions to date.

You have recently shown interest in India, Pakistan, and Afghanistan. Due to past disruption in this topic area, a more stringent set of rules called discretionary sanctions is in effect: any administrator may impose sanctions on editors who do not strictly follow Wikipedia's policies, or any page-specific restrictions, when making edits related to the topic.

For additional information, please see the guidance on discretionary sanctions and the Arbitration Committee's decision here. If you have any questions, or any doubts regarding what edits are appropriate, you are welcome to discuss them with me or any other editor.

Kautilya3 (talk) 11:53, 23 October 2018 (UTC)Reply[reply]

ArbCom 2018 election voter message[edit]

Scale of justice 2.svg

Hello, Spitzak. Voting in the 2018 Arbitration Committee elections is now open until 23.59 on Sunday, 3 December. All users who registered an account before Sunday, 28 October 2018, made at least 150 mainspace edits before Thursday, 1 November 2018 and are not currently blocked are eligible to vote. Users with alternate accounts may only vote once.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2018 election, please review the candidates and submit your choices on the voting page. MediaWiki message delivery (talk) 18:42, 19 November 2018 (UTC)Reply[reply]

I've cut to new section(s) on this long running battle - just a little concerned that your comment might get lost from being stuck on the end of the old one? In fact I nearly moved your last remark down to the bottom, but of course this is (rightly) against the guidelines. Anyway, do make sure you have a look at the current situation at the bottom of the page. Xcalibur's rules, I think we agree, are totally unusable - I have substituted by own idea of what a set of rules might look like - including the rule itself (in simple, unambiguous terms) followed by notes and examples where necessary. But I really think I did a fair job in getting rid of the rules altogether and I still haven't seen a coherent argument to the contrary! --Soundofmusicals (talk) 03:59, 21 November 2018 (UTC)Reply[reply]

EBCDIC again[edit]

for (c='A';c<='Z';++c)


char alfa="ABCD...WXYZ"; for(i=0;i<=25;i++)<ref to alfa[i] instead of c>

No procedure call required. Not much of a change. Apologies for not remembering C very well, I haven't used it in years. Peter Flass (talk) 17:03, 21 November 2018 (UTC)Reply[reply]

That requires keeping an extra variable that is unnecessary in ASCII and occupies 27 bytes one of which is unused. This sort of stuff is exactly what programmers hated back in the day.Spitzak (talk) 19:04, 21 November 2018 (UTC)Reply[reply]
:-{ Peter Flass (talk) 19:27, 21 November 2018 (UTC)Reply[reply]

December 2018[edit]

Ambox warning pn.svg You currently appear to be engaged in an edit war according to the reverts you have made on Arabic numerals; that means that you are repeatedly changing content back to how you think it should be, when you have seen that other editors disagree. Users are expected to collaborate with others, to avoid editing disruptively, and to try to reach a consensus, rather than repeatedly undoing other users' edits once it is known that there is a disagreement.

Points to note:

  1. Edit warring is disruptive regardless of how many reverts you have made;
  2. Do not edit war even if you believe you are right.

If you find yourself in an editing dispute, use the article's talk page to discuss controversial changes and work towards a version that represents consensus among editors. You can post a request for help at an appropriate noticeboard or seek dispute resolution. In some cases, it may be appropriate to request temporary page protection. If you engage in an edit war, you may be blocked from editing. Kautilya3 (talk) 15:04, 1 December 2018 (UTC)Reply[reply]

Alt-keycodes insight[edit]

Thanks for your input on pound sign. Your comment (in the edit history) that "on some setups any number equal to 156+n*256 will work, so this is in fact redundant with the 156 example" is interesting, and possibly interesting enough to be included within the article itself, or maybe a more general article such as Windows Alt keycodes - as there are several websites suggesting Alt-6556, and this could be clarified. But can you provide any references to demonstrate that this +n*256 principle is true? And is there a way of determining when (i.e. in which combination of codepage/default locale/language_used_for_non-Unicode_programs/IME/language settings/typeface etc. etc.) this particular combination will produce the pound sign? In my own case (see e.g. Talk:Pound_sign#Explanation_for_Alt_keycode_6556?, I am frequently (but rather randomly: I cannot detect the pattern) unable to enter it via any combination of Alt- 156/0163/6556 etc. As demonstrated at the talkpage, it seems to be connected with codepage 850, but as to why, that is still unclear too. Ozaru (talk) 12:32, 15 December 2018 (UTC)Reply[reply]

Created a safe wiki for Unicode subsets[edit]

Guy Macon is a digital version of a murderer. Safe information for Unicode subsets such as WGL4 is now found at . (talk) 11:24, 12 January 2019 (UTC)Reply[reply]

XIX and XVIIII[edit]

Re your reversion of my edit to Roman numerals: as attested by all the examples cited in the article and in the talk pages, "XVIIII" has always been an acceptable way to write "19", from the Roman times (check Julius Caesar) through medieval times. Ditto for the other additive notations for digits 4 and 9, such as VIIII, XIIII, XXXX, LXXXX, etc. The subtractive notations "IV", "IX", "XL" have always been regarded as abbreviations of the additive ones, not as the only "right" way to write those numbers; much like we see "Dr." as merely an abbreviation of "Doctor", not as having superseded it. The Romans almost always used the subtractive notation for the same reason that we almost always write "Dr. Smith" instead of "Doctor Smith": because it saved 50% or more in strokes, time, and space. The only exception is IV, that saves only 25% on all three counts; and that is obviously why "IIII" is much more common than "VIIII" or "XXXX".
Note, for example, that the longest additive numeral from 1 to 12 is "VIIII", that takes the space of six "I" letters. If a clockmaker were to use additive notation for all numbers he would have to leave that much space for each numeral. If he uses "IX" for 9 instead, the longest numeral will be "VIII", that takes only five "I"-slots -- quite a bit easier to fit, and wasting much less space overall. But then using "IV" instead of "IIII" makes no difference, except saving a little bit of metal; and "IIII" looks nicer because it visually balances the "VIII" on the other side.
On the other hand, notations like "IIXX" and "IIX" (or "XIIX") are extremely rare, and their contexts indicate either linguistic induction (as in Romans saying "duodevigesimo" = "two-from-twentieth" for "18th", or "duoetvicensimo" = "two-and-twentieth" for "22nd") or simple scribal/stonecutter error. As for the former, it seems that the 18th and 22nd Roman Legions were often written as "XIIX LEGIO" and "IIXX LEGIO", respectively.
There is even an example where a stonecutter was commissioned to chisel "IIXX LEGIO" for "22nd Legion" but "autocorrected" that to "XVIII LEGIO" instead. (See the Talk page for the source.). So, while no one ever mocked Caesar for writing "XVIIII" and the like all over De Bello Gallico, we must conclude that at some time in the late Roman Age there was at least one adult human in the Roman Empire who felt that "IIXX" was definitely "wrong". (Just as we must conclude that there is at least one cow in Scotland that is black on the left side.)
All the best, --Jorge Stolfi (talk) 01:49, 3 May 2019 (UTC)Reply[reply]

XVIIII is talked about in another paragraph as an alternative. I notice that paragraph does not say "instead of IXX" so there is precedent for not listing every possible alternative when describing each of them.Spitzak (talk) 17:47, 3 May 2019 (UTC)Reply[reply]
The point is that, based on copious evidence from actual texts, Roman and Medieval authors would consider both "XIX" and "XVIIII" as valid alternative ways to write "19"; whereas they apparently regarded "IXX" as "wrong". So I was trying to say "instead of the commonly accepted forms". --Jorge Stolfi (talk) 22:45, 3 May 2019 (UTC)Reply[reply]
I agree, it's just that the section that says "XVIIII is sometimes used instead of XIX" does not say "XVIIII is sometimes used instead of XIX or IIXX". Same should be true of this one. If there are N alternatives there is no reason to write N paragraphs, each saying "x is used instead of x1, x2, ...xN".Spitzak (talk) 23:15, 3 May 2019 (UTC)Reply[reply]

CD versus CCCC[edit]

Just to pick a nit,

  • "CD" is normally written with 3 strokes with same total length as ~5 "I"s, and uses the space of 4 "I"s
  • "CCCC" is 4 strokes with total length ~8 "I"s, and uses the space of 8 "I"s.

So the savings for "CD" are 25% in stroke count (roughly a measure of writing effort/time), ~38% in total stroke length (a measure of chiseling work and metal requirement) and 50% in space. That is why I had not claimed a flat "50% savings" for it.
All the best, --Jorge Stolfi (talk) 07:48, 17 May 2019 (UTC)Reply[reply]

Yes but this "50%" in stroke count is wrong for several of the other samples as well. I'm not sure where this "stroke count" stuff came from, in all examples I have seen the one with less "stroke count" also has fewer characters. For instance you could say "stroke count" is why "IIV" is not used instead of "III" but "IIV" also has just as many characters. About the only interesting fact is that "IIII" is not twice as wide as "IV" which may explain why "IIII" is used more often than other alternatives.Spitzak (talk) 19:35, 17 May 2019 (UTC)Reply[reply]

"Common patterns" in Roman numerals[edit]

Hi – in your latest edit to Roman numerals you changed:

"but these are arranged in a common pattern that remains constant for each power or "place".


"but a common pattern is used for each of them".

I can live with this, to be honest – in spite of recent accusation of WP:OWN I do try to make a habit of only reverting well-meant edits when important information has been lost, or a misleading or frankly erroneous impression has been created.
On the other hand my original wording in this case was very carefully considered – in the light of persistent misunderstanding of what the paragraph was actually about (it is always safe to assume, I suspect, that the well-intentioned editor is less, rather than more liable to confusion than the average reader). In particular, I was a little apprehensive that my use of the phrase "common pattern" might be misunderstood, especially by a careless reader, or one with a less than perfect command of English.
Obviously, you understood perfectly exactly what I meant, and were able to put it more succinctly, but I do wonder if on reflection you might conclude that the original, while a few words longer, has the advantage of clarity?
Leave this one with you, anyway... ---Soundofmusicals (talk) 01:26, 22 May 2019 (UTC)Reply[reply]

Undo to UTF-16[edit]

Hi there! Regarding your edit here, the RFC says "SHOULD", which in the language of RFCs explicitly means a recommendation. In section 1.2 of the mentioned RFC, it says 'The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119', and RFC 2119 says '3. SHOULD: This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.' Do you disagree that the "plain English" meaning is closer to recommend? DimeCadmium (talk) 06:32, 30 June 2019 (UTC)Reply[reply]

re ®[edit]

For Registered trademark symbol: Please please go to the talkpage. You know how we do things at enwiki. -DePiep (talk) 22:28, 17 July 2019 (UTC)Reply[reply]

Copyright symbol[edit]

Do you have a cite for the "many fonts draw the copyright symbol as a superscript" you recently added in "Copyright symbol"? I've been doing a clean-up and that's the last unsourced bit that needs attention. I attempted to ping you via edit summary, but I'm not sure that shows up as a notification. TJRC (talk) 14:55, 19 July 2019 (UTC)Reply[reply]

No, I copied the information from the Enclosed R page, I also copied it to the registered trademark symbol article. It had no citation, though it does appear true in a few fonts such as the serif and sans serif ones in the browser. An explanation as to why Unicode did not reuse it when they really like to reuse as much as possible is needed. I deleted uncited and imho dubious text that circled C is used instead of copyright because the copyright is missing from some fonts (any such font would be missing the circled C as well!)Spitzak (talk) 20:59, 19 July 2019 (UTC)Reply[reply]
The enclosed alphanumerics only references the entire Unicode standard when talking about the disunification.Spitzak (talk) 21:06, 19 July 2019 (UTC)Reply[reply]
Thanks, I'll delete that bit then; it's not all that important to what the symbol is, anyway. Thanks for deleting the Korean bit; that's bothered me for a long time, and I never got around to trimming it. Most of the edit adding it was pretty much nonsense, and in retrospect, I should have cut the entire paragraph rather than just editing out the obvious falsities. TJRC (talk) 22:39, 19 July 2019 (UTC)Reply[reply]

Pound sign[edit]

I have undone your reversion of reference to # in lead. Article is "pound sign", not "£" or pound sterling. For US readers, "pound sign" means "#". Per WP:LEAD, significant points in the body should be summarised in the lead. If you continue to disagree, please use talk:Pound sign to discuss. --John Maynard Friedman (talk) 11:55, 24 September 2019 (UTC)Reply[reply]

To say I am fed up with this article is an understatement - but last time I cut it from my watchlist the result was not good (!) In the meantime I have better things to do than field endless quibbles over precise wording. Current form of the article I can live with - so perhaps we could leave it there - at least until it gets attacked again. --Soundofmusicals (talk) 22:27, 18 November 2019 (UTC)Reply[reply]

ArbCom 2019 election voter message[edit]

Scale of justice 2.svgHello! Voting in the 2019 Arbitration Committee elections is now open until 23:59 on Monday, 2 December 2019. All eligible users are allowed to vote. Users with alternate accounts may only vote once.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2019 election, please review the candidates and submit your choices on the voting page. If you no longer wish to receive these messages, you may add {{NoACEMM}} to your user talk page. MediaWiki message delivery (talk) 00:04, 19 November 2019 (UTC)Reply[reply]

Sidebar at ordinal indicator[edit]

For the currency signs, I was wp:BEBOLD and just commented out the punctuation sidebar. I felt strongly that it was just clutter in those articles and in some, like rouble sign, it had the horrible disruptive effect that you describe: compare the versions before and after I edited it today. Maybe you don't actually need it in ordinal indicator? --John Maynard Friedman (talk) 20:39, 20 November 2019 (UTC)Reply[reply]

Google Code-In 2019 is coming - please mentor some documentation tasks![edit]


Google Code-In, Google-organized contest in which the Wikimedia Foundation participates, starts in a few weeks. This contest is about taking high school students into the world of opensource. I'm sending you this message because you recently edited a documentation page at the English Wikipedia.

I would like to ask you to take part in Google Code-In as a mentor. That would mean to prepare at least one task (it can be documentation related, or something else - the other categories are Code, Design, Quality Assurance and Outreach) for the participants, and help the student to complete it. Please sign up at the contest page and send us your Google account address to, so we can invite you in!

From my own experience, Google Code-In can be fun, you can make several new friends, attract new people to your wiki and make them part of your community.

If you have any questions, please let us know at

Thank you!

--User:Martin Urbanec (talk) 21:58, 23 November 2019 (UTC)Reply[reply]

Removal of *sidebars from punctuation pages[edit]

Hey, I noticed that you removed navboxes from (all?) pages on punctuation on December 11th. Just wondering why? It's been reversed on Full stop with no apparent attempt from you to counter it, so I'm unsure whether it's supposed to be that way or not. - Novelyst (talk) 10:45, 19 December 2019 (UTC)Reply[reply]

I was replacing the "sidebar" with a new "navbox" at the bottom. The sidebar is far too big and interferes with images. I didn't revert because I wanted to see if there was consensus the navbox was better.Spitzak (talk) 15:54, 19 December 2019 (UTC)Reply[reply]
Ah, I understand. From what I can see, however, the navbox does not appear to display all punctuation (as did the sidebars) - I see no full stop - or the exact names adjacent. From a more objective standpoint, wasn't the sidebar better in this regard? - Novelyst (talk) 14:30, 20 December 2019 (UTC)Reply[reply]
Full stop is there after the flerion. The design was copied from the Currency symbols navbox, you can see the names by looking at the popup tooltip or link preview. However it could be altered to have text instead, or more likely both text and symbols. I still feel it would be good to get rid of the sidebar, on several articles it is serioulsy interfering with legibility because it forces the images out of alignment with text, also it seems that directions to other pages should be in navboxes.Spitzak (talk) 18:18, 20 December 2019 (UTC)Reply[reply]

Broken bar[edit]

In your edit to vertical bar, you say that ⇧ Shift+\ produces a solid vertical bar even though the engraving has a broken bar. I suspect that this behaviour is OS dependent. So maybe best you qualify your caption with a note to say which OS? (I have UK extended, not US-int). Your call. --John Maynard Friedman (talk) 11:32, 16 February 2020 (UTC)Reply[reply]


If you are going to revert six edits in a row [2], [3], [4], [5], [6], [7], it would be appreciated if you would provide an explanation for at least one of them. --R'n'B (call me Russ) 00:39, 14 April 2020 (UTC)Reply[reply]

I was trying to fix the page it directed to. The page about straight apostrophe is completely unhelpful in explaining why this redirect is needed.Spitzak (talk) 03:30, 14 April 2020 (UTC)Reply[reply]
I'm not seeing anything there that could not be merged into Apostrophe#Typographic form. I would therefore disagree that Apostrophe is completely unhelpful. It is somewhat helpful and could be made more so. BD2412 T 04:08, 14 April 2020 (UTC)Reply[reply]
Obvious problems with ' (disambiguation) for describing ’:
  • The words "straight version" in the apostrophe
  • The words "straight version" in the single quotes
  • Implication it can be used for more than the closing quote.
  • It certainly is not a "modifier letter left half ring"
  • Wrong for Okina
  • Wrong for Satillo
  • Wrong for stress
  • Wrong for prime, foot, minute, minute of arc

I agree the interesting information could be put in the Apostrophe section you recommend, but the redirect has to go there and it has to show both meanings (at least for fonts with both ‘ and ’ they intended the character to be used as a quotation mark so it has to show that). Spitzak (talk) 04:17, 14 April 2020 (UTC) PS: adding " (disambiguation)" to links that happen to go to disambiguation pages is counter-productive, as it means if anybody ever edits the disambiguation page into a subject it won't fix these links.Spitzak (talk) 04:22, 14 April 2020 (UTC)Reply[reply]

Yes, a section redirect is possible. With respect to adding "(disambiguation)" to intentional disambiguation links, this is required by WP:INTDABLINK to remove them from the queue of errors needing to be fixed. On the rare occasion that a disambiguation page is converted to a regular article, these are also routinely fixed by disambiguators. We have been at this a very long time. BD2412 T 17:16, 14 April 2020 (UTC)Reply[reply]

Talk:List of typographical symbols[edit]

Could you have a glance at Talk:List of typographical symbols to see if you agree with the direction I'm taking? --John Maynard Friedman (talk) 09:58, 25 April 2020 (UTC)Reply[reply]

Common interests/focuses in/on character sets and code pages[edit]

It seems that we have very similar focuses/interests (at least recently) on Wikipedia. If you want to propose any online article collaboration on character sets, code pages, or encodings with me, I will consider it. Hkbusfan (talk) 10:36, 26 April 2020 (UTC)Reply[reply]

If you are interested and have time, please see my addition to the talk page on Code page 437. Hkbusfan (talk) 11:13, 26 April 2020 (UTC)Reply[reply]

UTF-1 Revision[edit]

Hello. I undid your revision to the UTF-1 article. I've explained my reasoning for doing so on that article's talk page. Thanks. — Preceding unsigned comment added by Maschinengott (talkcontribs) 20:45, 29 April 2020 (UTC)Reply[reply]

Danish keymap and AltGr[edit]

The Danish keymap was always under the X-windows section, like the Swedish keymap. It was I who tagged as 'clarification needed, reason=which OS?' until today when I realised that it should have been obvious from the nesting that it is X-windows. So I have reverted your change since (guessing that you are not Danish) neither of us can guess the intentions of the editor who first put it under x-windows. If it bothers you, I know a Danish editor who might be able to clarify if you want? --John Maynard Friedman (talk) 23:09, 13 May 2020 (UTC)Reply[reply]

It appears to exactly match the description of key combinations in the Windows section, and I do know that Linux tends to have a lot more AltGr combinations. I suspect it is the Windows keymap, but it is also redundant as that is already described. So maybe it should just be deleted.Spitzak (talk) 04:38, 14 May 2020 (UTC)Reply[reply]
Yes, I shall delete it as uncited and unclear. --John Maynard Friedman (talk) 09:38, 14 May 2020 (UTC)Reply[reply]
Is it is time to continue this discussion at [[template talk:char}}?
How easy is it would it be to add something like |size= big |style= bold ? But honestly it would be a lot easier if we could use the regular markup. --John Maynard Friedman (talk) 08:03, 19 May 2020 (UTC)Reply[reply]

Highlighting small symbols using template:code[edit]

I have been using {{tl:code}} to highlight tiny symbols, especially any (like straight apostrophe) that could be confused as markup or just plain overlooked: it makes it clearer that this is the symbol being described, it is not just part of the text. So I wondered why you removed it from guillemet: is it just a subjective taste thing?

(BTW, this template is different from {{tl:mono}}, which I tried in another context and didn't find at all helpful. Compare ' with ' and just plain ' ).--John Maynard Friedman (talk) 09:38, 14 May 2020 (UTC)Reply[reply]

That template also uses a monospace font, so it is pretty bad for a lot of typographical characters, as they tend to shrink, compress, or expand them to be fixed-width. If there is another way to put the character in a box it might be useful.Spitzak (talk) 18:43, 14 May 2020 (UTC)Reply[reply]
Ah, ok, now I understand your edit note. Right now, though, I really believe that we need the highlight so I will revert per WP:BRD pending a better solution because the cure is worse than the disease for most readers. I do understand and sympathise with your point though, so will search for a functionally equivalent template that doesn't have this annoying side effect. --John Maynard Friedman (talk) 18:51, 14 May 2020 (UTC)Reply[reply]

This seems to work, I can't find any way to get the box other than the code command: <code>{{serif|fooiMMM}}</code> -> fooiMMM I would get rid of all the parenthesis, quotes, angle brackets, and other stuff people have used. Even in this example there are now unnecessary parenthesis.

Yes, I agree. I was editing typewriter yesterday and they really are redundant. But I had to use {{code}} until we come up with something better.
I've been looking at Template:Navbox punctuation which has the symbols against a pale grey background. The actual formatting is done at Template:Navbox punctuation/set which maybe you could raid for inspiration? Is there anything in template:Semantic markup templates we could use (nothing leaps off the page at me). --John Maynard Friedman (talk) 10:39, 16 May 2020 (UTC)Reply[reply]
I think it might be ok to make template:char and make it do the code+serif hack shown here. Then pages can be edited to use it, and it can be changed later.Spitzak (talk) 18:42, 16 May 2020 (UTC)Reply[reply]
In principle, I agree but what makes me hesitate is not understanding the way wp uses CSS. I also don't know whether there are any usability issues if we override the readers font choice. (As indeed mono does already). But meantime I can't see anyone objecting to you building the char template and there is something definite for people to comment on. --John Maynard Friedman (talk) 20:17, 16 May 2020 (UTC)Reply[reply]
I don't think making a template is any worse than putting the code directly in the page, and the template makes it much easier to fix in the future if it is done wrong.
Another one to look at is template:keycap it draws a box and does not mess with the font.Spitzak (talk) 20:58, 16 May 2020 (UTC)Reply[reply]
A template is definitely the way to go, no question. I knew about {{keypress}} but not about keycap. The output seems similar: ° v °. The only concern I have about just using them it 'as is' is the shadow effect that aims to give an "artist's impression" of a raised key. So it is 'just' a question of copying and hacking the code of keycap and we have our {{char}}. Can you do it?
[Sentence struck out because keycap redirects to keypress].
Another advantage of using a template is that maybe someone in the future could develop it to support an argument like "font=" or "style=" or even "size=" or "colour=". But let's get basic version done first! --John Maynard Friedman (talk) 17:03, 17 May 2020 (UTC)Reply[reply]
The Lua code that tl|keypress uses is at Module:Key. It looks like the box-shadow code is near the top. But I have no idea whether the average editor can create a new Module:Char (the same as Key but without the shadow)? --John Maynard Friedman (talk) 17:15, 17 May 2020 (UTC)Reply[reply]

Specification for template:char[edit]

Just so as to be clear, compare these

  • ° {{highlight|°}}
  • ° {{code|°}}
  • ° {{keypress|°}}

Do you agree that we don't want any kind of hard box around the character?, that {{code}} is closest. But I notice that on the mobile version, {{code}} has a very thinly drawn box. Even so, it is acceptable to my eye. --John Maynard Friedman (talk) 21:36, 17 May 2020 (UTC)Reply[reply]

I like the code one the best. It would be nice if the extra space it adds was not there, though. Another place to look is the "buttons" at the bottom of this text editor, they look pretty good. Sorry I have no idea how to decipher the code behind the keypress template.Spitzak (talk) 16:08, 18 May 2020 (UTC)Reply[reply]
Whoa! I found out the lowest-level code by examining those buttons. Here is is:
<span style="border: 1px solid #ddd; background-color: #f9f9f9; padding: 1px 4px">°</span> -> °
Spitzak (talk) 16:24, 18 May 2020 (UTC)Reply[reply]
Fantastic! So do you want to give creation of {{char}} a try,? --John Maynard Friedman (talk) 19:40, 18 May 2020 (UTC)Reply[reply]
(BTW, maybe you already did this but I can confirm that it displays as expected on both desktop and mobile wikis). --John Maynard Friedman (talk) 19:53, 18 May 2020 (UTC)Reply[reply]
I made template:char and put a single usage on guillemet. Go for it, feel free to modify and add more uses! Spitzak (talk) 20:11, 18 May 2020 (UTC)Reply[reply]
"What a Brilliant Idea!" Barnstar.png What a Brilliant Idea Barnstar
By Jove, I do believe she's got it! John Maynard Friedman (talk) 20:48, 18 May 2020 (UTC)Reply[reply]

Some niggles to resolve[edit]

  1. In some articles (for example, Equals sign), the symbol is given in bold because that is the standard for redirect targets. '''{{char|°}}''' doesn't do that but '''{{code|°}}''' does. (° v °).
  2. In some articles, for example, Equals sign again, the symbol is given larger size (viz., '''{{Big|1==}}'''). Actually I don't think it is necessary in that particular case but there are some tiny glyphs that do need it. I don't understand why you would want to deprecate <big>{{char|°}}</big> but I see it works: ° --John Maynard Friedman (talk) 21:15, 18 May 2020 (UTC)Reply[reply]
Sorry, I've re-read the template doc and you do explain why you don't like it. But this behaviour is not unique to your template:
  • The quick brown fox jumped over the lazy dog.The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog. ° The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog.The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog.The quick brown fox jumped over the lazy dog.The quick brown fox jumped over the lazy dog.The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog. ° The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog.The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog.The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog.
I think we just have to accept it. It really isn't that offensive. --John Maynard Friedman (talk) 21:58, 18 May 2020 (UTC)Reply[reply]
It looks like <big> does not change the line spacing. I was trying to set font-size in the <span>. So if big works, use it put it in the template. Bold might be a good idea too, but I think you got bold because <code> thought it was a parser token.Spitzak (talk) 00:16, 19 May 2020 (UTC)Reply[reply]


I created a template:char/sandbox trying to make a bold on/off. It hasn't worked (see the test-cases page). I don't know of another template that has conditional styling like we need, that I could learn from. Do you? --John Maynard Friedman (talk) 10:24, 19 May 2020 (UTC)Reply[reply]

The other way round it of course is to create a template:charb that is the same as char but with bold on top? Maybe if someone in the future comes up with a Cunning Plan that resolves the issue, it will be easy to redirect. --John Maynard Friedman (talk) 15:01, 19 May 2020 (UTC)Reply[reply]
I don't see any easier way to show the user a choice other than just having them put the quotes around the character to make it bold. However why not just make *all* the characters bold? I think the reason equals sign was bold was becuase of code's syntax highlighting.Spitzak (talk) 16:27, 19 May 2020 (UTC)Reply[reply]
The 'usual' (!) way is to put three straight apostrophes on either side of the template call. (Interestingly, a question was raised at template talk:code querying why the template doc says it doesn't work when it does). '''{{char|@}}''' @ doesn't go bold and I was really surprised by {{char|'''@'''}} @ because I thought it would just produce a string like abc@xyz, but it neither does that nor go bold. Well, it wouldn't, the span syntax determines the presentation.
The bold = is because the article is about 'equal to' and if you search with an equals symbol (=) then you will be redirected to that article and 'house style' says that redirect targets must be bold. Which is precisely why I want {{charb}} or {{char|@|bold=}}. I'm warming to the idea of {{charb}}, there are certainly precedents for template variants for awkward cases and having it means we can get the changes underway. (I've already done Bracket). Unless you object, I will go ahead and create it.--John Maynard Friedman (talk) 19:11, 19 May 2020 (UTC)Reply[reply]
Triple quotes inside the template worked for me, but it is probably ok to make a template:charb.Spitzak (talk) 19:23, 19 May 2020 (UTC)Reply[reply]
Yes, I've just seen your sandbox tests and I'm beginning to doubt my sanity! I am convinced that it wasn't bolding, but there is no doubt now that it does, right there in the text I wrote above. I had better go lie down in a darkened room for a while. Sorry for wasting your time chasing fairies. Charb is dead, long live char! --John Maynard Friedman (talk) 19:56, 19 May 2020 (UTC)Reply[reply]

Markup updates[edit]


If you have time, would you review my edits to Tilde please? I am trying to find a balance between when to use {{char}} [highlighting] and when to use {{angbr}} [the grapheme notation]. We can do things on screen that can't be done in print media but don't want push my luck with the traditionalists and getting it all reverted. --John Maynard Friedman (talk) 15:04, 30 May 2020 (UTC)Reply[reply]

Looks pretty good to me, I think the template is working. Not sure if there should be any rules to using angbr instead, I sort of feel using char all the time would work.Spitzak (talk) 18:57, 30 May 2020 (UTC)Reply[reply]
Angle bracket is the recognised notation for graphemes so if we overuse char, we may be challenged for OR or wp:IJUSTLIKEIT. So best not to push our luck, I think. And yes, the template is working, I've written some insanely nested uses of it without a single barf.--John Maynard Friedman (talk) 00:06, 31 May 2020 (UTC)Reply[reply]


Prime (symbol) updated if is not on your watchlist. --John Maynard Friedman (talk) 15:21, 4 June 2020 (UTC)Reply[reply]

  • Thousandth of an inch ("thou") is the standard unit of measure used in US (and historically, UK) precision machining. But maybe for a general readership, decimal inches is safer. I won't dispute further but others may. --John Maynard Friedman (talk) 22:54, 4 June 2020 (UTC)Reply[reply]

Quotation mark[edit]

Done, a lot of work! A review would be wise as no doubt I have missed some things and misinterpreted others. --John Maynard Friedman (talk) 15:33, 6 June 2020 (UTC)Reply[reply]

Someone doesn't like char[edit]

See template talk:char#I have doubts about this template. --John Maynard Friedman (talk) 23:29, 14 June 2020 (UTC)Reply[reply]

Nomination for deletion of Template:Char[edit]

Ambox warning blue.svgTemplate:Char has been nominated for deletion. You are invited to comment on the discussion at the template's entry on the Templates for discussion page. Psiĥedelisto (talkcontribs) please always ping! 05:59, 6 July 2020 (UTC)Reply[reply]

In case you missed it, see this diff. So it is no longer disputed that WP:STATUSQUO means (as it should have meant) that you can re-instate the template as it was, pending an MOS debate.
The concern about screen-readers remains a serious one so it is not ever so obvious what is the best thing to do. The template is useful to sighted and partly-sighted visitors but possibly counter-productive to blind visitors. The answer may boil down to tactics: if we had a convincing argument drafted and nearly ready to deliver for debate, then it would be less likely to be deleted again. The risk is that someone else will make an adversarial proposal in the meantime that is difficult to counter when we don't have a solution to this show-stopper. --John Maynard Friedman (talk) 09:50, 19 July 2020 (UTC)Reply[reply]
@John Maynard Friedman and Spitzak: Indeed—I won't revert an edit that restores it. If you restore it, though, you should have a MOS discussion ready to settle the consensus issue once and for my opinion. I'll also probably remove it from more places as I've done to section sign, pilcrow, and numero sign so far as I come across them, which after some thought and consideration of what VanIsaac wrote me on their talk page, actually seems like the correct play as even I think the template can be useful for small punctuation/combining marks...I guess what y'all need to do is figure out what this template's really for, what problem it solves, what problems it's not meant to solve. SNice.svg Best, Psiĥedelisto (talkcontribs) please always ping! 12:10, 19 July 2020 (UTC)Reply[reply]

A barnstar for you![edit]

Civility Barnstar Hires.png The Civility Barnstar
I very much appreciate how willing you've been to see changes made to Template:Char, and how you've kept your cool despite how heated that discussion has gotten. This, along with User talk:John Maynard Friedman § Dedicated to you, is my attempt to lower the temperature a bit and reach out with a bit of an olive branch. Psiĥedelisto (talkcontribs) please always ping! 22:29, 8 July 2020 (UTC)Reply[reply]

Trip hazard warning[edit]

Anticipating potholes before going to talk:MOS might be wise. The only observation about {{char}} (as it was) that really stopped me in my tracks was the one by SMcCandlish: screen-readers for blind users don't cope well with span style. They gave me a succinct summary which will have to be taken on board.

Broadening their advice a bit, if the revised template uses standard html markup as much as possible, it should be safe to assume that screen readers have been programmed to deal with those, just not CSS or embedded styles. I've been doing some experimenting, so here they are to save you some work:

  • {{keypress}}: © ©
  • {{char}}: © ©
  • Original char: ©
  • {{code}}: © © is monospaced so a squeezed oval
  • {{samp}}: © the © is also monospaced
    with font var: © the [No effect so maybe I've not coded this correctly].
  • {{para}}: |© the= |©=

I'm willing to help in the background. --John Maynard Friedman (talk) 20:47, 17 July 2020 (UTC)Reply[reply]

I have just been reminded of WP:VPT, where maybe someone will have some suggestions or even solutions? --John Maynard Friedman (talk) 10:09, 18 July 2020 (UTC)Reply[reply]

Notice of Dispute resolution noticeboard discussion[edit]


This message is being sent to let you know of a discussion at the Wikipedia:Dispute resolution noticeboard regarding a content dispute discussion you may have participated in. Content disputes can hold up article development and make editing difficult for editors. You are not required to participate, but you are both invited and encouraged to help this dispute come to a resolution.

Please join us to help form a consensus. Thank you!

Xcalibur (talk) 07:39, 24 July 2020 (UTC)Reply[reply]

Multiplication sign[edit]

I guess that it would be a bit cheeky for me to propose your idea, so you are welcome to open the RFC if you prefer?

I assume what you had in mind was something like this:

What do you think? --John Maynard Friedman (talk) 21:58, 6 August 2020 (UTC)Reply[reply]

Looks good to me but I would make it look like this:
It is not only the Unicode Consortium who calls this a multiplication sign.Spitzak (talk) 22:22, 6 August 2020 (UTC)Reply[reply]
I suppose I was trying to find a way to justify WP attaching the name "multiplication sign" to that particular glyph. In Germany, it is not the multiplication sign (they use dot operator). I suppose we can cite wp:common name (in English). I was trying to learn from currency sign and currency symbol. You are probably right, your version is more likely to achieve consensus. Anyway, it would be civilised to wait until late August to propose it.
(By the way, could you try to use {{rto}} to me, rather than hope I will notice that a change to your talk page is relevant to me).--John Maynard Friedman (talk) 23:50, 6 August 2020 (UTC)Reply[reply]


Why did you remove this? It's not "unnecessary complexity" and it might not be in referenced sources, but it's a simple fact that naturally follows from previous explanations. Even the text says If the number of significant bits is no more than seven, the first line applies; if no more than 11 bits, the second line applies, and so on.. I just quantified it in the table, so readers like me who wanted to understand UTF-8 wouldn't have to count them manually. It seems to me as if you reverted just because I don't have an userpage and therefore am less experienced. Eksekk (talk) 10:44, 11 August 2020 (UTC)Reply[reply]

It is unnecessary and redundant (same as number of x's), not used to actually implement UTF-8 (comparing to max and min values is much easier), in the wrong column (it is an input so it should be on the left), has way to large of a title, and mostly because it makes UTF-8 look more complicated than it is.Spitzak (talk) 16:37, 11 August 2020 (UTC)Reply[reply]
  1. Using the same logic, you could remove number of bytes, as it's already shown more in depth by individual byte contents. No one is going to prefer counting x's themselves instead of having a flat out number.
    I have tried to remove the number of bytes but it got reverted, and it actually is that way in the reference.Spitzak (talk) 19:24, 11 August 2020 (UTC)Reply[reply]
  2. It's meant to enhance information, not replace it.
  3. I don't think so, and even if you insist, it could be moved instead of reverted.
  4. Then please think of better one, I tried my best. Maybe "bit count" and reference explaining it? Also a line break can be inserted to split it into two rows, probably the best choice.
    Older versions said "bits"Spitzak (talk) 19:24, 11 August 2020 (UTC)Reply[reply]
    That' too general. The best I can think of is "Significant bit count" split in two lines. Eksekk (talk) 19:34, 11 August 2020 (UTC)Reply[reply]
  5. No it doesn't, it makes sense of all bytes beginning with 10s, which threw me off.
    Sorry I don't see how this clarifies anything at all for what the 10 means.Spitzak (talk) 19:24, 11 August 2020 (UTC)Reply[reply]
    Should have posted more clearly. I was confused why there are bytes beginning with 10 and how that encoding actually works, then noticed a thing about 7/11 bits and deduced that these x's must be replaced with actual content. I think direct statement in the table might be helpful for understanding. Eksekk (talk) 19:34, 11 August 2020 (UTC)Reply[reply]
Eksekk (talk) 18:59, 11 August 2020 (UTC)Reply[reply]
Letting you know that I posted on WP:3O. Eksekk (talk) 19:24, 11 August 2020 (UTC)Reply[reply]
Oh and by the way sorry for accusing you of bad faith, I've seen now that you have managed articles about encodings for a long time. Eksekk (talk) 19:35, 11 August 2020 (UTC)Reply[reply]
If you really think this is important, look in the history, there were shorter readable methods. This has been added and removed from the table many times. My best guess is that it be called "bits" and placed after the column for maximum number. Also you will need to figure out if it makes sense to add a "bits" column to the matching tables in the "history" section.Spitzak (talk) 19:38, 11 August 2020 (UTC)Reply[reply]
I stumbled across this at Wikipedia:Third opinion. FWIW, I think that in most contexts adding clearly redundant information makes interpretation more difficult, not easier. This may due to two things: a cluttering effect and the implication by its inclusion that it is not redundant (the latter making the reader spend the effort of figuring out that it is only redundant information). The more subtle aspects of the encoding, such as whether longer encodings of a value are valid, (i.e. the minimum number of "significant bits" for a given byte length) is answered concisely and clearly in the first three columns, but not by the removed addition. So: for improved clarity of presentation, I would support Spitzak's perspective. —Quondum 14:26, 14 August 2020 (UTC)Reply[reply]
Thank you, that is exactly what I think but you explained it clearly. It is not clear to the user that information is redundant, so they waste time or are confused trying to figure out what additional information is being provided.Spitzak (talk) 16:22, 14 August 2020 (UTC)Reply[reply]
Would it be worth removing the request from the WP:3O page. It is still there, but I see a third opinion has already been provided. Maidyouneed (talk) 06:28, 19 August 2020 (UTC)Reply[reply]

Numeric input[edit]

It is probably for the best if I don't reply at talk:Unicode input, it will only mess up the discussion thread.

Windows certainly does have a track record of using binary values that are in the 'reserved for control codes' range, notably the € sign and curly quotes. It absolutely did not use the correct unicode code point (maybe it does now? I don't use Windows anymore). When a file with one or more of those code-points was sent to a Mac or Linux user, the result was not printable: typically it was just ignored. Peter argues that it doesn't arise: the user will either use Alt+(the decimal equivalent of x22) for straight quotations or Alt+(the decimal equivalent of x201C) for left curly quote, autocorrect doesn't arise. So what happens when (with code page 1252) the user enters Alt+0128 to get a € sign? Will the resulting file contain x80 or x20AC? Why? (or why not?).

We know from our Japanese friend's experience at talk:Pound sign#Explanation for Alt keycode 6556?, Alt+0nnn does not generate the uncode code-point for nnn.

This why I suggested that you guys first get clear on terminology because Microsoft has been playing fast and lose with this stuff and you stand on a quicksand. --John Maynard Friedman (talk) 19:25, 22 September 2020 (UTC)Reply[reply]

Underscore, underline[edit]

I have in mind to do a wp:merge on underscore and underline since they seem to cover more or less the same thing. Before I go to the effort, can you see any redeeming features? I don't want to waste time on a pointless exercise. Do you think does it need a WP:MERGEPROP? is it at all controversial? --John Maynard Friedman (talk) 18:55, 28 September 2020 (UTC)Reply[reply]

I assume the result will be underline? This makes sense, the character on the computer keyboards is based on a typewriter key who's purpose was to add underlines to already-typed text. The new article would talk about underlines first, then talk about the character, how it no longer works to underline text, but that many other uses were found for it.Spitzak (talk) 20:13, 28 September 2020 (UTC)Reply[reply]
That might run into an ENGVAR issue? Is 'underscore' an American usage? --John Maynard Friedman (talk) 00:00, 29 September 2020 (UTC)Reply[reply]
I think "underscore" is the character "_". If a typewriter user overprints a lot of them on some other letters, the result is an "underline".Spitzak (talk) 01:36, 29 September 2020 (UTC)Reply[reply]
Sorry, unconvinced. One letter can be underlined. The UC calls it 'low line', so I can't even hide behind them. Wiktionary defines 'underscore' by reference to underline, but not conversely. I think it will have to be a MERGEPROP. If I do it. --John Maynard Friedman (talk) 07:47, 29 September 2020 (UTC)Reply[reply]

Google's Ngram viewer has underscore the clear winner: Books Ngram Viewer: underscore, underline, so wp:common name. --John Maynard Friedman (talk) 09:19, 29 September 2020 (UTC)Reply[reply]

You are right it is ok to call both the line under letters and the ASCII character "underscore", and if that is the more common term I would use it for the title.Spitzak (talk) 17:00, 29 September 2020 (UTC)Reply[reply]

Clarification of my 3270 edit[edit]

I updated the reference to the 3270 as a thin client because the original text claimed that web clients were thin clients and I thought that referring to X terminal would be less contentious than deleting it entirely. The references to not uploading scripts was an attempt to avoid an edit war. I'm perfectly happy with deleting the sentence as long as it stays deleted and does not get changed to the original.

Stay healthy. Shmuel (Seymour J.) Metz Username:Chatul (talk) 17:31, 29 September 2020 (UTC)Reply[reply]

Alt codes[edit]

Spizak, if you would think in terms of this article being for Mac and Unix users (who have little or no prior knowledge of MSDOS and Windows), it might make it more obvious why Peter and I are intent on clarifying everything. Your replies seem to assume we already know most of it and are being deliberately obtuse. We aren't. We are trying to tease out the details and I in particular am trying to dig out the key underlying principles that are so easy to lose if the article is written exclusively from the pov of an American user. When we have properly explained the experience of our Japanese friend, then we have succeeded. I do realise that MSDOS in particular never took any account of "international" users in its design, so trying to retrofit a logic that was never there in the first place is hard, but we really need to set it in a standards-based context because that is the only scaffolding that we may assume our readers already have.

So thank you for your help thus far and can accept having your patience tried a little longer. The article is already vastly improved from where it was, mostly because of your contributions. --John Maynard Friedman (talk) 22:00, 6 October 2020 (UTC)Reply[reply]

You may have already noticed - but the "rules man" is at it again. I am SO fed up with this -which is probably all this troll really wants. Trying hard to be patient but what the [naughty word expunged]. I suspect you are even more fed up than me, but would appreciate your support in establishing a firm consensus here. --Soundofmusicals (talk) 22:24, 12 October 2020 (UTC)Reply[reply]

For anyone interested, I responded at User talk:Johnuniq#Roman numerals. Johnuniq (talk) 03:34, 13 October 2020 (UTC)Reply[reply]
I am sure you are up to date on this without further prompting from me - but we do need to keep this going! --Soundofmusicals (talk) 01:27, 20 October 2020 (UTC)Reply[reply]

I know this is a horrible bore (no one more fed up than me) but a quick little one line comment from you is probably all we need to get on top of this nonsense! -Soundofmusicals (talk) 21:39, 31 October 2020 (UTC)Reply[reply]

Sharp s[edit]

If you can spare a few minutes, I'd welcome you checking my edits to ß, please, as they were fairly extensive changes to the markup. See also talk:ß#Markup, where I've left an explanatory note. --John Maynard Friedman (talk) 18:54, 21 November 2020 (UTC)Reply[reply]

ArbCom 2020 Elections voter message[edit]

Scale of justice 2.svgHello! Voting in the 2020 Arbitration Committee elections is now open until 23:59 (UTC) on Monday, 7 December 2020. All eligible users are allowed to vote. Users with alternate accounts may only vote once.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2020 election, please review the candidates and submit your choices on the voting page. If you no longer wish to receive these messages, you may add {{NoACEMM}} to your user talk page. MediaWiki message delivery (talk) 01:17, 24 November 2020 (UTC)Reply[reply]

Information icon Welcome to Wikipedia. It might not have been your intention, but you recently removed maintenance templates from Wikipedia. When removing maintenance templates, please be sure to either resolve the problem that the template refers to, or give a valid reason for the removal in the edit summary. Please see Help:Maintenance template removal for further information on when maintenance templates should or should not be removed. If this was a mistake, don't worry, as your removal of this template has been reverted. Take a look at the welcome page to learn more about contributing to this encyclopedia, and if you would like to experiment, please use your sandbox. Thank you. TEDickey (talk) 18:20, 16 December 2020 (UTC)Reply[reply]

Private Use Plane redirect[edit]

Private Use Plane redirects to Plane (Unicode)#Private Use Area planes, not to Private Use Areas

Private use plane is the one that redirects to Private Use Areas.

Note the difference in capitalizations. The hatnote has to match the redirect exactly. Since it doesn't, the article is in Category:Articles with redirect hatnotes needing review. I would think that both should redirect to the same place. I have since changed Private use plane to go to the same place as Private Use Plane and removed that hatnote again. MB 16:13, 20 December 2020 (UTC)Reply[reply]


Thanks for cleaning up the sloppy citations. I intended to use wp:REFILL but real life intervened. I don't think we can use the url=, per WP:COPY, see talk:backslash. --John Maynard Friedman (talk) 21:38, 4 January 2021 (UTC)Reply[reply]


Would you review tilde, please. I have cleaned it up after a well-meaning editor tried to 'improve' it but their knowledge of typography is worse than mine, which takes some doing. I may have been too focused on the detail, would you check that it still hangs together as a whole? --John Maynard Friedman (talk) 17:50, 8 February 2021 (UTC)Reply[reply]

I saw that, probably a good start but even now it is enormously repetitive, describing dead keys over and over again. Also it is completely wrong about ASCII, at least initially they certainly 100% intended the accent characters to be overprinted to produce accented letters. The fact that this was not going to work on most computer hardware was either ignored or not learned until too late. And all the accents quickly aquired other uses and the images grew and moved as the need to use them as overprinted accents disappeared.Spitzak (talk) 19:27, 8 February 2021 (UTC)Reply[reply]
I cannot tell a lie, it was I who wrote that backspace & overtype was technically obsolete. No wonder I couldn't find a citation. I've never seen a printer that could do that but maybe teletypes did?
I deleted a load of repetitious waffle about dead keys, don'tont tell me that there was still more! --John Maynard Friedman (talk) 19:56, 8 February 2021 (UTC)Reply[reply]
No, I was referring to the earlier version. You did fix it a lot.Spitzak (talk) 19:58, 8 February 2021 (UTC)Reply[reply]
Do you have a citation that overprinting was in the original spec? The ñ character was encoded individually.
I have just realised that there is another horrible error that I 'corrected' as in made credible: ASCII has no code points for accented letters. I'll have to go back and correct again. --John Maynard Friedman (talk) 20:10, 8 February 2021 (UTC)Reply[reply]
I believe the original reason the accent characters (and the underscore) appeared in character sets was to allow overprinting of accent marks. However it is true that by the time ASCII was being designed, they incorporated these primarily for compatability with previous sets and the inability to use them as accents was already well established. So this is just the original reason they were there. It is possible users of mechanical typewriters actually started typing the tilde and space to get squiggly dashes and that the main reason by the time computer sets were started was to replicate this.Spitzak (talk) 20:20, 8 February 2021 (UTC)Reply[reply]
I think that it is still wrong. The position of the 007E tilde on the vertical axis is a type designer's choice, there is no spec that I know of that says it has to be the same as the horizontal line of an e (or and E). --John Maynard Friedman (talk) 20:40, 8 February 2021 (UTC)Reply[reply]
Thanks, you got there first. I had just realised that ñ and Ñ were not in the original ASCII and went back to remove them. The tilde on the old ASCII chart at File:USASCII_code_chart.png looks like the CJK double width one! --John Maynard Friedman (talk) 20:53, 8 February 2021 (UTC)Reply[reply]

Once more with feeling[edit]

I've rewritten Tilde#Keyboards so a review would be welcome given that I don't have a US keyboard to check it on. --John Maynard Friedman (talk) 23:36, 1 April 2022 (UTC)Reply[reply]

On a US keyboard the tilde is on a key at the upper-left corner to the left of the '1' key. Unshifted it has the backtick (`) and shifted is the tilde (~).Spitzak (talk) 00:48, 2 April 2022 (UTC)Reply[reply]

Color banding revert[edit]

Why was my edit fixing a misspelling reverted?

It said right at the top that the article uses British English.

Frass valley[edit]

Hi! You may be interested in Wikipedia:Redirects_for_discussion/Log/2021_June_22#Frass_valley. – Uanfala (talk) 21:19, 22 June 2021 (UTC)Reply[reply]

Html codes in infobox at Full stop[edit]

I don't understand why you removed the |html= option from the infobox at full stop. It is useful info and all other punctuation infoboxes have it. Your edit note says nothing. So why? --John Maynard Friedman (talk) 22:15, 28 June 2021 (UTC)Reply[reply]

I think the information is useless since there are extremely few cases where the period itself cannot be used in the document. I also find the html shortcuts being part of the "Unicode" template an extremely distraction. They should be printed somewhere else, and stop putting the numerical versions in as the user can type #xNNN; using the hex Unicode number.Spitzak (talk) 22:48, 28 June 2021 (UTC)Reply[reply]
You would need to take that view to (a) template talk:infobox punctuation mark and (b) template talk:unichar. I didn't write these templates, they do what they do. Yes, it is optional whether or not to use the |html= option, but not how much of it you get if you do (and for that reason I often don't use it because it can be disproportionate and even undue). I don't care enough about it to push it: the real point I'm making is that your edit note didn't give a clue. --John Maynard Friedman (talk) 07:35, 29 June 2021 (UTC)Reply[reply]


SNAP!!! Drat, you saw it first. I was just about to correct my error but too late. Thanks anyway. --John Maynard Friedman (talk) 19:41, 28 July 2021 (UTC)Reply[reply]


Not likely to have been a joke, given Lagrange's concern with making things easy for the common Frenchman, as reflected in his comments as recorded in the lectures given at the École Normale. See the material cited on the Lagrange and undecimal pages.

CP437 Turkmen[edit]

Remember over a year ago we discussed how CP437 may have been the inspiration for some of the unusual characters in Turkmen? I actually found a potential source for the CP437 and/or latin scripts article: Factors Influencing the Success and Failure of Writing Reforms. Hkbusfan (talk) 10:19, 19 October 2021 (UTC)Reply[reply]

I'm also not quite sure how to cite it (the article can be searched for on Google). Hkbusfan (talk) 10:23, 19 October 2021 (UTC)Reply[reply]
I think the fact that CP437 influenced the design of a written language is pretty interesting, it would be nice to add that (probalby to both the CP437 and Turkmen articles).Spitzak (talk) 17:33, 19 October 2021 (UTC)Reply[reply]


If you have the time and inclination, would you care to add your 2¢ worth at talk:Won sign#"Most Korean keyboards input 0x5C when the won sign key is pressed", please? I don't know enough about Microsoft code pages to contribute intelligently. --John Maynard Friedman (talk) 19:30, 30 October 2021 (UTC)Reply[reply]

Reverted example of UTF-8 codepage layout[edit]

Hi Spitzak, I noticed you recently reverted my edits on UTF-8. I made my changes because when I first read this article I found it difficult to understand how the codepage layout works, and I was hoping future readers could benefit from an example, especially since the codepage is a valuable resource once you understand how to read it. Please consider allowing my edit to stand. Cheers, --RubberDuckDebugger (talk) 21:24, 5 November 2021 (UTC)Reply[reply]

@RubberDuckDebugger: I too have had difficulty making sense of how UTF8 works (UTF16 is even worse), so thought your text might at least make at least z useful footnote (using template:efn). But I'm afraid your text

For example, cell 9D says +1D. The hexadecimal number 9D in binary is 10011101, and since the 2 highest bits (10) are reserved for marking this as a continuation byte, the remaining 6 bits (011101) have a hexadecimal value of 1D. These characters never occur as the first byte of a multi-byte sequence.

needs another sentence between "value of 1D" and "These characters". But talk:UTF-8 is a better place to discuss this.
Spitzak, I thought "useless verbiage" rather excessive and RDD's response is remarkably restrained. It may be obvious to you but not to everyone. --John Maynard Friedman (talk) 00:00, 6 November 2021 (UTC)Reply[reply]
@John Maynard Friedman: Thanks for chiming in. A footnote sounds like the right place for my example and it looks like a Notes section already exists with similar content. I'll go ahead and move my example down there. --RubberDuckDebugger (talk) 03:03, 6 November 2021 (UTC)Reply[reply]
If this is really felt to be necessary I guess it is ok. I am just worried that the page is getting quite redundant. There are six x's in the table at the start which IMHO make it perfectly clear where the bits are. The character table was originally to show what bytes were allowed and disallowed, it has somehow bloated into quite a monster.Spitzak (talk) 00:42, 7 November 2021 (UTC)Reply[reply]

Your 'home' page[edit]

Did you notice that someone has given you a home page? (They probably intended to put it here on your talk page.) --John Maynard Friedman (talk) 19:07, 17 November 2021 (UTC)Reply[reply]

Yep, moved it here: I don't have any idea how to go back to the "no home page" state however.

Nor do I. But in your case, the obvious response is to put a box in the middle saying "This page has intentionally been left blank", á la IBM manuals. :-D --John Maynard Friedman (talk) 21:37, 17 November 2021 (UTC)Reply[reply]


Thanks for your edits to hexadecimal. This article includes much redundant content and my addition of the x' format was already included later.

As to "common":

In an article dated 6 May 2017 " About 95 percent of ATM swipes use COBOL code, Reuters reported in April" 80 percent of in-person transactions. In fact, Reuters calculates that there’s still 220 billion lines of COBOL code currently being used in production today, and that every day, COBOL systems handle $3 trillion in commerce.

The "Second Edition (August 2009)" of the COBOL reference manual , p27 "Hexadecimal notation for alphanumeric literals" X"hexadecimal-digits" X'hexadecimal-digits' DGerman (talk) 22:53, 30 October 2021 (UTC)Reply[reply]

Bot changing Ordinal indicator[edit]

FYI, it is a faulty bot. The {{bot|deny=}} should have been enough to tell it to go away. It has been blocked until the author fixes it. See User talk:Qwerfjkl#bot problem?. --John Maynard Friedman (talk) 21:33, 17 November 2021 (UTC)Reply[reply]

ArbCom 2021 Elections voter message[edit]

Scale of justice 2.svgHello! Voting in the 2021 Arbitration Committee elections is now open until 23:59 (UTC) on Monday, 6 December 2021. All eligible users are allowed to vote. Users with alternate accounts may only vote once.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2021 election, please review the candidates and submit your choices on the voting page. If you no longer wish to receive these messages, you may add {{NoACEMM}} to your user talk page. MediaWiki message delivery (talk) 00:04, 23 November 2021 (UTC)Reply[reply]

Your Removal of Information from Windows-1252 and Mac OS Roman[edit]

Dear User Spitzak: Page History shows that you recently (November 13 and 15) removed a great amount of visible information from the pages Windows-1252 and Mac OS Roman and numerous other pages. Notably, each cell in each table is now missing its Unicode code point and its former background coloration (indicating character class). The reason you cited in your edit summary was "consistency" (translation: "another article gives less information"); consistency is not generally an adequate reason for data deletion. If your edits were made as part of a group decision with community consensus, please add an audit trail (e.g. a link to the discussion) in the talk page and/or revision history of each affected article. (talk) 22:31, 23 November 2021 (UTC)Reply[reply]

Enormous amounts of information was *ADDED* to the tables, not removed. The Unicode code point, and the NAME of the code point, are in the tooltip.Spitzak (talk) 23:56, 23 November 2021 (UTC)Reply[reply]
Yes the "colors" were removed. This allowed color highlighting to be used for *useful* purposes, rather than the boxes or checkerboard overlays that were forced before.Spitzak (talk) 23:59, 23 November 2021 (UTC)Reply[reply]
Just FYI, the tooltip doesn't appear on mobile web access (to MsDos extensions table) using Android+Chrome. I don't know that this especially matters or even how to fix it if it does, short of another table. --John Maynard Friedman (talk) 12:38, 24 November 2021 (UTC)Reply[reply]
People working on the Unicode code charts have experimented with ways of changing the text in a small attached area to the table to show information about the most-recently clicked cell. This has a lot of problems (for me their test always scrolled the table to the top of the window, and it certainly interferes with any ability to click on the cell to go anywhere). But it is being looked into. For now tooltips seem the best way, they can present a lot more data (ie the name of the Unicode code point) and they do not clutter the basic display. I also find it unlikely a mobile user is going to be able to use this information, especially the Alt codes, in any useful way.Spitzak (talk) 17:26, 24 November 2021 (UTC)Reply[reply]
On the original question: this was left up for weeks on the ASCII page with no comments ever posted. I also worked on the Unicode code pages and did get a comment that they wanted the dotted boxes preserved, as well as the cell sizes, both of which I did, and duplicated for these tables.Spitzak (talk) 17:28, 24 November 2021 (UTC)Reply[reply]

Code page tables on Wikipedia no longer show Unicode code point equivalents contrary to the pages' claims[edit]

Dear Spitzak: the Wikipedia pages for code pages state "The following table shows (code page). Each character is shown with its Unicode equivalent." and "...shown with its Unicode code point.". This sentence is placed right in front of the tables. However, the Unicode code points are no longer (immediately) visible. Removing the Unicode equivalents from the code tables does not serve any purpose. Worse, it invalidates the Wikipedia pages' current content. In addition, the color-coded character class identifications were removed, which served useful categorical information.

Developers and enthusiasts rely on the Unicode equivalents U+xxxx listed in the code page tables, which are now no longer directly visible. I strongly argue that the Unicode U+xxxx equivalents should be rendered visually in the tables, in addition to the (linked) character graphic and the character class color coding. Removing the U+xxxx is bad for developers who need to map code page character definitions to Unicode, e.g. to update legacy software. There are plenty of non-Wikipedia resources that show code page tables with Unicode code point equivalents. Because of this change, I'm sure very few will continue using these Wikipedia tables as a reference and rather prefer external resources.

Case in point: some of the open source software I contributed to used the CP-12xx and ISO code page tables listed on Wikipedia as a resource to implement converters of legacy code pages to UTF-8. This unfortunate change makes it much harder to contribute additional tables in the future.

I already reverted the Sharp pocket computer character sets Wikipedia page that I've contributed to in the past and will do so again if necessary. For this second case in point: the Unicode equivalents are necessary to document Unicode-equivalent versions of Sharp pocket computer programs for historical documentation purposes, for which Wikipedia is an excellent resource.

This mass change serves no purpose but rather removes critical information. Please revert back to the "old style" tables as soon as possible without removing any information.

--Robert van Engelen 20:52, 9 December 2021 (UTC)Reply[reply]

No I will not revert this, it ADDS VITAL INFORMATION (the unicode code point names). And the colors made it impossible to use colors to indicate actual useful information (like differences and version changes). Any programmer who wants to mass-copy the unicode assignments can hit "edit" and work from there in a much more usable and convienent form than either the old or new table.
These things were INCREDIBLY ugly and did not look like a table used by any other documentation in the world, and makes Wikipedia just look incompetent. I have been trying to fix these for years. Don't give me any nonsense about these changes being undesirable. You are wrong.Spitzak (talk) 21:19, 9 December 2021 (UTC)Reply[reply]
Dear Spitzak: I am fine to talk objectively about content quality improvements that make sense, not throwing unbacked claims at each other as your response is unnecessarily confrontational. I'm reflecting that I am getting complaints about this problem since people rely on these pages now (since these are linked from other external URLs).
I agree that displaying both the names and Unicode code points is useful. However, the names and Unicode code points are not visible in the tables. The old style tables may not be "beautiful" to you specifically, but that is not a technical criterium. It is a subjective assessment. Robert van Engelen 20:52, 9 December 2021 (UTC)Reply[reply]
Can you please forward the exact wording of these complaints. Thanks you. Technically the new tables are superior, that is not a subjective assessment. They make it possible to highlight interesting information, they do not confuse people with numbers without a U+ prefix, and they provide useful information about the unicode code points so the user does not need to go to another source to look them up.Spitzak (talk) 22:16, 9 December 2021 (UTC)Reply[reply]
I think it's your subjective opinion. in my opinion, the new table is not superior at all. Some information you deleted like Unicode points were useful for some person like me... Please listen to and respect other people's opinions. Your attitude toward discussion is too dogmatic.--𝒞𝒽ℯℯ𝓈ℯ𝒹ℴℊ (talk) 00:49, 21 December 2021 (UTC)Reply[reply]

I agree with Robert van Engelen's opinion. The new table is not informative, and a lot of useful information has been deleted. Why you think the new table as "Superior" and some useful information as "Vital"? I don't think so.--𝒞𝒽ℯℯ𝓈ℯ𝒹ℴℊ (talk) 00:45, 21 December 2021 (UTC)Reply[reply]

The Unicode code point numbers and the names are in the tooltips.
What about for mobile users? They usually can't see tool tips.--𝒞𝒽ℯℯ𝓈ℯ𝒹ℴℊ (talk) 06:17, 21 December 2021 (UTC)Reply[reply]
There does need to be a way to view the code points without being able to see the tooltips, yes. This is a non-issue for the Unicode block charts—since the Unicode code point is just the location in the chart (as read from column and row headers). It is absolutely an issue for charts for coded character sets other than Unicode itself. --HarJIT (talk) 18:43, 21 December 2021 (UTC)Reply[reply]
Best I can suggest is to hit "edit" which does put the text in a more useful form for those who are transcribing to some kind of lookup table. I really think the only practical way to do this is to get the text into some other editor which can do macros to translate it to the form needed for programming. Nobody in their right mind would try to transcribe visually, don't be daft. I think the clarifying "U+" prefix and the name of the code point are absolutely vital information and I'm sorry but I cannot see any way to put this into a table that also allows a character to be quickly visually identified and/or compared to another table.
The people who designed the Unicode code point charts also think the inability to see the number and name on mobile is a problem and they made several attempts to fix it. Mostly trying to get a footer to update with the most recently-clicked cell's information. The best example I saw caused the window to scroll vertically on any click. Also it seems to make it difficult to implement a link to the Wikipedia page for the character. I think if a solution is found it is equally important to fix those charts as well. I'm wondering if a popup box identical to how footnotes popup would work, those seem to allow markup and links, which is a big improvement over tooltips or those footer box attempts.Spitzak (talk) 19:31, 21 December 2021 (UTC)Reply[reply]
I should point out that the idea that somebody trying to write software and copying the code point numbers to a lookup table is using a mobile phone is really pretty ridiculous so the issue is somewhat moot. I don't like however that the mobile user cannot see the code point names, they are very informative and interesting.Spitzak (talk) 19:33, 21 December 2021 (UTC)Reply[reply]
Spitzak, dismissing our comments as "ridiculous" is not helping to converge to a solution as a team. Our comments are technical observations (not only opinions.) I should also add that printing the Wikipedia page and taking screenshots won't show the Unicode code points, in addition to the fact that tooltips are not easy or possible to trigger on a tablet or mobile phone. Tooltips are nice, but should not be relied on as the sole mechanism by which crucial information is shared with the community. Furthermore, some Unicode characters do not show up correctly with the "new tables format" font, see for example U+00B0 in the Sharp pocket computer character sets which incorrectly becomes a solid block in the "new tables format" instead of a light gray block that is correct. These observations and our opinions strongly argue for including the U+xxxx code points in the tables and leave the Unicode names to the tooltips, because these are not fixed width. Removing the U+xxxx code points is radical, deletes crucial information and not generally acceptable given the technical objections raised. Robert van Engelen 20:50, 21 December 2021 (UTC)Reply[reply]
B0 shows correctly as a dotted box for me. This may be a problem with the font you are using? Are you sure it is correct on your machine in the old table version?Spitzak (talk) 21:06, 21 December 2021 (UTC)Reply[reply]
Nobody is going to print the Wikipedia page in order to transcribe a Unicode lookup table, that is just as silly as claiming they will do that from a mobile phone. They will hit "edit" and copy and paste the text into their IDE, and edit it there into the form they want. This is true for both versions of the table.Spitzak (talk) 21:09, 21 December 2021 (UTC)Reply[reply]
Your repeated claims earlier that the "new tables" represent the Unicode Consortium's tables is false: just look at the front page that clearly shows the character graphic with its U+xxxx code point. The Unicode Consortium considers the code point information critical. We do too. It makes no sense to hide the Unicode code points from view. Leaving the U+xxxx code points from the tables on Wikipedia invalidates the Wikipedia pages' claims, stating the tables show Unicode code points associated with the code pages, as we all expect the U+xxxx codes to be there. Also, why do you assume that people will "hit edit" to copy the Wikipedia source? It has a lot of stuff that is markup. That makes no sense. They rather will turn away from Wikipedia as a source that is lacking information and find an alternative source online. As for the font issues: symbols show up wrong on MacOS in Safari, Chrome and Firefox. I did not change fonts or anything. The U+00B0 symbol should be a light shade, lighter than U+00B1, but it is darker and almost solid.Why introduce a different font to begin with? That is asking for trouble. Robert van Engelen 02:47, 22 December 2021 (UTC)Reply[reply]
All Unicode tables have Unicode code points in the table: see Unicode Character Code charts Robert van Engelen 02:51, 22 December 2021 (UTC)Reply[reply]
That PDF certainly does have the number, with no U+, in the table. And that is the charts that they copied to make the dotted box around the control characters and I was told they insisted on the dotted box in order to match the PDF. I think I will have to experiment with getting that in there, but it is going to have to be *much* smaller and nearer the edge, similar to the PDF. What I want is the ability to easily compare adjacent glyphs. At least you seem to admit that the old colors were pointless, I really relly wanted to get rid of them.Spitzak (talk) 04:20, 22 December 2021 (UTC)Reply[reply]
The table font was changed to be the same font used for any table. The previous tables were a bit inconsistent about this, there were attempts to make it a serif font but in many cases people added markup to the cells to switch it back.Spitzak (talk) 04:22, 22 December 2021 (UTC)Reply[reply]
Nope, putting the number at the bottom of the cell puts this back to unreadable. I have not found any way to get the gap between the glyph and number smaller so the table is a reasonable size. More importantly this does not work for glyphs that are more than one unicode code point, are not in unicode, or when there are alternatives. I also found I have to duplicate the number in the tooltip, which seems wrong, this is because without the "U+xxxx" prefix it is unclear what those all-caps words mean. In addition it is apparent the Unicode PDFs are using this number as a table index, as it is showing it on unassigned and invalid cells, which I think a lot of people think it is as there were continuous edits to 'fix' these to be the cell number rather than the Unicode code point.
I would like an explicit clear indication of exactly how you have used this number. I have used them, but to compare tables or to locate mistakes in them, and in all cases I have had to use "edit" and copy and paste the text to another editor in order to do anything useful. The visible number is absolutely useless for this.
A new idea is to make a huge, collapsable, table attached to the bottom of the cell grid. This would have one line per character and the Unicode code point and name, and possibly other information like the Alt code. That would certainly be easier to read. An annoyance is that the information is duplicated, although possibly the tooltips could be removed from the grid. What do you think?Spitzak (talk) 18:59, 22 December 2021 (UTC)Reply[reply]
We all agreed that the code points are essential information. You admitted that the code points should be there. There is no need to have a U+ prefix or Unicode names in the table if that is your concern. Just the 4 or 5 hex digit code suffices, just like the Unicode consortium uses in their documentation. If you cannot figure out how to place the code points into the tables, then restoring the original tables will be the only reasonable option I'm afraid. Please do not create more complications by considering collapsable tables. Keep the tables as simple and understandable (as they used to be). That's all I'm asking for and others appear to agree with me. Robert van Engelen 21:48, 9 January 2022 (UTC)Reply[reply]
As I said before, it is clear they are using the numbers as a table *index*, not as a result in the table. This is in fact what most people expect from a small number printed on the edge of a box. In addition the numbers are not there is hundreds and hundreds of tables of characters on Wikipedia. I have yet to receive an explanation of how this "essential" information is used, as well. What exactly do you do with that number, other than look up the Unicode code point name, something the new tables has done for the reader.Spitzak (talk) 23:06, 9 January 2022 (UTC)Reply[reply]
In addition we need some serious consideration of what to do with entries that are not a single Unicode code point (some are up to 4 code points!) or entries with multiple translations to Unicode, and entries that do not correspond to Unicode. Column width must remain constant as much as possible so it is possible to compare two tables.
Can you please type in exactly how you use this "essential information". You can see a 4-digit number, what do you do with it?Spitzak (talk) 00:02, 10 January 2022 (UTC)Reply[reply]
That is quite disappointing to hear from someone who claims to have the best interest in mind for the Wikipedia community. This backtracking by you is unhelpful and frankly not trustworthy. Accuracy and facts are crucial for trust in Wikipedia content. You still need to explain to everyone why you believe you have any strong arguments and facts that support your claim that you can just make this crucial information invisible and much harder to obtain? From what you said earlier, you appear to agree that the Unicode consortium's tables include the code points and that this is for a very good reason. Not only that, you also appeared to agree that for unsorted tables the code points are relevant (not so much as an index as you say for sorted tables, duh). In many cases the code points cannot be inferred from the location in the table, which is true for almost all code pages and foreign/special character sets. Explain to us why you think that the Unicode Consortium and all of us are wrong and you are right? Also, now you realize that "we need some serious consideration of what to do with entries that are not a single Unicode code point"? Good point but a bit late to think about that now, isn't it? They are there for a reason, because some character sets are vintage with some characters that do not match exactly to Unicode characters and therefore have common accepted code point alternatives that are close enough. Robert van Engelen 03:40, 10 January 2022 (UTC)Reply[reply]
Basically the problem is that unless the column sizes are similar it is impossible to compare these tables. Also the information really bloats the table size so it no longer fits on a mobile device. BTW the reason I consider the number an "index" is precisely because it is in the sorted tables, and not in unsorted tables, in the Unicode documentation, also because it is on unassigned code points in those tables. But in any case it sounds like you really really really insist that this information is "vital" (while refusing to explain how it is used) so I guess it can be added. I am not going to undo the tooltips that provide actual useful information (ie the name of the Unicode code point) or the numberous fixes to these tables that I and others did. I did some experiments and it looks like setting the line-height to .5 and the bottom-padding to 0 makes the tables less-hideous. The dotted lines have to be removed from the control characters because they seem closely tied to the line-height and I cannot get a layout with them looking ok. I propose to make the text small enough that two numbers separated by a space or slash fit, longer or non-obvious translations would have an ellipsis. Sound ok? At least I was able to allow colors to be used to indicate useful information...Spitzak (talk) 19:17, 10 January 2022 (UTC)Reply[reply]
My experiments are visible here: User:Spitzak/sandbox#Character set. There are some numbers on the 'A' and NUL and STX. I was able to keep the cell the same size (though imho it is still a lot bigger than it should be) and the letter centered vertically, both of which make it much more legible imho. I was unable to make the dotted box for control characters really match, but this may look ok, or maybe the dotted boxes can be removed (though that is rather inconsistent with the Unicode tables and with the use of dotted circle to indicate combining characters). I think also it can be set up so that entries without any Unicode equivalent can use the full vertical space.Spitzak (talk) 19:53, 10 January 2022 (UTC)Reply[reply]
Updated table can now be seen on code page 437.Spitzak (talk) 22:41, 10 January 2022 (UTC)Reply[reply]
The sandbox example character set table has some examples of Unicode code points that look fine to me, yet a bit small to be legible IMHO. At least this helps in a big way and in two ways: the code point is immediately visible and tooltips are easier to trigger. I had a very hard time to trigger the code point tooltip with the glyph taking up a large portion of the cell. Having to hovere the pointer NEXT to the glyph to trigger the tooltip is unnatural. I bet that very few people will figure out how to trigger the tooltip. Having the code there helps with triggering the tooltip if this is technically feasible to do so. The code points should be clickable as well. This way it's natural to hover over the code to trigger more info. Making some good progress, keep it up. Removing the dotted boxes doesn't bother me, I don't think this info is useful anyway. Some character code tables use smaller fonts for codes like NUL, BEL, etc. Also, I don't know what it will look like when adding multiple code points as is necessary for legacy/vintage character sets that do not map perfectly to Unicode. Worse, calculator and pocket computer character sets include BASIC tokens, so that is another big challenge. I'd leave these tables as they are right now. Robert van Engelen 01:34, 14 January 2022 (UTC)Reply[reply]
The font is actually the same size as the old tables. You cannot make the code point number into a clickable link because that would prevent that area from showing the tooltip just like the glyph links. I have tried to improve this in some places by removing useless links to Box Drawing, Arrows, and Geometric Shapes. Having done some experiments the Basic keywords are able to be dealt with, though often the font has to be pretty small. This is in fact using a smaller font for the control characters.Spitzak (talk) 02:17, 14 January 2022 (UTC)Reply[reply]

Font problems with new character set table[edit]

A new problem is observed with the change to a new character set table format. The font does not show characters correctly but rather as undefined on MacOS with Safari, Firefox, Chrome and Windows 10 Edge, Chrome and Firefox. For example, the 1/8 horizontal block characters x6d to x72 are in fact still visible (i.e. glyphs are rendered) in the old page's table but not in the new table where they show up as undefined (outlines). This is not acceptable. Robert van Engelen 14:11, 18 January 2022 (UTC)Reply[reply]

The old one seems to be using serif, is that what made it work? The people doing the Unicode tables were strongly opposed to using serif so I left that out. I also believe these problems have been fixed on newer versions of Windows, basically a glyph will either be there or not but it will be the same for any and all fonts, due to corrected fallback font behavior.Spitzak (talk) 21:50, 18 January 2022 (UTC)Reply[reply]
Actually I should have looked at the linked articles. The new one is using the newer Unicode Symbols for Legacy Computing block, all of which is missing on my machine as well (I see blocks). The older table attempts to simulate these with other horzontal bar characters, these suffer from being incorrect and the vertical spacing is not even and depends on the font. The changes were independent of the changes to the table format. There are lots of examples where bitmap images are used instead of missing characters in these tables, maybe this is another place for that.Spitzak (talk) 21:54, 18 January 2022 (UTC)Reply[reply]
Spitzak, the tables look OK now so this is important progress, except for the font! The old serif font is more accurate on MacOS and Windows 10 with Safari, Chrome and Firefox. I don't care what Unicode "experts" think is best, they are clearly not typesetting specialists. The problem with the new sans serif font is the inconsistency and lack of details in the glyphs. Inconsistency in the glyph renditions is observed in U+2514 and U+2534 for example, where the thickness of the lines is not the same (see Sharp character sets PC-E500 table.) and U+2591 is still much too dark. Just a few examples. Perhaps I'm picky about this, but the old tables look better in this respect. I don't want the articles to go backwards in quality, but rather go forwards with improvements. Robert van Engelen 17:29, 27 January 2022 (UTC)Reply[reply]
I was in favor of serif for the glyphs, mostly to make the letters easier to distinguish. I suspect you have some problem on your machine setup, it seems hard to believe it will choose different fonts for different line-drawing characters, and they all look correct to me on my machine (Chrome on Linux). The people making the Unicode charts absolutely did not like serif, or any attempt to mess with the font. I think the concern was to make sure the glyphs in the table matched the glyphs in running text.
It is fairly easy to add {{serif}} to the table entries to test things out, and {{chset-cell1}} could be changed to do it everywhere, though there may need to be an option to leave the font unchanged for when serif breaks things.Spitzak (talk) 18:58, 27 January 2022 (UTC)Reply[reply]
I changed the template to use Serif, lets see if people complain.Spitzak (talk) 19:12, 27 January 2022 (UTC)Reply[reply]
With the serif font the glyphs are consistent now and have the right shades and shapes (e.g. the "blocky" ones.) Major improvement! It is highly unusual to use sans serif for math symbols. LaTeX math and MathJax are always serif and that is for a good reason if you need a critical reason why serif is best overall. I've checked with MacOS and Windows Safari, Chrome and Firefox using the standard font settings. Strange why you think I try to mislead you into thinking I use special fonts? It is what is is. Robert van Engelen 20:29, 27 January 2022 (UTC)Reply[reply]


I'll let your edit to section sign stand for now, but you really need to make a proposal at template talk:unichar that the decimal form of html code be optional (discarded?) if a mnemonic form is available. Or are you really going to go round every 'special character' article, handcrafting the html codes? --John Maynard Friedman (talk) 11:15, 21 December 2021 (UTC)Reply[reply]

Yes I think the template should not show the decimal code if there is a mnemonic. An option to not show the decimal code at all may be nice (ie it only shows mnemonic and shows nothing including "HTML" if there is none). The HTML numeric entry in Hex is easily figured out from the Unicode code number.Spitzak (talk) 19:21, 21 December 2021 (UTC)Reply[reply]

Nomination for deletion of Template:Chset-cell2[edit]

Ambox warning blue.svgTemplate:Chset-cell2 has been nominated for deletion. You are invited to comment on the discussion at the entry on the Templates for discussion page. – Jonesey95 (talk) 03:21, 27 December 2021 (UTC)Reply[reply]

Nomination for deletion of Template:Chset-table[edit]

Ambox warning blue.svgTemplate:Chset-table has been nominated for deletion. You are invited to comment on the discussion at the entry on the Templates for discussion page. – Jonesey95 (talk) 03:21, 27 December 2021 (UTC)Reply[reply]

Use of unspaced em dashes[edit]

Hello Spitzak, I would like to explain my recent edit in Newline. I considered these as dashes to mark an aside within a sentence and adopted the guideline in MOS:EMDASH. According to the Manual of Style, both en dashes and em dashes are acceptable in Wikipedia. One of the two should be chosen and used consistently in an article in this way:

  • Unspaced em dash: «The sequence was used on computer systems that had adopted Teletype machines—typically a Teletype Model 33 ASR—as a console device.»
  • Spaced en dash: «The sequence was used on computer systems that had adopted Teletype machines – typically a Teletype Model 33 ASR – as a console device.»

But not, for example:

  • Spaced em dash: «The sequence was used on computer systems that had adopted Teletype machines — typically a Teletype Model 33 ASR — as a console device.»
  • Spaced hyphen: «The sequence was used on computer systems that had adopted Teletype machines - typically a Teletype Model 33 ASR - as a console device.»

Since the em dash was chosen in the article, I have used it in the form that is conventionally accepted in the Manual of Style. The em dash is already used unspaced in other parts of the article.

Please tell me if you feel that this is not right in the context of the article. Thank you. --Lion-hearted85 (talk) 17:47, 26 January 2022 (UTC)Reply[reply]

I did not know those were em dashes, I thought they were en dashes. Change it if it seems correct to you.Spitzak (talk) 17:59, 26 January 2022 (UTC)Reply[reply]
Okay, thank you. I would change them as in other parts of the article em dashes are used.--Lion-hearted85 (talk) 15:52, 27 January 2022 (UTC)Reply[reply]

Field support bulletin[edit]

I'm rather amazed that we don't have an article on FSBs. I thought perhaps you might fancy writing it as you probably have better access to sources than I do? --John Maynard Friedman (talk) 14:49, 28 January 2022 (UTC)Reply[reply]

I think you meant this message for somebody else? I have no idea what a Field Support Bulletin is.Spitzak (talk) 19:11, 28 January 2022 (UTC)Reply[reply]
Well so much for my assumption that you had all the backroom gen on computer systems. I thought it was what they called their tecnical update announcements for field engineers (when such people exist). I just assumed it was an IBM expression but Google search gives everybody but IBM. Never mind, it's not important. --John Maynard Friedman (talk) 20:53, 28 January 2022 (UTC)Reply[reply]

ISO 8859-X[edit]

Why did you remove all the unicode code points from these pages? That was actually useful information all in one place. Sure, it was duplicated (although not consistently I see) on the individual character’s page. But if you needed to check if an encoding contained a particular character you could search for its codepoint. Now that’s no longer possible. And heaven help anyone looking for the complete encoding mapping - they would need to visit at least 100 or so individual pages. Pcbbc (talk) 07:50, 6 February 2022 (UTC)Reply[reply]

The unicode code points, and also the *names* of the unicode code points, are in the tooltips.Spitzak (talk) 19:32, 7 February 2022 (UTC)Reply[reply]
Dear Spitzak, I strongly agree with Pcbbc and I have been warning about the loss of Unicode code point information from the tables (see above). Others have agreed with me that the situation is not acceptable. It is now fixed for the calculator character sets, but many other character sets such as the ISO- sets and CP- sets have become useless after removing the Unicode code points from these tables. Hovering over the characters will NOT show the Unicode code point in tooltips. Only hovering over the BACKGROUND does, which is counter intuitive and easily missed. Sometimes the character takes up the whole table cell, making it difficult to hover over the background to trigger the tooltip. Hiding the code points this way is also not helpful for anyone who needs to map a ISO- or CP- table to Unicode and cannot see multiple table cells or an entire row at once. This is not about who is wrong or right. This is about helping Wikipedia users. Robert van Engelen 02:26, 8 February 2022 (UTC)Reply[reply]
Any plans to fix the ISO 8859-X and all other CP code pages on Wikipedia? The new tables look fine, but the Unicode code points are still missing! Robert van Engelen (talk) 18:03, 11 March 2022 (UTC)Reply[reply]
One idea is to put the Unicode code points only on locations that are different from the base encoding, replacing the yellow color that is used to indicate that now. There are some examples but I have not heard any comments on them.Spitzak (talk) 22:11, 11 March 2022 (UTC)Reply[reply]

I really wish you had waited to make such massive changes to all of the character tables without more discussion. (Yes, I've been away from WP for a while, but there are several other editors involved in this that are still active.) See my comment at Talk:ISO/IEC_8859-1 about some of the history and rationale behind the "garish" table format that I and others started a few years ago. The current minimalist layout does not convey enough useful information, in my opinion. — Loadmaster (talk) 17:10, 8 May 2022 (UTC)Reply[reply]

Dash: numbers < 150[edit]

Are you certain that the Japanese, Thai, etc code pages all have a dash at that code point? --John Maynard Friedman (talk) 23:26, 14 February 2022 (UTC)Reply[reply]

Yep just clicked on all the 12xx code pages and they all have the dashes at 0x66 and 0x67.Spitzak (talk) 01:03, 15 February 2022 (UTC)Reply[reply]
Curses! Foiled again! ;-D --John Maynard Friedman (talk) 11:03, 15 February 2022 (UTC)Reply[reply]

In case you haven't noticed, the Caret article has been split without discussion[edit]

Caret is now a disambiguation article and the original article has been split into caret (proofreading) and caret (computing). I don't like that it was done without prior discussion but I can't see that it is worth arguing about. But you may be aware of other implications? --John Maynard Friedman (talk) 15:06, 7 March 2022 (UTC)Reply[reply]

Seems like an ok idea to me, though it probably should be (character) rather than (computing). And it can be called a "caret" in that article rather than a "circumflex".Spitzak (talk) 19:29, 7 March 2022 (UTC)Reply[reply]

ArbCom 2022 Elections voter message[edit]

Scale of justice 2.svg

Hello! Voting in the 2022 Arbitration Committee elections is now open until 23:59 (UTC) on Monday, 12 December 2022. All eligible users are allowed to vote. Users with alternate accounts may only vote once.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2022 election, please review the candidates and submit your choices on the voting page. If you no longer wish to receive these messages, you may add {{NoACEMM}} to your user talk page. MediaWiki message delivery (talk) 00:26, 29 November 2022 (UTC)Reply[reply]

Deletion of second similar edit in article on question marks[edit]

Here is the relevant text: which are written from right to left, the question mark is mirrored right-to-left from the Latin question mark. In Unicode, two encodings are available: {{Unicode character|061F|ARABIC QUESTION MARK|html=|note=With bi-directional code AL:

I changed the first one because the hyphens were unnecessary. The words "from right to left" are two simple related prepositional phrases. The second change is because the whole first double phrase is used as a simple adjective before a noun, and to indicate that change in usage and syntax, the hyphens are added. It's essentially the same difference as "books in print" vs. "in-print books".

With that justification, I'd like the deletion to be undone, restoring the change. Sdiabhon Sdiamhon (talk) 18:58, 30 January 2023 (UTC)Reply[reply]

Sure. I was just confused that you did two edits and they were in exactly the opposite directions (one removed dashes and the second added them). I kind of feel that the correct text would use the same dashes everywhere.Spitzak (talk) 19:14, 30 January 2023 (UTC)Reply[reply]

The reason given for your revert was incorrectly given[edit]

Your objection to the four word clarification was “no. Technically you cannot put surrogate halves into UCS-2”. The surrogate halves start at D800 and DC00, Spitzak. The subheading preceding this paragraph was

U+0000 to U+D7FF and U+E000 to U+FFFF
  U+D800 to U+DFFF have a special purpose, see below

In other words, the surrogate halves were deliberately excluded. I’m not saying you are wrong, just doing a revert for an inapplicable reason. I am hereby applying for your reconsideration. (talk) 00:41, 12 March 2023 (UTC)Reply[reply]

You are right, I did not notice what section this was in. I still feel deleting " as encoded, but unchanged," is correct, as it implies there are some characters that can be "changed" to put into 16 bits, but there are no such characters.Spitzak (talk) 01:17, 12 March 2023 (UTC)Reply[reply]

So you do not believe the second sentence is a helpful alternate rewording for the point of the first sentence, which mentions both UTF-16 and UCS-2. That must mean that the point of the second sentence deals with the inherent limits of sixteen bit representations, and has nothing to do with UTF-16.

So here’s the problem with that. If the paragraph’s topic change, from UTF-16 & UCS-2 to the inherent limitations of sixteen bits, is not awkward and is acceptable then just why is that sentence flagged for ‘citation needed.’ It is like needing a citation for ‘three plus five equals eight’. Inherent properties needs tagged for a citation?

Something in this paragraph is incomplete and needs changed. I believe it was closer to a typo than an awkward expression. How can we improve WP in clarifying this paragraph?

I'm very confused, are you talking about ? The point of my edit was that the old text implied the existence of characters where by "changing the encoding" or just "changing" they could be encoded in 16 bits. There are no such characters. So it seems removing this odd bit of text is the right thing to do and I'm rather confused as to what the objection is. Or are you talking about a different patch??Spitzak (talk) 16:14, 13 March 2023 (UTC)Reply[reply]

UTF-16 treats two groups of code points differently. One group, larger than sixteen bits, has each code point changed into a pair of surrogate code units. The other group of code points is passed through the encoding process unchanged.

The offending paragraph looked like it contained a potential ‘claim of fact’ similar to Original Research, at least to someone it did, which earned it a flag for ‘citation needed.’ This flag can also be used to draw attention to sentences that may need clarification, or some other kind of improvement. When taken out of context, the particular sentence mentions only UCS-2 and the BMP, and offended the reviewer.

I saw it as a prelude to discussing the larger of UTF-16’s group of code points involving surrogate pairs. Which means the point of the offending sentence is more than a statement of the inherent limitations of sixteen bit representations, more than a simple ‘claim of fact’ of Original Research. So it needs a clarity that would ameliorate the flag for a citation, and improve the understanding of the uninitiated.

To the uninitiated, the first sentence may be difficult to be understood. So a helpful rewording would improve understandability. Being followed by a statement of inherent limitation of sixteen bit code points does not improve the understandability of UTF-16, imho. No respectable contributor would include a potential ‘claim of fact’ of Original Research pertaining solely to UCS-2 in this article. Therefore, I concluded that the contributor had a tiny accident occur. A typo. The offending sentence was most likely missing a word or three. The second sentence was most likely meant to be a helpful rewording of the point of the first sentence. I concluded that “code units numerically equal to the corresponding code points” was better described as ‘the code points passing through the encoding process unchanged,’ but with fewer words.

I am hoping this clears away whatever your confusion was about. (talk) 23:06, 13 March 2023 (UTC)Reply[reply]


The article Popularity of text encodings has been proposed for deletion because of the following concern:

Essay content which only exists due to being split from UTF-8. Not independently notable.

While all constructive contributions to Wikipedia are appreciated, pages may be deleted for any of several reasons.

You may prevent the proposed deletion by removing the {{proposed deletion/dated}} notice, but please explain why in your edit summary or on the article's talk page.

Please consider improving the page to address the issues raised. Removing {{proposed deletion/dated}} will stop the proposed deletion process, but other deletion processes exist. In particular, the speedy deletion process can result in deletion without discussion, and articles for deletion allows discussion to reach consensus for deletion. Chris Cunningham (user:thumperward) (talk) 19:11, 13 March 2023 (UTC)Reply[reply]

This was created to stop a huge amount of bloat that people were typing into the UTF-8 article, pretty much discussing what is the second-most popular encoding behind UTF-8. If this is information is useless then deleting it makes sense, but something needs to be done to stop people from typing it back into UTF-8.Spitzak (talk) 21:26, 13 March 2023 (UTC)