6 Comments

(A little light computing entertainment for general readers.)

Seen recently on an on-line bookseller’s news updates:

Paypal has a global problem where for some customers the ‘$’ in their email confirmation is being replaced by ’24′. For example if you purchase something for $19.99, your paypal email will say you have purchased it for 2499.99.

There is no need to worry however as you have been charged the correct amount. Paypal engineers are working to fix the problem.

Ohhhhh… some people are in for a shock!

Computer geeks–programmers, at least–will instantly spot the problem.

Rather than include the character ‘$’ in the email, they’ve included the character code instead. Do’h!

Characters–letters–are stored as numbers on computer systems, with one particular number representing each character.

A capital ‘A’ is held by the decimal number 65.

Character codes have historically also be represented using octal (base 8) and hexadecimal (base 16) numbers. One advantage is that these number systems are exact multiples of powers of two, allowing direct mapping between base 8 or 16 values and binary values.

Hexadecimal digits use the 10 conventional digits, followed by A-F (10-15).

Capital ‘A’ is hexadecimal 41. Capital ‘Z’ is decimal 90, hexadecimal 5A.

The character code for ‘$’ is 24.

No doubt some silly sausage of a programmer has output the hexadecimal code for ‘$’ rather than the character itself.

The character character code for many years was ASCII – the American Standard Code for Information Interchange. ASCII is still with us, in a sense. ASCII is such a strong part of computing that for the sake of backward compatibility it was subsumed ‘as is’ within today’s Unicode and ISO/IEC 10646 Universal Character Set (UCS) codes.

The ASCII code is 7-bit – at most 128 characters can be represented. (27 = 128.) ASCII codes are anglophilic – based on the letters and numerals used in English, along with a few so-called control characters that move the cursor about the text (tab, carriage return, etc), control simple communication, or indicate ends of files or media.

Unicode, ASCII’s modern counterpart, is much, much larger allowing for the characters of many different languages to be represented, along with many useful symbols like those denominating different currencies, mathematical symbols or units of measure.

7 bits, one short of an eight-bit byte, can be represented by two hexadecimal digits with each hexadecimal digit representing four bits – a nybble.

(Get it? 8 bits = 1 byte; 4 bits = a nybble, with a nybble (nibble) being half a byte (bite). Yes, this is geek humour at it’s finest. They’re also the proper terms.)

ASCII isn’t just a simple list of characters. It has a few tricks for programming using binary.

Lowercase letters always have a character code 32 (decimal) larger than their uppercase counterpart. To ‘toggle’ the case of a letter, you can ‘flip’ the second-to-largest of the 7 bits. Changing it from 1 to 0 raised the letter to uppercase; changing it from 0 to 1 lowered the case.

Brackets were bracketed. Left brackets were paired with their right brackets, so that flipping a single bit would yield the other. In the case of the round brackets this was the smallest binary digit; for square and curly brackets it was the second digit. (This also allowed for searching for brackets–left or right–by searching ‘not caring’ about the value of the single bit that distinguished if the bracket was a left or right bracket.)

Similarly numbers where placed in the ASCII code so that zero lay on hexadecimal 30 (decimal 48) so that the digits can easily be interconverted between their numerical and character values by using the lowest 4 bits of character codes 30 – 3A (i.e. the 0 – A portion, paired with a higher-order nybble* of 3).

Footnote

Oh, dear. Now I’m tempted to weak jokes about higher- and lower-order bytes and nybbles. If you think of any good ones, add them in the comments. I’ll spare you, as you’re probably already wincing…


Other articles on Code for life:

The inheritance of face recognition (should you blame your parents if you can’t recognise faces?)

Friday picture: molecular modelling of the cytoplasm

Safari v Opera

Literate and test-driven programming (in bioinformatics)

Reproducible research and computational biology