unicode - python u'\u00b0' returns u'\xb0'. Why? -
i use python 2.7.10.
on dealing character encoding, , after reading lot of stack-overflow etc. etc. on subject, encountered behaviour looks strange me. python interpreter input
>>>u'\u00b0'
results in following output:
u'\xb0'
i repeat behaviour using dos window, idle console, , wing-ide python shell.
my assumptions (correct me if wrong): "degree symbol" has unicode 0x00b0, utf-8 code 0xc2b0, latin-1 code 0xb0. python doc say, string literal u-prefix encoded using unicode.
question: why result converted unicode-string-literal byte-escape-sequence matches latin-1 encoding, instead of persisting unicode escape sequence ?
thanks in advance help.
python uses rules determining output repr
each character. rule unicode character codepoints in 0x0080 0x00ff range use sequence \xdd
dd
hex code, @ least in python 2. there's no way change it. in python 3, printable characters displayed without converting hex code.
as why looks latin-1 encoding, it's because unicode started latin-1 base. codepoints 0xff match latin-1 counterpart.
Comments
Post a Comment