unicode - python u'\u00b0' returns u'\xb0'. Why? -

i use python 2.7.10.

on dealing character encoding, , after reading lot of stack-overflow etc. etc. on subject, encountered behaviour looks strange me. python interpreter input

>>>u'\u00b0'

results in following output:

u'\xb0'

i repeat behaviour using dos window, idle console, , wing-ide python shell.

my assumptions (correct me if wrong): "degree symbol" has unicode 0x00b0, utf-8 code 0xc2b0, latin-1 code 0xb0. python doc say, string literal u-prefix encoded using unicode.

question: why result converted unicode-string-literal byte-escape-sequence matches latin-1 encoding, instead of persisting unicode escape sequence ?

thanks in advance help.

python uses rules determining output repr each character. rule unicode character codepoints in 0x0080 0x00ff range use sequence \xdd dd hex code, @ least in python 2. there's no way change it. in python 3, printable characters displayed without converting hex code.

as why looks latin-1 encoding, it's because unicode started latin-1 base. codepoints 0xff match latin-1 counterpart.

Search This Blog

Living

unicode - python u'\u00b0' returns u'\xb0'. Why? -

Comments

Post a Comment

Popular posts from this blog

angular - Is it possible to get native element for formControl? -

unity3d - Rotate an object to face an opposite direction -

elasticsearch python client - work with many nodes - how to work with sniffer -