Wednesday, September 26, 2012

for line in f reads as unicode? python


for line in f reads as unicode? python


http://stackoverflow.com/questions/147741/character-reading-from-file-in-python





Reading Unicode from a file is therefore simple:
import codecs
f = codecs.open('unicode.rst', encoding='utf-8')
for line in f:
    print repr(line)
It's also possible to open files in update mode, allowing both reading and writing:
f = codecs.open('test', encoding='utf-8', mode='w+')
f.write(u'\u4500 blah blah blah\n')
f.seek(0)
print repr(f.readline()[:1])
f.close()
EDIT: I'm assuming that your intended goal is just to be able to read the file properly into a string in Python. If you're trying to convert to an ASCII string from Unicode, then there's really no direct way to do so, since the Unicode characters won't necessarily exist in ASCII.
If you're trying to convert to an ASCII string, try one of the following:
  1. Replace the specific unicode chars with ASCII equivalents, if you are only looking to handle a few special cases such as this particular example
  2. Use the unicodedata module's normalize() and the string.encode() method to convert as best you can to the next closest ASCII equivalent (Ref http://techxplorer.com/2006/07/18/converting-unicode-to-ascii-using-python/):
    >>> teststr
    u'I don\xe2\x80\x98t like this'
    >>> unicodedata.normalize('NFKD', teststr).encode('ascii', 'ignore')
    'I donat like this'









No comments:

Post a Comment