Tuesday, September 18, 2012

Python writing UTF-8 files

Some steps to take to create UTF-8 files with Python:
1. use codecs module to read and write files:
import codecs
f = codecs.open('file.txt', mode="w", encoding="utf-8-sig")
2. Don't mix strings and unicode. (Always prefix "strings" with u (e.g. u"hello world"))
3. Start your python script with # -*- coding: utf-8 -*- in case you might have any unicode characters in your code or in hardcoded "strings" in your script.

4. Also read http://lobstertech.com/python_unicode.html and http://www.evanjones.ca/python-utf8.html for more info on using UTF-8/unicode with Python.

Other sources:
http://docs.python.org/howto/unicode.html
http://www.carlosble.com/2010/12/understanding-python-and-unicode/

1 comment: