Skip to main content

Handling Unicode filenames in Windows with Python 3.0

You will find this post useful if
1. You are on Windows
2. You are using Python 3.0 or above
3. You are having problems with os.walk or os.listdir because some of your filenames are encoding in Unicode

While using os.walk, you encounter and error similar to the one below.
File "C:\Python32\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 226: character maps to

What would have gotten you going in Python 2.7 but will not work in Python 3.0:
rootdir = ru'D:\COUNTRY\ROADS\'
for (root, dirname, filename) in os.walk(rootdir):
You will be irreverently pointed out a Syntax error.

What will work (or to be more precise, worked on my system !):
rootdir = r'D:\COUNTRY\ROADS\'
for (root, dirname, filename) in os.walk(rootdir.encode('utf-8')):
# do your stuff here

Useful links:
  1. If you are on Python 2.6 and face a similar problem, checkout this thread on StackOverflow.
  2. Python 2.7 Unicode HOWTO (lot of details and good material on Unicode)
  3. A general discussion of what's changing in Python 3.0 (aka Py3K, Python 3000), esp. w.r.to Unicode.


Comments

Popular posts from this blog

Barcamps over the world: BCB3/Minnebar

I think I am kinda special. Not quite like Paris Hilton special, but getting a  chance to attend two Barcamps separated by 8000 miles in a span of 3 weeks has got to be some kind of special stuff. I think the big guy above is smiling at me. Invest in my equity. This is an article outlining some of the interesting differences I saw between the barcamps in Bangalore (BCB3, 31 March -April 1, 2007 ) and Minnesota(Minnebar 2007, 21 April). This is not an article intended to compare or pass a judgement. Just throwing up some observations, fwiw . I am not offering explanations, I am not a socio-anthropology by training. Some of these do not require a degree to arrive at the reason of causation, but I want to keep this blog close to what I saw, not what I think. At most, some "could-be"s. Both the barcamps have a local flavour and preservation of local flavour to me, is inherently good.  Consider food, for example.  A predominantly South Indian buffet spread for lunch at BCB3 a

Bambi 2.0

Bambi is a small coding-fest that we organise in our group at GE Healthcare. It was inspired by Yahoo Hackday after I heard about it at BarCamp Bangalore last year. I still remember, I came back all charged up after BarCamp and with some help from Arun B, we put together the first version of Bambi. Ours was a small team, roughly about 60 people, so spreading the news was not much of a problem. Getting people out of their workload was a bigger problem. The load is high and the work is, I guess, somewhat exciting ;-)  It is sometimes tough to lure people out of writing indexing algorithms for proprietary image databases or mitral-valve plane adjusters for segmentation of the human heart.     Today we had the demos for Bambi 2.0 The quality of demos were much improved and people came on the last day with some utterly cool demos. Unfortunately, I do not think I can write about them in detail owing to Intellectual Property issues but a mash of  Biometrics, Bluetooth, MRI scanners and

Talk at Barcamp Bangalore 3

Rakesh and I shared some thoughts on unstructured innovation such as unconferences and codejams in large, well-structured (and somewhat paradoxically) very innovative companies.       Technorati tags: barcampbangalore3 , innovation