I wrote a tcl script that converted an *.mht (single file web page) written by mSoft Word to a javascript-instrumented "flash cards" page to help me learn Chinese vocabulary.

Thing is, the MHT file sort of changed on me; the script (which uses htmlparse) no longer works.

I've gotten the idea in my head to write out a UTF-8 Unicode *.txt file, and sic Tcl on that.

Thing is, I can't figure out how to decode the UTF-8 file, so I can generate the Unicode HTML markup.

I've glanced about the Web, looking for a simple tech bootstrap on reading Unicode, and I've looked around the ActiveState libraries, but can't figure out the propitious way to handle this.

Can you help me?

Here's the source file, seen in Notepad:

Here's a script to dump that file in hex code, for scrutiny:

Here's what it looks like when you run the script:

A few pointers as to where I should go from here would be much appreciated.

Send those cards and letters to: