I have Mercurial repositories running on Apache with mod_wsgi. Repositories have all filenames encoded in windows-1251. This encoding is used for historically reasons: they were converted to mercurial from svn, windows-1251 is default windows encoding for russian locale.
Now programmers want to use Crucible tool for code review. It can't undersand filenames in any other encoding than utf-8. So I need to convert them from windows-1251 to utf-8. Does anyone know how to do this? Mercurial convert extension doesn't have options to convert encodings.
hgweb.config:
[web]
#encoding = UTF-8
encoding = windows-1251
#allow_archive = gz, zip, bz2
allow_archive = zip
allow_push = *
push_ssl = false
[extensions]
[collections]
/data/mercurial = /data/mercurial
You are right that the convert extension doesn't support this in a nice way currently. That is, you cannot ask it to recode from encoding X to encoding Y. However, you can ask it to rename the files one by one for you! First create a file called
rename.py
withThen run
This creates your file map. You can now use
to convert the repository into a new repository. In the new repository, it will look like the files have always been saved using UTF-8 file names.
Note: The file names are now stored as UTF-8 in the repository. This means that checkouts will look fine on moderns Linux machines. Windows, however, does not use UTF-8 file names. The FixUtf-8 extension must be used to make Mercurial convert the UTF-8 file names into UTF-16 on the fly. This will create readable file names on Windows too.
Note: Everybody will have to re-clone the new repository! Changing any part of the history inevitably changes all the changesets hashes too. So to pull this off, you need to either
or
Either way works since the conversion is deterministic and so your users can run it themselves if they have Python available. If they only have a TortoiseHg installation, then it's probably easiest if you convert for them on your server.
I looked at making the convert extension support this more directly and have sent a patch to the Mercurial mailinglist for more direct support for this.
I had the same problem. I needed to convert bunch of repositories, so I wrote a script that converts all repositories given as list.
usage:
You can get from my repository at BitBucket.
Just extraction from Mercurial Wiki FYI
Thus, I suppose, just changing presentation charset in
encoding =
may do the thickIf this assumption is wrong (it's always possible), try FixUtf8 Extension, read part Fixing existing filenames from readme carefully