Sometimes I need to search files with accented characters (diacritic in general), usually with locate/mlocate. I wish to setup (maybe in /etc/updatedb.conf
) so it let me search for this special characters using a certain language mapping, for example:
a == âàáäÂÀÂÄ
e == êèéëÊÈÉË
i == îïíÎÏ
o == ôöóÔÖ
u == ûùüÛÜÙ
c == çÇ
n == ñ
So locate -i liberación
also should search for file names with string liberacion and even liberaciòn.
Notes and assumptions
- And maybe others: ÂÃÄÀÁÅÆ ÇÈÉÊËÌÍÎÏ ÐÑÒÓÔÕÖØÙÚÛÜÝÞ ßàáâãäåæç èéêëìíîïðñòóôõö øùúûüýþÿ.
- This is a common situation on romance languages like Spanish, French and German.
- I'm always using a locale 100% UTF-8.
- I would rather not have to use regular expressions.
- A patch might use ASCII transliterations of Unicode as Unidecode/cUnidecode does. Most of mlocate is written on C.
Related
- Similar question but using
find
- Miloslav Trmač (
mlocate
developer) say here that the official source code is on pagure.io (and a fork on Github). - I file an issue on mlocate repo at Pagure.io to add this feature.
- Update 2018-02: This can be fixed with this pull request by marcotrevisan. Will add a
-t
/--transliterate
support usingiconv
to match accented. - Update 2018-03:
mlocate
with support for--transliterate
is now included in Ubuntu 18.04 LTS Bionic Beaver (v2 and v3.1).
- Update 2018-02: This can be fixed with this pull request by marcotrevisan. Will add a
If we take a look at
updatedb.conf(5)
, we'll find that there is no much we can do with configuration items.So we are going to write a script using
locate
; At the end we are able to run something likemy-locate.sh liberacion
ormy-locate.sh liberâciòn
and it will brings us all the possible combinations.Lets start
First create a simple file as our database anywhere you want it to be, e.g:
~/.mydb
; then add your accents characters into that file like this:Then we need a small script which does the job for us, I wrote a simple one:
Now save it somewhere in your PATH with a desired name, e.g: in
~/bin
. It should be already in your PATH environment.After all simply use something like this to search all possible combinations.
Will find for me all of these:
Now with mlocate 0.26 we have
-t --transliterate
option (see the man page) on Ubuntu 18.04+ (without the need of workarounds):Creating some test files:
Update and search:
So now
locate -t liberación
also search for files with stringliberacion
and evenliberaciòn
!Finally, creating an alias on my
.bashrc
:-)