I have the problem described in this Q&A. Probably from quite old linux distros or from windows I have several files with broken filenames. ls
displays a "?" instead of the broken character. I successfully renamed some of these files, but I don't know if I've found all of them.
Is there any method to find all affected files?
Assuming you are using utf-8 encoding (the default in Ubuntu), this script should hopefully identify the filenames and rename them for you.
It works by using find with C-encoding (ASCII) to locate files with unprintable characters in them. It then tries to determine if these unprintable characters are utf-8 characters or not. If not, it shows you the filenames decoded with each of the encodings listed in the
enc
array, allowing you to select the one that looks right in order to rename it.latin1 was commonly used on older Linux systems, and windows-1252 is commonly used by windows nowadays (I think).
iconv -l
will show you a list of possible encodings.Try this:
This will locate all non-ASCII characters in file and folder names, and help you to find the guilty culprits :P
Begin with this regex find command and modify it until you hit only those you are interested in:
find . | egrep [^a-zA-Z0-9_./-\s]
.The one above will find filenames that has a non UTF-8 character(s).
Here's another version, extremely simple, that uses
unidecode
utility (install:sudo apt install python3-unidecode
), which attempts as best it can to convert Unicode characters to "equivalent" ASCII character. Obviously this comes with caveats; see https://pypi.org/project/Unidecode/It first renames directories, then files (changing directory names and files at the same time might have unpredictable results, doing depth-first is necessary in the case of renaming directories.)
(With a few minor changes you can make it just print out what would be changed, instead of doing the actual changes.)