The file
command returns me the encoding of a file + the EOL if it's not LF.
But it returns ASCII
for both ANSI
and UTF-8 without BOM
. On UTF-8
files it returns UTF-8 Unicode (with BOM)
.
An I doing something wrong, or it's the default behavior? At if it's the default behavior how I can see if it's ANSI or UTF-8 without BOM?
file
tries to give you as specific information as possible (the opposite case would be to always printbinary file
, which is technically correct but not very useful). ANSI is not a specific encoding, and UTF-8 is a superset of ASCII, so it will report ASCII for both if the bytes contained in the file all are inside the ASCII charset.You cannot determine if a file is encoded in ASCII (8-bit) or ANSI or UTF-8 (without BOM). This is only a guess of
file
.When a file has a BOM then
file
will guess that it is UTF encoded (either UTF-8 or UTF-16 or UTF-32). But without this you see only a binary stream of data which could be a text file encoded in some encoding.I bet that
file
will also fail on determining the difference between ASCII and ISO-8859-1 because the first 128 bytes are the same in both encodings (as in ANSI).