As seen elsewhere, docx, xlsx and pttx are ZIPs. When uploading them to my web application, file
(via libmagic
andpython-magic
) detects them as being ZIP.
I store the contents of the file as a blob in the database, but naturally I don't want to trust the user with what kind of file type this is. So I would like to trust file
for and automatically generate a filename during download.
I know one can modify /etc/magic
but the format (magic(5)
) is way too complicated for me. I found a bug report on the issue at Debian bugs but since it's from 2008 it doesn't seem to be fixed any time soon.
I guess my only other alternative is to indeed trust the user (but still store the contents as a blob) and only check the file extension based on the file name. This way I can disallow some extensions and allow others. And when the user re-downloads his file, he can have it in whatever way he uploaded it. But this solution is insecure if the file is shared with others, since you can simply rename the file to allow uploading it.
Any ideas?
Lastly, I found a list of magic numbers for docx etc, but I'm unable to convert these into the magic(5)
format.
You can use
in /etc/magic to identify the general file type based on the information you supplied.
(However, this might not be universal:
PK\x03\x04\x00\x14\x08\x08
has been observed at the start of LibreOffice-generated XLSX files.)Later versions of Ubuntu have a go at correctly identifying the .docx, .pptx, and .xlsx files. Digging around in the sorce code for the file utility I found the
~/file-5.09/magic/Magdir/msooxml
file which does the identification. You can get a copy of the file and add it to your/etc/magic
file.Including copy of the file that has been updated to v 1.5
But leaving V1.2 here for posterity.
Including a copy here as the link above can go out of date as the file package is updated.
file, version prior to 5.13, will truncate MIME type to 64 characters. So using the content of msooxml, the MIME type from file -bi command becomes "mime application/vnd.openxmlformats-officedocument.wordprocessingml.d; charset=binary"
if use libreoffice's docx,you can add content(below) to /etc/magic: