As seen elsewhere, docx, xlsx and pttx are ZIPs. When uploading them to my web application, file
(via libmagic
andpython-magic
) detects them as being ZIP.
I store the contents of the file as a blob in the database, but naturally I don't want to trust the user with what kind of file type this is. So I would like to trust file
for and automatically generate a filename during download.
I know one can modify /etc/magic
but the format (magic(5)
) is way too complicated for me. I found a bug report on the issue at Debian bugs but since it's from 2008 it doesn't seem to be fixed any time soon.
I guess my only other alternative is to indeed trust the user (but still store the contents as a blob) and only check the file extension based on the file name. This way I can disallow some extensions and allow others. And when the user re-downloads his file, he can have it in whatever way he uploaded it. But this solution is insecure if the file is shared with others, since you can simply rename the file to allow uploading it.
Any ideas?
Lastly, I found a list of magic numbers for docx etc, but I'm unable to convert these into the magic(5)
format.