On a remote server I have a file which is the mongodump output compressed, let me say a file called mongodb.tar.gz
.
Inside mongodb.tar.gz
there is a directory structure like this:
dump/dbname/
dump/dbname/blogs.bson
dump/dbname/blogs.metadata.json
dump/dbname/editors_choice.bson
dump/dbname/editors_choice.metadata.json
...
Is there anyway to restore this dump without download and uncompress entire file locally?
I mean something like:
curl http://remoteserver/mongodb.tar.gz | gunzip | mongorestore -d dbname
You can only pipe compressed files which contain one collection.
You could do:
The
-c
gunzip option is needed so it writes to stdout and the last-
so mongorestore expects input from stdin.Tested with version 3.0.7 (doesn't work with v2.6.4).
At the moment, this is not possible, at least not without writing something yourself. The feature has been requested as SERVER-4345 and SERVER-5190 but there are several issues with an immediate implementation based on how the current tools work (i.e. it is not simple to do).
Although only a partial answer, you could use fuse to mount the .tar.gz file after downloading it.
Seeking a direct answer to the other part, I asked question 730494.
Well I did it and it wasn't pretty. What I did was first extract only the metadatas from the tarball since they couldn't be directly piped into the mongorestore command which accepts only BSON.
After extracting the metadata I ran two restores: first the normal mongorestore with the folder as a parameter to restore the metadata.
Then in the second restore I read the file names of the BSON files from a file I had created earlier and for each file I untarred it to STDIN and piped the result to mongorestore. Yes it was messy but hey, it works!
To see the abomination in its full glory here's the repo: https://github.com/datascienceproject2019-codescoop/codescoop-models
And here's the script https://github.com/datascienceproject2019-codescoop/codescoop-models/blob/master/commands.sh
The restore script is in a different file since piping to docker exec is too difficult: https://github.com/datascienceproject2019-codescoop/codescoop-models/blob/master/gh_mongo_scripts/restore.sh
I used Mongo 4.0.6
EDIT: Buuut it is a lot slower to use streams than to just read from the extracted files. So I probably did this all for nothing since extracting temporarily 26 GBs of extra files isn't that big of a deal. Oh well.