I have a huge MySQL backup file (from mysqldump) with the tables in alphabetical order. My restore failed and I want to pick up where I left off with the next table in the backup file. (I have corrected the problem, this isn't really a question about MySQL restores, etc.)
What I would like to do is take my backup file, e.g. backup.sql
and trim-off the beginning of the file until I see this line:
-- Table structure for `mytable`
Then everything after that will end up in my result file, say backup-secondhalf.sql
. This is somewhat complicated by the fact that the file is bzip2-compressed, but that shouldn't be too big of a deal.
I think I can do it like this:
$ bunzip2 -c backup.sql.bz2 | grep --text --byte-offset --only-matching -e '--Table structure for table `mytable`' -m 1
This will give me the byte-offset in the file that I want to trim up to. Then:
$ bunzip2 -c backup.sql.bz2 | dd skip=[number from above] | bzip2 -c > backup-secondhalf.sql.bz2
Unfortunately, this requires me to run bunzip2 on the file twice and read-through all those bytes twice.
Is there a way to do this all at once?
I'm not sure my sed-fu is strong enough to do a "delete all lines until regular expression, then let the rest of the file through" expression.
This is on Debian Linux, so I have GNU tools available.
Explanation:
Address range construction: Start with regex
End with
Command
Edit: depending on how you dumped the database you may have very long lines. GNU sed can handle them up to the amount of available memory.
NOTE: Not an actual answer
Since I was motivated to get this solved now, I went ahead and used
grep
to find the offset in the file I wanted; it worked great.Running
dd
unfortunately requires that you setibs=1
which basically means no buffering, and performance is terrible. While waiting for dd to complete, I spent time writing my own custom-built C program to skip the bytes. After having done that, I see thattail
could have done it for me just as easily:I say "this doesn't answer my question" because it still requires two passes through the file: one to find the offset of the thing I'm looking for and another to trim the file.
If I were to go back to my custom program, I could implement a KMP during the "read-only" phase of the program and then switch-over to "read+write everything" after that.
I wonder if something like that would do the trick:
So basically it starts printing stuff after the pattern, one can also pipe it directly to bzip2/gzip, like
perl chop.pl input_sql.bz2 | bzip2 > out.sql.bz2
You would needlibio-compress-perl
on Debian.