Ping a Specific Port

Question

Christopher Schultz

Asked: 2021-12-01 11:09:04 +0800 CST2021-12-01 11:09:04 +0800 CST 2021-12-01 11:09:04 +0800 CST

Best way to remove text from the beginning of a huge file

772

I have a huge MySQL backup file (from mysqldump) with the tables in alphabetical order. My restore failed and I want to pick up where I left off with the next table in the backup file. (I have corrected the problem, this isn't really a question about MySQL restores, etc.)

What I would like to do is take my backup file, e.g. backup.sql and trim-off the beginning of the file until I see this line:

-- Table structure for `mytable`

Then everything after that will end up in my result file, say backup-secondhalf.sql. This is somewhat complicated by the fact that the file is bzip2-compressed, but that shouldn't be too big of a deal.

I think I can do it like this:

$ bunzip2 -c backup.sql.bz2 | grep --text --byte-offset --only-matching -e '--Table structure for table `mytable`' -m 1

This will give me the byte-offset in the file that I want to trim up to. Then:

$ bunzip2 -c backup.sql.bz2 | dd skip=[number from above] | bzip2 -c > backup-secondhalf.sql.bz2

Unfortunately, this requires me to run bunzip2 on the file twice and read-through all those bytes twice.

Is there a way to do this all at once?

I'm not sure my sed-fu is strong enough to do a "delete all lines until regular expression, then let the rest of the file through" expression.

This is on Debian Linux, so I have GNU tools available.

3 Answers

Voted

Mark Wagner · Answer 1 · 2021-12-01T15:54:33+08:00

Mark Wagner

2021-12-01T15:54:33+08:002021-12-01T15:54:33+08:00

bunzip2 -c backup.sql.bz2 | \
  sed -n '/-- Table structure for `mytable`/,$p'

Explanation:

-n suppress automatic printing of pattern space

Address range construction: Start with regex

/-- Table structure for  `mytable`/

End with

$ Match the last line.

Command

p Print the current pattern space.

Edit: depending on how you dumped the database you may have very long lines. GNU sed can handle them up to the amount of available memory.

8

Christopher Schultz · Answer 2 · 2021-12-01T18:31:28+08:00

Christopher Schultz

2021-12-01T18:31:28+08:002021-12-01T18:31:28+08:00

NOTE: Not an actual answer

Since I was motivated to get this solved now, I went ahead and used grep to find the offset in the file I wanted; it worked great.

Running dd unfortunately requires that you set ibs=1 which basically means no buffering, and performance is terrible. While waiting for dd to complete, I spent time writing my own custom-built C program to skip the bytes. After having done that, I see that tail could have done it for me just as easily:

$ bunzip2 -c restore.sql.bz2 | tail -c +[offset] | bzip2 -c > restore-trimmed.sql.bz2

I say "this doesn't answer my question" because it still requires two passes through the file: one to find the offset of the thing I'm looking for and another to trim the file.

If I were to go back to my custom program, I could implement a KMP during the "read-only" phase of the program and then switch-over to "read+write everything" after that.

2

mestia · Answer 3 · 2021-12-01T11:41:14+08:00

mestia

2021-12-01T11:41:14+08:002021-12-01T11:41:14+08:00

I wonder if something like that would do the trick:

use strict;
use warnings;
use feature 'say';

use IO::Uncompress::Bunzip2 '$Bunzip2Error';

my $file = $ARGV[0] // die "need a file";

my $zh = IO::Uncompress::Bunzip2->new( $file, {
    AutoClose   => 1,
    Transparent => 1,
} ) or die "IO::Uncompress::Bunzip2 failed: $Bunzip2Error\n";

my $trigger = undef;
while ( <$zh> ) {
    chomp;
    $trigger = 1 if $_ eq '-- Dumping data for table `experiments`';
    say if $trigger;
}

So basically it starts printing stuff after the pattern, one can also pipe it directly to bzip2/gzip, like perl chop.pl input_sql.bz2 | bzip2 > out.sql.bz2 You would need libio-compress-perl on Debian.

0

Best way to remove text from the beginning of a huge file

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?