How can I use docker without sudo?

Question

pa4080

Asked: 2017-09-30 07:29:33 +0800 CST2017-09-30 07:29:33 +0800 CST 2017-09-30 07:29:33 +0800 CST

Grep -E, Sed -E - low performance when '[x]{1,9999}' is used, but why?

772

When grep or sed are used with the option --extended-regexp and the pattern {1,9999} is a part of the regexp that is used, the performance of these commands become low. To be more clear, below are applied few tests.^[1] ^[2]

The relative performance of grep -E, egrep and sed -E is almost equal, so only the test that were made with grep -E are provided.

Test 1

$ time grep -E '[0-9]{1,99}' < /dev/null

real    0m0.002s

Test 2

$ time grep -E '[0-9]{1,9999}' < /dev/null

> real    0m0.494s

Test 3

$ time grep -E '[0123456789]{1,9999}' < /dev/null

> real    21m43.947s

Test 4

$ time grep -E '[0123456789]+' < /dev/null
$ time grep -E '[0123456789]*' < /dev/null
$ time grep -E '[0123456789]{1,}' < /dev/null
$ time grep -P '[0123456789]{1,9999}' < /dev/null

real    0m0.002s

What is the reason of this significant difference of the performance?

1 Answers

Voted

Stéphane Chazelas · Answer 1 · 2017-09-30T12:58:40+08:00

Note that it's not the matching that takes time, but the building of the RE. You'll find that it uses quite a lot of RAM as well:

$ valgrind grep -Eo '[0-9]{1,9999}' < /dev/null
==6518== HEAP SUMMARY:
==6518==     in use at exit: 1,603,530,656 bytes in 60,013 blocks
==6518==   total heap usage: 123,613 allocs, 63,600 frees, 1,612,381,621 bytes allocated
$ valgrind grep -Eo '[0-9]{1,99}' < /dev/null
==6578==     in use at exit: 242,028 bytes in 613 blocks
==6578==   total heap usage: 1,459 allocs, 846 frees, 362,387 bytes allocated
$ valgrind grep -Eo '[0-9]{1,999}' < /dev/null
==6594== HEAP SUMMARY:
==6594==     in use at exit: 16,429,496 bytes in 6,013 blocks
==6594==   total heap usage: 12,586 allocs, 6,573 frees, 17,378,572 bytes allocated

The number of allocs seems roughly proportional to the number of iterations, but the memory allocated seems to grow exponentially.

That's down to how GNU regexps are implemented. If you compile GNU grep with CPPFLAGS=-DDEBUG ./configure && make, and run those commands, you'll see the exponential effect in action. Going deeper than that would mean going through a lot of theory on DFA and dive into the gnulib regexp implementation.

Here, you can use PCREs instead that doesn't seem to have the same problem: grep -Po '[0-9]{1,65535}' (the maximum, though you can always do things like [0-9](?:[0-9]{0,10000}){100} for 1 to 1,000,001 repetitions) doesn't take more time nor memory than grep -Po '[0-9]{1,2}'.

Grep -E, Sed -E - low performance when '[x]{1,9999}' is used, but why?

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?