How can I use docker without sudo?

Question

Ibraheem

Asked: 2019-02-01 10:22:30 +0800 CST2019-02-01 10:22:30 +0800 CST 2019-02-01 10:22:30 +0800 CST

BASH Script hangs after some processing on Ubuntu

772

I have been running below script on a Red Hat server, and it works fine and finishes the job. The file I am feeding it, contains half a million lines in it (approximately 500000 lines), and that's why (to finish it faster) I have added an '&' at the end of while loop block

But now I have setup a Desktop with 8 GB of RAM running Ubuntu 18.04 on it, and running the same code only finishes a few thousand lines and then hangs. I read a bit about it and increased the stack limit to unlimited as well and still it hung after 80000 lines or so, Any suggestions about how can I optimize the code or tune my PC parameters to always finish the job?

while read -r CID60
do    
 { 
       OLT=$(echo "$CID60" | cut -d"|" -f5) 
       ONID=${OLT}:$(echo "$CID60" | cut -d, -f2 | sed 's/ //g ; s/).*|//') 
       echo $ONID,$(echo "$CID60" | cut -d"|" -f3) >> $localpath/CID_$logfile.csv       
  } &     
done < $localpath/$CID7360

Input:

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN45| Unlocked|12-654-0330|Up|202-00_MSRFKH00OL6|P282018767.C2028 ( network, R1.S1.LT7.PON8.ONT81.SERV1 )|

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN46| Unlocked|12-654-0330|Down|202-00_MSRFKH00OL6|P282017856.C881 ( local, R1.S1.LT7.PON8.ONT81.C1.P1 )|

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN52| Unlocked|12-664-1186|Up|202-00_MSRFKH00OL6|P282012623.C2028 ( network, R1.S1.LT7.PON8.ONT75.SERV1 )|

output:

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

my output of interest is 5th column ( separated with pipe | ) being concatenated with part of last column, and then the third column

5 Answers

Voted

xenoid · Answer 1 · 2019-02-02T02:53:44+08:00

xenoid

2019-02-02T02:53:44+08:002019-02-02T02:53:44+08:00

A pure sed solution:

sed -r 's/^[^|]+\|[^|]+\|([^|]+)\|[^|]+\|([^|]+)\|.+\( .+, ([^ ]+).+/\2:\3,\1/' <in.dat >out.dat

5

Ole Tange · Answer 2 · 2019-02-03T23:54:52+08:00

Ole Tange

2019-02-03T23:54:52+08:002019-02-03T23:54:52+08:00

doit() {
  # Hattip to @sudodus
  tr ' ' '|' |
    tr -s '|' '|' |
    cut -d '|' -f 3,5,9 
}
export -f doit
parallel -k --pipepart --block -1 -a input.txt doit > output.txt

-k keep the order, so the first/last line of the input will also be the first/last line of the output
--pipepart splits the file on the fly
--block -1 into 1 chunk per CPU thread
-a input.txt the file to split
doit the command (or bash function) to call

Speedwise the parallel (yellow) version outperforms the tr (black) around 200 MB on my system (Seconds vs MB):

5

sudodus · Answer 3 · 2019-02-02T02:47:17+08:00

Oneliners by me and other persons as well as some scripts tested

If the order of the items and the separators can be different from what you specify in the question, I thought the following one-liner would do it,

< input tr ' ' '|' | cut -d '|' -f 4,6,10 > output

but in a comment you wrote that you need exactly the specified format.

I added a solution with 'awk', which is approximately on par with PerlDuck's solution with perl. See the end of this answer.

< input awk '{gsub("\\|"," "); print $5 ":" $9 "," $3}' > output

Test of oneliners and small scripts

The test was done in my computer with Lubuntu 18.04.1 LTS, 2*2 processors and 4 GiB RAM.

I made a huge infile by 'doubling 20 times' from your demo input (1572864 lines), so some margin to your 500000 lines,

Oneliner with cut and sed:

$ < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile
$ wc -l infile
1572864 infile
$ wc -l outfile
1572864 outfile
$ tail outfile
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

Timing

We might expect, that a pure sed solution would be faster, but I think that reordering of the data slows it down, so that the cut and sed solution is faster. Both solutions work without any problem in my computer.

Oneliner with cut and sed:

$ time < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile

real    0m8,132s
user    0m8,633s
sys     0m0,617s

A pure sed oneliner by xenoid:

$ time sed -r 's/^[^|]+\|[^|]+\|([^|]+)\|[^|]+\|([^|]+)\|.+\( .+, ([^ ]+).+/\2:\3,\1/' <infile > outfile-sed 

real    1m8,686s
user    1m8,259s
sys     0m0,344s

A python script using a regex with non-greedy matches by xeniod:

#!/usr/bin/python

import sys,re

pattern=re.compile(r'^[^|]+?\|[^|]+?\|([^|]+?)\|[^|]+?\|([^|]+?)\|[^,]+?, (.+) \)\|$')

for line in sys.stdin:
    match=pattern.match(line)
        if match:
            print(match.group(2)+':'+match.group(3)+','+match.group(1))

$ time < infile ./python-ng > outfile.pyng

real    0m8,055s
user    0m7,359s
sys 0m0,300s

$ python --version
Python 2.7.15rc1

A perl oneliner by PerlDuck is faster than the previous oneliners:

$ time perl -lne 'print "$2:$3,$1" if /^(?:[^|]+\|){2}([^|]+)\|[^|]+\|([^|]+)\|[^,]+,\s*(\S+)/;' < infile > outfile.perl

real    0m5,929s
user    0m5,339s
sys     0m0,256s

Oneliner with tr and cut with a tr -s command:

I used tr to convert the spaces in the input file to pipeline characters and then cut could do it all without sed. As you can see, tr is much faster than sed. The tr -s command removes double pipes in the input, which is a good idea, particularly if there can be repeated spaces or pipes in the input file. It does not cost much.

$ time < infile tr ' ' '|' | tr -s '|' '|' | cut -d '|' -f 3,5,9 > outfile-tr-cut

real    0m1,277s
user    0m1,781s
sys     0m0,925s

Oneliner with tr and cut without the tr -s command, fastest so far:

time < infile tr ' ' '|' | cut -d '|' -f 4,6,10 > outfile-tr-cut

real    0m1,199s
user    0m1,020s
sys     0m0,618s


$ tail outfile-tr-cut
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

Oneliner with awk, fast but not the fastest,

< input awk '{gsub("\\|"," "); print $5 ":" $9 "," $3}' > output

$ time < infile awk '{gsub("\\|"," "); print $5 ":" $9 "," $3}' > outfile.awk

real    0m5,091s
user    0m4,724s
sys     0m0,365s

awk with parallel implemented according to Ole Tange reduces the real time from 5s to 2s:

#!/bin/bash

doit() {
 awk '{gsub("\\|"," "); print $5 ":" $9 "," $3}'
}
export -f doit
parallel -k --pipepart --block -1 -a infile doit > outfile.parallel-awk

$ time ./parallel-awk 
# Academic tradition requires you to cite works you base your article on.
# When using programs that use GNU Parallel to process data for publication
#please cite:

#  O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
#  ;login: The USENIX Magazine, February 2011:42-47.

# This helps funding further development; AND IT WON'T COST YOU A CENT.
#If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

# To silence this citation notice: run 'parallel --citation'.

real    0m1,994s
user    0m5,015s
sys     0m0,984s

We can expect that the advantage with parallel will increase with bigger size of the input file as described by the diagram in Ola Tange's answer to this question.

Speed summary: the 'real' time according to time rounded to 1 decimal

1m 8.7s - sed
   8.1s - cut & sed
   7.4s - python
   5.9s - perl
   5.1s - awk
   2.0s - parallel & awk
   1.2s - tr & cut

Finally, I note that the oneliners with sed, python, perl, awk and {parallel & awk} create an output file with the prescribed format.

$ tail outfile.awk
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

PerlDuck · Answer 4 · 2019-02-02T04:58:51+08:00

Best Answer

PerlDuck

2019-02-02T04:58:51+08:002019-02-02T04:58:51+08:00

Perl solution

This script doesn't do anything in parallel but is quite fast regardless. Save it as filter.pl (or whatever name you prefer) and make it executable.

#!/usr/bin/env perl

use strict;
use warnings;

while( <> ) {
    if ( /^(?:[^|]+\|){2}([^|]+)\|[^|]+\|([^|]+)\|[^,]+,\s*(\S+)/ ) {
        print "$2:$3,$1\n";
    }
}

I copied your sample data until I got 1,572,864 lines and then ran it as follows:

me@ubuntu:~> time ./filter.pl < input.txt > output.txt
real    0m3,603s
user    0m3,487s
sys     0m0,100s

me@ubuntu:~> tail -3 output.txt
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

If you prefer one-liners, do:

perl -lne 'print "$2:$3,$1" if /^(?:[^|]+\|){2}([^|]+)\|[^|]+\|([^|]+)\|[^,]+,\s*(\S+)/;' < input.txt > output.txt

3

xenoid · Answer 5 · 2019-02-02T13:04:46+08:00

xenoid

2019-02-02T13:04:46+08:002019-02-02T13:04:46+08:00

Python

import sys,re

pattern=re.compile(r'^.+\|.+\|(.+)\|.+\|(.+)\|.+, (.+) \)\|$')

for line in sys.stdin:
match=pattern.match(line)
if match:
    print(match.group(2)+':'+match.group(3)+','+match.group(1))

(works with both Python2 and Python3)

Using a regex with non-greedy matches is 4x faster (avoids backtracking?) and puts python on par with the cut/sed method (python2 being a bit faster than python3)

import sys,re

pattern=re.compile(r'^[^|]+?\|[^|]+?\|([^|]+?)\|[^|]+?\|([^|]+?)\|[^,]+?, (.+) \)\|$')

for line in sys.stdin:
match=pattern.match(line)
if match:
    print(match.group(2)+':'+match.group(3)+','+match.group(1))

2

BASH Script hangs after some processing on Ubuntu

Oneliners by me and other persons as well as some scripts tested

Test of oneliners and small scripts

Timing

Perl solution

Python

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?