I have a text file made from the output of the repository management tool aptly
, which lists my published repositories, from which I need to extract information.
The file format is as follows:
Published repositories:
* test_repo_one/xenial [i386,amd64] publishes {main: [xenial-main_20190311]: Snapshot from mirror [xenial-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {multiverse: [xenial-multiverse_20190311]: Snapshot from mirror [xenial-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {restricted: [xenial-restricted_20190311]: Snapshot from mirror [xenial-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {universe: [xenial-universe_20190311]: Snapshot from mirror [xenial-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}
* test_repo_one/xenial-security [i386,amd64] publishes {main: [xenial-security-main_20190311]: Snapshot from mirror [xenial-security-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {multiverse: [xenial-security-multiverse_20190311]: Snapshot from mirror [xenial-security-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {restricted: [xenial-security-restricted_20190311]: Snapshot from mirror [xenial-security-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {universe: [xenial-security-universe_20190311]: Snapshot from mirror [xenial-security-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}
* test_repo_two/trusty [i386,amd64] publishes {main: [trusty-main_20190312]: Snapshot from mirror [trusty-main]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {multiverse: [trusty-multiverse_20190312]: Snapshot from mirror [trusty-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {restricted: [trusty-restricted_20190312]: Snapshot from mirror [trusty-restricted]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {universe: [trusty-universe_20190312]: Snapshot from mirror [trusty-universe]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}
...
The last line of the output ends in a new line.
The "Published repositories:" line is not required.
For each of the lines starting ' *' I need to remove extraneous information, leaving only snapshot names. There is no way to do this in aptly
. The desired output for the first of these lines is.
test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]
The square brackets are not essential either so a solution that retains or removes these is fine. I'd prefer a sed
or awk
solution but anything that works would be highly appreciated.
Two answers in one
I've posted two answers here:
grep
,sed
andcut
How the Bash script looks in operation
I've turned off gnome-terminal line wrap to make input and output files easier to read.
The actual Bash script
Remember to make the script executable with
chmod a+x script.sh
One-liner with common utilities
One-liners are popular in the Linux community and there are some excellent
awk
andperl
answers posted in this Q&A. Here is an example using common utilities most experienced command line users are familiar with:grep ^" \*" aptfilein
- thegrep
command selects lines containing a search string. The carrot (^
) denotes the string must start at the beginning of the line. The backslash (\
) denotes the asterisk/splat (*
) is to be taken literally and not act as a wildcard character that selects everything. In summary thisgrep
command selects all lines beginning with*
in fileaptfilein
.sed
is a "stream editor" that edits lines coming in and changes them and passes them out. There are threesed
changes here's/ \* //;s/ /: /;s/^/ /'
. The changes are between quotes ('
) and delineated (separated) by a semi-colon (;
) deliminator. They are broken down in next three points.s/ \* //
- search first occurrence of*
and change it to null. This will erase the*
that begins at each line.s/ /: /
- searches for the first space and changes it into a colon (:
) followed by a space. This is necessary to change our first field into a key. For exampletest_repo_one/xenial
becomestest_repo_one/xenial:
.s/^/ /
- tellssed
to insert a space at the beginning of each line.cut -d':' -f1,3,6,9,12 --output-delimiter=''
- Uses thecut
command to select key fields # 1, 3, 6, 9 and 12. The key fields are delimited by a colon as argument-d':'
stipulates. Normally output fields are delimited the same but this is overridden to null using --output-delimiter=''` parameter.Note: The one-liner is faster than bash which is slower at string processing.
A Perl approach:
Explanation
perl -lne
: read the input file line by line (-n
), remove trailing newlines (-l
) and run the script given by-e
on each line. The-l
also adds a newline to eachprint
call.next unless /^\s*\*\s*(\S+)/;
: find the name of the repo, so the first stretch of non-whitespace characters (\S+
) on a line that starts with 0 or more whitespace characters (^\s*
), then a*
(\*
), and 0 or more whitespace characters again. The longest stretch of non-whitespace after that is what we want. If this line doesn't match this regex, thenext
will move us onto the next line.$n=$1
: save what was captured by the match above (the(\S+)
in parentheses,$1
) as$n
.@k=(/\{.+?:\s*\[(.+?)\]/g)
: find all cases where we have a{
, any other characters and then a:
, followed by whitespace and a[
and capture anything between the[
and the]
. Save all matching strings in the array@k
.print "$n @k"
: finally, print the name of the repo, the$n
, and the array@k
from above.If you prefer to have the square brackets included, you can use:
My awk approach: