I simply need to get the match from a regular expression:
$ cat myfile.txt | SOMETHING_HERE "/(\w).+/"
The output has to be only what was matched, inside the parenthesis.
Don't think I can use grep because it matches the whole line.
Please let me know how to do this.
Use the
-o
option ingrep
.Eg:
2 Things:
-o
option, so only the match are printed (instead of whole line)-P
option, to use Perl regular expressions, which include useful elements like Look ahead(?= )
and Look behind(?<= )
, those look for parts, but don't actually match and print them.If you want only the part inside the parenthesis to be matched, do the following:
If the file contains the sting
/(a)5667/
, grep will print 'a', because:/(
are found by\/\(
, but because they are in a look-behind(?<= )
they are not reporteda
is matched by\w
and is thus printed (because of-o
))5667/
are found by\).+\/
, but because they are in a look-ahead(?= )
they are not reportedBecause you tagged your question as bash in addition to shell, there is another solution beside grep :
Bash has its own regular expression engine since version 3.0, using the
=~
operator, just like Perl.now, given the following code:
bash
and not justsh
in order to get all extensions$BASH_REMATCH
will give the whole string as matched by the whole regular expression, so<Lane>8</Lane>
${BASH_REMATCH[1]}
will give the part matched by the 1st group, thus only8
If you want only what is in the parenthesis, you need something that supports capturing sub matches (Named or Numbered Capturing Groups). I don't think grep or egrep can do this, perl and sed can. For example, with perl:
If a file called foo has a line in that is as follows:
And you do:
The letter a is returned. That might be not what you want though. If you tell us what you are trying to match, you might get better help. $1 is whatever was captured in the first set of parenthesis. $2 would be the second set etc.
Assuming the file contains:
And you want the character(s) between
>
and</
, you can use either:grep
grep -oP '.*\K(?<=>)\w+(?=<\/)' file
sed
sed -nE 's:^.*>(\w+)</.*$:\1:p' file
awk
awk '{print(gensub("^.*>(\\w+)</.*$","\\1","g"))}' file
perl
perl -nle 'print $1 if />(\w+)<\//' file
All will print a string "xyz".
If you want to capture the digits of this line:
grep
grep -oP '.*\K(?<=>)[0-9]+(?=<\/)' file
sed
sed -E 's:^.*>([0-9]+)</.*$:\1:' file
awk
awk '{print(gensub(".*>([0-9]+)</.*","\\1","g"))}' file
perl
perl -nle 'print $1 if />([0-9]+)<\//' file
This will accomplish what you are requesting, but I don't think it is what you really want. I put the
.*
in the front of the regex to eat up anything before the match, but that is a greedy operation, so this only matches the penultimate\w
character in the string.Note that you need to escape the parens and the
+
.