I need to construct a regular expression that will filter a group of words that begins and ends with the same word. For example, the life of the free
will output the life of the
and he was and he his the same
will output he was and he
. The two words have to be maximum 10 characters from each other.
With PCRE, one can do:
-P
enables Perl-style regular expressions using PCRE.-o
prints only the matched text.\b
marks the word boundary(\w+)
groups a match of word characters.{1,10}
matches up to 10 characters and at least 1.\1
refers to the group matched earlier.Try
grep
with extended Regular Expression:Here
-E
means extended regexp,-o
means only print the matched portion of the line,\b
matches the word boundary, the character class[[:alnum:]]
means all alphabetic (uppercase & lowercase) and numerical characters,[[:blank:]]
means space or tab,+
means one or more occurrences of the previous match,{1,10}
the previous match can occur between 1 to a maximum of 10 times,\1
means match the first matched group (expressed between first pair of parentheses) i.e.\b[[:alnum:]]+\b
.