Command grep
will print a line when the line contains a string that matches an expression, which is not handy to search for specifed content.
For instance, I have vocabulary files with formatting
**word**
1. Definition:
2. Usage
3. Others
I'd like to retrieve all the words to make a wordlist within files
grep '\*\*[^*]*\*\*'
Returns the bulk of the content.
How to use grep
to catch only the word
?
Like this for word, using perl regex (
-P
) :Output :
Like this for words :
Output :
There are several tools available that can be used to extract word, here's a version implemented in sed:
This command will match every line in your file that starts with
**
and print it. The other lines will be deleted from the output. If you also want to remove the stars you can extend the command to this:This command, in addition, will remove all
*
characters from the line before it is printed.This is one of those questions where it is helpful to have test input file and examples of desired output.
Input File
Here is a test input file I copied from the Internet and modified to encase search words within
**
pairs:Using
grep
Using
grep
it's fairly straightforward to get a word list:If you want to remove the
**
encasing the words, add a pipe tosed
:Saving index of words to a file
If you want to save your
grep
andsed
output use the file redirection>
command:Note original answer posted yesterday enhanced with new post today from muru on a separate Q&A: Use specified quantifier in grep to retrieve satisfied vocabulary
With
awk
way:sample test input:
the output:
If you don't mind using additional tools a very simple solution would be to post-filter the
grep
output withtr
to delete all occurrences of the character*
:I also recommend that you use the
-x
flag of GNU grep as above to match only whole lines to not accidentally catch**word**
appearing surrounded by other text on the same line. This may also speed up the pattern matching process since it can now discard many potential matches early on.sed
alternativeYou can also take advantage of sed’s
p
flag to match, replace and print as a single command:GNU grep
Your particular case is extracting text between two patterns on a line/string. This has been covered in the 2012 question How to use sed/grep to extract text between two words?. Particularly, as anishsane mentioned, you can use look-ahead and look-back patterns with Perl-regex flag
-P
. In your particular case, the solution would beHowever, as ghoti mentioned,
-P
is specific to GNUgrep
. Keep that in mind if you are porting your scripts/commands between different *nix systems.Perl
Instead of trying to use Perl regex, let's just use Perl itself:
This has two advantages. One, it specifies the delimiter for "fields", which means we can deal with individual items separated by
**
. Second, syntactically this is just slightly less confusing than look-ahead/back pattern.Python
Of course, there's other ways to do it, and one of them is Python. Python 2.7 script would be:
You could also make it a one-liner and take advantage of stdin redirection:
Others who prefer regex, may want to use
re
module.