What's the most simple way to extract substring on Unix shell (with regex)?
Simple means:
- less feature
- less options
- less study
Update
I realized regex itself is conflicting with simplicity, and I chose the simplest one cut
as the chosen answer. I am sorry for vague question. I changed title to represent current state of this QA more precisely.
cut
might be useful:Shell Builtins are good for this too, here is a sample script:
That outputs:
And as per Gnudif above, there are always sed/awk/perl for when the going gets really tough.
Unix shells do not traditionally have regex support built-in. Bash and Zsh both do, so if you use the
=~
operator to compare a string to a regex, then:You can get the substrings from the
$BASH_REMATCH
array in bash.In Zsh, if the
BASH_REMATCH
shell option is set, the value is in the$BASH_REMATCH
array, else it's in the$MATCH/$match
tied pair of variables (one scalar, the other an array). If theRE_MATCH_PCRE
option is set, then the PCRE engine is used, else the system regexp libraries, for an extended regexp syntax match, as per bash.So, most simply: if you're using bash:
If you're not using Bash or Zsh, it gets more complicated as you need to use external commands.
Consider also
/usr/bin/expr
.You can also match patterns against the beginning of strings.
grep and sed are probably the tools you want, depending on the structure of text.
sed should do the trick, if you do not know what the substring is, but know some pattern that is around it.
for example, if you want to find a substring of digits that starts with a "#" sign, you could write something like:
grep could do something similar, but the question is what you need to do with the substring and whether we are talking normal line-end delimited text or not.