I have a file which consists of a URL. I'm trying to get the URL from that file using a shell script.
In the file, the URL is like this:
('URL', 'http://url.com');
I tried to use the following:
cat file.php | grep 'URL' | awk '{ print $2 }'
It gives the output as:
'http://url.com');
But I need to get only url.com
in a variable inside the shell script. How can I accomplish this?
You can do everything with a simple
grep
:From
man grep
:The trick is to use
\K
which, in Perl regex, meansdiscard everything matched to the left of the \K
. So, the regular expression looks for strings starting withhttp://
(which is then discarded because of the\K
) followed by as many non-'
characters as possible. Combined with-o
, this means that only the URL will be printed.You could also do it in Perl directly:
Something like this?
or
To strip out http://.
Try this,
Revisiting this again, and trying to use nothing but a Bash shell, another one line solution is:
Where file.in contains the 'dirty' url list and file.out will contain the 'clean' URL list. There are no external dependencies and there is no need to spawn any new processes or subshells. The original explanation and a more flexible script follows. There is a good summary of the method here, see example 10-10. This is pattern based parameter substitution in Bash.
Expanding on the idea:
Result:
No need to call any external programs. Furthermore, the following bash script,
get_urls.sh
, permits you to read a file directly or from stdin:If all the lines contain a URL:
If only some lines contain a URL:
Depending on the other lines you may need to change the
^define
regexSimple:
and if you need to remove the 'http://', then:
So:
If you need a certain part of the URL you need to refine your terminology, a URL is all of the following, sometimes more:
for me, the other
grep
answers given return string information after the link.This worked for me to only pull out the
url
: