How can I use docker without sudo?

Question

Lynob

Asked: 2014-07-26 01:20:12 +0800 CST2014-07-26 01:20:12 +0800 CST 2014-07-26 01:20:12 +0800 CST

Remove text that I don't want

772

I have a big html file on my desktop that looks like

src="http://images.alaablubnan.com/images/Balls/20.jpg"
alt="http://images.alaablubnan.com/images/Balls/20.jpg"/></a></td><td><a
href="http://images.alaablubnan.com/images/Balls/32.jpg"
target="_blank"><img
src="http://images.alaablubnan.com/images/Balls/32.jpg"
alt="http://images.alaablubnan.com/images/Balls/32.jpg"/></a></td><td><a
href="http://images.alaablubnan.com/images/Balls/30.jpg"
target="_blank"><img
src="http://images.alaablubnan.com/images/Balls/30.jpg"
alt="http://images.alaablubnan.com/images/Balls/30.jpg"/></a></td></tr><tr><td><table><tr><td>webpage/url</td><td>http://www.playlebanon.com/webservices/website/lotto/PopUps/HistoryDetail.aspx?t=1405536730503&FromDraw=1&ToDraw=1213&Draw=0</td></tr></table></td><td>2</td><td>complete
lotto results</td><td>complete lotto results</td><td>2</td><td><a
href="http://www.playlebanon.com/webservices/website/lotto/PopUps/HistoryDetail.

If possible, I want to:

get all the .jpg files, remove all the html code (it's 1.jpg, 2.jpg... to 42.jpg)
I want to remove the .jpg extenstion
I want each row of numbers to have 7 numbers only, then insert new line

2 Answers

Voted

terdon · Answer 1 · 2014-07-26T02:17:37+08:00

This is not actually a particularly good job for sed but here goes:

sed -nr 's#.*/([^"]+).jpg.*#\1#p' file

The above will get you a list of numbers, one per line:

Now, it is actually possible to get all these on the same line with 7 numbers per line using sed but it is really not worth the effort. Just use standard *nix tools instead:

$ echo $(sed -nr 's#.*/([^"]+).jpg.*#\1#p' file | tr $'\n' ' ') | fold -sw 21
20 20 32 32 32 30 30 
30

Or, if you want to remove duplicates:

echo $(sed -nr 's#.*/([^"]+).jpg.*#\1#p' file | sort -u | tr $'\n' ' ')
20 30 32

Explanation

The sed command uses a few tricks:

-n: don't print any lines by default.
-r: enable extended regular expressions, this lets us use ( ) to capture groups without needing to escape the parentheses and + for "one or more".
s#from#to# : while the standard substitution operator in sed and other, similar tools, is s/from/to/, you can a non standard delimiter so that you can include / in the pattern. In this case I am using # but you could use something else like s|from|to| as well.
s#.*/([^"]+).jpg.*#\1#p : this will match everything from the beginning of the line until a / and then captures the longest stretch of non-" characters until .jpg. This is the filename minus extension. The filename is captured in the parenthesis and the whole line (because of the .* on either side) will be replaced with the captured patter (\1). The p at the end means that it will print the lines where the substitution was successful.

Personally though, I would have done all of this with perl in the first place:

$ perl -e '@k=grep(s/.*\/([^"]+).jpg.*/$1/s,<>); print "@k[0..6]\n@k[7..$#k]\n"' file 
20 20 32 32 32 30 30
30

Or, for a larger file:

$ perl -e '@k=grep(s/.*\/([^"]+).jpg.*/$1/s,<>); for($i=0;$i<=$#k;$i+=7){print "@k[$i..$i+7]\n"}' file 
20 20 32 32 32 30 30 30
30

Or grep even:

$ echo $(grep -oP '[^/]+(?=.jpg)' file | tr $'\n' ' ' ) | fold -w 21
20 20 32 32 32 30 30 
30

Or, stealing @Olli's clever xargs idea:

$ grep -oP '[^/]+(?=.jpg)' file |  xargs -n7 echo
20 20 32 32 32 30 30
30

Oli · Answer 2 · 2014-07-26T01:56:41+08:00

I assume you're trying to scrape some sort of result. In this example there are only three balls and we can extract them by searching for Balls/<one-or-many-digits> and grouping (the \(..\) construct) around the number and then replacing the whole lot with that group (the \1 is a reference to the first group).

$ sed -n 's/.*Balls\/\([0-9]\+\).*/\1/gp' htmlfile | uniq | xargs -n7 echo
20 32 30

sed is going through this line by line. I'm asking it to match and replace everything on the line (which is why we cap each end with .* — "any amount of anything") with whatever it matches in the group. The -n and /p are used together to not print unless the line was a match and the /g means it'll keep matching until it hits the end of the file.

It's a fairly complicated example if you're new to regular expressions.

I'm passing it through uniq because there's a lot of duplication going on there.

And I'm using | xargs -n7 echo on the end to group 7 arguments together and pass them all onto echo. There aren't 7 balls here so it's only showing 3.

It probably slows it down but you can have a slightly more readable expression if you use the -r extended syntax for sed:

sed -nr 's/.*Balls\/([0-9]+).*/\1/gp' htmlfile | ...

Does the same thing, just without some of the confusing looking escaping.
Probably ever-so-slightly slower.

Remove text that I don't want

Explanation

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?