How to unzip a zip file from the Terminal?

Question

Naive

Asked: 2013-09-26 20:01:02 +0800 CST2013-09-26 20:01:02 +0800 CST 2013-09-26 20:01:02 +0800 CST

Downloading contents of the web page

772

I want to write a python program to download the contents of a web page, and then download the contents of the web pages that the first page links to.

For example, this is main web page http://www.adobe.com/support/security/, and the pages I want to download: http://www.adobe.com/support/security/bulletins/apsb13-23.html and http://www.adobe.com/support/security/bulletins/apsb13-22.html

There is a certain condition I want to meet: it should download only web pages under bulletins not under advisories(http://www.adobe.com/support/security/advisories/apsa13-02.html)

 #!/usr/bin/env python
 import urllib
 import re
 import sys
 page = urllib.urlopen("http://www.adobe.com/support/security/")
 page = page.read()
 fileHandle = open('content', 'w')
 links = re.findall(r"<a.*?\s*href=\"(.*?)\".*?>(.*?)</a>", page)
 for link in links:
 sys.stdout = fileHandle
 print ('%s' % (link[0]))
 sys.stdout = sys.__stdout__
 fileHandle.close() 
 os.system("grep -i '\/support\/security\/bulletins\/' content >> content1")

I've already extracted the link of bulletins into a content1, but don't know how to download the content of those web pages, by providing content1 as input.

The content1 file is as shown below:- /support/security/bulletins/apsb13-23.html /support/security/bulletins/apsb13-23.html /support/security/bulletins/apsb13-22.html /support/security/bulletins/apsb13-22.html /support/security/bulletins/apsb13-21.html /support/security/bulletins/apsb13-21.html /support/security/bulletins/apsb13-22.html /support/security/bulletins/apsb13-22.html /support/security/bulletins/apsb13-15.html /support/security/bulletins/apsb13-15.html /support/security/bulletins/apsb13-07.html

2 Answers

Voted

Radu Rădeanu · Answer 1 · 2013-09-26T22:28:08+08:00

Best Answer

Radu Rădeanu

2013-09-26T22:28:08+08:002013-09-26T22:28:08+08:00

If I understood your question, the following script should be what you want:

#!/usr/bin/env python

import urllib
import re
import sys
import os
page = urllib.urlopen("http://www.adobe.com/support/security/")
page = page.read()
fileHandle = open('content', 'w')
links = re.findall(r"<a.*?\s*href=\"(.*?)\".*?>(.*?)</a>", page)
for link in links:
    sys.stdout = fileHandle
    print ('%s' % (link[0]))
sys.stdout = sys.__stdout__
fileHandle.close() 
os.system("grep -i '\/support\/security\/bulletins\/' content 2>/dev/null | head -n 3 | uniq | sed -e 's/^/http:\/\/www.adobe.com/g' > content1")
os.system("wget -i content1")

2

bikram990 · Answer 2 · 2013-09-26T20:55:35+08:00

bikram990

2013-09-26T20:55:35+08:002013-09-26T20:55:35+08:00

Probably this question is for stackoverflow!

But anyways you can look in HTTrack for this it does similar kind of operation and moreover its opensource

0

Downloading contents of the web page

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

What command do I need to unzip/extract a .tar.gz file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?

Change folder permissions and ownership