Some parts of wikipedia appear differently when you're logged in. I would like to wget user pages so they would appear as if I was logged in.
Is there a way I can wget user pages like this
http://en.wikipedia.org/wiki/User:A
this is the login page:
http://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=Login&campaign=ACP3
###The easy way: login with your browser,and give the cookies to wget
Easiest method: in general, you need to provide wget or curl with the (logged-in) cookies from a particular website for them to fetch pages as if you were logged in.
If you are using Firefox, it's easy to do via the cookie.txt add-on. Install the add-on, and:
Click on the plugin and save the
cookies.txt
file (you can change the filename/destination).Open up a terminal, and use
wget
with the--load-cookies=FILENAME
option, e.g.curl --cookie cookies.txt ...
(I will try to update this answer for Chrome/Chromium users)
###The hard way: use curl (preferably) or wget to manage the entire session
--cookie-jar
or wget with the--save-cookies --keep-session-cookies
options, along with the HTTP/S PUT method to log in to a site, save the login cookies, and then use them to simulate a browser.Another easy solution that worked for me without installing anything extra:
This will give you a command that you can paste directly into your shell, that has all your cookie credentials e.g.
You can then modify the URL in the command to fetch whatever you want.
With cURL is really easy to handle cookies in both ways.
curl www.target-url.com -c cookie.txt
then will save a file named cookie.txt. But you need to log in, so need to use --data with arguments like:curl -X --data "var1=1&var2=2" www.target-url.com/login.php -c cookie.txt
. Once you get loggued cookie you can send it with:curl www.target-url.com/?user-page.php -b cookie.txt
Just use -c (--cookie) or -b (--cookie-jar) to save and send.
Note1: Using cURL CLI is a lot of easier than PHP and maybe faster ;)
For save the final content you can easily add
> filename.html
to your cURL command then save full html code.Note2 about "full": Yo cannot render javascript with cURL, just get the source code.
For those still interested in this questions, there's a very useful Chrome extension called CurlWGet that allows you to generate a
wget
/curl
request with authentication measures, etc. with one click. To install this extension, follow the steps below:Enjoy!
have a look at cliget for Firefox.
When you're about to download, on the final download dialog you get the option to copy the download as curl command line to the clipboard.
The blog post Wget with Firefox Cookies shows how to access the sqlite data file in which Firefox stores its cookies. That way one doesn't need to manually export the cookies for use with wget. A comment suggests that it doesn't work with session cookies, but it worked fine for the sites I tried it with.
Have you tried this?
Try something like:
See also this link:
How to download this webpage with wget?
For more complicated website based logins you should also consider to use a Python script and some module which imitates a browser, like http://wwwsearch.sourceforge.net/mechanize/ instead of
curl
orwget
.This way session cookies are handled automatically, you can follow links and fill login forms, and so "script" yourself through the login process as if using your web browser.