FREE THOUGHT · FREE SOFTWARE · FREE WORLD

Home » Linux » Wget Trick to Download from Restrictive Sites

Wget Trick to Download from Restrictive Sites

Wget Trick to Download from Restrictive Sites

September 6th, 2007

Contents

  1. Update
  2. Using alias
  3. Using custom .wgetrc
  4. From the command line

Before
wget 403 Forbidden After trick
wget bypassing restrictions
I am often logged in to my servers via SSH, and I need to download a file like a WordPress plugin. I've noticed many sites now employ a means of blocking robots like wget from accessing their files. Most of the time they use .htaccess to do this. So a permanent workaround has wget mimick a normal browser.


Update

function wgets()
{
  wget --referer="http://www.google.com" --user-agent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6" \
  --header="Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5" \
  --header="Accept-Language: en-us,en;q=0.5" \
  --header="Accept-Encoding: gzip,deflate" \
  --header="Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7" \
  --header="Keep-Alive: 300" "$@"
}

Using alias

Add this to your .bash_profile or other shell startup script, or just type it at the prompt. Now just run wget from the command line as usual, i.e. wget -dnv http://www.askapache.com/sitemap.xml.

alias wget='wget --referer="http://www.google.com" --user-agent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6" --header="Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5" --header="Accept-Language: en-us,en;q=0.5" --header="Accept-Encoding: gzip,deflate"
--header="Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7" --header="Keep-Alive: 300"'

Using custom .wgetrc

Alternatively, you could instead just create or modify your $HOME/.wgetrc file like this. Or download and rename to .wgetrc.wgetrc. Now just run wget from the command line as usual, i.e. wget -dnv http://www.askapache.com/sitemap.xml.

###
### Sample Wget initialization file .wgetrc by http://www.askapache.com
###
##
## Local settings (for a user to set in his $HOME/.wgetrc).  It is
## *highly* undesirable to put these settings in the global file, since
## they are potentially dangerous to "normal" users.
##
## Even when setting up your own ~/.wgetrc, you should know what you
## are doing before doing so.
##
 
header = Accept-Language: en-us,en;q=0.5
header = Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
header = Accept-Encoding: gzip,deflate
header = Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
header = Keep-Alive: 300
user_agent = Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6
referer = http://www.google.com

From the command line

wget --referer="http://www.google.com" --user-agent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6" --header="Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5" --header="Accept-Language: en-us,en;q=0.5" --header="Accept-Encoding: gzip,deflate"
--header="Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7" --header="Keep-Alive: 300" -dnv http://www.askapache.com/sitemap.xml

http://www.askapache.com/linux-unix/wget-header-trick.html#comments

Add Comment!

Leave a Reply

Your email address will not be published.


It's very simple - you read the protocol and write the code. -Bill Joy

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 License, just credit with a link.
This site is not supported or endorsed by The Apache Software Foundation (ASF). All software and documentation produced by The ASF is licensed. "Apache" is a trademark of The ASF. NCSA HTTPd.
UNIX ® is a registered Trademark of The Open Group. POSIX ® is a registered Trademark of The IEEE.

Site Map | Contact Webmaster | Glossary | License and Disclaimer | Terms of Service |

↑ TOP
Main