Difference between revisions of "Wget"

From Alessandro's Wiki
 
Line 20: Line 20:
  wget -o wget.log --html-extension --restrict-file-names=windows --convert-links --recursive --level=inf --page-requisites --wait=0 --quota=inf --reject="*_form, *@*, sitemap, RSS" http://site
  wget -o wget.log --html-extension --restrict-file-names=windows --convert-links --recursive --level=inf --page-requisites --wait=0 --quota=inf --reject="*_form, *@*, sitemap, RSS" http://site


*; -k  
* '''-k''': converts non-relative links to relative  
** converts non-relative links to relative  
* '''-l 0''' recursion infinite
*;"-l 0"
* '''-E / --adjust-extension'''
** recursion infinite
* '''-K / --backup-converted'''
* '''-p / --page-requisites'''


* exclude a directory:
* exclude a directory:

Latest revision as of 08:00, 18 October 2013

software to download web pages from commandline


  • Limit the bandwidth of download (ex. to 60 kb).
wget --limit-rate=60k 
  • resume downloading (Continue to download an incomplete file):
wget -c http://www.yourdomain.com/bigfile.bin
  • Download into specified file name...
wget http://www.example.com -O index.html
  • Mirror an entire website
wget -m -p -l 0 -E -k http://site
  • and also
wget -k -l 0 -m -nh -r http://site
  • and...
wget --mirror -w 2 -p --html-extension --convert-links http://site
  • another one
wget -o wget.log --html-extension --restrict-file-names=windows --convert-links --recursive --level=inf --page-requisites --wait=0 --quota=inf --reject="*_form, *@*, sitemap, RSS" http://site
  • -k: converts non-relative links to relative
  • -l 0 recursion infinite
  • -E / --adjust-extension
  • -K / --backup-converted
  • -p / --page-requisites
  • exclude a directory:
--exclude-directories="directory1,dir2"
  • for SSL encripted with certificate websites:
--no-check-certificate
  • send authentication to apache:
--http-user=USER --http-password=PASSWD
  • send authentication by post action (backslash to escape & from shell):
--post-data=authid=USER\&authpw=PASSWD