Wget

From Vectivus
Jump to: navigation, search

Overview

The "wget" tool allows you to download (and upload) content directly from the command line.

Examples

Mirror a website using a Mac user-agent string, ignoring what robots.txt commands, and not following links up the path hierarchy:

wget -e robots=off --user-agent='Mozilla/5.0 (Macintosh; U; MacOS X 10_12_3; en-US; Source2 HTML/1484790260; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.10 Safari/537.36' -r -np 'http://sub.domain.tld/folder/'

Download a list of files using a random wait of 0.25 and 0.75 seconds between each request and using cookies previously saved to a file:

wget --random-wait --wait 0.5 --load-cookies=/path/to/cookiefile --input-file=/path/to/downloadlist

Capture the cookies needed to reuse a session, after authenticating as a user and getting prompted for the matching password:

wget --save-cookies=/path/to/newcookiefile --keep-session-cookies --user={username} --ask-password http://sub.domain.tld/location

Useful Options

-e robots=off
Ignore what the robots.txt file says to do (e.g. don't ignore what it says to ignore)
--no-check-certificate
Don't check certificate validity (great for expired or self-signed certificates)
-np
Don't follow parent links (links to higher levels of the folder structure).
-r
Follow links recursively
--random-wait
Wait between 0.5 and 1.5 times the value of the "--wait" value between requests
--spider
Analyze but don't download anything from the target(s)
--user-agent={TEXT}
Specify a custom user agent for Wget to present
--wait={NUMBER_OF_SECONDS}
Wait the specified number of seconds between each request

Important Defaults

  • If a user agent isn't specified, wget will self-identify as "Wget/Version" (e.g. "Wget/1.1.2")
  • Does not go off site by default, but will follow links to parent levels