You are on page 1of 2

============== WGET ============== Basic wget Commands: To download a file from the Internet type: wget http://www.example.com/downloads.

zip If you are downloading a large file, for example an ISO image, this could take s ome time. If your Internet connection goes down, then what do you do? You will h ave to start the download again. If you are downloading a 700Mb ISO image on a s low connection, this could be very annoying! To get around this problem, you can use the -c parameter. This will continue the download after any disruptions. eg : wget -c http://www.example.com/linux.iso I have came across some websites that do not allow you to download any files usi ng a download manager. To get around this, wget -U mozilla http://www.example.com/image.jpg This will pass wget off as being a Mozilla web browser Downloading Entire Sites: Wget is also able to download an entire website. But because this can put a heav y load upon the server, wget will obey the robots.txt file. wget -r -p http://www.example.com The -p parameter tells wget to include all files, including images. This will me an that all of the HTML files will look how they should do. So what if you don't want wget to obey by the robots.txt file? You can simply ad d -e robots=off to the command like this: wget -r -p -e robots=off http://www.example.com

As many sites will not let you download the entire site, they will check your br owsers identity. To get around this, use -U mozilla as I explained above. wget -r -p -e robots=off -U mozilla http://www.example.com A lot of the website owners will not like the fact that you are downloading thei r entire site. If the server sees that you are downloading a large amount of fil es, it may automatically add you to it's black list. The way around this is to w ait a few seconds after every download. The way to do this using wget is by incl uding --wait=X (where X is the amount of seconds.) you can also use the parameter: --random-wait to let wget chose a random number of seconds to wait. To include this into the command:

wget --random-wait -r -p -e robots=off -U mozilla http://www.example.com

Other Useful wget Parameters: --limit-rate=20k : Limits the rate at which it downloads files. (20Kb/s) -b : Continues wget after logging out. Very useful if you are connecting to your home PC via SSH. -o $HOME/wget_log.txt : Logs the output of the wget command to a text file withi n your home directory. Useful for if you are using wget in the background, as yo u can check for any errors that may appear.

You might also like