httrack

11/5/13, 4:08 PM

httrack
NAME SYNOPSIS DESCRIPTION EXAMPLES OPTIONS FILES ENVIRONMENT DIAGNOSTICS LIMITS BUGS COPYRIGHT AVAILABILITY AUTHOR SEE ALSO

NAME
httrack ! offline browser : copy websites to a local directory

SYNOPSIS
httrack [ url ]... [ !filter ]... [ +filter ]... [ !O, !!path ] [ !%O, !!chroot ] [ !w, !!mirror ] [ !W, !!mirror!wizard ] [ !g, !!get!files ] [ !i, !!continue ] [ !Y, !!mirrorlinks ] [ !P, !!proxy ] [ !%f, !!httpproxy!ftp[=N] ] [ !%b, !!bind ] [ !rN, !!depth[=N] ] [ ! %eN, !!ext!depth[=N] ] [ !mN, !!max!files[=N] ] [ !MN, !!max!size[=N] ] [ !EN, ! !max!time[=N] ] [ !AN, !!max!rate[=N] ] [ !%cN, !!connection!per!second[=N] ] [ !GN, !!max!pause[=N] ] [ !%mN, !!max!mms!time[=N] ] [ !cN, !!sockets[=N] ] [ !TN, !!timeout ] [ !RN, !!retries[=N] ] [ !JN, !!min!rate[=N] ] [ !HN, !!host !control[=N] ] [ !%P, !!extended!parsing[=N] ] [ !n, !!near ] [ !t, !!test ] [ !%L, ! !list ] [ !%S, !!urllist ] [ !NN, !!structure[=N] ] [ !%D, !!cached!delayed!type !check ] [ !%M, !!mime!html ] [ !LN, !!long!names[=N] ] [ !KN, !!keep !links[=N] ] [ !x, !!replace!external ] [ !%x, !!disable!passwords ] [ !%q, ! !include!query!string ] [ !o, !!generate!errors ] [ !X, !!purge!old[=N] ] [ !%p, ! !preserve ] [ !bN, !!cookies[=N] ] [ !u, !!check!type[=N] ] [ !j, !!parse!java[=N] ] [ !sN, !!robots[=N] ] [ !%h, !!http!10 ] [ !%k, !!keep!alive ] [ !%B, !!tolerant ] [ ! %s, !!updatehack ] [ !%u, !!urlhack ] [ !%A, !!assume ] [ !@iN, !!protocol[=N] ] [ !%w, !!disable!module ] [ !F, !!user!agent ] [ !%R, !!referer ] [ !%E, !!from ] [ !%F, !!footer ] [ !%l, !!language ] [ !C, !!cache[=N] ] [ !k, !!store!all!in!cache ] [ !%n, !!do!not!recatch ] [ !%v, !!display ] [ !Q, !!do!not!log ] [ !q, !!quiet ] [ !z, !!extra!log ] [ !Z, !!debug!log ] [ !v, !!verbose ] [ !f, !!file!log ] [ !f2, ! !single!log ] [ !I, !!index ] [ !%i, !!build!top!index ] [ !%I, !!search!index ] [
http://www.httrack.com/html/httrack.man.html Page 1 of 10

must be r00t (!%O root path) (!!chroot <param>) %O Action options: http://www. EXAMPLES httrack www.anothertest. and other files from the server to your computer.someweb.someweb. HTTrack arranges the original site’s relative link-structure. 4:08 PM !pN.com:8080 runs the spider on www. !!can!go!down ] [ !U.com/bob/ www.html +* !r6 means get all files starting from bobby.com/html/httrack. !!debug!headers ] [ !%!. images. and resume interrupted downloads.html using a proxy httrack !!update updates a mirror in the current folder httrack will bring you to the interactive mode httrack !!continue continues a mirror in the current folder OPTIONS General options: !O path for mirror/logfiles+cache (!O path mirror[.com/bob/bobby.someweb. and possibility of going everywhere on the web httrack www.com/*.jpg !mime:application/* mirror the two sites together (with shared links) and accept any .myhost.com/bob/bobby.jpg files on .html Page 2 of 10 .com/bob/ and only this site httrack www. with 6 link!depth.man.httrack 11/5/13. !!priority[=N] ] [ !S. !!go!everywhere ] [ !%H. !!can!go!up!and!down ] [ !a. and you can browse the site from link to link. !!userdef!cmd ] [ !%U.path cache and logfiles]) (! !path <param>) ! chroot path to. !!disable!security!limits ] [ !V.com/bob/bobby.com/bob/ mirror site www. !!stay!on!same!dir ] [ !D. !!stay!on!same!domain ] [ !l.html. Simply open a page of the "mirrored" website in your browser. HTTrack can also update an existing mirrored site. !!user ] [ !%W. !!callback ] [ !K.someweb.com sites httrack www.someweb.html !!spider !P proxy. getting HTML.httrack. ! !can!go!up ] [ !B. !!stay!on!same!address ] [ !d. !!keep!links[=N] ] [ DESCRIPTION httrack allows you to download a World Wide Web site from the Internet to a local directory. building recursively all directories.com/mike/ +*. as if you were viewing it online. !!stay!on!same!tld ] [ !e.someweb.

com/html/httrack.man.httrack 11/5/13. semi!automatic (asks questions) (!!mirror!wizard) just get files (saved in the current directory) (!!get!files) continue an interrupted mirror using the cache (!!continue) mirror ALL links located in the first level pages (mirror links) (!!mirrorlinks) Proxy options: !P ! %f ! %b proxy use (!P proxy:port or !P user:pass@proxy:port) (!!proxy <param>) *use proxy for ftp (f0 don t use) (!!httpproxy!ftp[=N]) use this local hostname to make/send requests (!%b hostname) (!!bind <param>) Limits options: !rN !%eN !mN !mN.httrack. 3600=1 hour) (!!max!mms!time[=N]) Flow control: !cN !TN !RN !JN !HN number of multiple connections (*c8) (!!sockets[=N]) timeout. number of seconds after a non!responding link is shutdown (! !timeout) number of retries. even in unknown tags or Javascript (%P0 don t use) (!!extended!parsing[=N]) get non!html files near an html file (ex: an image located outside) (!!near) Page 3 of 10 http://www. 3=timeout or slow (! !host!control[=N]) Links options: ! %P !n *extended parsing. in case of timeout or non!fatal errors (*R1) (!!retries[=N]) traffic jam control. 3600=1 hour) (!!max !time[=N]) maximum transfer rate in bytes/seconds (1000=1KB/s max) (!!max !rate[=N]) maximum number of connections/seconds (*%c10) (!!connection!per !second[=N]) pause transfer if N bytes reached. minimum transfert rate (bytes/seconds) tolerated for a link (!!min!rate[=N]) host is abandonned if: 0=never. 4:08 PM !w !W !g !i !Y *mirror web sites (!!mirror) mirror web sites. attempt to parse all links.html . 1=timeout.N2 !MN !EN !AN !%cN !GN !%mN set the mirror depth to N (* r9999) (!!depth[=N]) set the external links depth to N (* %e0) (!!ext!depth[=N]) maximum file length for a non!html file (!!max!files[=N]) maximum file length for non html (N) and html (N2) maximum overall size that can be uploaded/scanned (!!max!size[=N]) maximum mirror time in seconds (60=1 minute. and wait until lock file is deleted (!!max !pause[=N]) maximum mms stream download time in seconds (60=1 minute. 2=slow.

httrack.* 2=always. K5 transparent proxy link) (! !keep!links[=N]) replace external html links by error pages (!!replace!external) do not include any password for external password protected websites (%x0 include) (!!disable!passwords) *include query string for local files (useless. greately reducing latency for small files and test requests (%k0 don t use) (!!keep!alive) tolerant requests (accept bogus responses on some servers.0 requests (reduce update features.mht) (!!mime!html) long names (L1 *long names / L0 8!3 conversion / L2 ISO9660 compatible) (!!long!names[=N]) keep original links (e.) (u0 don t check. * %N2 always use) cached delayed type check. K absolute links.g.html ..asp. bitmask: |1 parse default. K3 absolute URI links.adr/link) (K0 *relative link.httrack 11/5/13.%t") delayed type check. for information purpose only) (%q0 don t include) (!!include!query!string) *generate output html file in case of error (404. http://www.txt (0=do not accept.* 1=accept) (!!cookies[=N]) check document type if unknown (cgi. 1+: see below) (!!structure[=N]) user defined structure (!N "%h%p/%n%q.js |8 don t be aggressive) (!!parse!java[=N]) follow robots.1=sometimes. %N1 use for unknown extensions. 3=always (even strict rules)) (!!robots[=N]) force HTTP/1. 4:08 PM !t ! %L ! %S test all URLs (even forbidden ones) (!!test) <file> add all URL located in this text file (one URL per line) (!!list <param>) <file> add all scan rules located in this text file (one scan rule per line) (! !urllist <param>) Build options: !NN !or ! %N ! %D ! %M !LN !KN !x !%x !%q !o !X !%p structure type (0 *original structure. K4 original links. |2 don t parse .class |4 don t parse .man.txt and meta robots tags (0=never.com/html/httrack. * u1 check but /. u2 check always) (!!check!type[=N]) *parse Java Classes (j0 don t parse.) (o0 don t generate) (! !generate!errors) *purge old files after update (X0 keep delete) (!!purge!old[=N]) preserve html files as is (identical to !K4 !%F "" ) (!!preserve) Spider options: !bN !u !j !sN !%h !%k !%B accept cookies in cookies. * %D1 don t wait) (!!cached!delayed!type !check) generate a RFC MIME!encapsulated full!archive (. only for old servers or proxies) (!!http!10) use keep!alive if possible.. don t make any link test but wait for files download to start instead (experimental) (%N0 don t use. to speedup them (%D0 wait. don t wait for remote type during updates. but not standard!) (!!tolerant) Page 4 of 10 http://www.

foo.. www.httrack.com==foo.. cache !C !k ! %n ! %v !Q !q !z !Z !v !f !f2 !I !%i ! %I create/use a cache for updates and retries (C0 no cache.com/html/httrack. 4:08 PM !%s !%u !%A !can !@iN !%w update hacks: various hacks to limit re!transfers when updating (identical size.cgi=text/html internet protocol (0=both ipv6+ipv4. 6=ipv6 only) (! !protocol[=N]) disable a specific external mime module (!%w htsswf !%w htsjava) (! !disable!module <param>) Browser ID: !F ! %R ! %E ! %F !%l user!agent field sent in HTTP headers (!F "user!agent name") (!!user !agent <param>) default referer field sent in HTTP headers (!!referer <param>) from email address sent in HTTP headers (!!from <param>) footer string in Html code (!%F "Mirrored [from host %s [file %s [at %s]]]" (! !footer <param>) preffered language (!%l "fr.html ..C1 cache is prioritary. en.* C2 test update before) (!!cache[=N]) store all files in cache (not useful if files on disk) (!!store!all!in!cache) do not re!download locally erased files (!!do!not!recatch) display on screen filenames downloaded (in realtime) ! * %v1 short version ! %v2 full animation (!!display) no log ! quiet mode (!!do!not!log) no questions ! quiet mode (!!quiet) log ! extra infos (!!extra!log) log ! debug (!!debug!log) log on screen (!!verbose) *log in files (!!file!log) one single log file (!!single!log) *make an index (I0 don t make) (!!index) make a top index for a project folder (* %i0 don t make) (!!build!top!index) make an searchable index for this mirror (* %I0 don t make) (!!search !index) Expert options: !pN priority mode: (* p3) (!!priority[=N]) Page 5 of 10 http://www.bin=application/x!zip) (!!assume <param>) also be used to force a specific file type: !!assume foo. *" (!!language <param>) Log.com.) is always linked with a mime type (!%A php3. index.cgi=text/html.) (!!updatehack) url hacks: various hacks to limit duplicate URLs (strip //. jp. bogus response. 4=ipv4 only.) (!!urlhack) assume that a type (cgi.httrack 11/5/13.asp.dat.man.

com/html/httrack.gif (!!debug!cache <param>) cache repair (damaged cache) (!!repair!cache) debug parser (!!debug!parsing) extract new.zip always flush log files (!!advanced!flushlogs) maximum number of filters (!!advanced!maxfilters[=N]) version info (!!version) scan stdin (debug) (!!debug!scanstdin) maximum number of links (!#L1000000) (!!advanced!maxlinks) display ugly progress information (!!advanced!progressinfo) catch URL (!!catch!url) old FTP routines (debug) (!!repair!cache) generate transfer ops. log every minutes (!!debug!xfrstats) wait time (!!advanced!wait) generate transfer rate statictics every minutes (!!debug!ratestats) execute a shell command (!#! "echo hello") (!!exec <param>) Dangerous options: (do NOT use unless you exactly know what you are doing) ! bypass built!in security limits aimed to avoid bandwith abuses (bandwidth.html Page 6 of 10 .man.httrack 11/5/13.gif ) (!!debug!testfilters <param>) simplify test (!#1 ..com/spider*. %! simultaneous connections) (!!disable!security!limits) !IMPORTANT NOTE: DANGEROUS OPTION.com) (!!stay!on!same!tld) go everywhere on the web (!!go!everywhere) debug HTTP headers in logfile (!!debug!headers) Guru options: (do NOT use if possible) !#X !#0 !#1 !#2 !#C !#R !#d !#E !#f !#FN !#h !#K !#L !#p !#P !#R !#T !#u !#Z !#! *use optimized engine (limited memory boundary checks) (!!fast!engine) filter test (!#0 *.com/foo./foobar) type test (!#2 /foo/bar.httrack./foo/bar/. ONLY SUITABLE FOR EXPERTS !USE IT WITH EXTREME CARE Command!line specific options: http://www.zip cache meta!data in meta.bar. don t save anything (for checking links) save only html files save only non html files save all files get html files before.php) cache list (!#C *.gif www. then treat other files stay on the same directory (!!stay!on!same!dir) *can only go down into subdirs (!!can!go!down) can only go to upper directories (!!can!go!up) can both go up&down into the directory structure (!!can!go!up!and!down) *stay on the same address (!!stay!on!same!address) stay on the same principal domain (!!stay!on!same!domain) stay on the same TLD (eg: . 4:08 PM !p0 !p1 !p2 !*p3 !p7 !S !D !U !B !a !d !l !e !%H just scan.

someweb.xxx/ Identical to N1 exept that "web" is replaced by the site s name Identical to N2 exept that "web" is replaced by the site s name Identical to N3 exept that "web" is replaced by the site s name Identical to N4 exept that "web" is replaced by the site s name Identical to N5 exept that "web" is replaced by the site s name Identical to N99 exept that "web" is replaced by the site s name Identical to N1 exept that there is no "web" directory Identical to N2 exept that there is no "web" directory Identical to N3 exept that there is no "web" directory (option set for g option) Identical to N4 exept that there is no "web" directory Identical to N5 exept that there is no "web" directory Identical to N99 exept that there is no "web" directory Details: User!defined option N %n Name of file without file type (ex: image) %N Name of file.com/html/httrack. images/other in web/xxx. 32 ascii bytes) %k full query string %r protocol name (ex: http) %q small query string MD5 (16 bits. 32 ascii bytes) %Q query string MD5 (128 bits. including file type (ex: image.so[.myparameters]) (!!callback <param>) Details: Option N !N0 !N1 !N2 !N3 !N4 !N5 !N99 !N100 !N101 !N102 !N103 !N104 !N105 !N199 !N1001 !N1002 !N1003 !N1004 !N1005 !N1099 Site!structure (default) HTML in web/. with random names (gadget !) Site!structure. images/other in web/images HTML in web/.httrack.httrack 11/5/13.com) %M URL MD5 (128 bits. images/other in web/ HTML in web/.gif) %t File type (ex: gif) %p Path [without ending /] (ex: /someimages) %h Host name (ex: www. where xxx is the file extension (all gif will be placed onto web/gif.man. images/other files in web/images/ HTML in web/HTML. 4:08 PM !V ! %U ! %W execute system command after each files ($0 is the filename: !V "rm ") (! !userdef!cmd <param>) run the engine with another id when called as root (!%U smith) (!!user <param>) use an external library function as a wrapper (!%W myfoo. 4 ascii bytes) %s? Short name version (ex: %sN) %[param] param variable in query string %[param:before:after:empty:notfound] advanced variable extraction Details: User!defined option N and advanced variable extraction %[param:before:after:empty:notfound] http://www.domain. for example) Images/other in web/xxx and HTML in web/HTML All files in web/.html Page 7 of 10 . without www.

cgi?q=45 !> foo4B54.com/html/httrack.httrack 11/5/13. 4:08 PM !param !before !after !notfound !empty !all : parameter name : string to prepend if the parameter was found : string to append if the parameter was found : string replacement if the parameter could not be found : string replacement if the parameter was empty fields. without confirmation (!iC1) !!catchurl create a temporary proxy to capture an URL or a form post URL !!clean erase cache & log files !!http10 force http/1. without confirmation (!iC2) !!continue continue a mirror. default) !> http://www. do not seek other URLs (!qg) !!list <text file> add all URL located in this text file (!%L) !!mirrorlinks <URLs> mirror all links in 1st level pages (!Y) !!testlinks <URLs> test links in pages (!r1p0C0I0t) !!spider <URLs> spider site(s). to test links: reports Errors & Warnings (!p0C0I0t) !!testsite <URLs> identical to !!spider !!skeleton <URLs> make a mirror. can be empty Details: Option K !K0 !K !K3 !K4 !K5 foo.httrack.h http://www.man.html Page 8 of 10 .foobar.cgi?q=45 (absolute URI) !> foo.cgi?q=45 (original URL) !> http://www.com/folder/foo4B54.0 requests (!%h) Details: Option %W: External callbacks prototypes see htsdefines.html?q=45 (transparent proxy URL) Shortcuts: !!mirror <URLs> *make a mirror of site(s) (default) !!get <URLs> get the files indicated.html?q=45 (relative URI.cgi?q=45 (absolute URL) (!!keep !links[=N]) !> /folder/foo. except the first one (the parameter name).com/folder/foo.foobar. but gets only html files (!p1) !!update update a mirror.

To avoid them: use filters like *cgi-bin* BUGS Please reports bugs to <bugs@httrack. 4:08 PM FILES /etc/httrack. COPYRIGHT Copyright (C) Xavier Roche and other contributors This program is free software.src=’image’+a+Mobj. OS version.txt by default. or to stderr if the -v option was specified. LIMITS These are the principals limits of HTTrack for that moment.com/html/httrack.gif’) .conf the line path ~/websites/# DIAGNOSTICS Errors/Warnings are reported to hts!log.com>.conf The system wide configuration file. either version 3 of the License. and say which version of httrack you are using. Note that we did not heard about any other utility that would have solved them.man. Do not forget to detail options used. ENVIRONMENT HOME Is being used if you defined in /etc/httrack. and any other information you deem necessary. See the GNU General Public License for more http://www.dst+’.Some java classes may not find some files on them (class included) . or any later version.Cgi-bin links may not work properly in some cases (parameters needed). without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.httrack. This program is distributed in the hope that it will be useful. you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. self-contained example that will allow the bug to be reproduced.httrack 11/5/13. but WITHOUT ANY WARRANTY. Include a complete.Several scripts generating complex filenames may not find them (ex: img. .html Page 9 of 10 .

write to the Free Software Foundation.httrack 11/5/13. 59 Temple Place .com/html/ ) contains more detailed information.com AUTHOR Xavier Roche <roche@httrack. Please also refer to the httrack FAQ (available online at http://www. 4:08 PM details.man.httrack.Suite 330. Boston. You should have received a copy of the GNU General Public License along with this program.com/html/faq.html Page 10 of 10 .. AVAILABILITY The most recent released version of httrack can be found at: http://www.httrack. USA.html ) http://www.com/html/httrack.httrack. MA 02111-1307. if not. Inc.com> SEE ALSO The HTML documentation (available online at http://www.httrack.

Sign up to vote on this title
UsefulNot useful