Professional Documents
Culture Documents
Norobots RFC
Norobots RFC
Koster
INTERNET DRAFT WebCrawler
Category: Informational November 1996
Dec 4, 1996 Expires June 4, 1997
<draft-koster-robots-00.txt>
Table of Contents
1. Abstract . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2
3. Specification . . . . . . . . . . . . . . . . . . . . . . . 3
3.1 Access method . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 File Format Description . . . . . . . . . . . . . . . . . . 4
3.2.1 The User-agent line . . . . . . . . . . . . . . . . . . . . 5
3.2.2 The Allow and Disallow lines . . . . . . . . . . . . . . . 5
3.3 Formal Syntax . . . . . . . . . . . . . . . . . . . . . . . 6
3.4 Expiration . . . . . . . . . . . . . . . . . . . . . . . . 8
4. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 8
5. Implementor's Notes . . . . . . . . . . . . . . . . . . . . 9
5.1 Backwards Compatibility . . . . . . . . . . . . . . . . . . 9
5.2 Interoperability . . .. . . . . . . . . . . . . . . . . . . 10
6. Security Considerations . . . . . . . . . . . . . . . . . . 10
7. References . . . . . . . . . . . . . . . . . . . . . . . . 10
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 11
9. Author's Address . . . . . . . . . . . . . . . . . . . . . 11
1. Abstract
2. Introduction
3. The Specification
The instructions must be accessible via HTTP [2] from the site that
the instructions are to be applied to, as a resource of Internet
Media Type [3] "text/plain" under a standard relative path on the
server: "/robots.txt".
Some examples of URLs [4] for sites and URLs for corresponding
"/robots.txt" sites:
http://www.foo.com/welcome.html http://www.foo.com/robots.txt
http://www.bar.com:8001/ http://www.bar.com:8001/robots.txt
If the server response indicates the resource does not exist (HTTP
Status Code 404), the robot can assume no instructions are
available, and that access to the site is not restricted by
/robots.txt.
User-agent: webcrawler
User-agent: infoseek
Allow: /tmp/ok.html
Disallow: /tmp
Disallow: /user/foo
GET / HTTP/1.0
User-agent: FigTree/0.1 Robot libwww-perl/5.04
User-agent: figtree
The /robots.txt URL is always allowed, and must not appear in the
Allow/Disallow rules.
robotstxt = *blankcomment
| *blankcomment record *( 1*commentblank 1*record )
*blankcomment
blankcomment = 1*(blank | commentline)
commentblank = *commentline blank *(blankcomment)
blank = *space CRLF
CRLF = CR LF
record = *commentline agentline *(commentline | agentline)
1*ruleline *(commentline | ruleline)
The syntax for "token" is taken from RFC 1945 [2], reproduced here for
convenience:
The syntax for "path" is defined in RFC 1808 [6], reproduced here for
convenience:
lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
"j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
"s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
"J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
"S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
3.4 Expiration
4. Examples
http://www.fict.org/
http://www.fict.org/index.html
http://www.fict.org/robots.txt
http://www.fict.org/server.html
http://www.fict.org/services/fast.html
http://www.fict.org/services/slow.html
http://www.fict.org/orgo.gif
http://www.fict.org/org/about.html
http://www.fict.org/org/plans.html
http://www.fict.org/%7Ejim/jim.html
http://www.fict.org/%7Emak/mak.html
The site may in the /robots.txt have specific rules for robots that
send a HTTP User-agent "UnhipBot/0.1", "WebCrawler/3.0", and
User-agent: unhipbot
Disallow: /
User-agent: webcrawler
User-agent: excite
Disallow:
User-agent: *
Disallow: /org/plans.html
Allow: /org/
Allow: /serv
Allow: /~mak
Disallow: /
The following matrix shows which robots are allowed to access URLs:
5.2 Interoperability
6. Security Considerations
There are a few risks in the method described here, which may affect
either origin server or robot.
7. Acknowledgements
The author would like the subscribers to the robots mailing list for
their contributions to this specification.
8. References
[5] Crocker, D., "Standard for the Format of ARPA Internet Text
Messages", STD 11, RFC 822, UDEL, August 1982.
9. Author's Address
Martijn Koster
WebCrawler
America Online
690 Fifth Street
San Francisco
CA 94107
Phone: 415-3565431
EMail: m.koster@webcrawler.com