You are on page 1of 14

Writing HTML documents

Documents must consist of the


following parts, in the given order:

1. Optionally, a single U+FEFF BYTE ORDER MARK (BOM) character.


2. Any number of comments and ASCII whitespace.
3. A DOCTYPE.
4. Any number of comments and ASCII whitespace.
5. The document element, in the form of an html element.
6. Any number of comments and ASCII whitespace.
ASCII whitespace
• SPACE
• CHARACTER TABULATION (tab)
• LINE FEED (LF)
Linefeed means to advance downward to the next line; however, it has been
repurposed and renamed. Used as "newline", it terminates lines (commonly confused
with separating lines).
• FORM FEED (FF)
Form feed means advance downward to the next "page". It was commonly used as
page separators, but now is also used as section separators.
• CARRIAGE RETURN (CR)
Carriage return means to return to the beginning of the current line without
advancing downward.
Void Elements

• Void elements can't have any contents (since there's no end tag, no content
can be put between the start tag and the end tag).
The DOCTYPE

A DOCTYPE is a required preamble


1.A DOCTYPE must consist of the following components, in this order:
2.A string that is an ASCII case-insensitive match for the string "<!DOCTYPE".
3.One or more ASCII whitespace.
4.A string that is an ASCII case-insensitive match for the string "html".
5.Optionally, a DOCTYPE legacy string.
6.Zero or more ASCII whitespace.
7.A GREATER-THAN SIGN character (>).
.

Note: <!DOCTYPE html>, case-insensitively.


Elements
There are six different kinds of elements
1.Void elements
area, base, br, col, embed, hr, img, input, link, meta, source, track, wbr
2.The template element
template
3.Raw text elements
script, style
4.Escapable raw text elements
textarea, title
5.Foreign elements
Elements from the MathML namespace and the SVG namespace.
6.Normal elements
All other allowed HTML elements are normal elements.
Start tags
Start tags must have the following format
1. The first character of a start tag must be a LESS-THAN SIGN character (<).
2. The next few characters of a start tag must be the element's tag name.
3. If there are to be any attributes in the next step, there must first be one or more ASCII whitespace.
4. Then, the start tag may have a number of attributes, the syntax for which is described below.
Attributes must be separated from each other by one or more ASCII whitespace.
5. After the attributes, or after the tag name if there are no attributes, there may be one or more
ASCII whitespace. (Some attributes are required to be followed by a space. See the attributes
section below.)
6. Then, if the element is one of the void elements, or if the element is a foreign element, then there
may be a single SOLIDUS character (/), which on foreign elements marks the start tag as self-
closing. On void elements, it does not mark the start tag as self-closing but instead is unnecessary
and has no effect of any kind. For such void elements, it should be used only with caution —
especially since, if directly preceded by an unquoted attribute value, it becomes part of the
attribute value rather than being discarded by the parser.
7. Finally, start tags must be closed by a GREATER-THAN SIGN character (>).
End tags

End tags must have the following format:

1. The first character of an end tag must be a LESS-THAN SIGN character (<).
2. The second character of an end tag must be a SOLIDUS character (/).
3. The next few characters of an end tag must be the element's tag name.
4. After the tag name, there may be one or more ASCII whitespace.
5. Finally, end tags must be closed by a GREATER-THAN SIGN character (>).
Attributes

• Attributes for an element are expressed inside the element's start tag.
• Attributes have a name and a value. Attribute names must consist of one or
more characters other than controls, SPACE, ("), ('),(>), (/), (=), and
noncharacters. In the HTML syntax, attribute names, even those for foreign
elements, may be written with any mix of ASCII lower and ASCII upper
alphas.
• Attributes can be specified in four different ways:
• Empty attribute syntax
Just the attribute name. The value is implicitly the empty string.
• Unquoted attribute value syntax
The attribute name, followed by zero or more ASCII whitespace, followed by a single EQUALS
SIGN character, followed by zero or more ASCII whitespace, followed by the attribute value, which, in
addition to the requirements given above for attribute values, must not contain any literal ASCII
whitespace, any QUOTATION MARK characters ("), APOSTROPHE characters ('), EQUALS SIGN
characters (=), LESS-THAN SIGN characters (<), GREATER-THAN SIGN characters (>), or GRAVE
ACCENT characters (`), and must not be the empty string
• Single-quoted attribute value syntax
The attribute name, followed by zero or more ASCII whitespace, followed by a single EQUALS SIGN
character, followed by zero or more ASCII whitespace, followed by a single APOSTROPHE character
('), followed by the attribute value, which, in addition to the requirements given above for attribute
values, must not contain any literal APOSTROPHE characters ('), and finally followed by a second
single APOSTROPHE character (').
• Double-quoted attribute value syntax
The attribute name, followed by zero or more ASCII whitespace, followed by a single EQUALS SIGN
character, followed by zero or more ASCII whitespace, followed by a single QUOTATION MARK
character ("), followed by the attribute value, which, in addition to the requirements given above for
attribute values, must not contain any literal QUOTATION MARK characters ("), and finally followed by
a second single QUOTATION MARK character (").
Comments

Comments must have the following format:


1. The string "<!--".
2. Optionally, text, with the additional restriction that the text
must not start with the string ">", nor start with the string "-
>", nor contain the strings "<!--", "-->", or "--!>", nor end with
the string "<!-".
3. The string "-->".
The HTML Document Tree
<body>

<div id="content">
<h1>Heading here</h1>
<p>Lorem ipsum dolor sit amet.</p>
<p>Lorem ipsum dolor <em>sit</em> amet.</p>
<hr>
</div>

<div id="nav">
<ul>
<li>item 1</li>
<li>item 2</li>
<li>item 3</li>
</ul>
• A diagram of the this HTML document tree would look like this.
</div>

</body>
Parent and Child

• A parent is an element that is directly above and connected to an element in the document tree. In the diagram
below, the <div> is a parent to the <ul>.

• A child is an element that is directly below and connected to an element in the document tree. In the diagram
above, the <ul> is a child to the <div>.
Sibling
• A sibling is an element that shares the same parent with another element.
• In the diagram below, the <li>'s are siblings as they all share the same parent - the <ul>.

You might also like