HTML Entity

 A "noparse block" (my term) is a block that is parsed according to totally different logic.
It is
the first thing the current parser does after preprocessing, in the "strip" method. The only
thing that ends one of these blocks is the matching close tag.
Notes:
Nowiki, pre and html-comment are always available.
Html is available if $wgRawHtml is true in localsettings.php
Math is available if the math extension is installed
Other tags may be available if installed and present in parser->mTagHooks.
 Magic links are words that may appear within <wiki-text> that are automatically converted
to external links without any special markup being required by the person writing the page.
Note:
that all character-literals on this page are case sensitive (i.e. upper-case characters in the definitions
on this page MUST be written in upper case in the markup).
HTML entity
The parser recognises validly constructed HTML entities and leaves them alone.
<html-entity> ::= "&" <html-entity-name> ";"
| "&#" <decimal-number> ";"
| "&#x" <hex-number> ";"
<html-entity-name> ::= Sanitizer::$wgHtmlEntities (case sensitive)
(* "Aacute" | "aacute" | ... *)
Rendering
 The whole sequence is output literally.

 The current parser does very complicated things with escaping and de-escaping. So
maybe there are places where something more complicated needs to happen.
HTML unsafe symbol

These "unsafe" symbols are turned into HTML entities if they haven't matched part of a valid
HTML entity above. It's probably not too efficient having single-character level matching
rules...perhaps should be combined with "text".
<html-unsafe-symbol> ::= <unescaped-ampersand> | <unespaced-less-than> |

<unescaped-greater-than>
<unescaped-ampersand> ::= "&"
<unescaped-less-than> ::= "<"
<unescaped-greater-than> ::= ">"
Rendering
 <unescaped-ampersand> → &
 <unescaped-less-than> → <
 <unescaped-greater-than> → >
Text
Harmless-characters mean characters that couldn't be anything else. I'm not sure how useful
this is as a distinction, but perhaps it will help speed things up?
A "random character" is any character which hasn't matched anything else.
<harmless-characters> ::= /[A-Za-z0-9] etc

<random-character> ::= ? any character ... ?
Rendering
Both types are written literally.
This section from the "fundamental elements" section...time to mangle!
<character> ::= <whitespace-char> | <non-whitespace-char> |

<html-entity>
<whitespace> ::= <whitespace-char> [<whitespace>] | EOF

<newlines> ::= <newline> [<newlines>]
<space-tabs> ::= <space-tab> [<space-tabs>]
<whitespace-char> ::= <space-tab> | <newline>
<space-tab> ::= <space> | TAB

<spaces> ::= <space> [<spaces>]
<space> ::= " "
<newline> ::= CR LF | LF CR | CR | LF
<BOL> ::= <newline> | BOF
<EOL> ::= <newline> | EOF
<non-whitespace-char> ::= <letter> | <decimal-digit> | <symbol>

<letter> ::= <ucase-letter> | <lcase-letter>
<ucase-letter> ::= "A" | "B" | ... | "Y" | "Z"
<lcase-letter> ::= "a" | "b" | ... | "y" | "z"
<symbol> ::= <html-unsafe-symbol> | <underscore> | "." | ","
| ...
<underscore> ::= "_"
<decimal-number> ::= <decimal-digit> [<decimal-number>]

<decimal-digit> ::= "0" | "1" | ... | "8" | "9"
<hex-number> ::= <hex-digit> [<hex-number>]

<hex-digit> ::= <decimal-digit>
| "A" | "B" | "C" | "D" | "E" | "F"
| "a" | "b" | "c" | "d" | "e" | "f
Formatting
Bold/italics is the biggest problem with switching to a consume-parse-render parser. It will
not be possible to describe the current, extremely esoteric rules in simple (E)BNF. The best
we can hope for is to store tokens representing the apostrophe clumps and do a second
pass to make more sense of them. It would be very useful to define a second, unambiguous
set of formatting syntax (most likely // and **), and encourage people to use those wherever
apostrophes and bold/italics meet.
Some rules for parsing bold/italics as recognised by the current parser. These must be
implemented (Brion said so). In increasing order of complexity:
1. ''italics'', '''bold''', '''''bold-italics'''''
italics, bold, bold-italics
2. '''''bold-italics''' just italics'' normal
bold-italics just italics normal
3. Some text about l''''Arc de triomphe'''.
Some text about l'Arc de triomphe.
4. However: '''bold is l''''Arc de triomphe'''.
However: bold is l'Arc de triomphe.
Optimistic view:
<formatting> ::= <bold-italic-toggle> | <bold-toggle> |

<italic-toggle>
<bold-italic-toggle> ::= "'''''"
<bold-toggle> ::= "'''"
<italic-toggle> ::= "''"
Reality:
<formatting> ::= <apostrophe-jungle>

<apostrophe-jungle> ::= "''" { "'" }
Rendering
 Once the parser has decided which way the toggles go:
 bold-toggle-on -> 
 bold-toggle-off -> 
 italic-toggle-on -> 
 italic-toggle-off -> 
 bold-italic-toggle-on -> 
 bold-italic-toggle-off-> 
Determining the behaviour of apostrophes

The following describes the behaviour of repeated apostrophes. "Bold" means "toggle bold",
rather than "turn bold on". "Bold, italics" means "Toggle bold and italics independently",
rather than "turn bold and italics on" or "toggle bold and italics the same way".
 One ( ' ): Always a single apostrophe.
e.g. ( hello ' blah ) → hello ' blah

 Two ( '' ): Always italics on or off
e.g. ( hello '' blah ) → hello blah
 Three ( ''' ):
1. Bold (default)
e.g. ( hello ''' blah ) → hello blah
2. Apostrophe, italics
 If there is otherwise an odd number of both bold and italics
1. If the preceding characters are <space><non-space> (and
there are no earlier such sequences)
e.g. ( hello l'''amour'' l'''ouest''' blah ) → hello l'amour louest blah
2. Else if the preceding characters are <non-space><non-space>
(and there are no earlier such sequences)
e.g. ( hello mon'''amour'' blah ) → hello mon'amour blah
3. Else (the preceding character is <space>) (and there are no
earlier such sequences)
e.g. ( hello '''amour'' '''blah '''blah ) → hello 'amour blah blah
3. Italics, apostrophe (never)

 Four ( '''' ):
1. Bold, apostrophe (never)
2. Apostrophe, bold (default, if either bold or italics ends up balanced)
e.g. ( hello ''''amour''' now ''italics unbalanced, but that's
ok ) → hello 'amour now italics unbalanced, but that's ok
e.g. ( hello ''''amour''' now, '''bold unbalanced, but that's
ok ) → hello 'amour now, bold unbalanced, but that's ok
3. Apostrophe, apostrophe, italics

 If the default treatment leads to an odd number of bold and italics then
this can meet condition 1 under the second case of three italics, above.
e.g. ( hello ''''amour''' now '''''bold and italics unbalanced,
so invoke this special case ) → hello ''amour now bold and italics
unbalanced, so invoke this special case
 Five ( ''''' ):
1. Bold, italics; or italics, bold (default, the two cases are equivalent)
e.g. ( hello ''''' blah ) → hello blah
2. Italics, apostrophe, italics (never)

 More than five:
1. Apostrophes, bold+italics (default)
e.g. ( hello '''''''''' blah ) → hello ''''' blah
e.g. ( hello '''bold '''''''''' blah ) → hello bold ''''' blah
2. Bold+italics, apostrophes (never)
Inline HTML
The parser recognises and cleans a large number of HTML tags, as defined in Sanitizer.php.
A decision has to be made here on whether to attempt to parse these things as a matched
set, or whether to leave that to a later pass.
A loose definition assuming they are treated individually:
<InlineHTML> ::= <InlineHTML-Open> | <InlineHTML-Close> |

<InlineHTML-OpenClose> | <HTMLComment>
<InlineHTML-Open> ::= "<" <InlineHTMLtagname> [<extra-
characters>] ">"
<InlineHTML-Close> ::= "</" <InlineHTMLtagname> [<extra-
characters>] ">"
<InlineHTML-OpenClose> ::= "<" <InlineHTMLtagname> [<extra-
characters>] "/>"
<extra-characters> ::= <word-boundary-char> {characters - ">"}
<word-boundary-char> ::= " " | "-" | ":" | " " | "\"" | "/" | "*" |
"#" | "!" | "$" | "%" | ...
Remarks
The range of "word-boundary-char" seems to be an artefact of the regular
expression: if( preg_match( '!^(/?)(\\w+)([^>]*?)(/{0,1}>)([^<]*)$!',
$x, $regs ) ) {
The list of "tags that must be closed":
block elements
p, span, table, div,
lists
ol, ul, dl,
paragraph formatting
h1, h2, h3, h4, h5, h6, cite, center, blockquote, caption, pre,
character formatting
b, del, i, ins, u, font, big, small, sub, sup, code, em, s,
strike, strong, tt, var, u
Ruby
rt, rb , rp, ruby,
Tags that can appear singly, and possibly paired

br, hr, li, dt, dd
Tags that must not be paired:

br, hr
Tags that can be nested (source code is dubious on this)

table, tr, td, th, div, blockquote, ol, ul, dl, font, big, small, sub, sup, span
Tags that can only appear inside a table:

td, th, tr,
Tags that make lists

ul,ol,
And tags that can appear inside lists

li

The significance of these groupings is shown as follows:
A <blockquote> B C </blockquote> D E
Here, blockquote and span are both "nesting" tags. When the close-blockquote tag is found inside the
span block, it is escaped.
This doesn't work:
Some text [[Image:foo.jpg|close it.]]
But this does:
Some text [[Image:foo.jpg|close it.]]
Rendering
 Tags that have to be paired are forced closed according to some sort of logic.
 <extra-characters> are "sanitized", strip all but pre-approved attributes and styles on a
whitelist.
 Tags are then written out literally: <InlineHTMLTagname> " " <sanitized-
attributes> > etc.
 HTML comments are completely discarded, with some whitespace massaging: (sanitizer.php)
To avoid leaving blank lines, when a comment is both preceded and followed by a newline
(ignoring spaces), trim leading and trailing spaces and one of the newlines.
Non-breaking spaces
This is pretty trivial and used basically to improve the appearance of punctuation in French, which
always places a space before certain punctuation, and places spaces inside guillemets. Other
languages use these characters, but without the spaces. Currently performed directly in the parse()
method.
<nbsp-before> ::= [any character] <space> ("»" |

"?" | ":" | ";" | "!" | "%")
<nbsp-after> ::= "«" <space>
Rendering
 In both cases, the space is converted to a string.
Behaviour switches
Not to be confused with magic links. These seem to be able to be used virtually anywhere: a table of
contents in an image caption even works. See Help:Magic words#Behaviour switches.
<behaviour-switch> ::= <behaviourswitch-toc> |

<behaviourswitch-forcetoc> | <behaviourswitch-notoc> |
<behaviourswitch-noeditsection> | <behaviourswitch-nogallery>
<behaviourswitch-toc> ::= mw("toc")
<behaviourswitch-forcetoc> ::= mw("forcetoc")
<behaviourswitch-notoc> ::= mw("notoc")
<behaviourswitch-noeditsection> ::= mw("noeditsection")
<behaviourswitch-nogallery> ::= mw("nogallery")
/* defaults, i->case insensitive, s->case sensitive */

mw("notoc") ::= "__TOC__"i
mw("forcetoc") ::= "__FORCETOC__"i
mw("notoc") ::= "__NOTOC__"i
mw("noeditsection") ::= "__NOEDITSECTION__"i
mw("nogallery") ::= "__NOGALLERY__"i
Notes:
 These are the "default" strings to be matched. They can be modified in

languages/messages/MessagesXx.php where Xx is the language.
 Each magicword can have more than one string associated with it.
 The magic words are by default case insensitive but this can be changed in the file.
 Plenty of other "magic words" exist, including "magic variables" (eg
{{CURRENTMONTH}}) which will be handled by the preprocessor. However it looks like
all sorts of other "magic words" exist and are processed in different places.
Semantics
 behaviourswitch-toc: a miniature contents page will be rendered and inserted at the first
instance of this token.
 behaviourswitch-forcetoc: a contents box will be rendered even if the normal criteria
(typically, 4 sections) have not been met. Irrelevant if magicword-toc is present.
 behaviourswitch-notoc: no miniature contents pages will be rendered. Only takes effect if
neither magicword-toc nor magicword-forcetoc are present.
 behaviourswitch-noeditsection: no edit links are to be displayed for any sections.
 behaviourswitch-nogallery: unclear. According to the code (parser::stripNoGallery): if the
string (not case-sensitive) occurs in the HTML, do not add TOC. Perhaps it only has an effect
in certain namespaces.
Images, media, gallery
Links to images and media should be handled as normal links. It's inline images and media that are
being dealt with here.
Originally from MetaWiki.
Images
ImageInline ::= "[[" , "Image:" , PageName, ".",
ImageExtension, ( { <Pipe>, ImageOption, } ) "]]" ;
ImageName ::= PageName, ".", ImageExtension
ImageExtension ::= "jpg" | "jpeg" | "png" | "svg" | "gif" |
"bmp" ;
ImageOption ::= ImageModeParameter | ImageSizeParameter |
ImageAlignParameter
| ImageVAlignParameter | Caption
ImageModeParameter ::= ImageModeManualThumb | ImageModeThumb |

ImageModeFrame | ImageModeFrameless
ImageModeManualThumb ::= mw("img_manualthumb");

ImageModeAutoThumb ::= mw("img_thumbnail");
ImageModeFrame ::= mw("img_frame");
ImageModeFrameless ::= mw("img_frameless");
/* Default settings: */
mw("img_manualthumb") ::= "thumbnail=", ImageName | "thumb=",
ImageName
mw("img_thumbnail") ::= "thumbnail" | "thumb";
mw("img_frame") ::= "framed" | "enframed" | "frame";
mw("img_frameless") ::= "frameless";
ImageOtherParameter ::= ImageParamPage | ImageParamUpright |

ImageParamBorder
ImageParamPage ::= mw("img_page")
ImageParamUpgright ::= mw("img_upright")
ImageParamBorder ::= mw("img_border")
mw("img_page") ::= "page=$1" | "page $1" ??? (where is this
used?)
mw("img_upright") ::= "upright" [, ["=",] PositiveInteger]
mw("img_border") ::= "border"
ImageSizeParameter ::= mw("img_width");

/* Default setting: */
mw("img_width") ::= PositiveNumber "px" ;
ImageAlignParameter ::= ImageAlignLeft | ImageAlign|Center |

ImageAlignRight | ImageAlignNone
ImageAlignLeft ::= mw("img_left")
ImageAlignCenter ::= mw("img_center")
ImageAlignRight ::= mw("img_right")
ImageAlignNone ::= mw("img_none")
mw("img_left") ::= "left"
mw("img_center") ::= "center" | "centre"
mw("img_right") ::= "right"
mw("img_none") ::= "none"
ImageValignParameter ::= ImageValignBaseline | ImageValignSub |

ImageValignSuper | ImageValignTop
| ImageValignTextTop | ImageValignMiddle |
ImageValignBottom | ImageValignTextBottom
ImageValignBaseline ::= mw("img_baseline")
ImageValignSub ::= mw("img_sub")
ImageValignSuper ::= mw("img_super")
ImageValignTop ::= mw("img_top")
ImageValignTextTop ::= mw("img_text_top")
ImageValignMiddle ::= mw("img_middle")
ImageValignBottom ::= mw("img_bottom")
ImageValignTextBottom ::= mw("img_text_bottom")
/* By default: */
mw("img_baseline") ::= "baseline"
mw("img_sub") ::= "sub"
mw("img_super") ::= "super" | "sup"
mw("img_top") ::= "top"
mw("img_text_top") ::= "text-top"
mw("img_middle") ::= "middle"
mw("img_bottom") ::= "bottom"
mw("img_text_bottom") ::= "text-bottom"
Caption ::= <inline-text>
Semantics
 Renders an image inline using the <img> tag.

 It is not an error to specify multiple alignment parameters; the first specified is the one used.
 It is not an error to specify multiple captions; the last specified is the one used.
 The caption has no effect if ThumbImageParameter is not given.
Media
MediaInline ::= "[[" , "Media:" , PageName "."

MediaExtension "]]" ;
MediaExtension = "ogg" | "wav" ;
Gallery
GalleryBlock ::= "<gallery>" [ NewLine ] GalleryImage

{ [ NewLine ] GalleryImage } [ NewLine ] "</gallery>" ;
GalleryImage ::= (to be defined: essentially foo.jpg[|
caption] )
Remarks:
 The gallery block can technically be used in the middle of a sentence so is not a "special
block". It doesn't render particularly nicely when you do that though.
The list of "tags that must be closed":

block elements
p, span, table, div,
lists
ol, ul, dl,
paragraph formatting
h1, h2, h3, h4, h5, h6, cite, center, blockquote, caption, pre,
character formatting
b, del, i, ins, u, font, big, small, sub, sup, code, em, s,
strike, strong, tt, var, u
Ruby
rt, rb , rp, ruby,
Tags that can appear singly, and possibly paired
br, hr, li, dt, dd
Tags that must not be paired:
br, hr
Tags that can be nested (source code is dubious on this)
table, tr, td, th, div, blockquote, ol, ul, dl, font, big, small, sub, sup, span
Tags that can only appear inside a table:
td, th, tr,
Tags that make lists
ul,ol,
And tags that can appear inside lists
li
The significance of these groupings is shown as follows:
A <blockquote> B C </blockquote> D E
Here, blockquote and span are both "nesting" tags. When the close-blockquote tag is found
inside the span block, it is escaped.
This doesn't work:
Some text [[Image:foo.jpg|close it.]]
But this does:
Some text [[Image:foo.jpg|close it.]]
Rendering
 Tags that have to be paired are forced closed according to some sort of logic.
 <extra-characters> are "sanitized", strip all but pre-approved attributes and styles on
a whitelist.
 Tags are then written out literally: <InlineHTMLTagname> " " <sanitized-
attributes> > etc.
 HTML comments are completely discarded, with some whitespace massaging:
(sanitizer.php)
To avoid leaving blank lines, when a comment is both preceded and followed by a newline
(ignoring spaces), trim leading and trailing spaces and one of the newlines.
Non-breaking spaces
This is pretty trivial and used basically to improve the appearance of punctuation in French,
which always places a space before certain punctuation, and places spaces inside
guillemets. Other languages use these characters, but without the spaces. Currently
performed directly in the parse() method.
<nbsp-before> ::= [any character] <space> ("»" | "?" |
":" | ";" | "!" | "%")
<nbsp-after> ::= "«" <space>
Rendering
 In both cases, the space is converted to a string.
Behaviour switches
Not to be confused with magic links. These seem to be able to be used virtually anywhere: a
table of contents in an image caption even works. See Help:Magic words#Behaviour
switches.
<behaviour-switch> ::= <behaviourswitch-toc> |
<behaviourswitch-forcetoc> | <behaviourswitch-notoc> | <behaviourswitch-
noeditsection> | <behaviourswitch-nogallery>
<behaviourswitch-toc> ::= mw("toc")

<behaviourswitch-forcetoc> ::= mw("forcetoc")
<behaviourswitch-notoc> ::= mw("notoc")
<behaviourswitch-noeditsection> ::= mw("noeditsection")
<behaviourswitch-nogallery> ::= mw("nogallery")
/* defaults, i->case insensitive, s->case sensitive */
mw("notoc") ::= "__TOC__"i
mw("forcetoc") ::= "__FORCETOC__"i
mw("notoc") ::= "__NOTOC__"i
mw("noeditsection") ::= "__NOEDITSECTION__"i
mw("nogallery") ::= "__NOGALLERY__"i
Notes:
 These are the "default" strings to be matched. They can be modified

in languages/messages/MessagesXx.php where Xx is the language.
 Each magicword can have more than one string associated with it.
 The magic words are by default case insensitive but this can be changed in the file.
 Plenty of other "magic words" exist, including "magic variables" (eg
{{CURRENTMONTH}}) which will be handled by the preprocessor. However it looks like
all sorts of other "magic words" exist and are processed in different places.
Semantics
 behaviourswitch-toc: a miniature contents page will be rendered and inserted at the

first instance of this token.
 behaviourswitch-forcetoc: a contents box will be rendered even if the normal criteria
(typically, 4 sections) have not been met. Irrelevant if magicword-toc is present.
 behaviourswitch-notoc: no miniature contents pages will be rendered. Only takes
effect if neither magicword-toc nor magicword-forcetoc are present.
 behaviourswitch-noeditsection: no edit links are to be displayed for any sections.
 behaviourswitch-nogallery: unclear. According to the code (parser::stripNoGallery): if
the string (not case-sensitive) occurs in the HTML, do not add TOC. Perhaps it only has
an effect in certain namespaces.
Images, media, gallery

Links to images and media should be handled as normal links. It's inline images and media
that are being dealt with here.
Originally from MetaWiki.
Images
ImageInline ::= "[[" , "Image:" , PageName, ".",

ImageExtension, ( { <Pipe>, ImageOption, } ) "]]" ;
ImageName ::= PageName, ".", ImageExtension
ImageExtension ::= "jpg" | "jpeg" | "png" | "svg" | "gif" |
"bmp" ;
ImageOption ::= ImageModeParameter | ImageSizeParameter |
ImageAlignParameter
| ImageVAlignParameter | Caption
ImageModeParameter ::= ImageModeManualThumb | ImageModeThumb |

ImageModeFrame | ImageModeFrameless
ImageModeManualThumb ::= mw("img_manualthumb");

ImageModeAutoThumb ::= mw("img_thumbnail");
ImageModeFrame ::= mw("img_frame");
ImageModeFrameless ::= mw("img_frameless");
mw("img_manualthumb") ::= "thumbnail=", ImageName | "thumb=",
ImageName
mw("img_thumbnail") ::= "thumbnail" | "thumb";
mw("img_frame") ::= "framed" | "enframed" | "frame";
mw("img_frameless") ::= "frameless";
ImageOtherParameter ::= ImageParamPage | ImageParamUpright |
ImageParamBorder
ImageParamPage ::= mw("img_page")
ImageParamUpgright ::= mw("img_upright")
ImageParamBorder ::= mw("img_border")
mw("img_page") ::= "page=$1" | "page $1" ??? (where is this
used?)
mw("img_upright") ::= "upright" [, ["=",] PositiveInteger]
mw("img_border") ::= "border"
ImageSizeParameter ::= mw("img_width");

/* Default setting: */
mw("img_width") ::= PositiveNumber "px" ;
ImageAlignParameter ::= ImageAlignLeft | ImageAlign|Center |

ImageAlignRight | ImageAlignNone
ImageAlignLeft ::= mw("img_left")
ImageAlignCenter ::= mw("img_center")
ImageAlignRight ::= mw("img_right")
ImageAlignNone ::= mw("img_none")
mw("img_left") ::= "left"
mw("img_center") ::= "center" | "centre"
mw("img_right") ::= "right"
mw("img_none") ::= "none"
ImageValignParameter ::= ImageValignBaseline | ImageValignSub |

ImageValignSuper | ImageValignTop
| ImageValignTextTop | ImageValignMiddle |
ImageValignBottom |
ImageValignTextBottom
ImageValignBaseline ::= mw("img_baseline")

ImageValignSub ::= mw("img_sub")
ImageValignSuper ::= mw("img_super")
ImageValignTop ::= mw("img_top")
ImageValignTextTop ::= mw("img_text_top")
ImageValignMiddle ::= mw("img_middle")
ImageValignBottom ::= mw("img_bottom")
ImageValignTextBottom ::= mw("img_text_bottom")
/* By default: */
mw("img_baseline") ::= "baseline"
mw("img_sub") ::= "sub"
mw("img_super") ::= "super" | "sup"
mw("img_top") ::= "top"
mw("img_text_top") ::= "text-top"
mw("img_middle") ::= "middle"
mw("img_bottom") ::= "bottom"
mw("img_text_bottom") ::= "text-bottom"
Caption ::= <inline-text>
Semantics
 Renders an image inline using the <img> tag.

 It is not an error to specify multiple alignment parameters; the first specified is the
one used.
 It is not an error to specify multiple captions; the last specified is the one used.
 The caption has no effect if ThumbImageParameter is not given.
Media
MediaInline ::= "[[" , "Media:" , PageName "."

MediaExtension "]]" ;
MediaExtension = "ogg" | "wav" ;
Gallery
GalleryBlock ::= "<gallery>" [ NewLine ] GalleryImage
{ [ NewLine ] GalleryImage } [ NewLine ] "</gallery>" ;
GalleryImage ::= (to be defined: essentially foo.jpg[|
caption] )
Remarks:
 The gallery block can technically be used in the middle of a sentence so is not a
"special block". It doesn't render particularly nicely when you do that though.

HTML Entity

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HTML Entity

Uploaded by

Copyright:

Available Formats

 A "noparse block" (my term) is a block that is parsed according to totally different logic.

 The whole sequence is output literally.

HTML unsafe symbol

<html-unsafe-symbol> ::= <unescaped-ampersand> | <unespaced-less-than> |

<harmless-characters> ::= /[A-Za-z0-9] etc

This section from the "fundamental elements" section...time to mangle!

<character> ::= <whitespace-char> | <non-whitespace-char> |

<whitespace> ::= <whitespace-char> [<whitespace>] | EOF

<whitespace-char> ::= <space-tab> | <newline>

<space-tab> ::= <space> | TAB

<non-whitespace-char> ::= <letter> | <decimal-digit> | <symbol>

<underscore> ::= "_"

<decimal-number> ::= <decimal-digit> [<decimal-number>]

<hex-number> ::= <hex-digit> [<hex-number>]

<formatting> ::= <bold-italic-toggle> | <bold-toggle> |

<formatting> ::= <apostrophe-jungle>

Determining the behaviour of apostrophes

e.g. ( hello ' blah ) → hello ' blah

e.g. ( hello '' blah ) → hello blah

3. Italics, apostrophe (never)

3. Apostrophe, apostrophe, italics

2. Italics, apostrophe, italics (never)

2. Bold+italics, apostrophes (never)

<InlineHTML> ::= <InlineHTML-Open> | <InlineHTML-Close> |

The list of "tags that must be closed":

Tags that can appear singly, and possibly paired

Tags that must not be paired:

Tags that can be nested (source code is dubious on this)

Tags that can only appear inside a table:

Tags that make lists

And tags that can appear inside lists

This doesn't work:

<span>Some text [[Image:foo.jpg|close </span>it.]]

But this does:

<b>Some text [[Image:foo.jpg|close </b>it.]]

<nbsp-before> ::= [any character] <space> ("&raquo;" |

<behaviour-switch> ::= <behaviourswitch-toc> |

/* defaults, i->case insensitive, s->case sensitive */

 These are the "default" strings to be matched. They can be modified in

Images, media, gallery

Originally from MetaWiki.

ImageModeParameter ::= ImageModeManualThumb | ImageModeThumb |

ImageModeManualThumb ::= mw("img_manualthumb");

ImageOtherParameter ::= ImageParamPage | ImageParamUpright |

ImageSizeParameter ::= mw("img_width");

ImageAlignParameter ::= ImageAlignLeft | ImageAlign|Center |

ImageValignParameter ::= ImageValignBaseline | ImageValignSub |

Caption ::= <inline-text>

 Renders an image inline using the <img> tag.

MediaInline ::= "[[" , "Media:" , PageName "."

GalleryBlock ::= "<gallery>" [ NewLine ] GalleryImage

The list of "tags that must be closed":

A <blockquote> B <span>C </blockquote> D </span> E

<span>Some text [[Image:foo.jpg|close </span>it.]]

But this does:

<b>Some text [[Image:foo.jpg|close </b>it.]]

 In both cases, the space is converted to a &#160; string.

<behaviourswitch-toc> ::= mw("toc")

 These are the "default" strings to be matched. They can be modified

 behaviourswitch-toc: a miniature contents page will be rendered and inserted at the

Images, media, gallery

ImageInline ::= "[[" , "Image:" , PageName, ".",

ImageModeParameter ::= ImageModeManualThumb | ImageModeThumb |

<nbsp-before> ::= [any character] <space> ("»" |

 In both cases, the space is converted to a string.