You are on page 1of 5

MHTML Language Reference

Anderson Santana de Oliveira, Christophe Masson


August 31, 2005

Abstract
MHTML stands for Modular HyperText Markup Language. As its name implies, it is an extension
of HTML and XHTML that introduces modularity in those languages as a pre-processing stage,
consequently preserving compatibility with W3C standards. To fulfil this goal, MHTML extends
the host language with a set of markups that are fully XML-compliant. Internally, MHTML heavily
relies on TOM available at http://tom.loria.fr and lies on the works of Ms. Hélène KIRCHNER,
Mr. Claude KIRCHNER and Mr. Anderson SANTANA DE OLIVEIRA.

1 An MHTML file
An MHTML file is a text file containing specific MHTML language constructs as well as classic HTML
or XHTML markups. The name of an MHTML file must always end with the “.mhtml” (without double
quotes) extension. Each MHTML file has “module” as its root element which takes the filename as its
“name” attribute. That is, a MHTML file must always start with <module name=“filename”> and end
with </module>. Last but not least, MHTML outputs its result in UTF-8 meaning that it is readable
by the majority of internet browsers across the world.

Depending on whether you want MHTML to produce HTML or XHTML code, there are some
particularities to care for.

1.1 HTML
Apart from the aforementioned advice, writing HTML inside an MHTML module is quite straightforward.
Just make sure that you respect the MHTML syntax described below and write your HTML code just
like you would in a classic HTML file, except for the DOCTYPE element that MHTML automatically
adds at the end of the processing.

1.2 XHTML
As XHTML is less lenient than HTML, writing XHTML-compliant code may need some additional effort
on your part. But the robustness of the result is worth it. Firstly, the MHTML file must be XML 1.0
standard-compliant. That is, it must:
• start with an XML declaration (such as <?xml version=“1.0” encoding=“UTF-8”?>,
• have a closing tag for each XHTML element,
• respect the case,
• have all of its elements properly nested, and
• have “module” for its root element.
Additionally, your code must comply with XHTML 1.0 transitional (references available at http:
//www.w3.org/TR/xhtml1/) and the MHTML syntax described below. In XHTML processing mode,
MHTML performs additional checks for well formedness and automatically validates the result against
XHTML 1.0 transitional standard. As such, any failed attempt to meet the above criteria will trigger
an error. Lastly, like in HTML mode, MHTML will take care of the DOCTYPE element.

1
2 MHTML syntax
2.1 The basics
Basically, MHTML introduces the two following markups:

2.1.1 The “module” markup


<module name="filename"> MHTML file’s content (...) </module>

As we have already seen, this element characterizes a MHTML file as it is its root element. The
“name” attribute is required and must have the current file’s name (without the “.mhtml” extension)
for its value.

2.1.2 The “import” markup


<import [mode="raw"|"wellformed"|"valid"|"local-only"]> moduleName </import>

This is the main element of MHTML and certainly one that is used the most as it allows to import
the content of another MHTML file (hence a module) to the currently processed one. I.e. when MHTML
encounters such a markup, it resolves moduleName to the corresponding system filename and then copies
the content of this file in substitution of the <import> </import> construction. Note the path of the
file to import must be relative to the one of the currently processed MHTMLfile. Please note as well
that the directory separator is the character ‘/’.

moduleName designates a file whose path is relative to the MHTML file that declares it. E.g. if
one wants to import a file h1.mhtml in a “header” subdirectory, the import will look like <import>
header/h1.mhtml </import> or <import> header/h1 </import> as the “.mhtml” extension is optional.

Optionally, the import element admits a mode attribute that can take the following values:
“raw” (default value when mode is not explicitly declared): no check is ever performed on the imported
file, it is the most failsafe option
“wellformed” the imported file is checked for well formedness. Basically, a well formed file is an XML
compliant file. If the file to be imported is not well formed, the process is aborted and an error is
displayed.
“valid” the imported file is checked against XHTML 1.0 standard. If the module is not XHTML
compliant, the process is aborted and an error is displayed.
“local-only” the imported file cannot use remote references, i.e. all of its links must point to local
resources.
Should you want to specify multiple modes, you can do so by using the ‘+’ operator (e.g. mode=r̈aw+local-
only¨). Please note that some modes aren’t compatible with other ones. In that case, running MHTML
in such cases will display an error message.

Here is a short example:

// file index.mhtml

<module name="index">
<html>
<body>
<import>world</import>
</body></html>
</module>

2
// end of file index.mhtml

// file world.mhtml, in the same directory:

<module name="world">
Hello world !
</module>

// end of file world.mhtml

Launching MHTML on index.mhtml produces the following file:


// file index.html

<html><body>
Hello world !
</body></html>

// end of file index.html

2.2 Advanced use: the templates


Additionally, MHTML provides a way to use parameterized modules by using templates.
Typically, a template consists of two things:

• a definition which provides the template body as well as a list of expected formal parameters and
their use in the template,
• and an instantation which “calls” the template by giving it actual parameters.

Like any other MHTML file, template definition and instantiation each have a “module” for their
root element. With templates come new markups to implement them, these fall into two categories:

2.2.1 Template definition markups


Firstly, one has to declare parameters in order to use them in the template body. Thus, a template
definition must have a “params” element that takes no attribute and that has one or more “param”
elements as its children. Each “param” element corresponds to one parameter with an unique name.

<params>

<param [mode="raw"|"wellformed"|"valid"|"local-only"]
[optional="true">] parameterName1 </param>

[<param [mode="raw"|"wellformed"|"valid"|"local-only"]
[optional="true">] parameterName2 </param>]

[<param [mode="raw"|"wellformed"|"valid"|"local-only"]
[optional="true">] parameterName3 </param>]

</params>

param supports the following attributes:

“mode” that works like its “import” homograph by enforcing some kind of check upon the designed
parameter. If no such attribute is provided, it defaults to “raw”.

3
“optional” that specifies, in case of it being “true”, that the parameter does not require to be valued
at instantiation time. It it isn’t valued, the corresponding “use” markup(s) (see below) will be ignored.
Any other value than “true” for this attribute will not be taken into account.

As soon as these parameters are declared, they can be used in the template body via the “use”
element:
<use> parameterName </use>
Where parameterName references a previously declared parameter.

When instantiating this template, MHTML will remove the “params” element and will resolve each
“use” element to their corresponding actual parameters.

2.2.2 Template instantiation markups


Now that we have a template body, all that is left is to instantiate it. Like everything in MHTML up
until now, the instantiation takes place in a module and consists of the following elements:

<instantiate>
<importname> templateFilename </importname>
<actualparam fp="paramName"> filename </actualparam>
[<actualparam fp="paramName"> filename </actualparam>]
</instantiate>

At compile time these elements are replaced by the template definition body loaded from the tem-
plateFilename. To each declared parameter a value must be given by using “actualparam” which takes
as its “fp” (for Formal Parameter) attribute the parameter name paramName and that encloses as its
value the filename of the module to load in the place of the corresponding “use” element.

As “actions speak louder than words”, let’s have a look at the following example:

// file template_definition.mhtml

<module name="template_definition">
<params>
<param mode="wellformed"> mycontent </param>
<param> mymenu </param>
<param mode="raw"> mytitle </param>
</params>

<html>
<head>
<title><use>mytitle</use></title>
</head>
<body>
<use> mymenu </use>
<use> mycontent</use>
</body>
<html>
</module>

// end of file template_definition.mhtml

// file instantiation.mhtml

<module name="instantiation">

4
<instantiate>
<importname> template_definition </importname>
<actualparam fp="myheader"> title </actualparam>
<actualparam fp="mycontent"> content </actualparam>
<actualparam fp="mymenu"> menu </actualparam>
</instantiate>

</module>

// end of file instantiation.mhtml

Running MHTML on this example (java -jar mhtml.jar instantiation.mhtml index.html) will load in-
stantiation.mhtml and process it by effectively instantiating template definition.mhtml. That is,
for each encoutered formal parameter, it will give look for the corresponding value and import the
designated module using the chosen mode.

You might also like