Professional Documents
Culture Documents
A huge amount of information can be gleaned as a result of the ability to examine the
structure and compilation of a website or webpage. Corroborative intelligence, such as
keywords, image files, and meta tag files can allow an investigator to determine important
facts about the intended audience and true purpose of a website or webpage. The ability to
see "behind" a website or webpage in this manner is a huge advantage to an online
investigator. This lesson goes "behind the scenes" to examine HTML and meta tags to
discover how this can progress an investigation in unexpected ways.
• Learn how to locate images and other files that may otherwise have gone untraced.
• Be able to identify hyperlinks to URLs and email addresses and locate hidden text
within a webpage.
This lesson should take no longer than 60 minutes to complete. If you have any questions
or require assistance, please contact the course instructor at training@toddington.com.
Chapters
Introduction
Markup tags not only tell the Web browser how to display the words and images on the
page, they also contain the information required for hyperlinks to work properly. It is not
only possible, but often highly useful, to directly view a webpage’s HTML source code, as
this can provide valuable insight into the real purpose of the site, as well as establishing its
intended audience. HTML source code can be viewed in raw form through most Web
browsers.
For this lesson, a sample webpage has been created and is located at https://
www.toddington.com/course-materials/example.htm. Go to this URL and confirm that the
page is the same as the one illustrated below.
The HTML source code may be located within different menus, depending on the browser
used to access it. The Mozilla Firefox browser will be recommended throughout this
program as it compatible with different operating systems, including Windows and
Macintosh.
Selecting this option will open the HTML source code window illustrated below.
Using Firefox, a page’s HTML source code can also be accessed via the dropdown Tools
menu. Within the dropdown menu, select ‘Web Developer’ followed by the ‘Page Source’
option.
In Windows version 10, Internet Explorer was replaced with Microsoft’s Edge browser and is
as simple-to-use, with familiar keyboard short cuts and other Windows-based features.
Before we can view a page’s source code, we will need to enable this option from within the
browser’s ‘Developer settings.’ To access these settings, type “about:flags” in the Edge
search bar. As illustrated below, ensure the ‘Show “View source” and “Inspect element”
in the context menu’ option is selected.
As illustrated below, the F12 Developer Tools page will be displayed; by default, the
Debugger tab will be shown.
The ‘F12 Developer Tools’ can also be accessed from within the ‘More’ menu in the top
righthand corner of the screen (illustrated below), or by pressing the fn + f12 shortcut keys
on your keyboard.
If you are using an older version of Windows that is bundled with the Internet Explorer
browser, or simply prefer to use this browser over Edge, open the page https://
www.toddington.com/course-materials/example.htm, right click anywhere on the page and
select ‘View source’ (illustrated in the screen shot below).
Similar to the Edge browser, selecting this option will open the F12 Developer Tools
window, with the Debugger window displaying the HTML source code by default. A page’s
source code can also be viewed via the Internet Explorer drop down Tools menu, by
selecting the F12 Developer Tools option; using the keyboard shortcut keys fn + f12 will
also produce the same page.
Using Chrome
Using the Chrome browser, a site’s HTML source code can be retrieved in the same manner
as the other browsers demonstrated in this lesson, by right clicking on the https://
www.toddington.com/course-materials/example.htm page and selecting ‘View Page
Source.’
In order to access a page’s source code via Chrome’s dropdown menu, hover your mouse
over the More tools option provided, and then select Developer tools (illustrated below).
Using Safari
Finally, using Safari, right clicking on the example page and selecting ‘Show Page Source’
will reveal the page’s source code.
By default, the ‘Show Page Source’ option is not enabled in Safari. In order to be able to
view this option, select ‘Safari’ from the toolbar, then select ‘Preferences.’
From the preferences window, select the ‘Advanced’ tab, and ensure the ‘Show Develop
menu in menu bar’ checkbox is ticked. You will now be able to view the ‘page source’
option in the right-click menu.
A page’s source code can also be viewed in Safari by selecting ‘Show Page Source’ from
within the Develop drop down menu, as illustrated below.
Note: The following demonstration will be conducted using the Mozilla Firefox browser.
Functionality will not differ from browser to browser.
All HTML source code documents consist of a ‘head’ and a ‘body,’ and are constructed using
‘markup tags’ and ‘meta tags.’
When viewed through a browser, the markup tags would not be visible; only the text
contained between the tags would appear. Markup tags also contain instructions relating
to the font type, colour, size, and spacing of the text contained within the markup tags.
Carefully examine the HTML source code of the example page and note the following
instructions and markup tags in relation to the text displayed on the webpage.
Instruction on how to align the sentence within the document is located here.
The size and type-face of the text (i.e., bold) is shown here.
The text of the page is identifiable within the body of the source code. In the example
above, the text reads:
Images
Images displayed on Web pages are stored separately from the HTML document and
recalled as the page is “built” within the browser. The example below shows the location of
the train station image, along with instructions to the browser as to the size of the image
and how it should be aligned (in this case, centred).
The ability to identify an image within the HTML source code can be a significant advantage
in an investigation as more detailed information can be obtained about the domain being
investigated; in the case of this example, the domain is www.toddington.com.
By separating the above information into three distinct sections (detailed below), it may be
possible to locate additional information relating to this domain that may be difficult to
otherwise ascertain.
1. Images
This section signifies the name of a file within the toddington.com domain. With this
information, it may now be possible to access the file called “images” within the
toddington.com domain as we now know that this file exists. This file would theoretically be
located at https://www.toddington.com/images.
Depending on the way in which the site has been configured and administered, it may be
possible to access this file and locate other data contained therein. Although the discovery
of this file may be significant, public users may be prohibited from accessing certain files
within particular domains due to password protection or other restrictions.
2. Station
This section is simply the name of the data, document, or image contained within the file
named “images.” In this case, the image is named “station.”
3. JPG
This is the file extension and indicates what type of data or information “station” is. In this
case, “station” is a “JPEG” image.
To view this image in its original location, you would enter the following into your browser’s
address field:
www.toddington.com/images/station.jpg
Please note the above example images folder is not accessible on the Toddington International
corporate site.
It is important to note that embedded within the image on the example page is a line of text
which cannot be located anywhere within the HTML source code (see text highlighted
below).
This is an excellent way to include text within a webpage or site, and yet prevent it from
being located, crawled, and indexed by search engines and other automated applications.
The text is considered to be an integral part of the image and, therefore, is not embedded
in the website, but rather within the image itself.
Hyperlinks
The paragraph shown below contains a line of text encapsulated within markup tags and an
email address contained within triangular brackets. Such as <p> indicates a new paragraph,
<a> indicates the use of a hyperlink.
Reading the above paragraph from left to right, the following can be ascertained:
<p> denotes a new paragraph containing the text: “If you’d like more information about
this picture email me!” The text is aligned in the centre and the size is coded as “big.”
Examine the actual webpage to view the interpretation of this code by your Web browser.
The paragraph below is very similar in composition to the example shown above; however,
the markup tags contain a URL (http://www.railtrack.co.uk) rather than an email address.
Note again that the hyperlinked word, in this case the word “here,” is contained within
brackets.
Invisible Content
The final paragraph in the body of the HTML source code is completely invisible on the
example Web page. The text reads:
“The text you are now reading is invisible when viewed through a browser as the text is the
same colour as the background… Not a bad way to hide text that can be easily viewed by
someone who knows where and how to find this hidden message.”
This has been achieved by presenting the text in the same colour as the background of the
webpage, rendering it effectively invisible to anyone unaware of its existence. This is an
excellent method of “hiding” information in plain view on the Web.
You can highlight the example “invisible” text by holding down the left button of your mouse
and dragging the cursor over the page. The hidden text will be revealed, as highlighted in a
pale blue colour below.
At the top of the HTML source code document, the meta tags and related information are
contained within the <head> and </head> tags (illustrated below).
The ‘head’ can be divided into a manageable list of meta tags, each of which contains a
description of its content and purpose. Not all HTML source code is compiled in this simple
format, and students will need to spend time examining the HTML source code of a variety
of websites to gain a better understanding of the more complex types of HTML.
meta NAME=“Description”
This meta tag gives a natural-language overview of the webpage to which it relates (in this
case https://www.toddington.com/course-materials/example.htm) and enables those
search engines that support description tags to return this text in its search result
description.
meta NAME=“KeyWords”
This meta tag shows the keywords that have been chosen to describe this site and will be
used by many of the major search engines to assist in determining the relevance of a page
to queries containing these keywords. The keywords in this example are: trains, train,
railways, railway, stations, rural, United Kingdom, England, and Britain.
Keywords that accurately describe a page but may not be included in the actual text of the
document are often listed, although there are many instances on the Web of unscrupulous
authors filling a meta tag with irrelevant keywords in an attempt to attract more traffic.
Notice how some of the keywords are listed in both root and plural form.
meta NAME=“ROBOTS”
The final meta tag “ROBOTS” relates to the file exclusion standard and will be covered later
within this module of the course.
The example page examined is a very simple HTML document. Many pages on the Web
incorporate numerous advanced features, including frames (multiple documents being
displayed as one page), embedded multimedia (i.e., shockwave animation, audio), and
other applications (i.e., Java applets).
Take the time now to examine the HTML source code of a number of pages you
frequently visit and note the differences.
Video Tutorial
This section contains a video tutorial by our training team that demonstrates how to locate
and view a website’s source code. We recommend students take a few minutes to view the
following video before proceeding to the next section: https://youtu.be/Rf-7bu9jGlY.
Knowledge Review
Knowledge Reviews are designed to assist with information retention and do not form part
of the overall grade for this or any other module. There may be more than one correct
answer for some of the questions, and often, there will be no “correct” or “incorrect”
answer.
Students should now complete the Knowledge Review relating to this lesson — HTML & Meta
Tags. This review is located within Lesson 13 of Module 2 on the training site homepage, or
here: Knowledge Review: HTML & Meta Tags.
Often times, when conducting a search, the search engine may indicate that the keywords
you searched for are included in the results, but when you view a specific result, the
keywords are not available in the visible page content. In such instances, you will need to
review the ‘page source’ content. As examples, this method can be used to locate unique
Furthermore, websites dealing with illegal content, such as selling counterfeit identifications
(fake IDs) or products (i.e., replica handbags, shoes), may use the keyword section in the
source code to direct traffic to their site, as opposed to overtly disclosing the purpose of
their website on the actual homepage.
Suggested Reading
• https://www.w3schools.com/html
• https://html.com
• https://www.computerhope.com/issues/ch000746.htm
www.TODDINGTON.com
PAGE 21 of 21 V12.0 DURATION: 60 MINUTES