You are on page 1of 21

101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

Lesson 13: HTML & Meta Tags

A huge amount of information can be gleaned as a result of the ability to examine the
structure and compilation of a website or webpage. Corroborative intelligence, such as
keywords, image files, and meta tag files can allow an investigator to determine important
facts about the intended audience and true purpose of a website or webpage. The ability to
see "behind" a website or webpage in this manner is a huge advantage to an online
investigator. This lesson goes "behind the scenes" to examine HTML and meta tags to
discover how this can progress an investigation in unexpected ways.

Upon completion of this lesson, students will:

• Understand the difference between markup tags and meta tags..

• Learn how to locate images and other files that may otherwise have gone untraced.

• Be able to identify hyperlinks to URLs and email addresses and locate hidden text
within a webpage.

This lesson should take no longer than 60 minutes to complete. If you have any questions
or require assistance, please contact the course instructor at training@toddington.com.

Chapters

1. Introduction 6. Source Code ‘Head’ & Meta Tags

2. HTML Meta Tags & Markup Tags 7. Video Tutorial

3. Images 8. Knowledge Review

4. Hyperlinks 9. Notes for Investigators

5. Invisible Content 10. Suggested Reading


PAGE 1 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

Introduction

To better understand how a webpage is created and indexed by search engines, it is


necessary to understand some of the basic elements that make up a Web-based document
or site. The primary building block of a typical webpage is Hypertext Markup Language or
HTML. In simplest terms, HTML uses a set of markup symbols, often referred to as tags,
that tell a Web browser how a particular page or document should be displayed.

Markup tags not only tell the Web browser how to display the words and images on the
page, they also contain the information required for hyperlinks to work properly. It is not
only possible, but often highly useful, to directly view a webpage’s HTML source code, as
this can provide valuable insight into the real purpose of the site, as well as establishing its
intended audience. HTML source code can be viewed in raw form through most Web
browsers.

For this lesson, a sample webpage has been created and is located at https://
www.toddington.com/course-materials/example.htm. Go to this URL and confirm that the
page is the same as the one illustrated below.

PAGE 2 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

The HTML source code may be located within different menus, depending on the browser
used to access it. The Mozilla Firefox browser will be recommended throughout this
program as it compatible with different operating systems, including Windows and
Macintosh.

Using Mozilla Firefox

In order to access the source code for the https://www.toddington.com/course-materials/


example.htm page, open the page in the browser window and right click on the page; select
‘View Page Source’ from the options provided (illustrated below).

PAGE 3 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

Selecting this option will open the HTML source code window illustrated below.

Using Firefox, a page’s HTML source code can also be accessed via the dropdown Tools
menu. Within the dropdown menu, select ‘Web Developer’ followed by the ‘Page Source’
option.

PAGE 4 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

Using Microsoft’s Edge Browser

In Windows version 10, Internet Explorer was replaced with Microsoft’s Edge browser and is
as simple-to-use, with familiar keyboard short cuts and other Windows-based features.

Before we can view a page’s source code, we will need to enable this option from within the
browser’s ‘Developer settings.’ To access these settings, type “about:flags” in the Edge
search bar. As illustrated below, ensure the ‘Show “View source” and “Inspect element”
in the context menu’ option is selected.

Similar to Firefox above, with the https://www.toddington.com/course-materials/


example.htm page open in an Edge browser window, right click on the page and select the
‘View source’ option (illustrated below).

PAGE 5 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

As illustrated below, the F12 Developer Tools page will be displayed; by default, the
Debugger tab will be shown.

PAGE 6 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

The ‘F12 Developer Tools’ can also be accessed from within the ‘More’ menu in the top
righthand corner of the screen (illustrated below), or by pressing the fn + f12 shortcut keys
on your keyboard.

PAGE 7 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

Using Internet Explorer

If you are using an older version of Windows that is bundled with the Internet Explorer
browser, or simply prefer to use this browser over Edge, open the page https://
www.toddington.com/course-materials/example.htm, right click anywhere on the page and
select ‘View source’ (illustrated in the screen shot below).

Similar to the Edge browser, selecting this option will open the F12 Developer Tools
window, with the Debugger window displaying the HTML source code by default. A page’s
source code can also be viewed via the Internet Explorer drop down Tools menu, by
selecting the F12 Developer Tools option; using the keyboard shortcut keys fn + f12 will
also produce the same page.

PAGE 8 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

Using Chrome

Using the Chrome browser, a site’s HTML source code can be retrieved in the same manner
as the other browsers demonstrated in this lesson, by right clicking on the https://
www.toddington.com/course-materials/example.htm page and selecting ‘View Page
Source.’

PAGE 9 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

In order to access a page’s source code via Chrome’s dropdown menu, hover your mouse
over the More tools option provided, and then select Developer tools (illustrated below).

Using Safari

Finally, using Safari, right clicking on the example page and selecting ‘Show Page Source’
will reveal the page’s source code.

By default, the ‘Show Page Source’ option is not enabled in Safari. In order to be able to
view this option, select ‘Safari’ from the toolbar, then select ‘Preferences.’

PAGE 10 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

From the preferences window, select the ‘Advanced’ tab, and ensure the ‘Show Develop
menu in menu bar’ checkbox is ticked. You will now be able to view the ‘page source’
option in the right-click menu.

PAGE 11 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

A page’s source code can also be viewed in Safari by selecting ‘Show Page Source’ from
within the Develop drop down menu, as illustrated below.

PAGE 12 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

Note: The following demonstration will be conducted using the Mozilla Firefox browser.
Functionality will not differ from browser to browser.

HTML Meta Tags & Markup Tags

All HTML source code documents consist of a ‘head’ and a ‘body,’ and are constructed using
‘markup tags’ and ‘meta tags.’

A markup tag is easily recognizable as it is encapsulated by triangular brackets and


generally formed in two parts that surround an instruction. Text that appears on the
webpage will usually be contained between two markup tags. For example, the <p>
markup tag represents a paragraph. A paragraph will open with the <p> markup tag and
close with the </p> markup tag. A one-sentence paragraph would appear like this:

<p>This is an example one-sentence paragraph.</p>

When viewed through a browser, the markup tags would not be visible; only the text
contained between the tags would appear. Markup tags also contain instructions relating
to the font type, colour, size, and spacing of the text contained within the markup tags.

Carefully examine the HTML source code of the example page and note the following
instructions and markup tags in relation to the text displayed on the webpage.

Instruction on how to align the sentence within the document is located here.

The specific font-face for this text is displayed here.

The size and type-face of the text (i.e., bold) is shown here.

PAGE 13 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

The text of the page is identifiable within the body of the source code. In the example
above, the text reads:

“This is a picture taken of a rural English train station.”

Images

Images displayed on Web pages are stored separately from the HTML document and
recalled as the page is “built” within the browser. The example below shows the location of
the train station image, along with instructions to the browser as to the size of the image
and how it should be aligned (in this case, centred).

The ability to identify an image within the HTML source code can be a significant advantage
in an investigation as more detailed information can be obtained about the domain being
investigated; in the case of this example, the domain is www.toddington.com.

By separating the above information into three distinct sections (detailed below), it may be
possible to locate additional information relating to this domain that may be difficult to
otherwise ascertain.

1. Images

This section signifies the name of a file within the toddington.com domain. With this
information, it may now be possible to access the file called “images” within the
toddington.com domain as we now know that this file exists. This file would theoretically be
located at https://www.toddington.com/images.

PAGE 14 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

Depending on the way in which the site has been configured and administered, it may be
possible to access this file and locate other data contained therein. Although the discovery
of this file may be significant, public users may be prohibited from accessing certain files
within particular domains due to password protection or other restrictions.

2. Station

This section is simply the name of the data, document, or image contained within the file
named “images.” In this case, the image is named “station.”

3. JPG

This is the file extension and indicates what type of data or information “station” is. In this
case, “station” is a “JPEG” image.

To view this image in its original location, you would enter the following into your browser’s
address field:
www.toddington.com/images/station.jpg

Please note the above example images folder is not accessible on the Toddington International
corporate site.

It is important to note that embedded within the image on the example page is a line of text
which cannot be located anywhere within the HTML source code (see text highlighted
below).

PAGE 15 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

This is an excellent way to include text within a webpage or site, and yet prevent it from
being located, crawled, and indexed by search engines and other automated applications.
The text is considered to be an integral part of the image and, therefore, is not embedded
in the website, but rather within the image itself.

Hyperlinks

The paragraph shown below contains a line of text encapsulated within markup tags and an
email address contained within triangular brackets. Such as <p> indicates a new paragraph,
<a> indicates the use of a hyperlink.

Reading the above paragraph from left to right, the following can be ascertained:

<p> denotes a new paragraph containing the text: “If you’d like more information about
this picture email me!” The text is aligned in the centre and the size is coded as “big.”

PAGE 16 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

Within the <a> brackets is “mailto:trains@toddington.com,” followed by “>email</a> me!“


The word “email” is enclosed within brackets, indicating that this is the hyperlinked word
on the webpage.

Examine the actual webpage to view the interpretation of this code by your Web browser.

The paragraph below is very similar in composition to the example shown above; however,
the markup tags contain a URL (http://www.railtrack.co.uk) rather than an email address.

Note again that the hyperlinked word, in this case the word “here,” is contained within
brackets.

Invisible Content

The final paragraph in the body of the HTML source code is completely invisible on the
example Web page. The text reads:

“The text you are now reading is invisible when viewed through a browser as the text is the
same colour as the background… Not a bad way to hide text that can be easily viewed by
someone who knows where and how to find this hidden message.”

PAGE 17 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

This has been achieved by presenting the text in the same colour as the background of the
webpage, rendering it effectively invisible to anyone unaware of its existence. This is an
excellent method of “hiding” information in plain view on the Web.

You can highlight the example “invisible” text by holding down the left button of your mouse
and dragging the cursor over the page. The hidden text will be revealed, as highlighted in a
pale blue colour below.

Source Code ‘Head’ & Meta Tags

At the top of the HTML source code document, the meta tags and related information are
contained within the <head> and </head> tags (illustrated below).

The ‘head’ can be divided into a manageable list of meta tags, each of which contains a
description of its content and purpose. Not all HTML source code is compiled in this simple

PAGE 18 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

format, and students will need to spend time examining the HTML source code of a variety
of websites to gain a better understanding of the more complex types of HTML.

meta NAME=“Description”

This meta tag gives a natural-language overview of the webpage to which it relates (in this
case https://www.toddington.com/course-materials/example.htm) and enables those
search engines that support description tags to return this text in its search result
description.

meta NAME=“KeyWords”

This meta tag shows the keywords that have been chosen to describe this site and will be
used by many of the major search engines to assist in determining the relevance of a page
to queries containing these keywords. The keywords in this example are: trains, train,
railways, railway, stations, rural, United Kingdom, England, and Britain.

Keywords that accurately describe a page but may not be included in the actual text of the
document are often listed, although there are many instances on the Web of unscrupulous
authors filling a meta tag with irrelevant keywords in an attempt to attract more traffic.
Notice how some of the keywords are listed in both root and plural form.

meta NAME=“ROBOTS”

The final meta tag “ROBOTS” relates to the file exclusion standard and will be covered later
within this module of the course.

PAGE 19 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

The example page examined is a very simple HTML document. Many pages on the Web
incorporate numerous advanced features, including frames (multiple documents being
displayed as one page), embedded multimedia (i.e., shockwave animation, audio), and
other applications (i.e., Java applets).

Take the time now to examine the HTML source code of a number of pages you
frequently visit and note the differences.

Video Tutorial

This section contains a video tutorial by our training team that demonstrates how to locate
and view a website’s source code. We recommend students take a few minutes to view the
following video before proceeding to the next section: https://youtu.be/Rf-7bu9jGlY.

Knowledge Review

Knowledge Reviews are designed to assist with information retention and do not form part
of the overall grade for this or any other module. There may be more than one correct
answer for some of the questions, and often, there will be no “correct” or “incorrect”
answer.

Students should now complete the Knowledge Review relating to this lesson — HTML & Meta
Tags. This review is located within Lesson 13 of Module 2 on the training site homepage, or
here: Knowledge Review: HTML & Meta Tags.

Please contact the course instructor at training@toddington.com with any questions


relating to this Knowledge Review or lesson.

Notes for Investigators

Often times, when conducting a search, the search engine may indicate that the keywords
you searched for are included in the results, but when you view a specific result, the
keywords are not available in the visible page content. In such instances, you will need to
review the ‘page source’ content. As examples, this method can be used to locate unique

PAGE 20 of 21 V12.0 DURATION: 60 MINUTES


101E IRT

Using the Internet as an


Investigative Research Tool™
Module 2 | Lesson 13

identification numbers of an account, the actual URL of an image embedded in a page, or a


photo credit for a photo included in an article; these items will not be apparent in the
normal page view.

Furthermore, websites dealing with illegal content, such as selling counterfeit identifications
(fake IDs) or products (i.e., replica handbags, shoes), may use the keyword section in the
source code to direct traffic to their site, as opposed to overtly disclosing the purpose of
their website on the actual homepage.

Suggested Reading

• https://www.w3schools.com/html

• https://html.com

• https://www.computerhope.com/issues/ch000746.htm

©Copyright 2021 - Toddington International Inc. All Rights Reserved.


Duplication of the materials within this publication without express permission is prohibited.

www.TODDINGTON.com
PAGE 21 of 21 V12.0 DURATION: 60 MINUTES

You might also like