Automatic Website Comprehensibility Evaluation

Abstract

The Web provides easy access to a vast amount of
informational content to the average person, who may
often be interested in selecting websites that best match
their learning objectives and comprehensibility level.
Web content is generally not tagged for easy determi-
nation of its instructional appropriateness and com-
prehensibility level. Our research develops an analyti-
cal model, using a group of website features, to auto-
matically determine the comprehensibility level of a
website. These features, selected from a large pool of
website features quantitatively measured, are statisti-
cally shown to be significantly correlated to website
comprehensibility based on empirical studies. The
automatically inferred comprehensibility index may be
used to assist the average person, interested in using
web content for self-directed learning, to find content
suited to their comprehension level and filter out con-
tent which may have low potential instructional value.

1. Introduction

One of the challenges of using web content for
learning is finding content which is instructional within
the comprehension level of the user. For example, a
common strategy for finding web content, when at-
tempting to learn about a particular topic, is to start by
going to a web search engine and typing in a keyword,
using the keyword 'Charles Darwin¨ returns over 2
Million results, without any indication as to which of
these results are of instructional value relative to ones
comprehension level. The user may review each search
result and guess as to which may satisfy their learning
objectives, then review the website content and attempt
to make the determination of comprehensibility and
instructional value using their own judgment, this is
largely inefficient and distracts from the primary inten-
tion of learning about the topic instead of effort to find
content to use for learning.
Popular meta-search engines, such as Google, use
PageRank algorithms to sort and filter search results,
the ranking is based on link analysis on the importance
of the contents within a set [1]. Similarly, algorithm
HITS (Hypertext Induced Topic Selection) measures
the contents authority by considering its 'hubness¨ [2].
PageRank and HITS are based on linkage of the docu-
ments. Directories, such as DMOZ (open directory
project, http://dmoz.org/) and derivatives, provide an
organic taxonomy of websites, but lack the indicators
for instructional value and comprehensibility. Virtual
libraries have attempted to create directories of sources
but they are not inclusive of the vast resources availa-
ble on the public Internet and also generally do not
adequately address whether content is instructional or
comprehensible. Self-directed learners, using the Web
for learning, need some form of guidance in narrowing
the choices and focusing their efforts to the most ap-
propriate instructional content they can find.
To address the needs of the self-directed learners
using the Internet for learning, we automated the
screening process of Web content by computing the
comprehensibility score of each website of interest.
We use the term comprehensibility here to define the
degree to which a web page provides direct access to
the relevant information (i.e., substance in the hyper-
text space) with minimal distractions. In this study, we
measure website characteristics regarding Information
Value, Information Credibility, Media Instructional
Value, Affective Attention, Organization and Usability.
We then search for predictors among this set of fea-
tures to determine the comprehensibility level of a
website.
From July 2006 through October 2006, we collected
a data corpus consisting of 800 websites that are ma-
nually browsed and evaluated by four professional
Ping Yan
University of Arizona
Tucson, AZ, 85721
pyan@email.arizona.edu
Zhu Zhang
University of Arizona
Tucson, AZ, 85721
zhuzhang@email.arizona.edu
Ray Garcia
University of Arizona
Tucson, AZ, 85721
rgz@email.arizona.edu
2007 IEEE/WIC/ACM International Conference on Web Intelligence
0-7695-3026-5/07 $25.00 © 2007 IEEE
DOI 10.1109/WI.2007.59
191
2007 IEEE/WIC/ACM International Conference on Web Intelligence
0-7695-3026-5/07 $25.00 © 2007 IEEE
DOI 10.1109/WI.2007.59
191
2007 IEEE/WIC/ACM International Conference on Web Intelligence
0-7695-3026-5/07 $25.00 © 2007 IEEE
DOI 10.1109/WI.2007.59
191
2007 IEEE/WIC/ACM International Conference on Web Intelligence
0-7695-3026-5/07 $25.00 © 2007 IEEE
DOI 10.1109/WI.2007.59
191
2007 IEEE/WIC/ACM International Conference on Web Intelligence
0-7695-3026-5/07 $25.00 © 2007 IEEE
DOI 10.1109/WI.2007.59
191
librarians. An empirical study based on the data corpus
is used to understand the characteristics of a web page
that may be more suited for learning from a compre-
hensibility perspective. We construct the feature space
of websites by aggregating the page-level feature vec-
tors. The aggregation considers the website topology
and the user`s browsing behavior. A computational
model is created based on statistical analysis of the
data corpus. We thus predict the comprehensibility
score for a website by examining the website characte-
ristics.
The remainder of this paper is structured as follows.
Section 2 presents an overview of related research. In
Section 3 we discuss the features associated with both
manual and automated comprehensibility evaluation.
Section 4 summarizes our empirical study. In Section 5
we describe our modeling techniques and present the
findings. Section 6 concludes by reviewing our find-
ings and discussing the implications.

2. Related works

Website evaluation researches and studies have ac-
tively focused on assessing website usability, accessi-
bility, credibility and overall design issues. Design
guidelines and evaluation tools either from website
developers` or website users` perspectives exist in the
literature. Fogg et al. investigated how different ele-
ments oI websites aIIect people`s perception oI credi-
bility by using an online questionnaire [3]. WebSAT
[4] is a static analyzer tool that inspects HTML files
for potential usability problems. WebSAT identifies
problems related to e.g. readability, maintainability
according to its own set of usability rules or IEEE
Standards. A free online service, WebXACT
(http://webxact.watchfire.com/), is another analysis
tool evaluating the usability of an individual web page
for quality, accessibility, and privacy issues. Bobby
World Wide Web Accessibility Tool is mainly used to
inspect the compliance of website design with the Web
accessibility guidelines such as World Wide Web Con-
sortium's (W3C) Web Access Initiative (WAI)
(http://www.w3.org/WAI/). These studies provide val-
uable insight about accessibility issues, but they do not
form a strong indicator for website comprehensibility
quality, as our study suggested. Brajnik [5] surveyed
11 automated website analysis methods, but revealed
that these tools address only part of usability issues,
such as download time and validation of HTML syntax.
Evaluation of hypertext comprehensibility related is-
sues such as information consistency and information
organization is absent from the literature.
Methods surveyed by Ivory and Hearst [6] entail
analyzing server and other log file data or examining a
web page`s conIormance with a set oI usability, acces-
sibility, or other guidelines. Researchers on the Web-
Tango project [7] considered a number of quantitative
measures of a website such as informational, naviga-
tional, and graphical aspects to rank its quality as poor,
average, or good. They use such quality metrics to help
non-professional designers improve their web site de-
sign.
Readability indexes have been used in education for
many years. Several readability index calculations in-
clude Readability Test, Gunning-Fog Index, Flesch-
Kincaid Readability Test, SMOG Index, Coleman-Liau
Index. They usually measure vocabulary difficulty and
sentence length to predict the difficulty level of a text,
often suggesting an approximate reading age of the text.
A full discussion can be found in [8].
Synthesizing the existing work discussed above, we
see that the assessment of web comprehensibility for
the purpose of facilitating learning processes has not
been widely investigated. In particular, few compre-
hensibility research studies the comprehensibility from
an analytical perspective, and thus are hardly feasible
to be automated. A recent study [9] related reading
strategies with hypertext based learning and cognition.
Díaz Sicilia et al. [10] proposed an evaluation frame-
work made up of a number of criteria to test the utility
and usability of educational hypermedia systems. They
emphasized the importance of evaluating the efficiency
of educational hypermedia systems, so that they can
meet the users learning and teaching needs, and framed
the evaluation approach around a number of analytical
parameters such as readability, aesthetic or consistency,
as well as content richness, completeness and hypertext
structure. In general, they provide a theoretical frame-
work but without an evaluation. Our proposed work
uses analytical approaches to automatically determin-
ing the comprehensibility score for each website in
support of learning. The results are empirically eva-
luated using expert judgment.
The most closely related work is our earlier study
[11] in which we reported a preliminary analysis of a
collection of 300 websites. We now report a much
expanded study than in the previous report in the fol-
lowing ways: First, the analytical model is developed
by inspecting and selecting the optimal feature sets
among 191 features rather than 19 features investigated
by the previous study to predict the website compre-
hensibility. It nearly measures all aspects of a website
that can be determined by parsing and analyzing the
content features. Features correlated to website com-
prehensibility are examined in finer granularity and
more broadly. Second, in-depth analysis and experi-
ments, that test the model's predictability for different
settings, are conducted. A non-linear modeling tech-
nique based on Support Vector Regression (SVR) is
192 192 192 192 192
applied in an attempt to better predict the website's
comprehensibility level. Finally, we validate our anal-
ysis based on a larger data set containing 800 website
data points.

3. Comprehensibility implications

The main interest in comprehensibility is the extrac-
tion of the instructional value of the information con-
tent in the web pages and the related linked web pages.
The instructional value is a measure of the knowledge
contained within the page, including the degree of ap-
plication, analysis, synthesis, inquiry or any other narr-
ative that provokes learning activity.
Comprehensibility is the degree to which a group of
web pages provide direct access to the substance of the
information in the hypertext space without distractions.
A hypertext space is the web page being viewed plus
all related linked pages which are necessary for the
reader to understand the information within the related
web pages. The reader should easily be able to deter-
mine where they are in the hypertext space relative to
where they started. We conceive that the Web compre-
hensibility is dependent on the following aspects of the
websites: Information Value, Information Credibility,
Media Instructional Value, Affective Attention, Organ-
ization and Usability.
x Comprehensibility encompasses ease of find-
ing and understanding of the concepts presented as-
suming that the reading level of the text is equal or less
than the reading level of the evaluator. Information
Value (IV) checks the readability and information
richness and completeness in general. We compute 38
features such as the number of words in the title, in
meta contents, in the body text; and a number of reada-
bility indexes such as Fog-Gunning, SMOG readability,
Flesch-Kincaid readability. For example, Fog-Gunning
index is computed according to formula:
(words_per_sentence + percent_complex_words) * 0.4.
x Comprehensibility also includes the sense of
credibility and trust and access to the source, author
and date of the information being presented. Informa-
tion Credibility (IC) examines the knowledge of the
information sources, authority of information and the
correctness of information. 16 features including the
counts of HTML syntax errors and warnings reported
by HTML Tidy [12], and number of images indicating
advertisements are computed. Also, whether copyright
information and date of last update are present are ex-
amined.
x The Media Instructional Value (MV) is used
to evaluate whether the use of graphics, icons, anima-
tion, or audio enhance the clarity of the information
and necessary to communicate the concepts. 25 fea-
tures are extracted from the HTML pages regarding the
graphical contents such as the counts of images, counts
of animated artworks and number of audio or video
clips in the web page, or maximum height and width of
the images.
While the three measures above focus on informa-
tion contents of web pages, another three measures are
used to examine the internet construction values of the
websites.
x Comprehensibility is also enhanced by the
ease of focusing attention on the most relevant infor-
mation of a website. The consistency and uniformity of
the presentation of the pages on the website should add
to comprehensibility, while the arbitrary use of colors,
fonts, background, images distracts from the ease of
reading the text. Affective Attention (AA) rating is
determined by evaluating the format, appearances and
aesthesis. 35 features regarding text formatting and
page formatting such as number of words that are
bolded, italic or capitalized and the presence of style
sheets are examined.
x Comprehensibility is evidenced by web pages
that allow for ease of linking through and selectively
discovering the meaning relevant to the reader. The use
of short paragraphs, bullet points, tables, or other
summary presentation allowing for quick scanning of
the information to find the central ideas should also
add to comprehensibility. Organization Structure (SO)
indicates the effectiveness of navigation (uses of list,
tables, headings, and links) and website contents and
layout design consistency. Related features are maxi-
mum crawling depth for a site, the number of hyper-
links in a page, the counts of page files in different
types (PHP pages, ASP pages and TXT documents,
and etc.) and computed variances of the page-level
features.
x Usability (UA) is established using 15 fea-
tures. These features look at average downloading time
for a page (indicating whether the web page load
quickly) and ease of use (by examining the use of
forms, framesets, and etc.). The usability and accessi-
bility of the website contribute to comprehensibility. If
a site is not easy to use or cannot be adjusted for acces-
sibility then its comprehensibility is diminished.
We examined a large pool of quantitative computa-
ble features which are supposed to provide sufficient
conceptual equivalence to the above heuristic used by
human evaluators when rating the websites. In total,
191 features are computed to quantify the heuristics,
but because of the limited space of this paper, full de-
scriptions of the 191 features will not be presented in
this paper.


193 193 193 193 193
4. Data collection

We downloaded the most recent version of websites
in a pre-compiled list of 800 URLs that are relevant to
Science Technology Engineering and Mathematics
(STEM) topics. The downloading was restricted to
HTTP and HTTPS protocols. Instead of creating a
complete copy of a website, we control the quota of
downloading from a site up to 50 megabytes, and set
the number of maximum levels for the Breadth-First-
Search to be 5. We do not save a copy of multimedia
files such images, audios and videos, as we do not per-
form multimedia processing in this study. Among these
800 target sites, 69 websites fail to download due to
problems such as inactive hyperlinks, and so, we suc-
cessfully downloaded 731 websites that are listed on
the given entry page sheet. In total, around 0.7 million
web pages are downloaded, averaging at about 1000
pages per site.
Four professional librarians applied their judgment
in the review and evaluation of these websites. The
outcome serves as a good approximation to the gold
standard, because, in their day-to-day work, the libra-
rian evaluators interface with the public to select a
broad range of appropriate websites and pages for
people interested in learning about a topic. The review
process consists of accessing a website page, finding
and reading the central concept, linking it to related
pages as necessary to understand the central concept,
and evaluating the website for adequacy for learning or
instructional purposes. The librarian evaluators rated
25 to 50 websites in a trial period to understand the
evaluation process and the criteria.
The librarians evaluate a website for each of the six
criteria: Information Value, Information Credibility,
Media Instructional Value, Affective Attention, Organ-
ization and Usability. And finally, an overall rating is
given to each site indicating the comprehensibility of
the hypertext space presented by the website. The rat-
ings are scored on a 1-5 Likert scale, one indicating the
lowest score for each criterion while five the highest
score. Approximately each librarian reviewed four
hundred websites in four months, with an average of
10 minutes allocated per website. Obviously they
sample a relatively small set of pages instead of re-
viewing every page within that site. The review dura-
tion is sufficient to meet our objective and is typical of
many real settings where judgments regarding a web-
site are often made very quickly by Web browsers.
Each site is evaluated by at least two evaluators, and
the rating results are averaged to represent the final
scores of the site. Excluding the websites that are
skipped by the evaluators due to various reasons (e.g.,
page not found), 540 websites are included in our anal-
ysis.


5. Analytical modeling

Our major objective in this study is to automatically
determine website comprehensibility by computing a
set of website features. We accomplish this by model-
ing relationships between the website comprehensibili-
ty and the set of page-level and site-level features. We
use regression analysis to construct the mathematical
models that can best predict the website comprehensi-
bility. Regression analysis is the most widely used me-
thod to both describe and predict one variable (depen-
dent variable) as a function of a number of independent
variables (explanatory or predictor variables) from
observed or experimental data [13]. The general form
of our problem is ݕ = ݂ሺݔ
1
, ݔ
2
, ǥ, ݔ
݊
ሻ, modeling the
comprehensibility ݕ as a function of ݊ computed fea-
ture variables ݔ
1
, ݔ
2
, ǥ, ݔ
݊
at page level or site level.
A feature vector is constructed for each web page
first. The features of the pages within a website are
then aggregated to produce a site-level feature vector
according to the topological structure of the site. The
topology structures are inferred from the linkage struc-
ture of the documents. When there is a hyperlink point-
ing from page ݌
݅
to page ݌
݆
, a directed path between
node ݅ and node ݆ is said to exist. The pages and the
linkage between them thus comprise a directed graph
for a website. We mimic the browsing behavior of a
learner by starting from an entry page (the first page
from which a learner starts to navigate the site), and
then pick a hyperlink to jump to another page. The
probability of a learner visiting a particular page is
approximated by a geometric function of the minimum
number of hops to that page having started from the
entry page. The minimum number of hops is computed
from the constructed topological graph with a shortest
path algorithm. Denoting the minimum number of hops
between the entry page of site ݅ and page ݆ as ݏ݌
݆݅
, we
assume that the probability of browsing a page is ߙ
ݏ݌
݆݅
,
where ߙ might take any fractional value, so that the
probability is within [0, 1]. The computer program thus
computes the site-level feature vectors by aggregating
the page-level features with a weight factor for the ݆
ݐ݄

page of ߙ
ݏ݌
݆݅
. Therefore, the site-level features are
aggregated the way that: ݊ features ݔ
݅
(݅ =
1, 2, ǥ, ݊) for a website ȱ is a weighted summation of
its page-level features ݔ
݆݅
(݆ = 1, 2, ǥ, ݌) for ݌ pag-
es in ȋ according to Equation (1) as shown below.

194 194 194 194 194
ݔ
݅
=
σ ݔ
݆݅
ߙ
ݏ݌
݆
݌
݆ =1
σ ߙ
ݏ݌
݆
݌
݆ =1
(1)

When ݏ݌
݆݅
takes the value of zero, i.e., page ݆ is ac-
tually the entry page, only the entry page is included in
the model, ݔ
݅
= ݔ
݆݅
, as the weighting factor becomes
zero for all the rest pages.
We model the relationship between the web com-
prehensibility and the feature vectors by regressing
from the data set that has been evaluated by human
evaluators. The models are inferred with both a linear
and a non-linear regression technique. The effects of
different ߙ values on the predictive power of the linear
model are also discussed.

5.1. Linear regression modeling

The general form of a multiple linear regression
model with k independent variables is given by
ݕ = ߚ
0
+ ߚ
1
ݔ
1

2
ݔ
2
+ ǥ+ ߚ
݊
ݔ
݊
+ ߝ , where
ߚ
0
, ߚ
1
, ǥ, ߚ
݊
are the regression coefficients that need
to be estimated, and İ is a random error term. ݕ is the
comprehensibility scale of a website to predict, and
there are ݊ independent variables ݔ
1
, ݔ
2
, ǥ, ݔ
݊

representing ݊ features that are computed for the cor-
responding website.
A base regression model consists of 191 feature va-
riables. There are 540 labeled data points available
from observation. The multiple linear regression model
has the rating values as dependent variables, and the
191 features are independent variables. Four outliers
were removed by eliminating the data points with stan-
dardized residuals outside the outlier cutoff point (plus
or minus 2.5). The regression produced a model with
191 predictors with an Adjusted R Square at 0.356 (ߙ
= 0.9).
A backward selection procedure then was used to
search the optimal feature subset. It begins with all
predictor variables in the regression equation and then
sequentially removes them with a removal criterion
specified (the entry criterion is that the significance of
F value <= .050 and removal criterion is that the signi-
ficance of F value >= .100). In our case, backward se-
lection works better than the forward selection and
stepwise selection methods because of the dependency
between a few features. For example, the sizes of im-
age files are approximated by the production of the
height and width of the images. The resulting linear
model contains 81 features, with an Adjusted R Square
at 0.437 (ߙ = 1.0).



Table 1. Backward linear regression results
with varying alpha value
Alpha
(ࢻ)
Adjusted R
Square
Std. Error of
Estimation
0.0 0.259 0.88547
0.1 0.272 0.88492
0.2 0.271 0.88220
0.3 0.293 0.88537
0.4 0.303 0.89559
0.5 0.318 0.81282
0.6 0.325 0.79087
0.7 0.343 0.75768
0.8 0.378 0.75281
0.9 0.419* 0.69883
1.0 0.437 0.64872
*ANOVA analysis (model fitness when ࢻ = 0.9)
Sum of Squares = 233.098
Degree of Freedom = 81
Mean Square = 2.878
F value = 5.746
Sig. = .000

In the above aggregation model, we take every page
in a website into our analysis. All the pages will con-
tribute to characterizing the particular website accord-
ing to a weighting factor. However, parsing nearly
1000 pages for each website is relatively computation-
ally expensive, therefore only a subset of the pages are
used to lower the cost. First, we analyze the entry pag-
es only, that is, the page we start from when browsing
the particular site. When alpha goes to zero, the weight
of pages other than the entry page goes to zero, so only
the entry pages are considered in this case. The result is
shown as the first row of Table 1 (Adjusted R Square =
0.259). Second, instead of parsing a single page, pages
indicating characteristics as a backup entry page is also
considered (aggregated features are the average of the
set of pages). For example, http://abc/index.htm,
http://abc/index1.asp or http://abc/default.html are all
parsed along with http://abc/index.html. On average, 5
pages for each website are computed. The following
table shows the linear regression statistics for the
second case. We actually saved about 80% computing
power, while the predictive power of the test indicated
by Adjusted R Square is 0.302.








195 195 195 195 195
Table 2. A linear regression prediction with a
subset of pages
R R
Square
Adjusted R
Square
Std. Error of
Estimation
0.655 0.430 0.302* 0.76880
*ANOVA analysis (model fitness analysis)
Sum of Squares = 117.088
Degree of Freedom = 63
Mean Square = 1.985
F value = 3.358
Sig. = .000

The linear model built shows how each feature con-
tributes to the prediction oI the website`s comprehensi-
bility. The effects are discussed by feature categories,
as an individual feature does not explain the variations
of the dependent variable. By running linear regression
on the feature sets by the categories we discussed in
Section 3, we notice: 1) features in MV category,
mainly graphic elements and formatting features, are
most closely correlated with our dependent variable. 2)
Text elements features such as readability indexes,
number of words are also factoring an important role in
the evaluation. However, the numbers of features in
different categories are not even, i.e., a category may
contain a larger number of features than another, there-
fore, comparing the prediction power of each category
needs more caution. The regression results are shown
in Table 3.

Table 3. Linear regression prediction with
features by categories (ߙ = 0.9)
Feature
Category
R R
Square
Adjusted
R
Square
Std. Error
of Estima-
tion
MV 0.499 0.249 0.218 0.82070
IV 0.486 0.236 0.197 0.83163
SO 0.487 0.237 0.142 0.85970
AA 0.391 0.153 0.111 0.87498
UA 0.332 0.110 0.091 0.88471
IC 0.320 0.102 0.075 0.89275


5.2. Support Vector Regression

Experiments employing support vector regression
(SVR) from the open source package of LIBSVM [14]
are also conducted. SVR based on statistical learning is
a useful tool for nonlinear regression problems. Nonli-
near relation that may exist between the comprehensi-
bility score and the feature vectors will thus be cap-
tured with the SVR method.
Our input space contains 536 vectors in 81 dimen-
sions (the dataset with an alpha at 0.9). The features
that have been eliminated from the backward selection
procedure in linear regression were not included. Each
feature is linearly scaled to the range [0, 1], to avoid
attributes in greater numeric ranges dominate those in
smaller numeric ranges, and also avoid numerical dif-
ficulties during the calculation. We employed the epsi-
lon-SVR with a Radial Basis Function (RBF) nonlinear
kernel. A detailed technical discussion on epsilon-
SVR can be found in [15]. Parameter selection (model
selection) is essential for obtaining good SVR models.
We conducted a grid search for optimal parameters
through the parameter space of cost parameter (c), the
epsilon in loss function (p), and gamma in the kernel
function (g). The resulted parameters are then used in
the regression process, and they are shown under Table
4.
V-fold cross-validation is used to evaluate the mod-
el fitness. V-fold cross validation chooses v random
partitions of the data set such that v-1out of v portions
are used for SVR training and the last portion held
back as a test set. Table 4 summarizes the SVR regres-
sion results with 10-fold cross validation.
When comparing the Adjusted R Square of SVR
with that of the linear regression at 0.419 using the
same input space (ߙ = 0.9), the nonlinear model shows
an obvious higher predictive power than the linear
model. The only way to explain this is that some fea-
tures contribute to the comprehensibility nonlinearly.
However, due to the difficulty in deciphering the black
box solutions generated by SVR models, how different
web characteristics contribute to the comprehensibility
of a website cannot be known. Linear models can be
analyzed instead to shed light on the relations between
comprehensibility and web characteristics as shown
earlier.

Table 4. Cross validation SVR experiments
(n = 536, k = 102)
Number
of folds
R Square Adjusted R
Square


10 0.658 0.5774
-c 16.0 -g 0.0078125 -p 0.00390625


6. Conclusion and Future Work

Self-directed learners seeking Web content that they
can easily read and understand, (i.e. content with some
instructional value, and within a comprehensibility
level that satisfies their learning objective), are chal-
lenged to quickly evaluate a website they find through
typical search engine results, therefore, would benefit
from an automated evaluation of website comprehensi-
bility. Our research uses an analytical approach to im-
196 196 196 196 196
proving information retrieval process for self-directed
learners by automatically evaluating web site compre-
hensibility using web page characteristics shown to be
most indicative of website rated high on comprehensi-
bility by professional librarians. We developed the
artifact that quantitatively measures a large group of
page-level and site-level features, and deducted analyt-
ical models based on our search for a set of optimal
metrics that helps evaluate website comprehensibility.
The analytical model developed was rigorously eva-
luated to see how well its assessment of the website
comprehensibility compares with the evaluations made
by librarians. Predictive performance of both a linear
model and a nonlinear model based of SVR is reported.
The linear model is easier to interpret, while the SVR
model is superior to the linear model with 16% higher
predictive power. With about 60% variations of the
comprehensibility of a website explained by 81 meas-
ured Web characteristics with the SVR model, we see
the developed artifact an effective and reliable solution
to the comprehensibility prediction problem.
However, the comprehensibility scoring is not de-
terministic but still remains useful when applied ap-
propriately. The scoring would highlight when a web
page may be a challenge to access and comprehend.
Re-sorting search results based on comprehensibility
scoring would benefit the learner by presenting web-
sites with higher probable comprehensibility. The
comprehensibility scoring is useful for making a quick
initial determination for volumes of web pages or for a
specific web page that a learner desires to assess.
So far in our preliminary study, we only conducted
the experiments on regressing for the overall rating.
Experiments will be conducted for each of the sub-
category ratings in the near future. Future research will
also explore the possibility of filtering out of websites
that may be addressing specific audiences where the
content is not instructional. For example, evaluating e-
commerce site on comprehensibility may be of limited
value and therefore identifying these categories of
website for removal from comprehensibility scoring
may be necessary. Lastly, an interesting application of
the comprehensibility scoring would be to include it
within a focused crawler which finds the link path with
the highest comprehensibility given a specific topic.

References

[1] Brin, S. and L. Page, "The Anatomy of a Large-Scale
Hypertextual Web Search Engine". WWW7 / Computer Net-
works, 1998. 30: p. 107-117.
[2] Kleinberg, J. "Authoritative sources in a hyperlinked
environment". Ninth Ann. ACM-SIAM Symp. Discrete Algo-
rithms. 1998: ACM Press, New York.
[3] Fogg, B., et al. "What Makes Web Sites Credible? A
Report on a Large Quantitative Study". SIGCHI'01. 2001.
Seattle, WA, USA.
[4] NIST, Web Static Analyzer Tool (WebSAT). 2002.
[5] Brajnik, G. "Automatic web usability evaluation: Where
is the limit?" Proceedings of the 6th Conference on Human
Factors and the Web. 2000. Austin, TX.
[6] Ivory, M.Y. and M.A. Hearst, "State of the art in automat-
ing usability evaluation of user interfaces". ACM Computing
Surveys, 2001. 33(4): p. 470516.
[7] Ivory, M.Y. and M.A. Hearst, "Improving Web Site De-
sign". IEEE INTERNET COMPUTING, Special Issue on
Usability and the World Wide Web, 2002. 6(2).
[8] DuBay, W.H., The Principles of Readability. 2004.
[9] Salmerón, L., J.J. Cañas, and W. Kintsch, "Reading Strat-
egies and Hypertext Comprehension". Discourse Processes,
2005. 40(3): p. 171-191.
[10] Díaz, P., M.-Á. Sicilia, and I. Aedo. "Evaluation of
Hypermedia Educational Systems: Criteria and Imperfect
Measures". International Conference on Computers in Edu-
cation (ICCE02). 2002.
[11] Ma, J., Z. Zhang, and R. Garcia. "Automatically Deter-
mining Web Site Comprehensibility". The 16th Workshop on
Information Technologies and Systems (WITS 2006). 2006.
Milwaukee, WI, USA.
[12] Raggett, D., HTML Tidy for Linux/x86 released on 1
September 2005, HTML Tidy Project Page:
http://tidy.sourceforge.net/.
[13] Kleinbaum, D.G., L.L. Kupper, and K.E. Muller, Ap-
plied Regression Analysis and Other Multivariate Methods.
Second Edition ed. 1988: PWS-KENT Publishing Company,
Boston.
[14] Chang, C.C. and C.J. Lin, LIBSVM: a library for sup-
port vector machines. 2001.
[15] Smolay, A.J. and B. Scholkopfz, "A tutorial on support
vector regression". Statistics and Computing, 2004. 14: p.
199-222.



197 197 197 197 197

Sign up to vote on this title
UsefulNot useful