You are on page 1of 44

Chapter 1: Content Management Concepts

Content management will be one of the top ten technology trends in 2002:
Determining what an organization actually knows is only half the battle. Getting that
knowledge to the right place at the right time is the other half.
Source- INFOWORLD, JAN 8, 2002
1.1 !ntroduction to Content
Computers have only recently become ubiquitous in the world of
information. Traditionally, computers have been tasked with handling data. As opposed to
data, which is a fairly concrete term, information is a very vague term. Just about any
communication (including data) can be described as information. or the purposes of this
discussion, information will be taken to mean all the common forms of recorded
communication! writing, recorded sound, images, video, and animations.

"."." #nformation and $ontent
$ontent, stated as simply as possible, is information put to use. $ontent is
whatever information you want to share, in whatever form you want to share it. Although
some content is more appropriate for Content Management "ystems ($%&) than others,
all content can be managed in this way. #nformation is put to use when it is packaged and
presented (published) for a specific purpose. %ore often than not, content is not a single
'piece( of information, but a conglomeration of pieces of information put together to form
a cohesive whole. A book has content, which is comprised of multiple chapters,
paragraphs, and sentences. )ewspapers contain content! articles, advertisements, inde*es,
and pictures. The newest entry to the media world, the +eb, is ,ust the same- sites are
made of articles, advertisements, inde*es, and pictures . all organi/ed into a coherent

".".0 Types of $ontent
or an e1commerce site, content includes!
+eb pages and page elements such as te*t, graphics, controls, multimedia,
advertisements, and scripts
Applications, middle1tier components, database procedures, and other
programming logic that enables and supports e1commerce
2atabase information that directly supports the creation of dynamic +eb pages or
enables the customer to e*ecute business transactions
2ownloadable or online viewable files of all types
$ontent on ancillary support sites in addition to the primary public site

$ontent which can be easily managed, includes!
3asic information
$ommunity 2iscussions
4apidly changing information
5arge amounts of comple* content

".".6 $ontent 2omain and $omponents
The content domain is the scope or range of information that is intended to
be captured, managed, and published. The content domain is directly related to your goals
of the content management system overall. #n fact, the content domain is the realm of
information that needs to be controlled in order to meet stated goals. $onversely, it can be
asked, '7ow will the stated goals be met8( The answer is, '3y providing content and
functionality.( unctionality, which is not covered in this discussion, is the set of features
and abilities provided to an audience for getting to content and for performing transactions
(monetary or information transfer) with an organi/ation. $ontent is of interest only if it
falls within the stated content domain.
9nce a content domain has been established and there is a clear idea of all
of the types of content, the content can then be broken up into its component pieces.
$omponents divide information into convenient and manageable chunks. They are a set of
discrete ob,ects whose creation, maintenance, and distribution can be automated. They
typically share some common attributes, such as format or length, and they should be able
to 'stand on their own.( #n other words, a component should have meaning in and of
itself, without needing the conte*t of other components to make it meaningful.
".".: The +eb $ontent 5ifecycle
;endor software categories and your particular requirements and can be
both understood in relation to the '+eb $ontent 5ifecycle.( +eb content is managed in
two phases!
<roduction, where content goes 'from thought to click.(
2elivery, where content gets actually consumed by end1users.
3oth phases contain specific attributes that need to be addressed in any +$% plan. The
key system attributes for both phases of the $ontent 5ifecycle are!

Fig 1.1 Compario! o" #ro$uc%io! a!$ Deli&er' o" (e) co!%e!%
1.# !ntroduction to Content Management and CM"
".0." $ontent %anagement
As simply stated as possible, content management is a discipline that
involves the collection, management, and publication of content. $ontent management
concepts include the following!
=nderstanding content domain, from which all of the structural decisions flow
The notion of content components, which allow content processes (collection,
management, and publication) to be automated
Target publications, which are the end result of any content system
A framework, which unites all of the content into a single system of meta
#n the broader sense, content management is a suite of applications that
allow corporations to effectively manage and deliver large amounts of diverse
information to different media through the most effective and timely means.
Fig 1.2 *o( co!%e!% ma!ageme!% e+ac%l' (or,

#mportance of $ontent %anagement!
9ne of the keys to the success of any e1commerce web site is to present fresh,
consistent, high quality content to customers. Therefore, effective content
management processes can establish better customer retention and can lead to
increased revenue.
#nefficient, broken, or inconsistent content management processes drive
production costs up. This is due to poorly coordinated efforts, lack of repeatable
processes, and use of incompatible tools. 9nline retailers need to find every way
possible to contain these costs.
9nline retailers must move quickly to develop and deploy new promotional
campaigns to take advantage of current market>product conditions. )ot being able
to respond promptly to these variations can cost the company time, money, and
market share.
<osting incorrect information, such as errors in product pricing, can lead to
tremendous customer dissatisfaction, which is then compounded by poor public
relations e*periences.
<ublishing inaccurate, misleading or untimely information may also result in legal
<oor testing processes can lead to lower site availability, slow performance, and
ultimately to fewer site visitors.
".0.0 $ontent %anagement &ystems
A content management system ($%&) is a database that organi/es and
provides access to all types of digital content 1 files containing images, graphics,
animation, sound, video or te*t. #t contains information a)ou% these files (known as ?digital
assets?), and may also contain links to the files themselves in order to allow them to be
located or accessed individually. A content management system is usually used to manage
digital assets during the development of a digital resource, such as a website or
multimedia production. #t might be used by staffs that digiti/e images, authors and editors,
or those responsible for the management of the content development process (content
managers). $ontent management systems range from very basic databases, to
sophisticated tailor1made applications. These more comple* systems can be integrated
with the eventual digital resource in order to enable access to digital assets and to allow
regular updating.
The system itself is definable as a tool or combination or tools that
facilitate the efficient and effective production of the desired web pages using the
managed content.
<ossible situations where a $%& is required!
#t takes a month to sign off the site?s Terms A $onditions because every time any
one of your organi/ationBs lawyers changes a full stop, all the other ones need to
sign it off.
Cou reali/e that your site?s visual design isn?t working, but it will take a month to
wrap a new design around the same words.
Cour web design agency insists on all content being signed off two months before
it goes live... and then transcribes it incorrectly.
#n a parting gesture, the +eb publisher you fired replaced photos of board
members with sheep.
Cou can?t update one section of the site because another section has a ma,or
overhaul underway. Cou can either publish the entire site, with both complete and
incomplete updates, or hold until both are completed.
Cou have to work through the night to publish the company?s results at market
opening time because you don?t have a secure area to develop them in advance.
Cou send email promotions about ?upgrading? to +indows0DDD to registered %ac
Cou?re employing an army of skilled web publishers ,ust to update the system
requirements of your software.

1.$ "ummary:
The entire introduction can be summari/ed as follows!
$ontent is in essence, any type or ?unit? of digital information that is used to populate a
page. #t can be te*t, images, graphics, video, sound etc 1 or in other words 1 anything that
is likely to be published across an inter1, intra1 and>or e*tranet.
$ontent %anagement!
$ontent %anagement is effectively the management of the content described above, by
combining rules, process and>or workflows in such a way that centrali/ed webmasters and
decentrali/ed web authors>editors can create, edit, manage and publish all the content of a
web page in accordance with a given framework or requirements.
$ontent %anagement &ystem!
A $%& is a tool that enables a variety of (centrali/ed) technical and (de1centrali/ed) non
technical staff to create, edit, manage and finally publish a variety of content (such as te*t,
graphics, video etc), whilst being constrained by a centrali/ed set of rules, process and
workflows that ensure a coherent, validated website appearance.F
Chapter #: CM" Dissected
%Content management software e&tends the capabilities of pre'(eb document
management tools to make data a)ailable* as it is generated* to employees* business
partners and consumers across intranets* e&tranets and the !nternet.%
Source-S%ep-e! #-illip, INFOR.A/ION A01
#.1 !nno)ati)e )iewpoint of CM":
#deally, your $%& is like a supermarket of content. %anufacturers (content
contributors, that is) package their products (content) in containers (components) that they
clearly and consistently label. The manufacturer knows generally what you can use the
product for but not what any particular cook wants to do with it. The supermarket
managers (the $%& administrators) organi/e categori/e and display product in a way that
enables shoppers to easily find and select the most appropriate products. This overall
organi/ation lies on top of the organi/ation that the manufacturers of the products impose
inside the individual containers. They organi/e a bo* of macaroni and cheese, for
e*ample, into a package of cheese powder and an e*act portion of macaroni. The store
displays the bo* of macaroni and cheese in the pasta section ne*t to the other packaged
pastas. The containers organi/e their contents, and the store organi/es the containers.
A consumer (a publication creator) comes in and selects ,ust the right containers
(components). The consumer reorgani/es and blends the particular products into a unique
and tasty dish (the publication). &ome of the products are recogni/able within the dish,
and some aren?t. All the products are out of their original containers and appear as a single
unified whole.
+ithout the original chunking of the product into standard containers, the consumer can?t
count on the amount or composition of the product. +ithout the further organi/ation of
the containers into an overall storage and management system, the consumer can?t find the
product that she needs. 9ur system of food creation, management and consumption, ,ust
as does a $%&, depends on well1packaged, standalone chunks, that you can mi* and
match in a variety of ways.
#.# +ecessity of CM"
Traditional tools and methods of building web pages were>are not only
labour intensive but also inefficient and e*tremely costly. or e*ample, something as
simple as changing a single word in a piece of te*t on a web page with traditional methods
would have to be done by someone who understood 7T%5. This process not only
bottlenecked all creation of information and content through #T departments, but it also
prevented more effective use of the #T skills within that department (purchased usually at
considerable cost).
$ontent management systems are essential for large or even small1scale
pro,ects that involve the capture or creation of digital assets. They also are increasingly
necessary for the creation of any but the most basic websites. %anaging the capture or
creation of digital images requires metadata to be recorded that documents the capture,
ownership, location and licensing conditions relating to each image. Iven for a few do/en
images, this may add up to hundreds of different pieces of information, the management
of which would not be possible without some automated assistance.
The desire to increase the amount of information being contained in web
pages and the need to include an ever widening circle of groups into the ?modern? web
publishing process has e*acerbated this situation to the point that many web management
teams are no longer able to cope with the growing demand on their resources. or this
reason, the use of templates that draw on content held in a database is a vital management
tool. The websites that don?t use a $%& will become choked, out of date and most
importantly in a world where the other websites contain more information that changes on
a more regular basis, they will become stale in comparison and visitors (both internal and
e*ternal) will stop coming.

The world of the webmaster or web team being the sole method of getting
information onto a web site is over. #t is not so much a case of whether you should
implement a $%& 1 but more a case of when and which one...
#.$ ,e-uired capabilities of CM"
Key requirements may include!
!ntegrated authoring en)ironment
The $%& must provide a seamless and powerful environment for content creators.
This ensures that authors have easy access to the full range of features provided by the
"eparation of content and presentation
#t is not possible to publish to multiple formats without a strict separation of
content and presentation. Authoring must be style1based, with all formatting applied
during publishing.
Multi'user authoring
The $%& will have many simultaneous users. eatures such as record locking
ensure that clashing changes are prevented.
"ingle'sourcing .content re'use/
A single page (or even paragraph) will often be used in different conte*ts, or
delivered to different user groups. This is a prerequisite to managing different platforms
(intranet, internet) from the same content source.
Metadata creation
$apturing metadata (creator, sub,ect, keywords, etc) is critical when managing a
large content repository. This also includes keyword inde*es, sub,ect ta*onomies and
topic maps.
0owerful linking
Authors will create many cross1links between pages, and these must be stable
against restructuring.
+on'technical authoring
Authors must not be required to use 7T%5 (or other technical knowledge)
when creating pages.
1ase of use 2 efficiency
or a $%& to be successful, it must be easy to create and maintain content.
#.3 4ypical 5unctions .Constituents/ of CM"

Fig 2.1 Co!%e!% ma!ageme!% proce$ure
The functions (constituents) of $%& can be divided into four main categories
Collection .6uthoring* 6ggregation* Con)ersion/
A $%& manages the path from authoring through to publishing using a
scheme of workflow and by providing a system for content storage and integration.
Fig 2.2 C.S "u!c%io!al cope a!$ %-e co!%e!% li"e c'cle
0.:." $ollection (Authoring, Aggregation and $onversion)
The collection system is the tools, procedures, and staff that employed to
gather content, and provide editorial processing. +hen content is collected, it is brought
inside the content management system. The content collection process is one of adding
new components to the e*isting repository. $ontent collection can be broken into these
") Authoring!
Authoring is the process by which many users can create +eb content
within a managed and authori/ed environment. #t is basically the process of creating
content from scratch. Authors almost always work within an editorial framework that
allows them to fit their content into the structures of a target publication. Authors should
also be made aware of the framework that has been developed for the downstream use of
the content. Authors are in the best position to tag their creations with meta information.
&o, to whatever e*tent possible, authors should be encouraged and empowered to
implement the meta information framework within their content.
The role of authoring is performed by graphic artists, videotape production
crews, photographers, technical writers, advertising writers, application developers, +eb
page developers, lawyers, human resource personnel, marketers, or anyone else that
produces original material for the +eb site. Authored content is often put under version
control through the use of document management systems or source code management
#/ Aggregation!
Aggregation is the process of gathering pre1e*isting content together, for
inclusion in the system. Aggregation is generally a process of format conversion followed
by intensive editorial processing. The conversion changes the formatting of the content,
while the editorial processing serves to segment and tag the content for inclusion in the
repository. 9bviously, the closer the original content is editorially (its style and
'elementation( and its componenti/ation and the meta information that has been entered)
to the content management systemBs framework, the easier the aggregation is.
$/ $onversion!
This is the process of changing the elementation scheme (i.e., the tagging
structure) of the content. #n this process the structural as well as the format related codes
must be handled. 9ne conversion problem comes in identifying structural elements
(sidebars or footers, for e*ample) that have only format codes marking them in the source
content. Another problem comes in transforming formatting elements that donBt e*ist in
the target environment.
0.:.0 +orkflow
+orkflow is the management of steps taken by the content between
authoring and publishing. Typical steps could be link checking and review>signoff by a
manager or legal team. #f workflow has e*isted at all in traditional +eb site management
it has been an off1line affair and not built in to software processes.
The workflow system is the tools, procedures, and staff that you employ to
assure that the entire process of collection, storage, and publication runs effectively and
efficiently, according to well1defined timelines and actions. A workflow system supports
the creation and management of business processes. #n the conte*t of a content
management system, the workflow system sets and administers the chain of events around
collecting, 'repositing(, and publishing.
To be successful, the workflow system should!
1&tend o)er the entire process. Ivery step of the process, from authoring
through final deployment of each publication, should be able to be modeled and
tracked within the same system.
,epresent all of the significant parts of the process including!
o &taff members
o &tandard processes
o &tandard tools and their functions
o Time and data flow with a variety of transitions and charting
,epresent any number of small cycles within larger cycles, with some sort of drill
down to the appropriate level of detail.
7ave a )isual interface that shows cycles and players in the process graphically.
Make meta information in the repository a)ailable. The workflow system
should not have to store its own staff members, content types, outlines, and other
meta information. #t should be able to read the data that is stored in the repository,
and make it available when appropriate in its dialogs and selection screens. or
e*ample, an editor might select a content type for an article in a workflow screen
order to forward it to the ne*t reviewer. The list of content type selections should
come from the repository, not from the workflow systemBs own internal data store.
As an alternative, the workflowBs data store (which would need to be some sort of
open database) could be considered part of the repository that is responsible for
storing certain meta information.
0ro)ide a conduit to the repository for bottom up meta information. +hether
or not the workflow system stores meta information, its screens will be a natural
place for staff to enter meta information. 2ata such as author, status, and type are
naturally entered in workflow screens. This data must be able to be transmitted
into the repository from the workflow system.
Fig 2.2 Wor,"lo( ma!ageme!%
A $%& must meet the following minimum workflow requirements!
A publisher can assign users to a small number of predefined roles, such as
FAuthor,F FIditor,F F2esignerF and F%anager.F 7e>she may modify the predefined
roles as necessary.
A publisher can formali/e a production process into a checklist consisting of a set
of tasks. 7e>she can specify dependencies among tasks to guide the order in which
the production process occurs. The system must provide a simple default checklist,
which the publisher can substitute or modify as desired.
A publisher can start a new production process based on a checklist. A typical
process centers on creating, producing and deploying a single item. The publisher
can assign production tasks (i.e. FAuthor,F FIditF and F2eployF) in the checklist
either to roles or to individual users.
&taff users can receive notification of their tasks via e1mail. They can also review
and e*ecute their assignments from their workspace.
inally, and most important for many organi/ations, the system must be fle*ible
enough to deviate from the process as needed. $ontent items may need to be
reworked and returned to a previous user when an iteration isn?t defined, may need
to be seen by additional personnel for approval, or may need to skip steps if there
is an acceleration of publishing deadlines. &ee igure 0.6
0.:.6 &torage!
&torage is the placing of authored content into a repository. $ontent is
usually stored directly in file systems or version control systems. 3eyond this it is also the
versioning of the content, so that access conflicts between multiple authors cannot arise
and so that previous versions can be found and restored if required. #t can also mean
breaking down content into structured, meaningful components such as L,ob titleM,
LcourseM or LdescriptionM which are stored as separate elements. These can be stored as
records in a database or as I*tensible %arkup 5anguage (N%5) files.
#t is also the repository of all content and meta information, as well as the
processes and tools employed to access and manage the collected content and meta
information. The repository holds all of the content and meta information of the system.
4epositories perform the following functions!
"tore content. The repository may be one or a set of databases of various kinds. #t
can include the file system and network resources of the host computer. #f the
repository is distributed among databases, one database is often in a master
position, organi/ing the information in the others. The repository must be able to
o /e+%ual co!%e!%- This content is either flat te*t, or more often markup. #n a
relational database the markup is usually saved as te*t within fields. #n an
ob,ect database, the markup is broken into all its elements and made
o Compo!e!%- The repository must be able to link content into manageable
components. The better the repository, the greater the ability to create,
modify, and find components.
o 3i!arie a!$ "ile-)ae$ $a%a- +hether in the file system or inside a custom
data store, the repository needs to be able to effectively manage a range of
data, media, and e*ecutable files.
o .e%a i!"orma%io!- The repository must be an effective store of the variety
of meta information that needs to be collected. &ome of this meta
information is coded into the structure of the repository itself (for e*ample,
a database table can be created especially to store meta information for a
particular component type). 7owever meta information is stored, the
repository must provide for the amount and kind of meta information
needed to describe your content.
"elect content. The repository must allow access and selection of content from
within itself. The repository should offer fielded querying to find components with
particular meta information associated with them, as well as full te*t querying
against te*t in the system. #n repositories with multiple databases it can be difficult
to issue a search that queries all databases in a consistent way.
Manage content. The repository must facilitate these management tasks!
o Securi%', including read and write access permissions for components
o 4er mai!%e!a!ce that interfaces to system user management resources
o Co!%e!% %a%u ,eepi!g and tracking for staging publications, workflow
triggers, and maintenance operations
o /ra!ac%io! loggi!g a!$ roll)ac, of ma,or changes in individual databases
or to the repository as a whole
o 3ul, au%oma%e$ procee that run periodically against subsets of the
o I!pu%5ou%pu% procee that load in and push out information
Connect to other systems. The repository must be able to communicate over the
network with a variety of clients. #deally, the repository should be able to
communicate with 5A)1based +eb browsers, #nternet1based +eb browsers, and
5A)1 or internet1based non1+eb client applications. #nternet connectivity to the
repository enables authoring and other publishing process to take place from
multiple locations, a frequent requirement for todayBs content1intensive +eb sites.
0.:.: <ublishing
<ublishing is the process by which stored content is delivered. Traditionally
this has meant Odelivered to the +eb site as 7T%5B. 7owever, it could also mean as an e1
mail message, as an Adobe <2 file or as +ireless %arkup 5anguage (to name but a
few). #n the near future multiple delivery mechanisms will be required, particularly as
accessibility legislation starts to bite.
Fig 2.6 Co!%e!% pu)li-i!g %o #DA a!$ (e) )ro(er
$ontent publishing describes the process by which content is drawn out of
the repository and formatted into +eb sites and other publications. To be fle*ible enough
to produce a wide range of publications, the publishing system must include!
Fig 2.7 /'pe o" Co!%e!% a&aila)le "or %empla%e
0ublication templates. These templates draw content into the appropriate conte*t
for each particular publication. The templates must instantiate!
o The formatting synta* and surrounding standard te*t and media elements
of the target publication platform
o The page structure and synta* of the target publication platform
o $ontent components and meta information on the target pages
o &tandard te*t and binary files from the repository onto the target pages
6 full programming language. The wider the publications and more open the
repository, the more comple*ity there will be in transforming content in the
repository into a publication. The system needs to have complete programming
abilities so that this comple*ity can be managed. The language should provide!
o All of the standard variable types and control structures of ma,or
programming languages.
o $omplete access to the repository databases and files.
o Access to e*ternal ob,ects and libraries.
,untime dependency resolution. +hen content is added to the repository it
cannot be determined where and when it will be used in a publication. Therefore,
the publication system must be able to read and resolve content links when the
publication is being produced. or e*ample, if component A has a link to
component 3 in the repository, but component 3 is not being published, then ABs
link must be suppressed by the publication system to avoid a bad link in the
5ile and directory creation. The publication system must be able to create the
appropriate file and directory set for the target publication. Additionally, the
system must have some mechanism for deploying the built publication to its final
storage location.
#.7 (orking of CM"
Fig 2.8 *o( co!%e!% ma!ageme!% (or,
&ub,ect e*perts build content in a separate environment. The server takes
the content, inserts it into the correct template and sends it all, neatly wrapped up, to end
users 3ut thatBs ,ust the technology side of $% systems. $%Bs other aspect is the way it
addresses the workflow. $% streamlines how your design gets approved and onto the
Fig 2.9 Co!%e!% ma!ageme!% (or, "lo(

$reate a design in whatever tool>environment you are comfortable
with.9nce it is tested and ready to go, you pass it to your manager or editor or boss or
whoever okays your design. #f itBs approved, itBs sent on to the server. #f not, you get notes
and it is sent back to you, all within the $% environment! no email, no voice mail, no
printouts of your design with red ink and yellow sticky notes all over it. The same process
happens on the content side. The end result is that even though itBs easier for content and
design to publish, there are still strict controls as to what makes it to the live server.
#.8 "election
0.E." $hoosing a $ontent %anagement &ystem
+ith the multitude of $%& solutions that now e*ist on the market, it is
imperative that you choose your solution very wisely. Iqually 1 they may have another list
of ?features? that is not so favorable to your environment. =nless you have a very specific
set of requirements to present them with, the likelihood is that neither party will find out
whether the product is a true fit 1 until it is too late.
A few issues to ponder when selecting a $%&!
(orkflow and scheduling. 5arge organi/ations need a $%& that sends
automatically triggered e1mails to everyone who needs to see a document
before it posts. A $%& should also let back1end users choose the posting date
and time in advancePor #& staff members will eventually end up posting stuff
in the wee hours of the night.
Database compatibility. The whole point of the +eb is to leverage your
e*isting data and use it to sell the company along with its products or services.
2on?t accept any solution that demands you restructure e*isting databases to
make it easier for a $%& to handle the data.
Multile)el security. Qenerally, one person per department should have the
clearance to post content to a staging server. #n all cases, the authority to
actually post content to the live site should rest with one or two people.
"yndication and personalization. To distribute content around the +eb,
you?ll need a $%& that maps content ob,ects to N%5 data types. And if your
site will deliver custom pages based on user preferences, you?ll need a $%&
that breaks documents down to a granular level so that only relevant material
gets served.
9ffline integration. #f your company produces lots of print material, you may
be a candidate for a system that integrates offline and online publishing. 3oth
9penpages? $ontent+are and I*pressroom #>9 hook into
RuarkN<ress so that master documents can ensure consistent offline and
online content.
The fact that an organi/ationBs requirements are what should determine the choice
of $%& is also one of the reasons why there is no such thing as T7I content
management solution.
0.E.0 $hoosing the 4ight $ontent %anagement Tool
9rgani/ations are turning to a wide range of tools to handle content
management tasks, from document management systems to portals and groupware to
content management solutions for contributor1intensive sites to full lifecycle solutions. To
determine the best choice organi/ations must understand their business ob,ectives for the
site and how their site needs may change over time.
or e*ample, if the primary goal of the +eb site is to give access to
documents that may be stored in an underlying document management system then a
document management solution with +eb capabilities is the right choice. 9r if the
ob,ective is to aggregate multiple data sources, both internal and e*ternal, then a
portal or syndication product with strong data interface capabilities is the most likely
#.: 4ypes of CM"
Interprise <latforms
=pper %id1tier <ackages
<ublishing19riented <ortals and Application &ervers
2epartmental > %id1market <roducts
5ow1$ost <roducts
9pen1&ource <ackages
#t is also meaningful to divide products according to!
Their roots, and
7ow they address the +$% lifecycle.
0.G." <roduct 4oots
=nderstanding the origins of different packages enables you to see deeper
into their relative strengths and weaknesses. +$% universe can be divided into 6
different ancestries!
<ure1play +eb $ontent %anagement <ackages
These currently predominate in the marketplace.
Application &ervers and Interprise <ortals
2ocument %anagement > +orkflow 9rigins
After a sluggish start, established 2ocument %anagement companies have
moved aggressively into the +eb $ontent %anagement arena in the past two years. These
companies come out of client1server roots (sometimes using &Q%5 . a precursor to
N%5), and therefore bring e*perience with large document and asset repositories and
comple* publishing processes. #ndeed, they sometimes store content in its native file
format as opposed to a database. After inde*ing and gathering metadata, the package then
converts it to 7T%5 only for publishing.
The former 2% vendors are especially well suited to reference1oriented
pro,ects or other requirements that call for long, comple*, hierarchical documents (like
product manuals). They have been quick to adopt N%5, but often donBt make much room
for relational data in their models.
0.G.0 5ifecycle ocus
&ome packages strive to address the full +$% lifecycle, often through a
'suite( of modules. 9thers focus principally on the <roduction end of the cycle, or
alternatively, the 2elivery facets of $ontent %anagement. There is no inherent advantage
to any category- your needs should drive you one direction or another.
Sull1$ycle <ackages
These products focus primarily (though not always e*clusively) on the
<roduction phase of the +$% lifecycle. They address everything from 4ole %anagement
to 5ibrary &ervices, +orkflow, and #nde*ing, but then 'hand off( content to other
software . application servers or web servers . to do the actual publishing and
distribution. This model is increasingly popular. +orkflow and 5ibrary &ervices are
trendy right now because that is where potential time1 and cost1savings lie. (Although
even full1cycle products increasingly recogni/e the value of speciali/ation and are
integrating with application servers for publishing.) or a complete +$% solution, you
may want bundle a <roduction1oriented product with a 2elivery1oriented package.
Deli&er' Orie!%e$:
These products focus principally on run1time aggregation of content and
other services. The field includes portals, application servers, and combinations of both.
#.; 5eatures of CM"
Fig 2.8 C.S Fea%ure o!io!
The 6 core features of $%& are!
&o that groups of individuals can work safely on a document and also
recall older versions.
&o that content goes through an assessment, review or quality assurance process.
&o that content can be stored in a manageable way, separate from web site design
OtemplatesB, and then delivered as web pages or re1used in different web pages and
different document types.
#.= >enefits of CM"
6 CM" enables online information to be fresh* consistent and a high -uality.
4educed customer (internal A e*ternal) dissatisfaction created by having incorrect
4eduction in legal issues created by displaying incorrect information.
#ncreased value perception of the information provided.
There is a higher likelihood of a customer re1visiting the site.
&ome search engines rank pages that change frequently higher in search results.
6 CM" facilitates the re'use of content
The re1use of content across multiple web sites or pages creates an enhanced
productivity value.
The re1use of web output to broadcast over e.g. 2T;, %obile <hones, Kiosks
creates new audiences.
The syndication and re1use of content from other suppliers is made easier.
6 CM" ensures enhanced producti)ity 2 ?ob satisfaction of the web team
+ebmasters can focus on technology and areas such as redesign and functionality.
A more appropriate use of the web team results in lowered production costs.
Inables a quick response to changes on competitorBs web sites.
6 CM" enables decentralised content creation
This enables global contribution of content and information.
The ?speed to market? of changes and new content is improved by avoiding the #T
$ontent creators>editors are able to take ownership>responsibility for the
information they provide.
6 CM" facilitates centralized workflow* appro)al processes and rules
Inables decentrali/ed contribution without loss of controlled centrali/ed process.
<rovides and effective audit trail that allows production with accountability.
Insures a controlled flow of content around internal processes.
6 CM" pro)ides either a competiti)e ad)antage or eliminates a competiti)e
#ncreasingly the web site is the window that investors use to evaluate a company.
A dynamic, changing website creates the impression of a forward thinking
#t enables a ?speed1boat? response to changes in the competitive environment.
Chapter $: CM" and @MA
@MA is a great way to store data in a way your organization can digest and
manage it.
Source- Dell Cae S%u$'
$.1 4he B4MA 0roblem
7T%5 is great, but when it comes to running a large1scale +eb site it
presents real problems. 7T%5 is needed to e*press creativity and describe the user
e*perience, but as a way of describing the data behind a +eb site so that it can be reused
and manipulated, 7T%5 doesnBt make the grade. 7owever hard the standard setters try, it
simply isnBt possible to reverse the trend toward using 7T%5 tags for visual effect.
6."." 7T%5! Too 4igid and Too le*ible
9ne aspect of the problem is that 7T%5 is both too rigid and too fle*ible.
Another is the conflict, inefficiency, and duplication inherent in a page1oriented
publishing paradigm. 5etBs look at both of these problems. +hat do we mean by too rigid
and too fle*ible8 #t is too fle*ible in the sense that browsers only loosely enforce the
&Q%5 definition of 7T%5. They tolerate badly written documents and do their best to
present them as intended. 3rowser idiosyncrasies and unique features are too numerous
to count, and achieving cross1browser compatibility is a time1consuming affair. #f
the content of a page is to be treated as data with any integrity, this kind of looseness
canBt be tolerated.
9n the other hand, 7T%5 is too rigid in two ways. The first is that there is
no formali/ed method for e*tending 7T%5 in specific applications, and in a sense
there are already too many tags in the language. The addition of custom tags by )etscape
and %icrosoft to achieve specific effects was widely condemned, at the same time the
abuse of the e*isting tags in the service of tightly controlled formatting became the norm.
The second way in which 7T%5 is too rigid is, quite simply, the
impossibility of separating format and content in a meaningful way. #t is true that with
$&& you can radically alter the way page elements are presented, but a table is still a table
and a list is still a list. %ore importantly, a page is still a page, and that is the problem
to be looked at ne*t.
6.".0 3reaking the <age <aradigm
+hen you browse a +eb site, you e*perience it as a set of pages. True, with
dynamic 7T%5 those pages might have application1like functionality built into them, but
there will still be a sequence of screens or pages. #n the conventional 7T%5 paradigm,
authors prepare pages or they program A&< or $Q# scripts that map closely to
pages, perhaps pulling in a set of data from a database on the fly. 2atabases may be
used for repetitive tabular data on the site, or to store data submitted via a form, but
for everything else, pages are hard1coded.
As highlighted earlier, the data owned by any one part of an organi/ation
will span many pages, while at the same time any one page may incorporate data from
many groups. Two things can happen! either the page structure ends up being modified
to mirror the internal structure of the organi/ation rather than the information
needs of users, or the organi/ation has to build processes to handle the matri* of
ownership. #n the second case, it isnBt always clear who owns the pages, and they
may not be maintained properly.
The bottom line is that although we still work in a page1oriented publishing
paradigm, there will be conflict, inefficiency, and duplication. The #nternet will fail to
live up to its promises. #nformation owners need to be able to maintain their data
easily without combing the site for every instance of data relating to their domain,
and without worrying about formatting issues. &ite designers need to be able to
set presentational standards that will be consistently applied across the site without
worrying about the data they are dealing with. #n other words, the page paradigm
has to be brokenT
$.# 4he "CA 0roblem
The answer to the 7T%5 problems, according to some content
management suppliers, is to break down the site into 7T%5>A&< templates and a set of
&R5 tables. That way you can isolate look and feel, easily manage the data, and enable
yourself to publish far more data. #t is fast, highly scalable, robust, and the data is held in a
completely media1neutral format. #f the ma,ority of your site is highly consistent in format
(for e*ample, you may have thousands of news articles, classified advertisements, or
product specifications), and you donBt have the challenges of providing locali/ed content
to a variety of markets, then &R5 might be the right answer. 7owever, in many cases
where this technique is used, much secondary data is incorporated in the templates, and
locali/ation and maintainability are lost.
#f you are determined, you can model the structure of most classes of +eb
content using &R5. 7owever, the more comple* the page type, the more tables, keys,
and ,oins it requires and more the performance suffers. +hatBs more, the more difficult
to the initial design, the more difficult it is to adapt later. #f you want fle*ibility of design
and ease of reuse, &R5 rapidly shows its limitations and N%5 shows its strength.
$.$ Data'>acked (eb "ites )s. Data'Dri)en (eb "ites:
4he 4emplate 6pproach
+hen N%5 is applied properly to +eb site design, there is a big difference
from the template1driven approach. Templates basically consist of fi*ed (or slowly
changing) content with slots to be filled from a regularly updated database or even a live
data source. &R5 databases are great for storing the things you want to place in the slots,
even if they contain some markup (preferably N%5 markup). This is what we call a
data1backed site.
N%5 (in tandem with N&5) puts control in the hands of the data author.
The recursive processing of the data document by the style sheet means that the data
structure is the primary driver of the document assembly process, not a script or a
template. #n the system, we still have a kind of template, but it is at a very high level, and
completely empty. There is no fi*ed content at all. This is what we call a data1driven site.
$.3 (hy @MA for CM"D

N%5 has a natural place in +eb and Interprise $ontent %anagement.
7ereBs why!
Fig 2.1 Co!%e!% i!5ou% ui!g ;.L a)%rac%io!
#t is a completely open standard based on common synta*, but infinite semantics.
That means everyone who uses it needs to follow basic rules (that makes it
portable), but you donBt have to bend your business to a predefined data model.
Cour content will probably have its own unique structure . or 'semantics( . and
N%5 will e*tend with you.
#t enables universal data interchange. 2isparate systems and enterprises can share
content via N%5 without having to e*pose internal data models or invest in
comple* integrations. Therefore, N%5 is most useful for 'data in motion.( 9f
course, companies are trying to get more value from content precisely by putting it
in motion. 3y the same token, this means that if your content is going to be
delivered in only one format (e.g. +eb), there is no need to invest in N%5 . it will
add little or no value.
SThe 'eNtensible( approach typically enables more granular control and adaptation.
The holy grail of content management is separating content from site map ('where
it lives() and content from layout ('what it looks like(). This enables you to
repurpose and redeploy the same content to multiple locations, devices, and skins.
N%5 does precisely this! it tells you what the content i, not where it resides or
how it appears. 2atabases can accomplish this too, but it is generally easier to
update an N%5 document or schema, and in any case, N%5 is better suited to
hierarchical content structures that you typically find in te*t documents (as
opposed to relational structures that typify catalogs).
le*ible tagging means more sophisticated searches with tag1aware search
engines. N%5 enables you to more easily assign mea!i!g to your content. &earch
engines that can leverage the tagging in your N%5 repository, as well as the
inherent structure of your content, will generate far superior results compared to
simple keyword queries. =sers can search within particular nodes within your
content hierarchy, and en,oy more relevant results that have taken advantage of all
the metatagging you did.

SN%5 has become the '5ingua ranca( for aggregating disparate content elements.
N%5 makes it substantially easier for +eb publishers to assemble atomic bits of
content (in all its varied forms) in an organi/ed way within one site, or indeed, on
a single page. There are two sides to this. irst, with respect to accessing source
content from within a $%&, N%5 can provide a common unification environment
to work within . a single layer between source content and its actual management.
Among other useful features, N%5 can provide a file system type interface to
database data Then, on the output side, N%5 can provide a sole source . and in
fact, a single paradigm 11 for generating diverse consumable formats, such as
7T%5, <2, +%5, and even (with some wrangling) print.
#n short, N%5 can add value across the +$% lifecycle.
$.7 Esing @MA for CM"
N%5 can stand behind most electronic information initiatives like content
management. N%5 enables you to add the structure that you need to content to find it and
deliver it. &uppose, for e*ample, that you?re a manufacturer and have a +eb site that tells
your distributors about all about the products that you provide. 3y using N%5, you can
create a system behind the site that matches what you know about a distributor to all
product information that distributor may want.
#n N%5 parlance, the product information is tagged in such a way that it
can be matched to a distributor?s profile. #f you create a strong N%5 framework, it not
only serves this personali/ation feature, but it can also form the basis of knowing how to
bring new content into the site, how and when to update information, and how to build a
variety of outputs, not ,ust a +eb site, from your content. 9bviously, as the si/e and
comple*ity of your content increases, so does your need for the organi/ation that N%5
gives you. &ee figure 6.0
#t essentially creates a repository to take e*isting content in its various
forms 1 such as +ord documents, <2s, <ower<oint presentations, etc. 1 and turn it into
an N%5 format. 9nce broken down into N%5, the document can be componenti/ed,
making it much easier to link to other documents and update. +hy is that important8
$onsider what happens to a company that produces products that are constantly being
updated. &pecifications related to that product?s features might be contained in documents
scattered throughout a company?s intranet, such as marketing materials. Iach time it is
updated, someone has to go and find all of the related postings, and update them 1 or as is
often the case, the information is not updated resulting in an inconsistent message. 3y
breaking documents into components, using the N%5 platform, it is possible to update
one document and have all the related documents, no matter where they are on the
corporate intranet, also be updated.
2ata is stored in two ways 11 either in N%5 or in a relational database like
9racle or &R5 &erver. And even if it?s stored in N%5 11 it?s still probably kept in a
relational database, in N%5. #t?s going to be a long time before we see relational databases
phased out in any way.
2oes it matter if it?s stored in N%5 or directly in a relational database8
2epends on the vendor. N%5 tends to be more easily re1purposed i.e. if you
want to move your <4 te*t from a browser to a set1top bo* to a +A< enabled
device, it?s probably going to take your N%5 savvy designers and developers very little
time to do it.
3y opening the benefits of content management to a growing middle
market, N%5 delivers on the unreali/ed promise of the technology. #nstead of simply
managing content as a production process or overhead e*pense, companies can deploy
content strategically in both internal and e*ternal applications. The supply chain shown in
igure 6.0 allows business partners to share content through a common repository. #ts
N%5 capabilities allow for content e*change automatically among cooperating

Fig 2.2 Suppl' c-ai! "or au%oma%ic co!%e!% e+c-a!ge
$entral to the concept of content managementPand one of the things that
N%5 is designed to handle wellPis separating content from its presentation. 7ence, a
new technology wave based on N%5 standards is sweeping through the world of content
Chapter 3: Case "tudy of CM"
9nline Communities and Content Management
+hat are 9nline $ommunities8
$ommunities are groups of people tied together by some common purpose
or kinship. #t is no different online. +hile lots of people talk about creating online
communities, few do so from this sort of understanding. Qenerally, their concept is to be
Fthe placeF to go for some sort of information. They add a chat or a threaded discussion,
collect user data, and call it a community. 7owever, without the core of common purpose
or some sort of kinship (in the widest sense of sharing some important aspect of life) these
sites will never fulfill their goals. To succeed, an online community needs to fulfill its
members? needs for affiliation and knowledge. Affiliation is the members? desire to belong
to something. Knowledge is the members? desire to know something. The web system
behind the online community needs to support affiliation and knowledge.
+hat are the components of an online community site8

Fig 6.1 .a<or compo!e!% o" o!li!e commu!i%' '%em
The $ommon #nterest 2omain
The common interest domain is the boundary around the community. #t is
the realm of content and interaction. #t is the basis of a community. or all of the
members, there is a reason why they would come together. &pecifically, the domain is a
statement of purpose for the community. The statement can be as general as F+e love
3arbie dollsF or as specific as F+e are all female $UU developers working at atomic
accelerators on software designed to track the trails of sub1nuclear particles.F +hether
specific or general, the statement must clearly define the entrance requirement of the
community. #t is absolutely the first thing that must be determined about the community
and the rest of the structuring of the system should spring naturally from it. The common
interest domain defines what members will become affiliated to and on what sub,ect they
want knowledge.
The <ersonali/ation>2ata Qathering &ystem
<ersonali/ation, in general, (see <ersonali/ation and $ontent %anagement)
is the process of collecting user data and using it to sub1select content to present to that
user. This is true for the community site too. 7owever, in addition to using member data
to direct content to the member, on the community site member the data can be used to
target other members to the member. This member match up provides for a much greater
sense of affiliation. %embers, in the end, want to be affiliated to other members 11 not to a
+eb site. To perform this match, the system must collect member data that falls within the
constraints of the common interest domain but narrows the focus so that members who
share specific interest can be found. 9f course, there is more to making friends than
answering questions the same way. &o the successful member matching system will need
to be open and configurable enough to let in some sub,ectivity.
9n the more mundane side of matching members to content, as in any
personali/ation system, the site must have mechanisms for!
Qathering member data
Tagging content
%apping the type of data gathered to the appropriate tags in the content
2ynamically rendering the selected content within a standardi/ed page
The $ontent %anagement Ingine and Knowledgebase
The site must have a viable engine for building the knowledge that
members want to get to. As with any content management system, the site must be able to
collect, reposit, and publish content. &pecifically, in the conte*t of a community site, the
content management engine must!
Allow members to actively contribute to the knowledgebase of the site. )ot only
does this provide a wide base of knowledge flowing into the site, it also brings
affiliation to its ma*imum depth. %embers are most affiliated when they
contribute as much as they receive from the community. 5ots of member
contributions are good only as long as they are pertinent well1structured content.
Thus, it is particularly important in a community site to build a strong, simple
metadata framework that naturally guides members to contribute relevant well1
tagged information.
7ave a repository with a fine level of granularity to support ma*imum
3e able to take feeds from the semi1structured sources that will come out of the
message center.
Allow the repository to grow in a constrained way with content e*piring when
needed, missing or scanty information clearly identifiable, and new content areas
able to be presented and pushed out to the members and host as needed.
The %essage $enter
The message center is the communication hub for the site. #t includes any or
all of the following technologies!
3asic email
$hat and hosted forums
Threaded message boards
)et meetings
)et presentations
%ember location services
%ember classifieds and goods or services e*changes
The e*act number and types of technologies used depends on the common
interest domain, the computer savvy of the members, and their degree of affiliation.
Qenerally, the more affiliation your community can muster, the more members will put
the time and energy into these communication channels. The best sign of a low affiliation
community is one where all of the bulletin boards are empty.
The system behind the community must obviously support these
communication vehicles. #n addition, it must harvest from them and successfully transition
their semi1structured, real1time output into more enduring knowledge that can be
delivered along with the rest of the content in the site?s knowledgebase.
=ser %anagement &ystem
%ember data is essential to the system behind the community. #n addition
to being the basis for personali/ation, the system needs this data for a variety of other
purposes such as!
%ember bulletins and global emails
%ember rights to particular content in the knowledgebase
%ember rights to the communication services in the message center
%ember rights to submit and modify content
Administration of member fees or other initiation rites
or all of these purposes, the site needs a strong and e*tendable user data
management system.
The %embers
%embers ,oin the community for affiliation and knowledge. Again, when
they are fully immersed in the community, they contribute as much of these goods as they
receive. The purpose of the site and its underlying system is to facilitate the e*change of
affiliation and knowledge among the members. %uch more tangibly, members come to
the site to!
ind new content of interest
<articipate in a communication forum
ind members to interact with
$ontribute content
Qet updates on content that they have previously stated is of interest
&ee what is going on
All of these activities consist of uploading and downloading messages, files,
content, and data. The goal of the site?s system is to use these mundane upload and
download actions in such a way that they create a sense of place and belonging in the
The 7ost
The community?s host is the organi/ation that is in charge of the site?s
infrastructure and maintenance. There are two typical hosts!
A commercial host has the members as a target market for goods or services. This
host is willing to trade the cost of maintaining the site in return for e*posure to the
members. #n a typical scenario, the host has the original idea for the community,
creates an initial implementation of the site?s system, fills the system with enough
content to be viable, and then launches the site and opens it to members. The host
continues to feed content in, administer user data, and create communication
events. The ma,or issue to resolve in this circumstance is what rights the host will
need in order to use member data outside of the community. There is a delicate
balance here between the members leaving because they feel e*ploited and the
host leaving because they see a lot of cost and little return from the community.
A member host is one or more potential members who decide to create a web
presence. Typically, there is some e*isting trade or interest organi/ation with a
current membership that organi/es and funds the initial system. As with the
commercial host, the member host creates an initial implementation of the site?s
system, fills the system with enough content to be viable, and then launches the
site and opens it to members. The key issues here are continued funding of the site
from often cash1strapped organi/ations and sufficient attention paid to the site
maintenance by what is often volunteer run organi/ations.
or both the commercial host and the member host the primary issue is to
make the site truly belong to the members. As time goes on, members should be the ma,or
contributors to the site, with the host having to supply less and less content. #n a high
affiliation community, members even plan and e*ecute the communication events (chats,
net meetings, etc.) #f the community is successful, it is because the host has created the
system that promotes affiliation and targeted knowledge gathering among a group of
people who naturally gravitate to a clearly stated and well founded common interest.
+eb content management has evolved far beyond the management of static
html pages. $ontent management is more than presenting internal or e*ternal
communications, more than publishing newsletters or event listings. $ontent management
today is a comple* set of processes, oftentimes involving a geographically distributed
production team from diverse functional areas, multiple process steps, and e*ceptional
amounts of information regarding publishing requirements and the targeting of content. #n
this paper, # have tried to give an insight to the reader the relevance of content
management systems and their abstraction using N%5.