You are on page 1of 173

Open Source Web Content Management in Java

Release 1.0 February 2008

Open Source Web Content Management Options in Java


Seth Gottlieb Version 1.0, Workgroup License Copyright 2007 Content Here, Inc.

License Agreement and Disclaimer


Workgroup License This report is licensed under a "Workgroup License" that allows your company to make this report available to up to ten (10) staff members. It may not be shared with customers or copied, reproduced, altered, or re-transmitted in any form or by any means without prior written consent. Any rankings or scoring information may not be used in promotional materials. Disclaimer This report is intended to be an overview of the technologies described and not a recommendation or endorsement of a specific platform or technology strategy. The most appropriate platform for your use depends on the unique requirements, legacy architecture, and technical capabilities of your organization. Content Here, Inc., cannot ensure the accuracy of this information since projects, vendors, and market conditions change rapidly. Content Here, Inc., disclaims all warranties as to the accuracy or completeness of the information in this report and shall have no liability for errors, omissions, or inadequacies in the information presented.
All Rights Reserved. Not for Redistribution.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 2

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 3

Acknowledgements
Thanks to the following people for reviewing sections of this report for accuracy Elie Auvray (Jahia) Kevin Cochrane (Alfresco) Arj Cahn Alexander Kandzior (OpenCms) Boris Kraft (Magnolia) Steven Noels (Daisy) Jennifer Gottlieb provided copyedit services and general encouragement to help me complete this report. Glenn Barnett customized the XSL style sheets used to format the report. Cover Art The photograph used on the cover was taken by Tan Quang Tuan [http://www.flickr.com/ photos/e8club/] and published under the Creative Commons Attribution 2.0 License on Flickr.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page i

Table of Contents
1. Introduction ................................................................................................................. 1 The Demand for Open Source Java Web Content Management ................................. 1 The Need for This Report ........................................................................................ 2 Organization and Methodology ................................................................................. 2 2. Open Source WCM Marketplace .................................................................................. 5 State of the Industry: Web Content Management ....................................................... 5 Market Characteristics and Trends .................................................................... 5 Core Enterprise Requirements .......................................................................... 8 Market Summary ........................................................................................... 12 Open Source Market Segmentation ........................................................................ 13 Community Open Source ............................................................................... 13 Institutional Open Source ............................................................................... 16 Commercial Open Source .............................................................................. 17 3. Product Evaluations ................................................................................................... 21 Informational Brochure ........................................................................................... 21 What Makes a Good Informational Brochure Platform? ..................................... 21 Informational Brochure Platform Market Overview ............................................ 26 Apache Lenya 2.0 .......................................................................................... 27 Daisy 2.1 ....................................................................................................... 43 Magnolia 3.5 Enterprise ................................................................................. 62 OpenCms 7.0.3 ............................................................................................. 78 Informational Brochure Platform Summary ....................................................... 95 Web Content Management Framework ................................................................... 97 What Makes a Good WCM Framework? ......................................................... 98 WCM Framework Market Overview ............................................................... 102 Alfresco 2.2 WCM ........................................................................................ 104 Hippo CMS 6.05.02 ...................................................................................... 123 Jahia Enterprise 5.0 ..................................................................................... 139 WCM Framework Market Summary ............................................................... 154 Round Up ............................................................................................................ 156 Comparing with Commercial Products ........................................................... 157 Selecting a CMS and Beyond ....................................................................... 159 Glossary ...................................................................................................................... 161

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page ii

List of Figures
3.1. Lenya Architecture Diagram: Use Case Framework ................................................... 30 3.2. Lenya Screenshot: Edit Menu .................................................................................. 32 3.3. Lenya Screenshot: BXE Editor ................................................................................. 33 3.4. Lenya Screenshot: Kupu Editor ................................................................................ 34 3.5. Lenya Screenshot: Editing Structured Content in BXE ............................................... 35 3.6. Lenya Screenshot: Site Tab ..................................................................................... 35 3.7. Lenya Screenshot: Lenya Localization ...................................................................... 36 3.8. Lenya Screenshot: Image Dialog ............................................................................. 37 3.9. Lenya Screenshot: Workflow Syntax ........................................................................ 38 3.10. Lenya Screenshot: Edit Permissions ....................................................................... 39 3.11. Daisy Architecture Diagram: Daisy Architecture ....................................................... 46 3.12. Daisy Repository Server Architecture ...................................................................... 48 3.13. Daisy Screenshot: Defining Field Types .................................................................. 50 3.14. Daisy Screenshot: Content Actions Menu ............................................................... 51 3.15. Daisy Screenshot: Link Builder ............................................................................... 53 3.16. Daisy Screenshot: Editing a Navigation Document ................................................... 54 3.17. Daisy Screenshot: Editing Image Properties ............................................................ 55 3.18. Daisy Screenshot: Daisy Diff .................................................................................. 56 3.19. Daisy Screenshot: Defining ACLs ........................................................................... 57 3.20. Daisy Screenshot: Faceted Browsing ...................................................................... 58 3.21. Magnolia Screenshot: Configure Subscribers .......................................................... 66 3.22. Magnolia Screenshot: Browsing in AdminCentral ..................................................... 67 3.23. Magnolia Screenshot: Page Layout ........................................................................ 68 3.24. Magnolia Screenshot: Edit Dialog ........................................................................... 69 3.25. Magnolia Screenshot: Localized Edit Dialog ............................................................ 70 3.26. Magnolia Screenshot: Site Designer ....................................................................... 73 3.27. Magnolia Screenshot: Configure Cache .................................................................. 74 3.28. OpenCms Screenshot: Editing Structured Content ................................................... 81 3.29. OpenCms Screenshot: Configure Search Index ....................................................... 82 3.30. OpenCms Screenshot: Database Replication Module .............................................. 83 3.31. OpenCms Screenshot: OpenCms Workplace Interface ............................................ 84 3.32. OpenCms Screenshot: Editing XML Pages ............................................................. 85 3.33. OpenCms Screenshot: Link Checking ..................................................................... 86 3.34. OpenCms Screenshot: Localizing Content .............................................................. 86 3.35. OpenCms Screenshot: Direct Edit Interface ............................................................ 87 3.36. OpenCms Screenshot: Insert Image ....................................................................... 88 3.37. OpenCms Screenshot: OCEE LDAP Connector ...................................................... 89 3.38. OpenCms Screenshot: Content Tools ..................................................................... 90 3.39. Architecture Diagram: Structured Publishing .......................................................... 100 3.40. Alfresco Architecture Diagram .............................................................................. 106 3.41. Alfresco Architecture Diagram: Repository Services ............................................... 108 3.42. Alfresco Screenshot: Web Content Properties ....................................................... 110 3.43. Alfresco Screenshot: Browse Site View ................................................................. 112 3.44. Afresco Screenshot: Sand Boxes ......................................................................... 113 3.45. Alfresco Screenshot: TinyMCE Formatting Buttons ................................................ 114 3.46. Alfesco Screenshot: Image Position Dialog ........................................................... 114 3.47. Alfresco Screenshot: Workflow Dialog ................................................................... 116 3.48. Alfresco Screenshot: Managing Permissons .......................................................... 117 3.49. Alfresco Static Deploy Model Diagram .................................................................. 118 3.50. High Level Hippo Architecture Diagram ................................................................. 125 3.51. Hippo Architecture Diagram: Forms Generation Architecture .................................. 127 Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page iii

Open Source Web Content Management Options in Java 3.52. 3.53. 3.54. 3.55. 3.56. 3.57. 3.58. 3.59. 3.60. 3.61. 3.62. 3.63. 3.64. 3.65. 3.66. Hippo Screenshot: Taxonomy Browser ................................................................. Hippo Screenshot: Document Browse Interface ..................................................... Hippo Screenshot: Xopus Editor Integration .......................................................... Hippo Screenshot: Managing Permissions ............................................................ Hippo Screenshot: To Do List .............................................................................. Hippo Screenshot: JSF Repository Browser Demo ................................................ Jahia Architecture Diagram: Distributed Architecture .............................................. Jahia Code Sample: Content Type Definition ........................................................ Jahia Screenshot: In-Context Content Management ............................................... Jahia Screenshot: Forms Based Editing ................................................................ Jahia Screenshot: Advanced Search Form ............................................................ Jahia Screenshot: Version Differences .................................................................. Jahia Screenshot: Workflow Approval Page .......................................................... Jahia Screenshot: Field Level Access Control ....................................................... Jahia Screenshot: Personal Portal Page ............................................................... 128 130 131 133 134 135 143 143 144 145 146 147 148 149 150

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page iv

List of Tables
1.1. High Level Summary of Products Reviewed ............................................................... 3 1.2. Scoring Key .............................................................................................................. 4 2.1. How Community Projects are Governed ................................................................... 13 2.2. Commercial Open Source Revenue Sources ............................................................ 18 3.1. Informational Brochure Strengths and Weaknesses ................................................... 26 3.2. Lenya Project Overview ........................................................................................... 28 3.3. Lenya 2.0 Summary ................................................................................................ 41 3.4. Daisy Project Overview ........................................................................................... 44 3.5. Daisy 2.1 Summary ................................................................................................. 60 3.6. Magnolia Project Overview ...................................................................................... 63 3.7. Magnolia 3.5 Enterprise Summary ............................................................................ 76 3.8. OpenCms Project Overview ..................................................................................... 79 3.9. OpenCms 7.0.3 Summary ....................................................................................... 93 3.10. Informational Brochure Score Summary .................................................................. 96 3.11. Informational Brochure Strengths and Weaknesses ............................................... 102 3.12. Alfresco Enterprise Project Overview .................................................................... 105 3.13. Alfresco 2.2 Summary ......................................................................................... 121 3.14. Hippo CMS Project Overview ............................................................................... 124 3.15. Hippo 6.05.02 Summary ...................................................................................... 137 3.16. Jahia Enterprise Project Overview ........................................................................ 140 3.17. Jahia 5.0.3 Summary ........................................................................................... 152 3.18. Informational Brochure Score Summary ................................................................ 155

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page v

Chapter 1. Introduction
The Demand for Open Source Java Web Content Management
Not long ago, companies looking for an open source Java web content management system (WCM) had limited options. While the open source content management system (CMS) community as a whole was thriving, most of the activity was on the PHP and Python stacks. The main Java options were Apache Lenya and OpenCMS. If you wanted a simple, widely used technology that your users would like, neither of these options looked very attractive. This state of the market was frustrating for many companies that had standardized on the Java platform and wanted to take advantages of the opportunities afforded by open source content technologies. The building blocks have been available for a long time. The Java world is rich with frameworks that provide core services like persistence, access control, data validation, and presentation. Many companies have used these components to build custom systems that fit their needs. However, these homegrown systems tend to languish without a continuous commitment to maintenance and enhancement. Adding certain core content management features can be prohibitively complex. For example, adding versioning and/or localization to a data model that was not originally designed for it can disrupt the whole application. Furthermore many in house development teams building these systems do not have the wealth of subject matter experience that a dedicated content technology development team would have. Lessons learned can only be applied in the next release of the application - if there is one. The state of the market is rapidly changing. More products are emerging and some of the older projects are seeing a resurgence. The momentum behind Java web content management (WCM) technologies started to surge in early 2006 when open source business applications began to get the attention of enterprise buyers who were having success with infrastructure products like Linux, Apache, and MySQL. Java was a natural requirement for large enterprises who had standardized on the language. At the same time, commercial open source vendors were starting to notch up their offerings and connect with these interested buyers. Many companies are reporting successful implementations using a new breed of Java WCM technologies. If you were disappointed the last time you looked for a Java web content management platform, it may be time to look again. Companies that have successfully implemented solutions based on these platforms talk of lower project start-up costs and similar (not greater) integration and maintenance costs. Typically, they have strong development teams or rely on systems integrators to manage the systems for them. These same companies tend to have a history of frustration with commercial software because they do not feel that the value is commensurate with the licensing costs (because they spend so much time or money doing integration work) or they feel under-served by technical support and would like to be less dependent. Companies have found the greatest leverage using open software to power basic informational web sites and also to provide content management services to highly dynamic, transactional or interactive web applications. As you will see in the pages of this report, the Java open source content management marketplace is rich with options in these categories. While the Java products still lag PHP and Python based systems in terms of social media oriented features and community size, they have good support for the more fundamental content Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 1

Introduction

management functionality and several of these products offer the assurance of commercial support packages.

The Need for This Report


Now that there are a growing number of viable open source web content management choices available, a technology decision maker needs to be more informed than ever. Even if the best technology fit is a commercial product, the technology decision maker now needs to be able defend his choice of commercial software by demonstrating a knowledge of open source alternatives that were rejected. The answer "we looked at open source and it was all bad" is becoming weaker and weaker as a response to a challenge to consider open source. Alternatively, if an open source system is selected, it pays to have a deeper understanding of how the technology works than you can get away with commercial software. If the information is out there, there is no one to blame for your ignorance. Just like you don't want to announce your commercial software selection after the vendor has filed for bankruptcy, you don't want to select an open source project that has governance or process issues. Despite the increasing relevance of open source software, traditional analyst firms have been slow to cover this sector of the market. Their standard evaluation processes do not work as well on open source communities or businesses with small or non-existent sales and marketing budgets. The buying practices of most customers are equally reliant on sales and marketing efforts by software vendors and rendered equally effective by the open source model. It is not that open source projects are secretive. In fact more information is available because coordination and communication usually happen out in the open. It is just that the information is spread thinly across many sources and people. Compilation and interpretation take a lot of work and a different set of skills than your typical career analyst. In order to understand an open source application, you need to use it, configure it, and interact with the community (actively and passively). The source code itself also contains valuable information about the development standards and history of the project. It takes time to learn the personalities and group dynamics of the community. Not that it wouldn't be nice to know all this information about commercial software - it certainly would. It's just that commercial software doesn't allow you that access.

Organization and Methodology


This report evaluates seven open source Java WCM systems: Alfresco, Apache Lenya, Daisy, Hippo, Jahia, Magnolia, and OpenCMS. You can save yourself the trouble of leafing through the pages to find a "blue ribbon" winner or a magic quadrant. There is no universally superior product. Each has its strengths and weaknesses. The astute reader will notice that not all of the products reviewed in this report qualify under the Open Source Definition [http:/ /www.opensource.org/docs/osd]. Many of the newer, fast moving entries in the open source marketplace are backed by companies (often venture-funded) that are exploring different models to build viable businesses out of free software. The upside to the customer is that there is more potential for accountability (and continuity) with a company than a nebulous community. The potential downside is that, if the business fails, the product will probably fail too. Potential buyers of these commercial products should pay close attention to the business model. The business should be viable but the pricing should be proportionate to the value that is provided. For the commercial open source products reviewed in this report, there will be a commentary on the business model and success of the company. As in my report Content Management Problems and Open Source Solutions [http:// contenthere.net/articles/optaros_cmsReport_012206_sgg.pdf], I take the approach of Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 2

Introduction

positioning the each project in categories of use where it typically excels. In the above mentioned report, I used the categories: Informational Brochure Site, Online Periodical, Collaborative Workspace, and Online Community. The projects described in this report fall into two categories: Informational Brochure and Web Content Management (WCM) Platform. There is also a discussion of the overall web content management marketplace and how open source software fits in. Many of the products reviewed in this report are commercial open source meaning that a software company develops the product as part of their business strategy. For these products, I discuss how the company makes money off the software: whether they sell a commercial version of the software that is better than the free version ("tiered product") or whether the revenue comes entirely from selling support services for the free version.

Table 1.1. High Level Summary of Products Reviewed


Platform Alfresco WCM Apache Lenya Daisy CMS Version Product Type 2.2 2 2.1 Commercial: Tiered Product Community Commercial: Support Commercial: Support Commercial: Tiered Product Commercial: Tiered Product Community/ Commercial Started 2005 2002 2003 Primary Use WCM Framework Informational Brochure Informational Brochure WCM Framework WCM Framework Informational Brochure Informational Brochure Corporate Intranet Also Used Informational Brochure Multi-site hosting Knowledge Base, Documentation Site

Hippo CMS Jahia Enterprise Magnolia Enterprise OpenCms

6 5 3.5 7

2000 1998 2003 1999

For each of the projects reviewed in this report, I have subscribed to the mailing list and monitored the volume and nature of the activity. I have talked to users of the software. I have built prototypes that involve defining content types, setting permissions, and developing layouts. To ensure factual accuracy, each evaluation has been reviewed by a project committer or company officer. Within each evaluation, I discuss the architecture and integration potential, usability factors, the community, and how the project seems to be trending. For the business oriented reader, the content contribution and presentation sections describe how the application is used to manage content and what type of visitor facing functionality is possible. For the technical reader, the architecture and development sections describe how the product works behind the scenes and can be configured and integrated. Although, I do not give overall ratings of the product, I do rate each product along certain common criteria.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 3

Introduction

Table 1.2. Scoring Key


Score Explanation Not available Below average Average Above average Exceptional

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 4

Chapter 2. Open Source WCM Marketplace


State of the Industry: Web Content Management
Web content management has been an interesting market to follow over the past few years. At any point in time there seems to be a lot happening; but if you take the long view, not much has changed. The content management system market remains immature despite years of mergers and acquisitions and the entry of large vendors that tend to consolidate markets. There are no SAPs of web content management and although IBM, Oracle, and Microsoft have their products, they are lagging rather than dominating the WCM market. Buyers of WCM technology quickly become disoriented by the number of options - not just the number of products, but also the different ways to acquire a platform. The old question of "build versus buy" seems laughably simplistic when you consider that in addition to buying a commercial product, a company can share the technology by using open source or rent it from a Software as a Service (SaaS) vendor. But whatever they choose, they will be doing a considerable amount of building because the term "out-of-the-box" is generous with most WCM features. Furthermore, because of the diversity of requirements and the complexity of implementation, there are no unequivocal successes with any product. No customer can say that they made all the right compromises, achieved their return on investment, and have their users rushing back to their desks to use their CMS. The qualified successes that customers do report are more likely to be attributed to the execution of the implementation than the merits of the platform used. Many buyers in the market now are replacing technology that failed their expectations, are skeptical of commercial products, and are looking for alternatives.

Market Characteristics and Trends


There are a few characteristics that make the WCM market vulnerable to open source business models: the fact that WCM solutions are really frameworks, the fragmentation of the commercial marketplace, and the rejection of ECM as a strategy for managing web sites.

Platform vs. Business Application


Web content management is not a turn-key solution. Customers have different functional requirements and different content that they want to manage. Most importantly, however, they want their web sites to look and behave differently than other sites on the web. Often some of the uniqueness of an implementation is not necessary and reflects the quirkiness of the customer. The diversity of requirements and business processes also reflects a lack of accepted best practices. Still, the very nature of content management and how intertwined it is with other business processes and organizational structures makes it difficult to impose a onesize-fits-all solution. In fact, many large companies operate multiple specialized WCM systems across their organization. Because company requirements are unique, web content management products are constructed more like toolkits than out-of-the-box business applications. The industry's accepted implementation cost for a web content management system starts at two to three times the license cost of the product. The middle tier web content management products have an average deal size that start in $150,000 plus range. That puts the cost of a WCM project at around $600,000 - and the numbers just go up from there. Of course, most WCM deployments Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 5

Open Source WCM Marketplace

are accompanied by other expensive and difficult to predict activities such as a site re-design or a corporate re-organization that complicate and delay the implementation work. While companies are used to spending money on the technology, they need to be careful not to divert resources away from the things that matter most: good content and good processes. When faced with these numbers, technology buyers often feel like they are not getting value out of the platform and may be tempted to try building something on their own. Companies that have gone this route, have found mixed results. To someone new to content management, a web content management system looks a lot like any other data management application, and most developers have built plenty of those. But content management is different, and developers usually discover these differences after it is too late to efficiently incorporate them into the design. [For more information on why content management is different from typical data management applications, see sidebar Homebrew CMS] Homebrew CMS Before you build the one billion and first CMS, here are some things that typically burn generalist architects in the process: Versioning. Frequently, the single requirement that kills a custom CMS is versioning especially if it is added in after the initial design. Versioning is hard. It is hard because it makes your data model more complicated. It is hard because it is a concept that most generalist architects haven't implemented before. There are all of these interesting nuances like how often to create a new version (with every save, or every time it is published?) or the need to link to a specific version of an asset or just the latest version. Localization. Localization isn't just about Unicode; it is a whole other dimension of your content repository. While adding versioning doubles the complexity of a data model, versioning combined with localization makes chaos if you are not careful. Does each translation have multiple versions? Or does each version have multiple translations? What language do you fall back to if you don't have a translation of an asset in the requested language? What is the relationship between the URLs of the translated sites? How do your presentation templates handle it when text runs right to left or up and down? Do all of the attributes of an asset need to be translated or can some things (like images) be shared? Deployment and dependency management. Content, especially web content, is interrelated. Pages reference images and have links to other pages. If you are going to deploy a piece of content to the presentation tier, what will you do if the related assets are not ready for publishing and/or not deployed? Would you even know? Usability. While the content management market cannot claim to have mastered usability, it probably spent more time refining user interfaces than you can afford to. Usability is probably the most common reason why companies abandon their home grown CMS. Access control. Most software systems are designed to manage access control by function, not by data. Most (although definitely not all) content management systems have figured out a manageable system for controlling permissions around data. Source: Homebrew CMS [http://contenthere.blogspot.com] Many potential content technology buyers are just looking to augment a custom web application with content management services so that the text and imagery of the application Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 6

Open Source WCM Marketplace

does not have to be deployed as code. These buyers are usually quickly frustrated by the cost and limitations of commercial products. They don't intend to use many of the features that the WCM system offers and, as a development platform for building custom functionality, the system is sub-optimal. In addition, custom code built on a proprietary platform is not portable so there may be a considerable risk and expense of lock-in. The web content management framework section of this report describes these uses of a WCM platform.

Market Fragmentation
A few years ago there was a count of 1,800 software applications that called themselves content management systems; anecdotal information indicates that the number is growing and not shrinking. New products continue to emerge (I was at a conference in November 2007 and met two people who were starting to build new WCM products.) and old products don't seem to die. It is surprising how long a small CMS vendor can survive off the maintenance revenues from a tiny install base. As one long time content management veteran said, "the WCM market is in dire need of a Darwinian event." Market fragmentation is rife in the open source world, too (especially in the content management sector), and comes at a great cost: developer resources are spread too thinly across too many projects. But the absence of a "winner" in the commercial market takes away a safe, automatic choice and forces technology decision makers to look at alternatives. Every option appears equally risky from a market share perspective. The market has appeared to be on the verge of a massive consolidation for years. That it hasn't happened yet means that it will never happen - or that it is due. Interestingly, one of the few products that did disappear was at one time considered one of the safest bets: Microsoft CMS 2002.

Return of the Pure Play


Further confusing the marketplace is what may be called "the demise of the top tier." This happened when the high end WCM vendors started to compete with the big document management vendors over the ECM market by trying to expand their products to manage all kinds of content. This focus on non-web content management caused these vendors to neglect the web at precisely the wrong time: when Web 2.0 was starting to take off. While pure play WCM vendors were starting to incorporate features like RSS support and refining their user interfaces with AJAX, the top tier WCM vendors were actually making their products harder to use by cluttering too much diverse functionality into the same platform. In just four short years the titans of web content management went from being able to define the category to needing to defend their relevance. Both middle tier commercial and open source products that have kept their WCM focus are seeing a competitive advantage. The industry where there has been the greatest unrest is in media and publishing that used to be served by products like Vignette, (what is now called) FatWire Content Server, and Interwoven TeamSite. Despite being born at C|Net and being an industry leader in media and publishing, Vignette now lumps media companies along with Tel Co in its client list. Most of Vignette's largest customers are moving off of the product. Not surprisingly, media companies are most actively adopting the WCM framework products described later.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 7

Open Source WCM Marketplace

Core Enterprise Requirements


With the return of the pure play comes the need for "Enterprise Grade" web content management and there are certain attributes that an Enterprise Grade WCM system should have. These are the features that enterprise architects and system administrators look for in all of the applications that they evaluate.

LDAP Integration
In a large company with thousands of employees, administering user accounts across many disparate systems is a real challenge. While a marketing brochure web platform may only have a few users, an IT organization would prefer being able to terminate access to it, along with every other system in one place: in the centralized LDAP directory. Most large enterprises will want to integrate their content management systems into their corporate LDAP user profile repository for authentication and authorization. All of the content management systems reviewed here support that feature. Authentication is the easy part, the complexity is in authorization (that is, determining what privileges a user has). Privileges are usually determined by the user's roles, which are either applied directly to the user or to a group that the user is a member of. The question becomes where to manage the groups and roles. Because the corporate LDAP is central to the whole company, it may not be easy for a content management initiative to insert the necessary information into this shared resource. For example, say you need an "author" group. The rest of the company might not care enough about this group to add it to the LDAP structure. There may also be collisions with other existing groups. A web author may be very different than a technical documentation author. Performance may also be at risk if the CMS needs to consult an external system every time it calculates whether a user can see an asset. All these issues can be worked out, but it takes collaboration and cooperation from across different departments and business units and that is neither quick nor easy to achieve. Another issue arises when there are external users that need access to the CMS but do not meet the criteria for an entry in the central LDAP repository. A common design is to have a local profile repository with a fall-back to the LDAP directory. In fact, nearly all systems keep a local store of user profiles. This is necessary because when a profile is removed from the LDAP directory, the CMS still needs to remember that user for ownership information and its audit and version histories. The design of the LDAP integration may determine where the roles and groups are managed. Some LDAP integration is done by regularly importing records from the central LDAP directory. If this is the case, any role or group assignment configured in the CMS will be overwritten the next time the user repository is refreshed. If the system uses a pluggable authentication architecture, it can consult external directories if the user is not in the local user repository. Once a user is authenticated against the external directory, a local account is created. However, the system should still verify authentication credentials with every login to ensure that the user has not been centrally de-activated.

Resiliency
The people who run data centers do not like midnight calls telling them the CMS is down; they have too many other things to worry about. They look for technologies that can be configured on multiple servers so that when one crashes the whole system does not become unavailable. Mission critical systems usually get deployed across geographically dispersed data centers so that a catastrophic event will not bring down the system. Typically, the single point of failure will be the database and that can be solved through database replication technology. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 8

Open Source WCM Marketplace

Good systems administrators will also care about resistance to data corruption and the ability to back-up and restore the system. Again, the underlying components that support persistence usually are responsible for this. In a content management system, complications arise when content is stored across multiple systems that need to be in sync. Most commonly this will be the file system for binary files and the database for structured text content and metadata. Search indexes are a third place where data are stored but they are not so much of a problem because you can usually re-index the entire repository. Look for technologies that have documented back-up scripts and procedures. Ideally these will not require turning off the system to execute the back-ups. If live back-ups cannot be done, back-ups will be done less frequently increasing the potential for data loss. Another feature is de-coupling the authoring and delivery environments. This prevents intensive authoring activity from degrading the performance of the external web site and high traffic loads on the web site from making the authoring environment unresponsive.

Scalability
Even if the initial use of the system is not intensive, large companies tend to avoid applications with limited scalability. If it the application turns out to be successful and has the promise of an enterprise-wide deployment, the technology should not stand in the way of realizing a business opportunity. Of course, there is the case for deploying un-scalable prototypes to test the idea and then rebuilding the application if it has business potential. Google has experienced success with this model. Open source certainly fits into this strategy by taking licensing out of the experimentation and start-up costs. This report, however, makes the assumption that the buyer has a lower R&D budget than Google and is looking for a sustainable solution.

Versioning and History


Regardless of whether prior versions are ever consulted or rolled back to, versioning is assumed to be a necessary requirement for any enterprise grade content management solution. While all products in this report have versioning, there is some variability in implementation that may be worth considering. For example, some CMS save a version only when a user requests it. Other systems save versions with every publish, some do it on every save. Versions with every save can be overkill especially if the system automatically does a periodic save to prevent data loss of connectivity is severed. Versioning on request may rely too much on the user. Some users think to do it. Some do not. However, the version on request model can usually be automated by a workflow step to automatically save a new version when the asset is published or passes through some other workflow state. Sometimes auditing is what people are really looking for when they ask for versioning. Auditing records what was done to the asset over time and by whom. While auditing is good for accountability and understanding what happened, auditing cannot roll back a change. Another good use for auditing is as a measurement of how volatile and (sometimes) important a piece of content is. You can assume that a piece of content that has been updated a lot over time is valued within the organization. However, you cannot assume that a piece of content that does not have an active edit history is not valuable - it may contain information that does not change or it just may have been done right the first time.

Compatibility and Interoperability


From an IT perspective, the most important aspect of a CMS is often how well it integrates with other systems in the architecture. Content that is managed in the CMS usually needs to be visible in other platforms or publishing channel. Information that is managed by other platforms frequently needs to be displayed by the content management system along side other managed content. In general, application level integration (through an API) is preferable Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 9

Open Source WCM Marketplace

to data integration by replicating data across repositories. There are several reasons for this. First, data integration is brittle. Content technologies do not promise to keep their database schema the same like they commit to APIs. Secondly, a CMS may do a lot of processing when they add or update an asset, for example: firing an event to update links, clearing cache, and updating search indexes. If you follow the rules and integrate at the API level, the storage mechanism makes less of a difference. The only reason to care would be if you had database management competencies specialized on a particular product (like Oracle). In fact, compatibility with existing technical skills is often more important than system interoperability. In particular, the projects covered in this report rely heavily on specific open source web application frameworks: Cocoon, Struts, MyFaces, etc. Knowledge of these frameworks is very helpful. Having skill in the Java technology stack is critical if you expect to have any responsibility for managing the platform. Some Java WCM platforms are certified or known to run on specific servlet containers or application servers. If you happen to run Tomcat or JBoss, you are in good shape. The products in this report either ship with Tomcat or (in the case of the Cocoon based projects) have good documentation on how to deploy the application to a Tomcat container. Theoretically, if the application works on Tomcat, it should work any certified container. However, from the mailing lists, users of more elaborate J2EE server platforms (such as Websphere, WebLogic, or Sun) tend to have configuration questions that the general community is less prepared to answer.

Usability
Regardless of how extensive the platform is, unless the users perceive the user interface as being usable and intuitive, the solution will not be regarded as a success. Users will look for ways to avoid using (or misuse) the system and by doing so undermine the business value of the initiative. James Robertson from Step Two Designs astutely observed that enterprise applications need to be simpler and easier to use because users rarely get adequately trained during company-wide deployments (see article More Users = Simpler CMS [http:// www.steptwo.com.au/papers/cmb_moreequalssimpler/index.html]). This goes against the long standing trend in enterprise software that values number of features over usability. A revolution of sorts is going on where business users are starting to reject the assertion that enterprise software needs to be more complicated than consumer tools that they like to use. Project managers are forced to behave more like a commercial product managers than serving a captive audience that has no choices. Despite its importance, usability is hard to measure because it is so subjective. This report attempts to address some obvious strengths and weaknesses and common observations about each product's usability. Only your users, however, will be able to tell you if they consider the solution usable. The basic fact is that content management systems strive to solve a set of hard problems. There is an inherent conflict between the competing needs of the authors of content and the audience. An author just wants to spend time making the asset useful and pleasing to him and tends to focus on the content and layout of the asset. He tends to resent spending the time to add metadata and structuring the content that will make the asset easier to find by others and more re-usable. Systems that strictly enforce tasks that are not regarded as important or break the flow of producing content tend to be criticized for usability. Systems that are lax in this area become unmanageable and content becomes unfindable. Striking the right compromise between convenience and compliance is an art. Using open source technology may provide some advantages in the practice of that art but it will not solve the problem directly. Some open source adopters have found that the user interface is easier Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 10

Open Source WCM Marketplace

to scale down and simplify than the commercial products that they have tried. Others point out that they used to money saved on license costs to refine the UI to the needs and tastes of their users. However, if the choice of an open source platform is made by developers because of an affinity toward open source or, for other technical reasons, there tends to be less interest in user satisfaction than the technical aspects of the system. The general assessment on the usability of open source software is mixed. On one hand, many of these applications are written by technical people for technical people and there is a tendency to neglect the sensibilities of a technophobe. However, these characteristics are not appreciably worse than commercial content management software that is also commonly criticized for usability issues. Open source WCM developers tend to be very interested in and excited by Web 2.0 and rich internet clients. Many have taken up the challenge to use technologies such as AJAX to address usability issues.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 11

Open Source WCM Marketplace

Market Summary
While companies are losing enthusiasm for managing their web content in "ECM" products that try to centralize all forms of content onto a single platform, the search for "enterprise grade" tools to manage web content has gotten much more intense. In the absence of inexpensive, safe commercial web content management technologies, many of the perceived advantages of commercial software seem to lose their influence over buying decisions. Companies are starting to add open source software to their selection matrices but are having a hard time evaluating open source alongside commercial products. Selection processes that depend on RFP responses and vendor demos choke on options with small or no sales and marketing bandwidth. Traditional analyst firms are equally confused trying to incorporate open sources in their analysis. Unless a selection process is adapted to fully explore open source, the commercial products typically win because of the allure of a polished and well executed demo. Investing in an open source proof of concept typically levels the playing field, but few companies make the investment unless there is a particular motivation such as a senior-level directive to carefully consider open source. This has essentially happened in many of the governments across Europe that have been mandated to use open source software wherever possible. Anecdotal evidence suggests that there is no correlation between customer success and the licensing model of the application. Open source implementations often fail for the same reasons as commercial software implementations: poor requirements gathering, ineffective scope management and change control, dependencies on other systems, and not enough user training. As discussed later in this report, open source describes a very wide range of products and business models. In many cases two open source products are no more like each other than like a commercial competitor. Both commercial software and open source software can be oversold. Just like commercial software buyers are misled by claims of features being out-of-the-box, companies that adopt open source because of unreasonable cost expectations tend to fail because they under invest in other aspects of the project. Companies with reasonable cost expectations and a good understanding of the strengths and limitations of the product they are using tend to fare better. It is not unlike the trend to use off-shore development resources. Companies that blindly pursu a pot of gold in low cost labor tend to abandon the idea after disastrous first experiences. Companies that take the time to understand the model, learn about best practices, and invest in the solution tend to have better outcomes.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 12

Open Source WCM Marketplace

Open Source Market Segmentation


Most people who are new to open source see an "open source" as a category of software that is distinct from commercial software and has a different set of potential risks and pitfalls. They do not want to use open source software because it is clunky, or it is unsupported, or will cost more to integrate. Or they do want to use open source software because it is more reliable, or free of cost, or has better standards support. It is important to remember that open source is just a licensing model and there are few generalizations that one can make. The open source market is complex. At a high level, open source breaks down into three distinct sub-categories: Community Open Source, Institutional Open Source, and Commercial Open Source. These three categories are no more like each other than they are like commercially licensed software. Community, Institutional, and Commercial projects are managed differently and provide different customer experiences; it is important to understand what makes them unique.

Community Open Source


Community open source roughly aligns with the popular, if naive, image of how open source works. A loosely organized, organically grown community of developers and users works collaboratively toward a greater good by pursuing their personal interests. The main distinguishing characteristics of community based open source projects are that participation is open, voluntary, and personally motivated and there is some kind of meritocracy that rewards the best contributors with leadership and control. Anyone can join a community and people usually do so in a personal capacity. Consequently most community open source contributors are independent consultants, small consulting companies, and in-house developers that work with the community in order to be more effective in their jobs (either with or without their employers' support). Many of the open source projects that you may be familiar with are community based especially in the non-Java world. Plone, TYPO3, Drupal, and Joomla! are all community based projects. In the Java world there are the InfoGlue, mmBase, and Lenya projects. OpenCms is somewhat of an edge case that will be discussed later. Who's in Charge?

Table 2.1. How Community Projects are Governed


Apache Lenya: OpenCms: Project led by a group of independent consultants according to the Apache Software governance model. Project is run by Alkacon Software that employs all the committers. Alkacon owns the copyright of the software and runs the OpenCms.org web site. OpenCms straddles the fence between a community project and a commercial open source project.

How to Evaluate Community Open Source


In addition to the obvious (compatibility with your requirements), the strength and tone of the community are important factors to consider in a community open source project. The community will determine the future of the project: whether it will grow and how it will evolve. If Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 13

Open Source WCM Marketplace

you do the implementation yourself, the community will be an important resource for you. If you are working through a systems integrator, the community will help them work efficiently and keep their interest in the platform. There has been much study on the dynamics of communities at both the micro and macro level. Despite all the theory, in practice it is hard to consciously try to build a community and even harder to predict whether a community will grow or decline. Successful communities typically have strong leadership, a mechanism for building social bonds, and a motivating factor (which is nearly always the ability to make money). Turnover is actually a good thing because it infuses the community with new ideas and energy.

Leadership
Leadership may be the most critical and rare ingredient in an open source community. Usually there is one member that is able to set a vision and make decisions. Community members need to trust the leader's motivations and judgement. They need to feel like they are heard and understand the rationale when their ideas are rejected. There may be multiple leaders but each needs to have his own area of control. Good leaders provide leadership opportunities for star contributors. For example, there may be a rotating role of "release manager" who presides over all the activities related to the release of a version of the software. Having one so critical member of the community is always a risk. What happens when the leader moves on? The best communities have a strong culture of meritocracy and a pipeline of new leaders. It has been said that the single best test of strength for a community is its ability to endure a change in leadership. Most of the big projects have not been tested in this way. The larger projects usually form a legal entity (called a "foundation" or an "association") to institutionalize leadership and make the will of an individual subservient to the organization. This introduces formal governance practices like a board of directors and transparent decision making processes. In addition to ensuring continuity, the establishment of a legal entity creates something that other organizations can interface and partner with. If this happens, the community project may transform into an institutional project (described next) where large corporations contribute to the project similar to a joint venture. While forming a legal entity distributes the privilege and responsibility of making decisions across a group of people, the need for strong individual leadership does not go away. The leader still needs to encourage and facilitate the activity of the board of directors and step in when dynamics get dysfunctional. There are some decisions that are hard to make by committee. For example, good user interface design is not a democratic process. There needs to be a visionary that keeps things clean and consistent. Maintaining this vision sometimes requires dictatorial behavior that only a trusted leader can pull off. Good examples of selfstyled "benevolent dictators" are Linus Torvalds whose efforts kept the Linux Kernel clean and stable, Dries Buytaert who kept the Drupal core thin and extensible with modules, and Alexander Limi who keeps the Plone user interface simple and compliant with accessibility standards. Projects that try to vote on every UI decision usually trend toward bloat and entropy.

Social Interaction
Programmers are not generally known for their social skills but social interaction is an important aspect of community based open source. Social interaction forms a foundation of trust that facilitates communication. Digital communication (like email, IRC, and message boards) is the mainstay of geographically distributed development teams but the most successful projects have face to face events that put a human face behind the email address or IRC handle. Some projects have local user groups. Others have global conferences, Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 14

Open Source WCM Marketplace

activities, and programming sprints. It is not unusual to see people that no longer use the software listen in on the IRC channel or attend events for purely social reasons. Social interaction is also an important part of forming a culture and social bonds that keep participants engaged and compel each other to help each other out. These social dynamics also keep members in line. A member of one of the major PHP based WCM platforms noted that, since they encouraged users to put their real portraits on their forum profiles, members have become more friendly and helpful. When people participate on a personal level, they tend to be more accountable. Social activity also creates the opportunity for non-technical users of the application to get involved. Building and serving a non-technical community is a plateau that only a few of the open source content management projects have achieved. It is an important milestone because it allows for user input to be contributed directly in the users own words rather than as interpreted through a technical developer who filters the information through is own biases.

Economic Opportunity
The social aspects of a community are important but people can't afford to invest time and energy in a project if there is no potential to make money. Developers, by nature, are attracted to hot technologies to build marketable skills. Projects using dated technologies have a hard time attracting new people. Freelance consultants seek out projects that are widely used and need systems integration work. Fortunately, in the WCM space, all platforms (commercial, open source, and SaaS) need extensive systems integration work. Some open source platforms create opportunities for companies and individuals to sell products and services that enhance the core. For example, the Joomla! community had a very lively marketplace for themes and modules. Most of the money on the Joomla! platform was made by people selling themes (site branding modules). When the project leadership revoked the GPL exception that made add-on modules exempt from GPL licensing, there was an uproar. Like with most things, commitments must be kept.

What to Expect in a Community Open Source Project


If you are considering a community open source platform, you should expect to engage with the community or team up with a systems integrator that is already plugged in. The systems integrators for community projects tend to be smaller because there are no partner programs that attract the big consultancies. The companies will be small, so look for options. If you cannot find multiple consultancies with good reputations that you could see yourself working with, you should be concerned.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 15

Open Source WCM Marketplace

Institutional Open Source


Institutional open source is a mechanism for different companies to jointly build a platform that each party can benefit from. As opposed to closed "walled garden" ventures, institutional open source projects allow the outside world to use and contribute to the software. The difference between an institutional project and community based projects is that most of the work is done by developers who are contributed by established software companies. For example, IBM has dedicated teams of developers to work on the Apache HTTPD project and the Eclipse IDE (Integrated Development Environment), among others. Most institutional projects are infrastructure or frameworks that can be leveraged by proprietary software that adds additional value. Back to the case of IBM, Eclipse serves as the basis for their WebSphere IDE and a number of other tools that support their flagship products. Open sourcing these components reduces IBM's cost to support their infrastructure. Even more importantly, open sourcing diminishes the market value of the component to concentrate value in the proprietary products that IBM actively sells. Sun Microsystems, on the other hand, chose to build their own IDE. While Eclipse is becoming the defacto Java IDE and is also commonly used to program in other languages, Sun's product, NetBeans, is becoming largely irrelevant. Institutional projects can start out as corporate initiatives, like Eclipse, or they can come from community projects that form organizations and start to build commercial interest like the Linux kernel. There are few examples of institutional projects in the content management world. Probably the best example is not a full WCM but a content repository: Apache JackRabbit. JackRabbit is a reference implementation of the JSR (Java Specification Request) 170 specification for a Java Content Repository (JCR). Most of the developers are employed by commercial software companies (like Day Software) that wanted the specification to succeed (which it did). Next, JackRabbit will become the reference implementation for JSR 283 that enhances the JCR specification. There are a couple of community open source WCM platforms that could be on the cusp of attracting institutional-style involvement. For example, the Plone community has a well formed foundation that owns the trademark and all rights to the code. Google hired one of the two Plone founders and he spends most of his time working on the Plone platform to make it a suitable intranet solution for large scale enterprises like Google. If other companies get involved with Plone in this way and work collaboratively with Google through the Plone Foundation, it could become an institutional project. The community Java WCM platforms covered in this report do not have the size or organizational structure to become institutional projects but Java frameworks used in these products certainly do. One of the major commercial open source products, Magnolia, is built on top of JackRabbit.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 16

Open Source WCM Marketplace

Commercial Open Source


Commercial open source has been a very active sector of the WCM marketplace over the past few years. These are commercial style software companies that distribute their software using an open source licensing model. The advantage of using open source licensing is market penetration. In a crowded and fragmented industry like web content management, companies need to be innovative to get their products noticed. Open source licensing gets buyers attention with the prospect of saving money and also reduces the cost of sales because the buyer downloads the software and does much of the education/qualification work that a pre-sales engineer would do. By the time the customer approaches the software company to make a purchase, the sale is pretty much made. In the software industry where sales and marketing represents an extremely large share of operating costs, this savings is substantial. Having regional sales offices all over the world whose sole purpose is to show the product to people is expensive. Buyers in geographically remote regions are basically forced into using open source software because they do not have access to commercial software sales organizations. Apart from the sales and distribution aspects, commercial open source software companies tend to behave like traditional software companies. They have an in-house engineering team that builds the product. Decisions are made by traditional corporate reporting structures with roles like director of engineering, product management, CTO, and CEO. Depending on the company, customers can expect interaction and levels of service that are typical for the industry. The fact that the source code is available is not so special because one can always decompile the Java classes to see what is going on (however tedious and illegal that may be) and many software vendors offer code escrow programs to mitigate the risk of the vendor going out of business or discontinuing the product. If a commercial open source WCM platform costs the same and feels the same as typical commercial software, it is just commercial software? Perhaps, but not necessarily so. Most of these products have fresh new architectures that leverage widely used open source components. This not only lowers the cost of developing the software, it also makes the technology potentially more familiar to developers. For example, a Cocoon developer would feel immediately at home in the Daisy CMS code base. A Struts or Jetspeed developer would feel the same way when working with Jahia.

What to Look for in a Commercial Open Source Product


In commercial open source, the success of the software is inextricably linked to the success of the software company behind it. Yes, theoretically if the software company fails, the software can live on (just like when a commercial software product is bought by another vendor) but no customer wants to go through the uncertainty and isolation of being dependent on a vendor that goes under. Just like with any vendor, a customer needs to think about the viability and sustainability of the business when evaluating the product. The first thing to consider is how the vendor makes money - not just that they are generating revenue but also whether what they expect from you is reasonable for the value they are providing. There has been much talk in the blogosphere about open source business models as companies and venture firms grapple with the question of how to make money providing free software. A number of strategies are being actively pursued by software companies. It is important to understand these strategies and know which of them your vendor is practicing because it will form the basis of your relationship with the vendor. Commercial open source tends to make open source software more accessible to the average buyer who is used to entering into a relationship with a vendor and expects corporate Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 17

Open Source WCM Marketplace

accountability. Just like with traditional commercial software, commercial open source has support and maintenance packages and account management staff to sell them. There are essentially three ways to make money off open source software: selling a commercially licensed "enterprise" version in addition to the free "community" version; selling support and training; and selling integration services. Commercial open source companies try to stay out of the professional services business because it creates a competitive relationship with other systems integrators that could be out implementing the software and creating new support customers and market-share. While it may carry an open source license, a product that is developed and implemented by a single systems integration company has the same prospects as a commercial "consultingware" product: the install base will be small and there will be no energy in the community. Show Me the Money

Table 2.2. Commercial Open Source Revenue Sources


Alfresco: Daisy: Hippo CMS: Jahia: Magnolia: Tiered Product, Support, and Training Support and Training Support and Training Tiered Product, Support, and Training Tiered Product, Support, and Training

Support and Training


Selling support is perhaps the most popular revenue stream for open source companies. This is how the pioneers of the open source industry like Red Hat and MySQL built their successful businesses. Many software consumers buy support by reflex (as one CIO puts it "the C in CIO stands for 'chicken'.") and there is certainly value to be provided. The variability in the support model is in what the vendor does when a customer opts out of support. Some software vendors will distribute an "unsupported" version that the company will not touch and a "supported" version that allows a customer to buy an annual support contract. This forces the customer to choose whether to lock into a support relationship before implementing the software. When the unsupported version is not patched, the model becomes a "Tiered Product" model (see next section) where the delineation is along lines of quality rather than features. Unless the supported version carries a non-open source license, the software vendor should not be able to prevent the customer from using the software if they opt out of the support contract. Most open source software vendors offer a single license and allow their customers to opt in and out of support contracts at will. This forces the software vendor to prove value to their customers by providing good support. It would also enable other companies to provide competing support packages. The companies covered in this report that use a pure support model are all very small and their support organizations are staffed by senior consultants, not the 24/7 call centers staffed by "support specialists" like one would expect from a large software vendor. On the positive side, your support call (or email) will probably be answered by someone who knows what he is talking about. On the negative side, the initial response time may not be as fast and you may feel a little guilty about taking someone away from their dinner.

Tiered Product
In the tiered product approach, the free "community" version is a functionally trimmed down or otherwise inferior version of the commercial "enterprise" version of the software. The Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 18

Open Source WCM Marketplace

community version typically lacks enterprise oriented features like LDAP integration, replication and clustering, and conduits to commonly used applications or is not certified or supportable. The logic is that that companies that use these features are getting more value out of the product and should pay for that value. Companies that use the Tiered Product approach tend to have an awkward relationship with the community version of the software. They want a community to form around the community version and succeed but not at the risk of losing potential enterprise version customers. The language used to described the two products is interesting as they try to promote the enterprise version without denigrating the community version (at least not too much). Tiered Product approaches work best when a community is able to provide and receive value in the community version and the vendor is able to leverage these non-monetary contributions. If the software vendor treats these non-paying customers like "free riders" the community version becomes little more than lip service and a community is unlikely to form and contribute. The extended features may also be offered in the form of a set of extensions where the community and the enterprise versions share a common core. Depending on how the licensing works, this may create the opportunity for third party software vendors to sell competing or complementary extensions. Another variation of the tiered product model is the free version carries a requirement to display a "powered by" badge on every page. Open source experts (including the Open Source Initiative's Open Source Definition) maintain that these badgeware products are not legitimate open source because they restrict the user from modifying the code to remove the badge.

What to Expect from a Commercial Open Source Product


Your experience with commercial open source will probably be similar to traditional software companies. You will rely primarily on company employees for support, system documentation, and other resources. Some of these companies are venture funded startups with ambitious goals and aggressive drive. Others are more like small comfortable independent software vendors that are focused on stability and more moderate growth. Commercial open source vendors generally try to leverage their open source orientation and engage with their customers through a community. They will support mailing lists (or other forums) and practice at least some of their engineering process out in the open be it a publicly available bug tracking system, a view of their source code system, or doing design on a publicly available wiki. Customers respond to this transparency to different degrees. Companies that do engage have the prospect of having more influence over the product. It is worth noting that commercial proprietary software companies are starting to do similar things on their customer extranets and support sites. Community involvement, in general, is usually minimal. First, paying customers tend to get their support directly from the vendor rather than over the mailing list. Second, the community generally looks to the vendor rather than itself to power the project. One summer intern on the Daisy project vented on the mailing list after his request to review and respond to his project presentation was largely ignored by the community. Most users of the technologies see themselves more as customers than contributors. The one exception is the Magnolia project where an active community has formed around the Community Edition. OpenCms is not a commercial project but it is very close to being one. Alkacon does nearly all of the development with the exception of some database adaptors yet they have a very active community that feels ownership and responsibility for the project as well as gratitude to Alkacon for their contribution.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 19

Open Source WCM Marketplace

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 20

Chapter 3. Product Evaluations


Most readers of this report will probably skip the first two chapters and get right to the evaluations. If you are one of those readers, welcome to the report. Evaluations are divided into two categories: systems that are best suited for managing informational brochure type sites, and systems that are better for managing content in highly interactive web applications. For each of these categories, there is a description of features that are important and that can be used to measure the effectiveness of the product in that category.

Informational Brochure
Despite Web 2.0 trends that urge companies to have their web site be more than a static brochure, there is still a need for simple tools to allow non-technical users to manage a basic corporate web site. Still, it is short sighted to implement a WCM system without thinking of features that will enable a bi-directional conversation with the visitor. At the simplest level, features like web forms and RSS are desirable. Most importantly, however, the system needs to be easy to use so that employees from across the organization can efficiently publish fresh content. Now it may be enough to have a few people in the marketing department manage the web site. In the future, however, the corporate site will need to connect customers more directly with the employees that understand the products and the vision. Customers will be less satisfied with a formal press release and instead look for CEO or employee blogs. They will want to explore content through faceted navigation that makes sense to them. They will want to throw away paper product manuals and be able to read them online. The products described in this section have the basic features to efficiently run simple informational web sites such as a basic marketing web site, a corporate intranet, and a customer extranet. Many of these systems also have the capability to do more interactive functionality, but their strength is in managing informational resources. This category also includes other informational resources such as corporate intranets and customer extranets.

What Makes a Good Informational Brochure Platform?


The Informational Brochure category is defined by a set of attributes that a CMS should excel in. These system characteristics are grouped into three areas: content contribution, management and administration, and presentation. There is also a discussion about the support and community resources that are available for the product.

Content Contribution
The Content Contribution section describes the work environment used to manage content. The key areas to look at how content is organized and may be found, the content editing interfaces, localization features, and workflow.

Navigating the Repository


Adding content to the system must be simple enough for a non-technical user with minimal training to immediately understand the user interface and the process. Systems with an "insite" or "browse-to-edit" editing model, which allows a user to edit the web site as he browses, have an inherent advantage over systems that have different interfaces for management and presentation. However, these products may become awkward to use for sites that intensively re-use content because the content repository may not map one-to-one with what is shown on Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 21

Product Evaluations

the site. For example, it might not be clear that editing a content component in one area of the site will affect other pages. In a high content reuse scenario, the author composes content in a presentation neutral way and lets the presentation tier worry about content placement. The problem is that most content authors think of everything in terms of Microsoft Word that has no concept of re-use - only copy and paste. The web CMS user interface has the challenge of resolving the conflict between word processor expectations and the value of managing reusable content components. Some interfaces more successfully maintain this balance than others. There is also a tension between organizing content within a site explicitly by placing content assets within a hierarchical site map or implicitly by tagging assets and using a query based navigation. The former appeals to content managers that want full control over the visitor's experience. The latter is preferable when large volumes of content are in play or when a personalized experience is desired. It is possible to achieve a mixture of the two concepts where certain structural components of the site (such as landing pages) are explicitly organized and dynamic lists of content based on taxonomy or other query based rules are also presented to the user. In larger web sites, not all the content that a visitor sees originates from within the CMS. Product information and documentation may be authored in other systems and then imported or synchronized into the WCM repository. Although this is often done manually, the stronger platforms will have interfaces to import and exchange content.

Content Entry and WYSIWYG


Because of the Microsoft Word orientation of most business users, one should never underestimate the importance of a WYSIWYG editor. In fact, for many business users, the WYSIWYG editor is the CMS. Most commercial and open source WCM products OEM third party rich text (or WYSIWYG) editors. The most popular ones are the open source FCKeditor and TinyMCE editors which are written in Javascript and are highly configurable through properties files. Many commercial CMS use these same editors. If you do not like the editor that comes with the CMS, others can usually be plugged in. The big differentiation is how the editor is integrated with the CMS - particularly how a user is able to embed image references and links to other pages. A well integrated editor will allow a user to browse the content repository for images and pages. If the user is not able to find the right image in the repository, he should be able to upload one without leaving the page he is editing. The better integrations will also keep track of what the user has linked to and manage dependencies by warning a user when moving or un-publishing an asset will sever a link created in a rich text area.

Localization
While localization used to be a low priority requirement, many companies now see localization as central to their business strategy. Even for companies in the United States that do not have international aspirations, the growth of non-English speaking populations represents an important business opportunity. Fortunately, all of the products in this report originated in Europe where multilingualism is the norm. Each of these products has support for extended character sets and strategies for maintaining a site in multiple languages. The most primitive of these strategies is to use the multiple site capability of the system to treat the different translations as independent sites. This approach may be desirable for a marketing brochure site if the marketing organization of a company is broken down into independent regional business units. In this case, content reuse across the sites is manual and there is no centralized control over the content, how it is organized, or the branding of the sites. If localized sites are to be centrally coordinated and content is to be shared, more sophisticated localization functionality is needed. In this case, the system should be aware of Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 22

Product Evaluations

the fact that two assets are really different translations of the same content. Maintaining these relationships will enable features like automatically triggering a translation workflow when the primary language version of the asset is updated. Support for this advanced localization functionality varies between the products and each product has its own strategies and best practices. The key is to find a product that aligns with the way your business is organized.

Workflow
Realistically speaking, most informational web sites need only the most simplistic workflow models such as a single approval or even no approval at all. While complex workflows may be desirable on paper, in practice they tend to get in the way and are frequently circumvented. What is perhaps more important in a Web 2.0 world is strong monitoring capabilities so managers can see what was published and respond (if necessary), rather than be a bottleneck to publishing any content asset.

Management and Administration


To be a true empowerment and efficiency tool, a WCM system should enable distributed authoring so that the people who need to publish do not have to go through an intermediary. Removing the webmaster as the bottleneck to publishing may have been the reason for investing in a CMS in the first place. However, unless there is an effective and manageable access control system, it is not possible to open the system up beyond a small circle of highly trusted and heavily trained users.

Presentation
All of the products in this category provide a framework to build what the visitor sees as the web site. In general, this means offering a presentation templating system to render content as formatted pages and a system (commonly called a controller) for mapping URLs to pages. But the presentation tier is usually used for more than simply rendering content. Most sites today have at least some form of interactivity such as a simple search field or a mail form. More advanced sites support personalization or interactive applications that allow the user to interact with the content or other third party data. Products are evaluated along the following criteria.

Layout and Branding


As the term "brochure" implies, creative control over the look of the web site is critical for this category. Web designers need to be able to implement any layout or branding elements in the product's presentation tier. The less additional work needed to turn a static HTML mock-up into a dynamic presentation template the better. Because companies like to update their branding, the ease with which a template can be updated by an HTML designer is also important. While nearly all WCM platforms provide this control, some are more successful at softening the learning curve and reducing the risk of breaking presentation logic. Generally speaking, the more a designer can do with Cascading Style Sheets (CSS) the better. CSS achieves the core web content management principal of separating display from business logic by pulling display code out of the presentation template. Better CSS can lead to less template code development because the same HTML output can be styled in different ways to achieve different appearances. Allowing the designers to "own" the style sheets creates a clearer delineation of responsibilities and enables a smoother design life-cycle. CSS is the common denominator of all web presentation technologies. Both the CSS code and the know-how are portable from one delivery tier to another. While all CMS support CSS, Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 23

Product Evaluations

some do a better job of others. In general, CMS that try to auto-generate HTML code are a risk unless a developer can have full control over HTML element classes and IDs. The ease with which CSS files can be deployed to the delivery environment is also a differentiator. Some platforms treat CSS files as code; others treat them as deployable content assets.

Search Engine Support


Search Engine Optimization is now being regarded by everyone as a critical aspect of managing an externally facing web site. Some studies indicate that 80 percent of web traffic originates from a search against one of the major public search engines (see David Esrati's 80/40 Rule. [http://www.thenextwave.biz/tnw/?p=333]), so anything that a web site can do to increase search engine visibility is valuable. While only good content can reliably gain the attention of a search engine, there are a few things that a content management system can do to make the job of a search engine harder. Probably the biggest factor is the ability to generate good, clean XHTML that is easy to parse. Images should have alt tags describing any information that they contain and the site should be suitably cross linked. Well worded "title" and "meta description" tags will help a user understand what the page is about when seeing it in a list of search results. Other important elements to a Search Engine Optimization (SEO) strategy include having every page navigable by a link (a sitemap that links to major hubs of the site helps too). Most of these factors depend on the design and implementation of the web site. The WCM can help, however, by providing a support for metadata management (so, for example, an image in the system can have a description that is used whenever that image is used in a page) and not trying to generate too much messy, non-compliant HTML Clean URLs have become a hot topic in web site management. While it is debatable as to how much a search engine cares about the human readability of a URL, it is certainly true that human readable URLs are easier for humans to remember and type (into their blogs especially). Any web optimization initiative needs good analytics support to help measure the impact of different modifications to the site. While some of the commercial products have built-in web analytics functionality, such functionality does not exist in any of the products reviewed here but users are free to integrate their own third-party analytics packages. Friendly URLs can make the traffic reports easier to read. For a very well written article on CMS and SEO, read Non-Linear Creations excellent white paper SEO and CMS: Implementing Best Practices [http://www.nonlinear.ca/seo-cms].

Interactivity
In the world of Web 2.0, even something referred to a "brochure" cannot get away with being entirely static. Simple interactive features like search, registration forms, and registrationprotected content have been ante stakes for years. Now, with Web 2.0, your visitors are probably expecting a lot more. Dynamic media such as video and audio content have been shown to have a positive impact on site effectiveness. The ability to effectively manage, reuse, and promote this content is important. Forward looking companies are starting to see their marketing web sites as platforms for delivering and deploying applications. To build interactive tools like configurators or ROI calculators, the presentation tier needs characteristics of a development environment. It needs to have decent tools for writing and managing code, the ability to incorporate code that you didn't write (e.g. Javascript libraries), and allow developers to use familiar or easy to learn skills. There is a tension between providing flexibility to the developer and enforcing clean presentation code with minimal business logic. Technologies that use JSP leave the developers to police themselves to keep complex business logic out of the display templates. XSLT and scripting engines like Velocity rigidly enforce this separation. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 24

Product Evaluations

Multi-channel publishing
Whether or not you look to the WCM platform to provide blogging functionality, syndication support (RSS, ATOM, RDF) is important now and will only become more important in the future. Producing XML views of content is fairly trivial for most WCM technologies. The ability to read in RSS feeds from external sources and incorporate this syndicated content in the web site is less common. Open source technologies have typically been ahead of their commercial peers in recognizing and participating in the syndication. More than ever, companies need to think about the mobile platform. With the popularity of the iPhone and other smart phones, more and more users are getting their information from alternate devices. Robust presentation tiers allow content to be presented in multiple presentation templates that are optimized for the devices that access the content.

Delivery and Support


In many companies, the ownership of externally facing web sites has jumped out the hands of IT and into the domain of the marketing organization. Most would argue that this is a positive trend - especially when considering the design sensibilities of the average CIO (hint: the "C" does not stand for creative). While this shift has the benefit of moving focus away from the technology to the content (and how it is presented), most marketing organizations are inexperienced owners of enterprise software. They need to be sold to and supported in a different manner. This is one of the reasons why the SaaS model has gained so much traction: marketers want to own the web site, not the technology. Granted, most open source comes from distinctly technology oriented origins; it takes a technologically competent organization to even consider open source unless it is made accessible to non-technical buyers. With large sales and marketing budgets and support callcenters, commercial software products have a head start serving non-technical customers. But open source has a couple advantages of its own that may level the playing field. The open source model is based on service-oriented companies using the open source technology as a foundation to deliver (and support) complete solutions. By specializing in an open source platform, a systems integrator can invest more of the project budget in branding work and customizing the administrative user interfaces. Historically open source systems integrators have been primarily technically oriented. But as the technology matures and gets easier to use, "agency-style" consultancies will latch on to these projects. This trend has already happened with the non-Java WCM platforms such as eZ Publish, Plone, Drupal, and Typo3. It has started to happen on the Java stack in Europe and is about to spill over into North America with a few pioneer consultancies. Until there is a critical mass of agency-style systems integrators, commercial open source companies - with their own capabilities in professional services, training, and support - have an advantage. These vendors know the importance of having a delivery network and are actively recruiting integration partners. For community based projects, look for the availability of developer talent that knows the underlying technology if not the platform itself.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 25

Product Evaluations

Informational Brochure Platform Market Overview


Of the seven products evaluated in this report, Apache Lenya, Daisy CMS, and OpenCms are all commonly used to power informational brochure sites. Yet, while they share some functional overlap, they are all very different systems. Daisy has its roots as a wiki and is frequently used as a platform to write documentation. OpenCms and Apache Lenya are older systems and reflect a purer vision of content management: XML, workflow, access control, and localization. Magnolia is the newest platform in this report and lives by the motto "Simple is Beautiful." The four projects are also different types of open source projects. Apache Lenya is a pure community-based project owned by the Apache Software Foundation. Contributors are not affiliated in any way other than that they are members of the Apache Software Foundation. OpenCms is officially a community based project because any person could theoretically be a committer (See Glossary for committer); it just so happens that all of the committers are employed by Alkacon Software and Alkacon owns the copyright to the code-base. Alkacon sells commercial style support packages. Magnolia is a pure commercial open source project with an unsupported, scaled down open source Community Edition and a more feature-rich, commercially licensed Enterprise Edition. Each of the platforms has its own strengths and weaknesses (see table below) and users tend to have their own opinions as to which ones are the most intuitive and easy to use. Your best bet is to select a couple of products whose strengths meet up with your needs and prototype with them by doing a basic install and trying to build a few pages or features on each.

Table 3.1. Informational Brochure Strengths and Weaknesses


Platform Apache Lenya Strong Features Versioning Localization XML support Multi-site hosting Daisy CMS Faceted navigation Wiki-like ease of use User input validation Access control Access control Versioning and visual differencing Magnolia Enterprise Ease of use De-coupled delivery tier Standards support (JCR and Portlets) OpenCms Link management User sand boxes Direct edit user interface Workflow Workplace UI is still very complex Content reuse Localization Required XSLT knowledge for site branding Weak Features Small community Product complexity Usability

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 26

Product Evaluations

Apache Lenya 2.0


Abstract
Apache Lenya was an early entrant in the open source Java WCM market and developed credibility and cach by its distinction as a top level Apache project. "Brand name" installs like NZZ Online and Wired News further established Lenya as "enterprise class." However, Lenya has run into obstacles that it has had difficulty overcoming. Its highly technical developer community has struggled to address usability issues and to flatten the technical learning curve. Infighting within the community has undermined leadership and vision for the project, although that seems to be improving. Lenya's primary strengths are in its localization and administration features and its use of XML. Apache Cocoon developers will appreciate its use of the latest version of the Cocoon framework. Still, Lenya is in an awkward position in the marketplace. It does not have the feature set required by high end implementations and it is not simple enough for low end implementations. Functionality-wise, Lenya lags behind its peers primarily because they have taken so long to get out a major release. While there are some interesting concepts that were innovative for the time, quirks that would have been worked out in a subsequent releases linger in the product. The Lenya community lacks user-focused leadership and a commitment to usability that distinguishes some of the more successful open source WCM projects.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 27

Product Evaluations

Product Overview
Table 3.2. Lenya Project Overview
Web site Project Inception: Current Version: Project Type: Licensing Options: Geography: Common Uses: Sample Customers: http://lenya.apache.org 2002 2.0 since January 2008. Community Apache 2.0 Global with concentration in Switzerland. Brochure site. Used to be used on news sites. The University of Zurich [http://www.unicms.unizh.ch/docu/livepubs.html] has several publications running on Lenya. They are working off of a special branch off of the 1.2 code line. Harvard Medical School Countway Library of Medicine [https:// www.countway.harvard.edu/lenya/countway/live/index.html] runs on Lenya version 1.2. Committer Andreas Hartmann's company BeCompany GmbH [http:// www.becompany.ch] is running on the 2.0 trunk. Frameworks and Components: WYSIWYG Editor: Integration Standards: Java Support: Databases: Cocoon, ehcache, Jena, Lucene, Websphinx Kupu, BXE XML 1.4, 1.5 No relational database used. Filesystem/Lucene based repository.

Application Servers: Jetty (default), Tomcat (optional)

Project History
Apache Lenya was originally developed in 1999 as the brain child of then-Ph.D student Michael Wechner to manage content for an academic journal. Later when working at Swissbased Neue Zrcher Zeitung (NZZ, one of the world's largest German language newspapers), Wechner proved the viability of the technology for NZZ's publishing and called the application XPS (Extensible Publishing System). In 2002, Wechner founded a systems integration firm called Wyona to implement Wyona CMS, as it came to be called. The vision was to build a "nearly out of the box CMS" on the Cocoon platform. Despite successful implementations, under Wyona's ownership, XPS failed to draw attention and participation from the outside world and develop a community. In 2003, Wyona donated the application as "Lenya" (after Wechner's sons Levi and Vanya) to the Apache Software Foundation, where it was incubated under the Cocoon project. In September 2004, Lenya was promoted to a top level project. Around the same time, Wechner and Wyona were collaborating with open source WCM developers from other projects to form a new organization called OSCOM (Open Source Content Management). OSCOM was envisioned as a forum for sharing ideas about content management and hosting shared projects like the popular WYSIWYG editor Kupu that is also used by Plone and Infrae Silva. OSCOM held some promising events in the U.S and in Europe Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 28

Product Evaluations

but began to deflate when founding members ran out of time to devote and the organization failed to recruit new active members. Today, OSCOM is little more than a web site. Although the project continues to develop and just published a major release, Lenya's momentum seems to have slowed. Despite being a top level Apache project, Lenya gets little attention. Other projects have been more successful than Lenya when it comes to addressing usability and technical complexity problems. More than one Java open source WCM project was founded out of disappointment with the Lenya platform. Apache Cocoon, on which Lenya is based, has fallen out of fashion as a general purpose web application development framework. The Java technology stack as a whole has been challenged by lighter weight, efficient technologies such as PHP and Ruby on Rails. The Java community has countered with their own frameworks that answered the call for simplicity and efficiency (such as Hibernate, Spring, and Wicket). Cocoon is now relegated to applications that are totally XML focused. In January 2008, the project put out the first major release of the platform (2.0) since late 2004. 2.0 was originally named 1.4 and had correspondingly modest improvements over the 1.2 release. However, delays in completing 1.4 caused it to grow in both complexity and size. By the time it was near completion, the new pending release was a major rewrite of the platform and the team voted to rename it 2.0. Part of the struggle may have been a rough transition of leadership from a Wyona dominated development team to a community effort. As mentioned earlier in this report, change in leadership is one of the biggest challenges in community projects and the Lenya project has had its share. Many of the Lenya developers who were very active in the project have moved on. Whechner himself is developing a new content management framework called Yanel and Yulup and Wyona is no longer doing new Lenya work. U.S. systems integrators that specialize in Cocoon have shifted over to other Cocoon-based platforms like Daisy and Hippo. Mailing list traffic is still active but as original leaders transitioned out and new members joined, there were long periods where the tone tended toward frustration and confrontation. Most of this turmoil seems to be behind the project. The team has just rewarded several dedicated members with committer status and release 2.0 is finally live. 2.0 introduced many architectural changes and a few user interface improvements but perhaps the biggest impact is just getting out of the state of limbo that comes with an almost complete new release. It will be interesting to see how fast the team is able to move without that anchor to drag around.

Architecture
The Lenya architecture is pretty much all Cocoon with a simple file system based repository. There were plans to develop a more service based repository on Jakarta Slide (see Glossary for Slide) and then, later, Apache JackRabbit but those integrations ran into complications and were never implemented. There was also some hesitation about committing the effort of following the emerging JCR standard (See Glossary for JCR). There are occasional discussions on the mailing list about whether to reconsider JCR support. While Version 2.0 introduced some API improvements to the file system based repository, the Lenya repository is not at the level of projects like Hippo, Daisy, and Alfresco as a stand-alone service. There is no remote API to read and write content in the repository. Since Lenya is just basically a user interface on top of Cocoon, to effectively implement and scale Lenya one needs to be very familiar with Cocoon. The core concept is the notion of "Pipelines" that describe a sequence of logic that gets executed for a request. Pipelines originate from Cocoon's core purpose: to choreograph the execution of a bunch of XSLTs to allow for the layering of display logic. Pipelines are defined in a file called the "Sitemap." Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 29

Product Evaluations

Struts developers reading this may be thinking of the struts-config.xml file that performs a similar function. Sitemap pre-dates struts-config.xml and is considerably more elaborate and powerful. The idea behind Sitemap is "to allow non-programmers to create web sites and web applications built from logic components and XML documents" (from the Cocoon Users Guide [http://cocoon.apache.org/2.1/userdocs/concepts/sitemap.html]). However, it is clear that project founder Stefano Mazzocchi over-estimated the technical abilities of non-technical users (and even many developers). Prior to 2.0, Lenya had a sitemap for each "use case" called a "use case sitemap." With 2.0, the process has been simplified by automating the wiring of use cases to business logic. In the new system, the developer just writes Java classes to support the business logic and a JX template (See Glossary for JX Template) to display the results.

Figure 3.1. Lenya Architecture Diagram: Use Case Framework

The new system for managing system behavior has been streamlined from the old system of use case sitemaps. Cocoon supports interactive behavior with a framework called "Flow" for "Control Flow." Like Sitemap and Pipelines, Flow is elegant and sophisticated but has a steep learning curve. Flow supports an advanced concept called "Continuations," where logic execution can pause and wait for more user input. Think of a command line shell script that asks the user a question and waits for the answer. This is distinct from the way most web applications work (where each request/response is stateless and atomic) and has the potential to support more complex interaction between the client and server. Newer Java web application frameworks such as WebWork (now Struts2 [http://struts.apache.org/2.x/]) and RIFE [http://rifers.org/] support Continuations, but Cocoon was doing it "before it was cool." In Cocoon, Flow Scripts are written in Javascript and interpreted through the Mozilla Rhino [http://www.mozilla.org/rhino] engine. The advantage of this is that scripts can be changed without compiling or restarting the application. The fact that most technologies get along without Continuations supports the opinion that they are not as important as they were originally envisioned. It will be interesting Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 30

Product Evaluations

to see if new AJAX oriented programming models will leverage Continuations, or make them obsolete. While Flow is powerful and elegant, Cocoon falls short of being an ideal platform for building highly transactive web applications because of its complexity. Cocoon has many moving parts that make it difficult to understand and debug and slow to perform. While XSLT processors are continually getting faster, the inherent overhead that comes from text parsing puts a limit on how fast logic can execute. The Cocoon project has largely addressed performance with clustering and caching strategies and Lenya takes advantage of those. In extreme cases, Lenya has been used to generate static HTML pages that are deployed to basic web servers (the "baking" model) and also clustered on multiple nodes reading from the same repository based file system over NFS. Lenya represents a single web site as a "publication" and can host multiple publications on an instance. Deployments like the University of Zurich have used this feature extensively to support the school's many departments. Publications store content, code, and user information (unless the LDAP integration is used). Each publication can have multiple languages. Publications are more or less atomic and cannot share content between them. From the basic install, Lenya comes with a starter site, called the "default publication," and the common practice is to use the default publication as a starting place to build a new site. A frequent newbie mistake is to not change the metadata thus causing your new web site to have the HTML title tag "Default Publication." By default, content is stored in a subdirectory of the publication directory called "content." With 2.0, that directory can be anywhere on the file system. Every page on the site gets its own subdirectory that contains an XML file for each translation of the document (distinguished by a naming convention on the file) and, potentially, sub-directories for sub-pages in the navigation structure. A best practice is to put the content directory somewhere other than within the publication directory. This is a good idea to make sure code and content are stored in separate places and deployed separately. This is also a good idea for back-ups (assuming that you use a source code control system to manage your template code).

Content Contribution
Although not generally known for its usability, Lenya employs some user interface concepts that were rather innovative for time. By default, Lenya uses a "browse to edit" model. The presentation tier operates in two main modes: an "Authoring" mode where editing controls are placed on the top of the page and the user is able to see pre-published content, and a "Live" mode that does not have the editing controls and where only published content is visible.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 31

Product Evaluations

Figure 3.2. Lenya Screenshot: Edit Menu

The Lenya management interface uses a browse to edit model where the edit controls are placed on the top of the page on the staging view of the site. Notice the yellow text indicating a broken link. The user browses through the Authoring view of the web site and, upon finding a document to edit, selects the edit menu to access the appropriate editing interface. Lenya ships with a number of options: probably too many for a project of its size to support. There are two WYSIWYG HTML editor interfaces: Kupu (also the default WYSIWYG editor of Plone) and BitFlux (also known as BXE). Kupu is by far the more stable of the two but BXE has better integration with the Lenya platform and has more features. From the BXE editor, a user can insert images and links using pop-up dialogs that browse the repository. The BXE editor has reasonable set of formatting buttons but is missing a spell check feature.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 32

Product Evaluations

Figure 3.3. Lenya Screenshot: BXE Editor

The Bitflux editor is the most feature rich editor that ships with Lenya, but it is also the least stable. The Kupu integration puts Lenya specific functions (like metadata and links) on the right side of the page outside of the main button bar. This avoids the need for pop-ups but it is a little clunky to use. Like the BXE editor, the Kupu editor is missing a spell checker. In Lenya's implementation, you need to know the URL path of the target page or image. The second missing feature is a spell checker. However, this should not be difficult to add by following instructions provided by other CMS that use Kupu (notably Plone).

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 33

Product Evaluations

Figure 3.4. Lenya Screenshot: Kupu Editor

The Kupu WYSIWYG editor in Lenya has integration on the right side. There are also two non-WYSIWYG editors. The awkward "HTML Forms" editor allows a user to edit a document as a list of HTML elements (like paragraph, table, headline). The One Form editor is just a simple HTML Text Area where a user can manually code HTML or XML. When the document is saved, Lenya checks that the user's submission is a well formed XHTML document. Alternative editors can also be installed. Some customers have configured Lenya to work with the popular XML based editor Xopus and there is some remnant Xopus integration code in the core codebase. Xopus would probably be the best editor for structured content types although there is a tutorial for setting up BXE to edit structured XML content types. Lenya has an extensible content model that is defined in "resource types." The primary resource type that comes with Lenya is a simple XHTML page that is stored as an XML file in the file system. While Lenya uses "object" rather than the standard "img" tag for image references, the file is a fairly recognizable HTML file - just without the layout markup that is defined within XSL stylesheets that are processed by the presentation tier. There is also an ODT type that allows a user to upload and download the asset in Open Document format to be edited in Open Office. The asset is then rendered as HTML at request time. New resource types are implemented as modules and the process for setting them up is somewhat complicated and lengthy when compared with other products described in this report. In addition to defining the content type in XML Schema format (XSD), a developer needs to declare it as a resource type by adding an XML file that extends the core Cocoon configuration. Then XML files need to be edited for menu items and instructions for the WYSIWYG editor. At the end of the process, the content is edited as an XML document which Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 34

Product Evaluations

is somewhat of a stretch for a WYSIWYG editor. There are instructions on how to configure the BXE editor to control structured content types but nothing for Kupu.

Figure 3.5. Lenya Screenshot: Editing Structured Content in BXE

The BXE WYSIWYG editor can be configured to edit structured content types. Lenya also stores Dublin Core metadata inside the file using XML tags within the "dc" namespace. However, the editing interface for metadata is on another tab called "Site." Despite a convenient link to edit metadata from the Edit menu that jumps you over to the "Site" tab, this creates a disjointed user experience. Separating the editing interfaces for these two aspects of a document would make more sense if documents could be placed within multiple locations of the site and have different metadata values depending on their location. But this is not the case.

Figure 3.6. Lenya Screenshot: Site Tab

The Site tab provides an interface to organize the site, edit metadata, and edit permissions.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 35

Product Evaluations

In addition to editing metadata for content, the Site tab performs a number of other functions to organize and manage the web site. For those who are familiar with MediaSurface's commercial product Morello, the distinction between the Site and Authoring tabs is akin to the "Content Contributor" and "Site Planner" interfaces. Within the Site tab a user can cut/copy and paste a page to another location in the navigational tree. Pages in the same folder can be ordered using "move up" and "move down" menu options. Pages can be removed from the live site by selecting the "hide" or delete menu options. Permissions are also controlled here. Theoretically, a manager would work in the Site tab to manage the site while authors and editors spend most of their time in the Authoring view to edit content. Content is managed in a hierarchical tree structure that parallels the path structure of the site. Documents can only be placed in one location of the node hierarchy and there is no mechanism for creating links or pointers in other locations. Localization support is better than average and follows a model of translated copies. Each asset has one or more language versions. The Site tab shows the user which translations exist and which translations are missing. This is a good system for sites that try to keep the different localized versions of the site in sync. Sites that allow the local versions to be more independent typically use different publications but, as mentioned earlier, there is no sharing of code or content between publications.

Figure 3.7. Lenya Screenshot: Lenya Localization

The Site tab shows the user wbich translations of an asset exist and are missing. Here you can also see the unique identifier implemented in version 2.0. Lenya has full versioning support. From the Site tab, a user can see the history of versions, view a particular version, and roll back to a previous version. By default, Lenya stores nine previous versions (plus the current version making 10). Some of the other systems reviewed in this report have a visual differencing feature; Lenya does not. Deleting assets can also only be done through the Site tab. Prior to 2.0, binary, file-based content, such as images and PDFs, were called "Assets" and were managed solely in the Site tab. Now, assets are treated like documents and can be managed from the Author tab. Wrapping binary assets in documents also provides improved metadata support. Images and other binaries can be added from the File menu as a "Media Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 36

Product Evaluations

Document" or from within the image browsing dialog launched from the BXE or Kupu editor. Images can also be automatically resized when they are added to a page.

Figure 3.8. Lenya Screenshot: Image Dialog

Now with Lenya 2.0, users can add images as they edit pages from the WYSIWYG editor. The Lenya workflow system uses a state-transition model and can be configured to most needs. The basic install comes with a simple one step approval workflow. New workflows are defined in XML documents called "workflow schemas" that describe the behavior of a simple state machine. Transitions are triggered by events and can have conditions. Workflows can also have variables that hold values through their execution. Workflows are associated to document types so it is difficult to configure Lenya to apply a workflow to all content that lives in a particular branch of the site tree or meets other criteria. It may also be hard to do advanced workflow concepts like parallel processes. Lenya's workflow system has a nice audit trail feature that records and displays all events that happened on an piece of content. Lenya 2.0 introduced personal inboxes so that workflow tasks may be assigned to groups or individuals rather than all users who have reviewer permissions.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 37

Product Evaluations

Figure 3.9. Lenya Screenshot: Workflow Syntax

The workflow syntax is simple and functional.

System Administration and Configuration


Lenya supports authentication through LDAP, but user profiles and group assignments are managed locally within the publication. Users and groups are assigned roles at various branches on the site tree and these roles are inherited down the branch unless overridden at a lower level. By default, Lenya ships with the following roles: "visit," "edit," "review," and "admin." A review or admin role is required to approve a version that has been submitted for publishing to the live site. Permissions can be controlled on the externally facing site by using the "AC Live" sub-tab. Here you can restrict who can view a document by user, group, or IP Range.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 38

Product Evaluations

Figure 3.10. Lenya Screenshot: Edit Permissions

Roles can be inherited down or directly assigned. Lenya has a basic link management system that displays broken intra-site links in yellow in Authoring view. While it would be more helpful to put in warnings if deleting an asset would lead to broken links, the visual cue is useful when previewing content and doing pre-publish spot checking.

Presentation
Presentation templates are written in XSL and stored on the file system within a publication. To some degree, XSL coding can be avoided with a technique called XHTML templating where the overall page layout is defined in a well formed XHTML document that is merged with the content by the presentation engine. Also, because Lenya document content is well formed XML, CSS is a powerful and convenient method for styling a site. While it may be more a reflection of the developer community than the capabilities of the platform, but the sites listed on the Lenya live sites gallery would not be considered "high design" sites. The layouts are basic and simple and there are few examples that innovate or stand out from a design perspective. This could be because skinning a Lenya site requires knowledge of XSLT, which most web designers lack. Web designers tend to do better with JSP and other templating languages that are more similar to straight HTML. More elaborate, dynamic functionality is achieved by writing business logic in Flowscript and display templates in JX Templates. JX Templates is an XML based templating language that has replaced XSP as the official templating language for the Cocoon community. JX Templates supports JSP-like tag libraries to call Java code. As mentioned earlier, Lenya's architecture make it a less than ideal platform for building highly dynamic, transactional web applications. Developers would be far more productive programming interactive functionality in a platform other than Cocoon whose strength is in managing layouts and multi-channel publishing. Lenya would be a poor choice for building community, Web 2.0-style applications. However, the basic install does have an example of a blogging publication. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 39

Product Evaluations

Delivery and Support


During its peak (when it became a top level Apache project), Lenya was actively developed and implemented by Wyona and a couple of other small to medium sized consultancies. Since that time, Wyona has drifted away from Lenya to focus on other technologies. Wechner is still a committer and participates on the mailing list but his presence and control has diminished substantially. Wyona no longer does new Lenya implementations, they just support their existing client installations. A U.S based Cocoon specialist has made Daisy their go-to platform for building simple web sites. Few other U.S. based Lenya specialists exist; most are in Switzerland. NZZ Online, once a premier reference, has migrated to commercial Java WCM platform. The committer team consists mainly of independent consultants. Lenya's lack of anchor consultancies and corporate sponsorship may have contributed to the difficulty of getting out a major release. Another issue that Lenya has been struggling with has been changes in Cocoon. Lenya has been working on the head of the Cocoon 2.1 code base rather than a stable branch. This regularly introduces incompatibilities that must be addressed. Documentation of Lenya is below average. Fortunately, Cocoon documentation has been improving with re-launch of the cocoon.apache.org web site. Still, there are lots of gaps. The mailing list is fairly active. There used to be an IRC channel but now it is only used intermittently. There was a plan for everyone to move over to the Cocoon IRC channel but activity on that channel appears to have dried up, as well.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 40

Product Evaluations

Conclusion
Table 3.3. Lenya 2.0 Summary
Category Contributor Navigation Content Entry Score Explanation Intuitive browse to edit model but division between the Author tab and the Site tab is awkward. The more feature rich and better integrated BXE editor is not very stable. A spell checker would be helpful. Input validation is weak. Lenya's link management functionality visually highlights broken links but there are no warnings if deleting a page will create a broken link or broken links report. Full versioning support with new version created with every save. By default Lenya saves up to 10 versions of each document. Users can roll back to earlier versions but there is no visual differencing functionality. Content is stored in hierarchical tree structure. Building dynamic query based pages is more complicated than other systems in this category. Localized versions of an asset are managed in parallel. The site tab shows which translations exist and which translations are missing. While connecting to external data sources is possible within Cocoon, it is not as easy as other platforms. The Lenya repository has no remote API to access content. A basic approval workflow comes out-of-the-box. More complex workflows are supported by Lenya's workflow engine. Theming a Lenya site requires knowledge of XSL. Cocoon is a complex technology platform for building interactive applications. Performance may be an issue. XML orientation tends to output clean XHTML. User friendly, human readable URLs are supported. None Sparse. Particularly with the recent release of 2.0. Active mailing list. Below Average; Average; Above Average; Exceptional.

Link Management

Versioning

Content Organization and Reuse Localization

Content Integration

Workflow Layout and Branding Interactivity SEO Books Online Documentation User Forums Key: Nonexistent;

While not a widely used or easy to learn platform, Lenya has some decent support for some of the classical content management features such as versioning, localization, workflow, and link management. The areas that Lenya lags behind the market are in structured content management and reuse. Both are possible on the platform but implementing these features requires complex configuration compared to other products. Now that the long awaited 2.0 release is official, the Lenya team may be able to focus on some of the finer usability issues that have turned prospective adopters away from the platform. Organizational and leadership Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 41

Product Evaluations

issues that have historically plagued the project have also largely been addressed and the mailing list is friendlier and more upbeat.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 42

Product Evaluations

Daisy 2.1

Abstract
Daisy's simple, wiki-inspired editorial interface - combined with powerful through-theweb administration and configuration functionality - makes it a good choice for rapidly building and maintaining simple informational web sites, intranets, and knowledge bases. Workflow, access control, and structured content types take Daisy into applications that are beyond the capabilities of a traditional wiki, and the de-coupled repository creates new opportunities for integration. Daisy provides a powerful faceted navigation system to make content easier to find and organize. Daisy's lack of support for clustering and separate management and production environments may be concerning to architects building high availability, high security web sites. Abnormally large web sites will strain Daisy's search subsystem. However, the de-coupled repository can be used for building enterprise grade publishing systems with a separate delivery tier. Architects considering using the stand-alone Daisy Repository to deliver persistence services for a custom application should also consider Alfresco, JackRabbit, and pure XML databases. Using the Daisy Repository in this way will constrain your design more than a generic repository would, but the higher level API and additional features (workflow, user management, and plugin framework) may save design and implementation time.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 43

Product Evaluations

Project Overview
Table 3.4. Daisy Project Overview
Web site: Project Inception: Current Version: Project Type: Licensing Options: Geography: http://cocoondev.org/daisy 2003 2.1 since September 2007. 2.2 due out in February 2008. Commercial: support based. Apache 2.0 Outerthought is headquartered in Belgium. The install base is global with a concentration in Northern Europe. At least one North American based systems integrator has started to build a Daisy practice. Brochure, intranet, knowledge base, documentation site Redback [http://www.redback.com] uses Daisy for a call center knowledge base. QAD [http://www.qad.com] uses Daisy to deliver a live customized product documentation service. Vlerick Leuven Gent Management School [http:// www.vlerick.be/en/] uses Daisy as a back end publishing system. Frameworks and Components: WYSIWYG Editor: Integration Standards: Java Support: Application Servers: Databases: ActiveMQ, Apache Cocoon, Apache Lucene, JBPM, Java Advanced Imaging (JAI), MX4J, Spring htmlArea REST, JMS, XML 1.4, 1.5 Jetty (default), Tomcat (also commonly used) MySQL

Common Uses: Sample Customers:

History
Daisy CMS was first released as an open source project (Apache 2.0 license) in October of 2004 after 18 months of development and customer implementations by Belgian-based Outerthought bvba [http://outerthought.org]. Outerthought still manages the development of the platform and has built a business around support and services. Given Outerthought's small size (only five full time employees), the maturity and install base of Daisy is greater than you would expect. This is largely due to Outerthought's relationship with Schaubroeck [www.schaubroeck.be], a large Belgian e-government services company and some key U.S. based customers. The last pre-open source version of Daisy (V0.9) was built primarily with investment and cooperation from Schaubroeck. Today the copyright on the Daisy code is shared between Outerthought and Schaubroeck. Daisy has been used primarily for intranets and internal knowledge bases but the platform is increasingly being considered for and used in externally facing informational web sites. In many ways, Daisy has become the user friendly, easy to implement, Cocoon based WCM platform that Apache Lenya [http://lenya.apache.org] has always wanted to be. Some attribute the different trajectories to the fact that Lenya is managed as an Apache project with an Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 44

Product Evaluations

Apache governance model that may be more appropriate for infrastructure components and frameworks than for user facing business applications. A scan of the Apache project portfolio would certainly support this theory. Only a few out of the many Apache projects are targeted as business applications: Lenya, Jetspeed 1 and 2, and (recently) the Roller weblog platform. Daisy is internally developed by a dedicated, user oriented team. The Daisy core team may have its leadership spats and personality conflicts, but they don't happen out in the open like they do in the Apache Lenya community. Daisy also has been able to keep on a regular release schedule which Lenya has been unable to do. There are many Daisy-powered web sites online today in Belgium. Most of the sites have been built by Outerthought and Schoubroeck (Schoubroeck gallery [http://www.schaubroeck.be/ internet/default.htm]. Daisy gallery [httpp://cocoondev.org/wiki/286-cd.html]). Daisy is less widely used in North America but there are a few small non-profits using Daisy for their public facing web sites. Examples include: The Samueli Institute [http://www.siib.org], Provider's Council [http://www.providers.org], and The Minnesota Newspaper Foundation [http:// www.minnesotanewspaperfoundation.org/mnf/index.html]. There are a few North American software companies using Daisy for call center knowledge bases and to produce and maintain their product documentation. One of the leading U.S. based Cocoon specialists has switched from Lenya to Daisy as their go-to platform for basic web sites.

Architecture
Daisy consists of two main components: the stand-alone Daisy Repository server that has a HTTP/XML interface, and a wiki-style front end (based on Apache Cocoon) called the "Daisy Wiki". Starting with release 2.1, the repository server runs in a custom Java container called the Daisy Runtime that is based on the Spring Framework [http://www.springframework.org/]. By default, Daisy Wiki runs on Jetty [http://jetty.mortbay.com/] although customers frequently run it on Apache Tomcat.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 45

Product Evaluations

Figure 3.11. Daisy Architecture Diagram: Daisy Architecture

Daisy consists of a de-coupled business application (The Daisy Wiki) and a stand alone repository server that is accessible through an HTTP based API and extended with plugins. The technology neutral API creates the opportunity to integrate with other technologies. Diagram courtesy of Daisy's documentation site. Daisy Wiki talks to the Daidy Repository server over HTTP to a publisher component that handles content persistence, search and retrieval operations. The publisher is implemented as an extension within the Daisy Repository plugin architecture. At the simplest level, the publisher returns an XML document containing all the information necessary for the wiki (or any other client) to render a page. The publisher can also perform other operations like building collections of assets or preparing a difference view between two versions of an asset. A publisher request takes the form of an XML document that contains information about the request sent to the HTTP interface. Some requests, such as a simple query, do not require sending an XML document. All the arguments can be passed through query string parameters. Every request needs to send authentication credentials via basic authentication, but there is no support for HTTPS so care should be taken in network setup. The API is powerful and there is documentation with simple examples but expect to spend some time mastering it. A good way to experiment is to use a tool like GNU WGet [http://www.gnu.org/software/wget/] to post documents to HTTP server: this architecture would allow other applications, such as a custom CMS written in any technology (not just Java) to use the Daisy Repository. However, it appears that most implementations pair Daisy Repository with the Daisy Wiki front end. This is a primary reason for Daisy's categorization in this report as an informational brochure oriented system. Architects looking for a pure content repository may consider Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 46

Product Evaluations

a standard like the Java Content Repository (See Glossary for JCR). Unlike the JCR specification, which is intentionally abstract (consisting of a hierarchical set of "nodes"), the Daisy Repository has a higher level, more specific API based on documents, users, and collections. Daisy's Repository, like most wikis, is non-hierarchical. On the one hand, the flat structure rules out options like inheriting access control structures or other metadata down a branch of the tree. On the other hand, the non-hierarchical model enables faceted navigation concepts where assets can appear under more than one category (See Enter Content Here blog post There Is No Folder [http://contenthere.blogspot.com/2006/05/there-isno-folder.html] for a deeper discussion of this trade-off). The repository stores metadata in a relational database (MySQL or PostgreSQL) and the actual content assets as files in the file system. There is also a Java API for local integrations with the Daisy Repository. The Daisy Repository supports plugin framework for extending the Repository with custom functionality. Commonly developed plugins include generic Extensions (which is how many of the noncore functionality, such as email notification, is implemented), authentication schemes, link extractors (that keep track of relationships between pages for a "what links where" view of the site), and HTTP handlers (to extend the API). To build a plugin, you implement the plugin interface, package it in a JAR, and deploy it to the Java runtime container of the Daisy Repository server.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 47

Product Evaluations

Figure 3.12. Daisy Repository Server Architecture

The Daisy Repository server is based on a plugin framework. Diagram courtesy of the Daisy documentation site. Daisy comes with a JMS server (ActiveMQ) built in to handle communications between the various components. For example, every time there is a change made to the repository, a message is posted to a message queue to tell client applications (such as Daisy Wiki) to invalidate their local caches. Architects generally like this messaging design because it is asynchronous and scalable, but ensures delivery of the message. Whenever a published asset is updated, the Lucene based full text indexer is notified. Content that is not in a live state is not indexed. This could be a problem for managing pre-published assets. The search system does some post processing of the Lucene results such as sorting and filtering by permissions. The result limit is also applied outside of Lucene. If your Daisy Repository has 1 million documents and you search for everything but limit the results to 10, Daisy still parses through 1 million asset references coming out of Lucene. While only the metadata ("fields") are loaded into memory, the query system is the part of Daisy that tends to struggle when content repositories get overly large. Up to a certain point, this can be handled with configuration. However, when you get into the 1,000,000s of documents, it gets difficult to allocate enough memory for the system to operate properly. Essentially, you have to give the

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 48

Product Evaluations

Daisy Repository server enough memory to hold all of the metadata of the documents in the repository. For simple bulk operations on documents within the Daisy Repository (like automatically updating a group of documents like a SQL update statement), Daisy comes with a Document Task Manager. Simple, built-in actions like categorizing assets can be called from the Daisy Wiki. More elaborate operations can be scripted in Javascript. The Document Task Manager performs its updates on one document at a time and can be interrupted and resumed. A success/failure log is viewable from within the Daisy Wiki. Daisy's ability to support high traffic web sites has not been tested. Cocoon is a resource hungry framework because of all the XML transformation it does. For the most part, Cocoon has addressed performance through caching techniques and by minimizing the number of XSL transforms in the rendering pipeline. There is no documentation for configuring clustered environments with multiple Daisy Wiki servers talking to the same Daisy Repository. However, the architecture looks like it would support this model given the use of JMS to notify clients of content changes and also the stateless communication between the Repository server and its clients. Since Daisy is predominantly used in smaller intranets and corporate web sites, these configurations have not been actively tested. While the Daisy Wiki is based on Cocoon, you can still get a lot done without much Cocoon experience. However, you should be comfortable using Cocoon if you want to do anything substantial. This is great for people who love working with Cocoon (and the framework certainly has its appeal for those willing to learn it) and want to rapidly develop web sites without writing a lot of custom code. For most Java developers who are more comfortable with less complex frameworks, getting under the hood can be intimidating. The Daisy team is aware of this hurdle and is open to moving the UI to another technology stack. That is one of the advantages of having such a clean separation between the repository layer and the user interface layer. You could write a front end application on any technology stack.

Content Contribution
Daisy has an extensible content model and you can define new content types through the administrative user interface. The model is based on "Parts" and "Fields." Parts are the actual content (such as an XML or other text file, or a binary file such as PDF, image, or MS Word document). Fields are metadata attributes and are based on common Java data types: String, date, datetime, long, double, BigDecimal, boolean. The base content class is called a "Daisy Document" that can be sub-classed into custom content types. The most popular content type that comes out-of-the-box is an "XHTML" document that is essentially a generic web page. A document type is defined with part types, field types, and links. Both part types and field types are re-usable across different document types. For example, you can set up a title field type and use it in an Article or Page document type. A Daisy Document can have more than one part but cannot have multiple instances of the same part. For example, you couldn't have a collection of image parts in a document. Instead, you would define an image1 part, an image2 part, and so on. Unfortunately, while Daisy does ship with a WYSIWYG rich text editor (htmlArea), there is no XML editor that would allow you to edit a structured XML document part. To add this capability would involve creating a Cocoon form and registering it as an extension. While the process is documented [http:// cocoondev.org/235-cd.html], this is more Cocoon programming than most would want to tackle if they were just trying to build a simple web site. A more practical approach maybe to edit the XML document in a client side XML editor and then upload the file. Fields can be multi-value and hierarchical. The input field can either be a free text field or a select list where the possible values are either statically defined or populated by a query. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 49

Product Evaluations

Unless you give the user a controlled set of values to pick from, you will have very little control over what they enter. There is no input validation built into Daisy Wiki. You can say that a field is required and you can set a sizing hint (which controls the size of the input box), but you can't enforce a rule like a character limit or only numeric characters.

Figure 3.13. Daisy Screenshot: Defining Field Types

Defining fields through the administrative user interface. In addition to fields and parts, Daisy Documents can also have links. Links are more structured than simple anchor links edited through the WYSIWYG editor and the Daisy Repository manages these dependencies. All Daisy Documents have one or more variants that can be the same or different. Variants are managed similar to branches in a source tree. A common usage for variants is for localization, but variants can also be used for content reuse. For example, on the Daisy web site, the product documentation for different releases of the application are managed as variants. Variants go along two axes: branches and languages. Branches are similar to what you would see in a source code control system although there is no merging feature. By default, one branch (main), and one language (default) per document. Searching for non-existing variants is a useful way to find content that has not been translated. The URLs have the branch in the path. A little bit of Apache mod_rewrite could turn a www.cocoondev.org/fr/about to www.cocoondev.fr. When viewing a document, you can view other variants of the asset using the Variants menu. Doing so adds some query string arguments so you can see the alternative version of the file without losing your current site context (branch/language). Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 50

Product Evaluations

While the groundwork for localization is there with variants, the Daisy team is hard at work to add features that help editors keep multiple translations in sync. For example, there will be a new feature to add non-translatable content elements that are shared between language variants. There will also be a mechanism to map versions between variants in the repository so that, for example, the system knows that version 3 in French maps to version 5 in English. Documents are managed within "Collections." The most common use of a collection is to define a group of assets that are used in a particular site. Collections can also be used to organize content for other uses such as Daisy's book feature. Documents can be part of more than one collection. Similar to collections are "Baskets." Each user has a basket that can be populated either individually by selecting documents like a shopping cart, or by query. Once a basket has been filled with the desired assets, the user can execute group operations such as aggregating the assets and displaying them in PDF or an HTML sub-site. This is especially useful in Daisy's common use for documentation sites and knowledge bases. The Daisy Wiki serves as both the management interface and the presentation tier. The label "wiki" is a holdover from earlier days and perhaps does the application a disservice because it does not conform to most people's vision of a wiki. Daisy uses a XHTML and a WYSIWYG editor for rich text areas, not wiki syntax. Daisy's extensible content model enables structured content types to be created through the administrative section of the user interface. Forms are automatically generated based on the document type definition.

Figure 3.14. Daisy Screenshot: Content Actions Menu

The actions menu shows which actions are available on a piece of content.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 51

Product Evaluations

Still, many companies use Daisy as a traditional wiki due to its many wiki-like features. Like a wiki, Daisy has an in-context editing model. You browse around the site and then use the actions menu (if you are logged in) to do things like edit the document and review versions. There are other wiki-esque features such as you can create a new document by linking to it from another document. Unlike a wiki, you don't have to worry about CamelCase. The WYSIWYG editor is based htmlArea which is stable but is no longer being developed (current development is being done on a derived project called Xinha [http://xinha.webfactional.com/]). The WYSIYWYG editor will appear for all parts of type "Daisy HTML." Being an XML-focused WCM system, Daisy ensures that the user authored HTML is clean and valid XHTML. There is an allowed subset of tags (html, body, br, pre, h1-h5, a, strong, em, sup, sub, tt, del, ul, ol, li, blockquote, img, and table and its sub tags). Text styling and image positioning is done with CSS classes. The out-of-the-box configuration includes lots of buttons including tables links, images, bullets, etc. The editor can be configured by adding and removing buttons. The semihtml-literate contributor will at first be encouraged to see that you can turn the WYSIWYG editor off to hack in their own HTML but then be disappointed to learn that Daisy will strip out their dubious HTML code on the server side when the document is saved. In all, the formatting capabilities strike an appropriate balance of user empowerment, content-layout separation, security, and WWW standards compliance. The rich text editor is nicely integrated with a link browser that allows users to search for their link target and preview what the page looks like. There is also the option to link to a specific version of the target document or a fragment within the target page. Links built in this way are stored in the metadata of the document and visible in the referrer view of the document. While the editor doesn't have a spell checker, there are a number of other features that one normally does not see. For example, there is a query button that allows a user to define a SQL-like query and have the results embedded in the page. There is also the ability to include variables and other assets by reference. While these functions are powerful, they would be of most use to more technical users as the GUI is fairly low level.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 52

Product Evaluations

Figure 3.15. Daisy Screenshot: Link Builder

The WYSIWYG editor is integrated with a nice link builder dialog that allows users to search for and preview target Daisy pages. The community thinks of Daisy as both a wiki and a traditional web site building platform. They sometimes bristle at not being categorized with other WCM platforms. Other times they do things like submit the platform to WikiMatrix [http://www.wikimatrix.org/]. Out of the box, the Daisy Wiki operates in different modes. Live and Staging modes determine whether to show pre-published content. A user who has multiple roles (such guest, author, and administrator) can select what role he wants to use when viewing the site. For example, selecting the administrative role exposes an administrative menu that surfaces functionality like defining content types and permissions and making an unpublished asset live. Daisy does not store content in a hierarchical repository as is the case with most CMS. Like Drupal [http://drupal.org], Daisy stores all the content in one large collection and then uses metadata to create faceted navigation. There is also a navigational element that can be used to create a basic hierarchical tree of content: the "navigation" content asset (called a Navigation doc) contains static references to assets and queries.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 53

Product Evaluations

Figure 3.16. Daisy Screenshot: Editing a Navigation Document

Editing a Navigation document. When deleting a document, the user has a choice of "archiving," deleting the variant, or deleting all variants. On the bottom of the page there is a listing of all pages that link to the document. However, there is no pop-up warning when a link is about to be severed and the broken link does return the Daisy equivalent of a 404 error. The fact that the delete operation is reserved for the "Administrator" role reduces the risk of breaking links. Images and other binaries are managed like other documents in Daisy, the image file is the "part" and "fields" represent the metadata. Out of the box, Daisy supports some basic image manipulation features, such as resizing and allows you to add captions and specify positioning within a WYSIWYG text area.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 54

Product Evaluations

Figure 3.17. Daisy Screenshot: Editing Image Properties

The WYSIWYG editor has an image properties dialog that controls the size and position of the image. True to its wiki roots, Daisy has always had a good versioning system complete with in-line differencing. New with version 2.1 is a nice graphical diff'ing feature that shows a color coded view of changes between two versions of a document.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 55

Product Evaluations

Figure 3.18. Daisy Screenshot: Daisy Diff

New with 2.1 is a WYSIWYG version differencing feature. Daisy 2.0 introduced a new workflow system based on JBPM built into the repository. Three workflows ship with the product: Generic Task (a delegated to-do), Review (simple one-step approval), and Timed Publication. Adding new workflows is done by uploading XML workflow process definition files that can be authored using the Eclipse Process Designer plugin. [http:/ /easyeclipse.org/site/plugins/jboss-jbpm.html]. The process for initiating a workflow is a little disjointed from the document editing process. A user creates or edits a document and then creates a new workflow and adds the document to it. However, a basic approval mechanism of "Put this live" is available when a user with publishing access views an asset that is not yet published (or has an updated version that is not yet published). There is a search interface to find assets that are workflow and in a particular state or assigned to a user.

System Administration and Configuration


Daisy Repository is a JMX enabled application and ships with the JMX console MX4J. Using this interface, a developer can browse through the running application, diagnose problems, and make settings. The JMX console can also be used to start processes and operations such as indexing the site. Daisy's user management system supports basic users and groups. By default, users can self register and are sent an email to confirm their email address. The more traditional CMS in this category do not support this feature out-of-the-box, although Magnolia has a primitive self registration add-on module available for download. An administrator can also create users Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 56

Product Evaluations

and groups. Daisy's user profiles are lean: username, full name, password, and email address. Members can be assigned multiple roles. The default role is the role that they assume when they log into the site. Daisy's authentication system allows the use of multiple "Authentication Schemes," which can be set up to authenticate against an external system such as LDAP. However, each user can only be associated with a single authentication scheme. Daisy has a rules-based access control system that is managed similar to a firewall. The ACL interface allows an administrator to create a rule with a text-based condition like "documentType = 'SimpleDocument'" and to set read, write, delete, and publish permissions to individuals or groups for documents meeting that criteria. Conditions can be based on membership in a collection, a document type, or the value of a content attribute (or "field" in Daisy terminology). Rules are defined in the staging site and then published to the live site.

Figure 3.19. Daisy Screenshot: Defining ACLs

Authorization rules are managed by defining content filters and then assigning read, write, delete, and publish permissions. There is some work being done to implement a new "fine grained" access control model, though it is not planned for an immediate release. One of the new use cases enabled by fine grained access control is the ability to see that documents exist without having read access to them. The fine-grained access control will be taken to the field level. "Parts" can be set to allow or deny full-text indexing and different access levels on the summaries or full text.

Presentation
The Daisy Wiki can support multiple sites on the same infrastructure and repository. A site is implemented as a view on the Daisy Repository. Each site has a default collection that is used for the on-board full text search box and its own navigation. Although Daisy has good support for CSS, there are few best practices or tools for styling a Daisy site. As of now, skins are written in a combination of XSLT and CSS. Documentation is Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 57

Product Evaluations

a little thin in this area but the systems integrators that specialize in the platform have become very proficient at applying any brand to a Daisy site. But there is a risk in relying on a systems integrator for every re-branding request. There was an interesting discussion on the mailing list about building a "skinning framework" for Daisy. One of the ideas bandied about was a gallery of downloadable themes that can be used as a starting place for building a custom theme. Joomla and Drupal have been very successful with these types of programs. It remains to be see what becomes of this initiative. Although it is not enabled in the out-of-the-box install, a very powerful faceted navigation system can be configured. While not as powerful as Endeca, Daisy's faceted navigation delivers a similar experience. A user can search for a term and then see a categorized list of results complete with asset counts.

Figure 3.20. Daisy Screenshot: Faceted Browsing

Daisy's faceted navigation provides a powerful interface for browsing the nonhierarchical content repository. The basic Daisy install comes with simple commenting functionality for all document types. The default configuration allows any authenticated user with read access to a document to comment on it. The user can set the visibility of his comment to everyone, editors only, or private. If a user has write access to a document, he can delete comments. Daisy's on-board search engine supports full text search on both text based content types and popular binary formats such as Microsoft Word. However, the search results are not filtered by access control. The user is denied access when he tries to click through. One increasingly popular use of Daisy is for producing documentation. A built-in book application can publish a collection of content as static HTML, PDF, and other formats. Some members of the DITA community are starting to show interest in Daisy for this reason. Being based on Cocoon and primarily designed for editing and displaying information-rich content, Daisy is not well suited for building interactive applications. A Cocoon expert could Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 58

Product Evaluations

probably build any sort of functionality (after all, Daisy itself is built on Cocoon) but most Java programmers would prefer a simpler framework. Furthermore, unlike Lenya, Daisy does not have an optimized use case framework.

Delivery and Support


Outerthought does all of the development on the Daisy platform. Although there are mechanisms for outside developers to contribute patches through the issue ticketing system, few do. While the repository supports a plugin framework, there is no "Daisy Forge" for community contributed extensions. Perhaps if the user community grows that will happen. A larger community would also help with documentation, which is average but less extensive than some of the larger products. Fortunately, Cocoon documentation, on which there are several good books written, fills in the gap in explaining some of the generic concepts of the Daisy architecture. Outerthought sells commercial style support packages. But being a small company, they would be hard pressed to deliver a 24/7, two-hour turnaround service level agreement. The standard support packages are email based and have two business day turn around time. Central European business hours may frustrate North American customers who are accustomed to immediate response - especially customers on the West Coast. That said, the low cost of the support packages are in line with the level of service. Most mailing list traffic is non-customers because customers tend to interact directly with Outerthought. Although Outerthought is focused on being in the software business, they do a considerable amount of systems integration work and have yet to establish a formal partner program to encourage SIs to work on the platform. The existence of such a program may have a positive effect in the visibility of the product. However, since the code base is jointly owned by a systems integrator (Schaubroeck), that may not happen any time soon.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 59

Product Evaluations

Conclusion
Table 3.5. Daisy 2.1 Summary
Category Contributor Navigation Score Explanation The wiki oriented user interface is easily understandable for most users. Some users will take a little while to grasp the concept of "variants," but that aspect of the system can be suitably de-emphasized for casual users that do not need that functionality. The model of "parts" and "fields" is somewhat specialized; but still works well for simple XHTML content types. The WYSIWYG editor is well integrated but is missing a spell check feature. User input validation on both fields and parts is noticeably missing. Similar to many wikis, Daisy has a nice link management feature that shows which pages link to an asset. This referrer view is shown on the delete asset page but is not warned. Daisy stores a version with every save. The new WYSIWYG differencing feature nicely displays differences between versions. Daisy is one of the few CMS that can link to a version of a document. Daisy improves on the standard wiki model with features like "includes" and "queries" built into the WYSIWYG editor. The faceted navigation is a powerful way to organize the repository. "Variants" are well suited or parallel localization strategies. Better relationship management between language variants is coming with the next release. The Cocoon based wiki delivery tier is not easy to extend to read from other sources. Content from within the Daisy repository, however, would be easy to integrate into other platforms using the REST based API. A basic approval workflow comes out-of-the-box. Complex workflows are supported by the jBPM workflow engine. Theming a Lenya site requires knowledge of XSL. Cocoon is a complex technology platform for building interactive applications. Performance may be an issue. The faceted navigation makes all content accessible to search engines. URLs are not user friendly, but the flat path structure may be favored by some search engines. None The online documentation is actually pretty good. This may be because the Daisy platform that powers the site is often used for documentation. The mailing list is fairly active, but most customers go directly to Daisy for support. Below Average; Average; Above Average; Exceptional.

Content Entry

Link Management

Versioning

Content Organization and Reuse Localization

Content Integration

Workflow Layout and Branding Interactivity SEO

Books Online Documentation User Forums Key: Nonexistent;

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 60

Product Evaluations

Daisy's wiki approach to content management is unconventional but brings with it powerful benefits. Common limitations of the wiki model such as access control and organization have been solved by Daisy's user interface. While Daisy is a powerful platform for building informational web sites and knowledge resources, it is not an ideal platform for building highly interactive applications primarily due to its Cocoon architecture with a steeper learning curve than competing Java frameworks. Daisy's reliance on a small regional software vendor probably does introduce some risk which may be mitigated by working with a systems integrator that has a long track record on the product. For North American customers, there are a couple systems integrators to choose from.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 61

Product Evaluations

Magnolia 3.5 Enterprise


Abstract
Magnolia is one of the first of a new generation of open source Java WCM projects designed for simplicity and usability. Magnolia's vision is governed by its tag line and motto "Simple is Beautiful." The page editing interface gives the author the experience of "editing the web site" by laying out structured content elements as they would appear on the rendered page. Content is easy to find within the management interface, which is organized according to the navigational hierarchy of the published web site. Many potential customers compare OpenCMS and Magnolia. When ease of use is a primary selection criteria, Magnolia is generally favored. While tight binding between the authoring and delivery views makes finding and editing content more intuitive to users, it comes at the cost of making content less re-usable. Content elements exist only as components of a page and are not easily shared between pages. It is less intuitive to build query driven sites with placeless content in Magnolia. Magnolia's architecture of running different instances of the application for authoring and public access provides a sandbox to preview the full site before it is published. This design also supports clustered environments with multiple autonomous instances of the content delivery servers. Magnolia's caching functionality makes it suitable for high traffic sites such as the Magnolia powered amgentourofcalifornia.com that has served up to 8 million hits per hour. France24 uses Magnolia with the help of Akamai distribution. Like other products in this report, Magnolia offers a free Community Edition and a commercially licensed Enterprise Edition. Unlike Alfresco and Jahia, there is actually a community using and developing the Community Edition. While Magnolia will not offer support for the Community Edition, it is the same code base as the core of Enterprise Edition. Much to the frustration of Magnolia, several very large sites (including France24) run on the Community Edition. In addition to the option to buy support, the Enterprise Edition has a more robust feature set aimed at enterprise customers and is certified to work with different application servers and an alternative repository.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 62

Product Evaluations

Project Overview
Table 3.6. Magnolia Project Overview
Web site: Project Inception: Current Version: Project Type: Licensing Options: http://www.magnolia.info 2003 3.5 since December 2007. Commercial: tiered product model. Community Edition: GPL Enterprise Edition: Magnolia Network Agreement [ http:/ /www.magnolia.info/mna.html]. The Magnolia Network Agreement is essentially a commercial license that provides access to the source code. Geography: Magnolia has a primarily European install base although U.S. adoption is growing and Magnolia has opened a New York office. Brochure sites, intranets, news sites France24 [http://www.france24.com/] runs on Magnolia Community. Ministry for Public Adminstration (Spain) [http://www.map.es] runs on Magnolia Enterprise. Frameworks and Components: WYSIWYG Editor: Integration Standards: Java Support: Application Servers: Databases: Apache Lucene, Apache JackRabbit (substitutable with Day Software's CRX), ehcache, FreeMarker, OpenWFE, Velocity FCKeditor JSR 170, JAAS, WSRP*, LDAP*, RSS 1.4, 1.5 Tomcat (default), Glassfish, JBoss, Web Logic**, Websphere** MySQL, Oracle, Microsoft SQL Server, Derby, DB2 * Enterprise Edition only ** Certification requires Enterprise Edition

Common Uses: Sample Customers:

History
Obinary, now Magnolia International, was founded as a services firm in 2000 - the peak of the Internet bubble. Founders Boris Kraft and Pascal Mangold were experts in WebObjects and built a sizeable practice implementing the Icelandic WebObjects-based CMS Soloweb. Given the amount of integration work required to implement a solution, Kraft and Mangold looked for alternative foundations that would bring down the cost of the offering while maintaining services revenues. After a disappointing survey of the existing open source Java WCM options available, Obinary started to build their own platform. At that time, JackRabbit - Apache's reference implementation of the Java Content Repository - was still in its unnamed infancy and lived its life as a subproject of Slide. However, Obinary was convinced that this project could eventually bring significant business benefits and choose it to provide the core functionality for the new Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 63

Product Evaluations

product. Although incumbent Java WCM projects criticized Magnolia for being just a thin shim on top of JackRabbit, Magnolia was able to provide the simplicity and usability which had eluded the more engineering focused products. Magnolia was released under the open source LGPL in November 2003, and was able to shield its users from many of the changes that JSR170 would go through before it was finally released in June 2005. While the product gained attention, particularly in Europe, Obinary looked for different business models. A donation based revenue model failed. Obinary decided on a tiered product approach with an unsupported, open source Community Edition and a supported, commercially licensed Enterprise Edition which includes some value added features, as well. The Enterprise Edition requires an annual subscription of 10k USD per server. Obinary had aspirations in the DMS market and built document management capabilities into the Enterprise product. But the emergence of the well funded and highly visible Alfresco discouraged their hopes of being the open source Java ECM product. While the document management capabilities remain, they are not the focus of the offering. In 2006, Obinary changed its name to Magnolia International. Currently at 12 employees, Magnolia is experiencing growth and is expanding its team. It has signed more than 30 customers since releasing the first Enterprise Edition in November 2006.

Architecture
Like other products in this category, Magnolia is an end-to-end web content management system. To use Magnolia requires adopting it for both your management tier and you web delivery platform. There are many benefits to this architecture including ease of use and development efficiency, but some architects will feel boxed in having to rely so much on the Magnolia architecture - especially if they have invested heavily on an alternative web application framework such as Struts, MyFaces, Tapestry, or Spring. It is conceivable to use an alternative presentation tier (and customers have experimented with many) because of the standards based repository. However, doing so would probably make the product more difficult to use since the editing environment is so closely tied to the delivery tier. Magnolia's original claim to fame was that it was the first CMS built from the ground up to work on the Java Content Repository (See Glossary for JCR). The JCR is a relatively new Java standard (JSR 170) that defines a repository for managing content. The JCR is well suited for semi structured content that is hierarchical in nature. Unlike relational databases, JCR's native support content management-specific functions like versioning, workspaces, and content deployment. The JCR specification has not yet enjoyed widespread adoption. The biggest proponent is Day Software, whose CTO David Nscheler is the specification lead. Day also has a number of its own developers working on the reference implementation: Apache JackRabbit. Day also sells a commercial JCR implementation called CRX and JCR adaptors for other repositories such as Documentum, FileNet, Lotus Notes, TeamSite, Sharepoint, OpenText Livelink, and Vignette. Outside of Day, however, use of the JCR has been limited. Few CMS companies, such as Alfresco, support the JCR standard with their own repositories but now Oracle 11 supports the JCR natively. Other products, like Percussion, support the spec at level one (Read Only access) and support the query language for finding assets. There is more JCR interest within internal corporate software engineering departments that are building custom systems and want to reduce their risk by sticking to standards. If the JCR is to become a truly successful standard, it will require corporate architects putting pressure on more software vendors to support the specification. The JCR specification is in the process of being upgraded with a new Java Community Process (JCP) Java Specification Request (JSR) called JSR 283. JackRabbit will serve as a Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 64

Product Evaluations

reference implementation for JSR 283, as well. Magnolia did a good job of keeping up with the evolving JSR 170 spec and improvements proposed in 283 are not expected to pose too much disruption. That is, certainly not like the the original 170 specification. Architects that are new to using the JCR sometimes have the tendency to want to use it for everything. On the developer mailing list, you occasionally see architects getting "talked down" from using the JCR for inherently tabular data like e-commerce catalogs and order history. By default, Magnolia's content repository is Apache JackRabbit. JackRabbit has a pluggable persistence manager with the default being a file system, which is also the slowest. Other options include relational databases such as Oracle and MySQL. The binary install of Magnolia comes bundled and configured to use JackRabbit with the Java relational database Derby for persistence. Derby is better than the other commonly embedded database Hypersonic but high end implementations will probably want to use MySQL. JackRabbit has a WebDAV interface; however, WebDAV support in Magnolia is slated for a future release. The Enterprise Edition comes with the option to use Day Software's commercial CRX JCR product. The Magnolia administrative interface contains a JCR repository browser that lets you view and edit the content tree as one would with a database administration tool. There is a free JCR browser plugin for Eclipse, as well. For relational data, Magnolia implementations can use the embedded Derby database or connect to any other JDBC compliant relational database. Magnolia does not use the relational database for much but Derby can be replaced with MySQL if a more powerful relational database management system is needed. This configuration is done at the application server level by specifying a data provider. Magnolia is certified to work on Tomcat as well as full JEE Java application Servers JBoss, WebLogic, and Websphere. Magnolia was one of the first open source WCM systems with a multi-tier design. In a basic configuration, two instances of the web application are deployed: one is designated as the authoring server and another as the presentation server. The distinction is made with an attribute setting made in the management interface. Content is authored and edited on the authoring server and then deployed to one or more presentation server instances. Each instance of the application is called a "stage." Communication between stages is done over http/https.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 65

Product Evaluations

Figure 3.21. Magnolia Screenshot: Configure Subscribers

In the authoring server, an administrator configures subscribers that the authoring server will publish content into. Developers wishing to extend or customize Magnolia can do so by writing add-on modules. The best practice is to write all customizations in modules so that a new instance can easily be brought online by installing the base system, deploying the modules, and then deploying content and template code. There is a Subversion repository for sharing modules, and a couple have been added, but there is no module forge like other projects have. The newly released version 3.5 introduced a more pluggable architecture that may facilitate the sharing of modules. Right now module sharing is limited to the mailing list. Future releases will likely introduce the Spring Aspect Oriented framework. This will enable code to be more easily shared across custom components.

Content Contribution
Magnolia provides two main interfaces that content contributors can use to navigate the web site. The primary user interface is "AdminCentral" that is used for all site administration, configuration, and development tasks. The "web site" section shows a clean view of the hierarchical site tree. Color coding is used to express workflow in three states: red for unpublished, green for published, and yellow means that the asset has changed since it was last published. Context sensitive right-click menus show what actions are available to the user based on his security permissions.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 66

Product Evaluations

Figure 3.22. Magnolia Screenshot: Browsing in AdminCentral

The main "AdminCentral" user interface is the primary way to navigate to edit content. The second method of navigating the web site is through preview mode. Content contributors can browse around the authoring instance in preview mode, which shows editable regions as well as buttons for editing. This interface is also used to edit page metadata and move paragraphs around on the page.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 67

Product Evaluations

Figure 3.23. Magnolia Screenshot: Page Layout

Content contributors can browse around the authoring instance in preview mode and edit paragraphs by clicking the edit buttons. This interface is also used to edit page metadata and move paragraphs around on the page. Source: Magnolia documentation site. Magnolia is closely tied to the hierarchical model of the JCR. The primary content type is a generic "page" that is composed of "paragraphs" and metadata. Although pages are not "typed" into content types or classes, a page is defined by a template that determines the structure of the page by creating regions that can hold paragraphs. For example, a template might have a three column layout with the center column containing the main page content and a right column containing sidebar content components. In this example, the user would be able to add paragraphs to the center and right column. The template would also control the display of the paragraphs in the page. Paragraphs are structured content types in their own right and can have their own metadata. Paragraphs are defined by and edited with pop-up "dialogs" that can have multiple tabs. In addition to paragraphs that accept content, there are also paragraphs that hold dynamic page components. The default Magnolia Enterprise installation comes with paragraph types for: Text and image, File download, Link, Anchor, Documents: List, Search by topics, RSS link, RSS icon, Movie Player, Code example, Navigation, Breadcrumb, Mail form, iFrame, 2 Columns, Javascript, Full text search input field, Full text search result, MP3 Player, and Text scroller. New paragraph types are defined by creating custom dialogs through the administrative user interface.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 68

Product Evaluations

Figure 3.24. Magnolia Screenshot: Edit Dialog

Each paragraph type has its own edit dialog. Clicking the edit button launches an edit dialog that contains form fields for the structured elements of the paragraph. Paragraphs can also have their own metadata. Image source: Magnolia documentation site. This model is different than most web content management system designs that create semantically meaningful content types at the page level. In Magnolia, rather than creating a new "event" asset, one would create a page and put in a custom "event info" paragraph that contained elements for information like date, time, location, and description. To create an event calendar page that listed events chronologically, a dynamic template would need to look for all "event info" paragraphs and display them. The design of Magnolia is not conducive to sharing content across pages but there are work arounds. A common strategy is to create a non-navigable collection of pages that are just containers for global assets. Then paragraph templates can be programmed to pull in paragraphs from these pages. While the strategy works fine, it is a break from the overall browse-to-edit metaphor. Localization, new in version 3.5, is done at the paragraph level. In the Magnolia localization strategy, a language neutral page has a collection of paragraphs. Each paragraph has its own language specific elements that, in the edit dialog, are organized into tabs. For example, if a paragraph is translatable into English and German, there may be English Title and German Title fields that are on different tabs. When the page is rendered, a locale identifier in the URL path tells Magnolia what language specific paragraph fields to show. It is easy to see how this Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 69

Product Evaluations

model would break down in sites that are published in many different languages but it seems adequate for sites that are translated into up to three or four languages. Perhaps companies that work in many languages group the localizations into multiple independent Magnolia instances. However, this design would not effectively enable content sharing between the sites. Another limitation is that the URLs would only be in the primary language that the page was in. For a first attempt, the localization system is not too bad. However, the plan is for a major improvement in release 4.0.

Figure 3.25. Magnolia Screenshot: Localized Edit Dialog

In Magnolia's new localization scheme, content is localized at the paragraph level. Each paragraph is given a set of fields for each language supported. Rich text editing services are provided by FCKeditor. By default, users are not able to embed images directly into the text area using an image button. Instead, they need to go to an image tab of the paragraph dialog to upload an image that will be placed by the display template. The image tab of the paragraph dialog may present the user with options such as alignment, text wrapping, and fields for attributes such as caption. This is generally a good strategy because it allows the CMS to be aware of images and give it more control to manage their display. The FCKEditor can be configured to allow other ways of dealing with images, if so desired. Magnolia has no special link management functionality. Internal links are constructed through a path but internally the relationship is maintained through the unique identifier of the asset (UUID). This means that if the target asset is moved or renamed, the link will still work. If the target page is deleted and then replaced with another page with the same name, the linking mechanism will fall back and use the URI to link to the new page. However, Magnolia does not manage links and is not able to warn a user when deleting an asset will create a broken link. There is also no view of what pages link to a page like in Daisy, Lenya, and OpenCms. Using a third party link checking software is advised. Not all authors like to think of their pages collections of paragraph components. It makes it hard to re-factor text heavy pages when you have to open different pop up dialogs to move text from one paragraph to another. Authors who write text heavy pages will have a tendency to put in hard returns to create multiple paragraph-looking text blocks within a single paragraph component. This is probably not a bad thing. Still, this component page model is a nice compromise between structure and user control over layout. Structuring content in this way Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 70

Product Evaluations

allows a template developer to re-organize the layout of the pages by editing the template rather than editing content. However, because page components are not easily re-used across pages, the benefit of the structure is somewhat reduced. Since release 3.0, Magnolia has used the open source Java workflow engine OpenWFE to manage workflow. Magnolia's CTO Boris Kraft is a committer on the OpenWFE project. Like most workflow engines, OpenWFE uses an XML syntax to create "process definitions," although OpenWFE does not use a standards based process definition language, such as BPEL. In the configuration area of the administration interface, there is a place to paste in a process definition. The workflow definition references "commands" which are mapped to classes in the administration interface. The default definition that comes with the basic installation, "activation," has commented-out sections for functions like email notification.

System Management and Configuration


Magnolia's access control list system is based on the Unix model. Users and groups are granted privileges of read, write, or deny. Access is granted hierarchically on the site tree. Privileges are bundled into "roles." Typically, access is denied at the root of the content tree and then granted at lower branches of the tree. All security management is done through the administrative user interface. The recommended configuration is to assign a base role to all users have that provide a minimal level of read access on the repository. Then users get additional roles that either restrict or allow more specific access. The LDAP integration supported in the Enterprise Edition is only for authentication purposes, although Magnolia does provide an abstract class that can be extended to use information in the LDAP directory for authorization logic. Several Community Edition users have built their own implementation of the LDAP connection module. Importing content would be done through the JCR API for which Magnolia provides a method in its GUI. JackRabbit, for example, has a method called Workspace.importXML() that accepts an XML document as a String. The Enterprise Edition comes with some useful features for enterprise deployments. A packager module can export and import entire instances of Magnolia as packages. This is useful for back-up and recovery, replication, and also for managing developer sandboxes. JackRabbit also has a "systemview" method that exports the whole repository as an XML document. On the Wiki, there are some scripts and Ant tasks that can synchronize content and templates between Magnolia instances. These scripts also handle the JSP template code that is stored in the file system. There are a number of tools available for editing the JCR repository directly. Day offers a free Eclipse plugin that can browse and edit content in both JackRabbit and its CRX repository. However, these tools fall short of the familiarity and power of SQL database management tools. The JCR is a new standard; expect better tool support as the standard matures. Magnolia's multi-site hosting capabilities are limited. The most common approach is to have root elements for each of the sites and then use virtual hosts and URL rewriting to map different domains to specific branches of the content tree. The hierarchical permissioning system can be used to restrict authoring access to specific "subsites." Features like cache clearing are global across the application, so one site clearing its cache will have a performance impact across other sites. That said, many Magnolia users run multiple sites on the platform. They just create a separate instance for each site. With the improved deployment mechanism introduced in version 3.5, system back-up has improved. Regular deployments can back up to other instances of the repository. Companies needing a more robust back-up system may consider upgrading the repository for the default JackRabbit JCR implementation to Day's CRX that provides hot back-up functionality. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 71

Product Evaluations

Otherwise, the best approach is to take the system down and do a dump of the database being used as the persistence layer.

Presentation
Magnolia has a "frying" style presentation system where pages are generated at request time rather than pre-compiled into HTML to be served by a simple web server. Caching is done at the page level using a custom cache implementation or optionally the popular open source caching framework ehcache. Cache is invalidated when new content is published and (now with 3.5) when template code is changed. The main issue here is that cache clearing is global, so high traffic sites should publish at regular intervals to limit the number of cache clearing events. The centralized caching mechanism is one of the key issues that limit Magnolia's suitability for multi-site hosting. Magnolia does power some fairly high traffic sites, such as France24 [http:// www.france24.com/]. High throughput is achieved through clustering presentation servers and there is support of session federation. In this configuration, the presentation tier is read-only. Interestingly, France24 is using the Community Edition so they must have developed their own version of the clustering mechanism that comes with the Enterprise Edition. One common strategy is to publish different sections of the site to different delivery servers. For example, a highly interactive section may be put on another delivery server so that the computational complexity of those pages does not degrade the performance of the entire site. A future version of Magnolia will introduce a "baking" style model where static HTML files are deployed to a web server farm. As previously mentioned, pages are rendered at request time. Magnolia has its own modelview-controller implementation. URLs reflect the organizational hierarchy of the site. Magnolia does support virtual URLs or URL aliases. On the mailing list, some developers have reported success in integrating Magnolia with a Struts based delivery tier although this would probably make more sense when using the unsupported Community Edition rather than wasting money on the Enterprise Edition by making it unsupportable. Content presentation templates have traditionally been written in JSP with the help of the Java Standard Tag Library (JSTL) and custom Magnolia tags that are provided under the namespace "cms". The third party add-on MagTags distributed by Noodle Open Source under the LGPL provides convenient helper tags. Velocity and Freemarker JARS ship with the product but are not, by default, available for use as an alternative templating language. However, it is possible to build alternative "renderers" that leverage these technologies. There is discussion within the Magnolia community for adopting Java Server Faces, or potentially an AJAX based framework (such as Google's GWT-Ext) as a delivery framework. But for now, the officially sanctioned front end of Magnolia is JSP with Magnolia's tag library. The JSP code files are stored in the file system under the "webapp" directory and pointed to by nodes in the repository. The Enterprise version comes with a module called "Sitedesigner" that allows templates to be developed and modified through a design environment within the admin interface. This is a convenient feature for companies that do not have web designers at the ready to respond to template change requests. Sitedesigner consists of a generalized parameterized template that allows a business user to edit properties to control the layout. There are no WYSIWYG, drag-and-drop graphical features that a DreamWeaver user may be accustomed to. Instead, Magnolia uses the dialog model with bunch of property fields. The edit buttons appear right next to the buttons used to edit content. Sitedesigner template updates are stored at the page level and can be inherited down the tree structure or overridden by child pages. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 72

Product Evaluations

Figure 3.26. Magnolia Screenshot: Site Designer

Sitedesigner consists of a parameterized template that allows a user to control the look and behavior of pages by setting properties. Although Magnolia has a dynamic delivery tier, the separation of the authoring environment and the delivery environment makes it less suitable for visitor contribution functionality. However, there is a strategy to store visitor submitted content in a different workspace within the JCR and replicate that workspace back to the authoring environment. Magnolia does maintain some add-on modules to deliver community oriented functionality (such as forums and polls) but they are extremely simplistic and not well documented. They should, at best, be considered as a starting point on which to build custom functionality. Modules can be downloaded from the Magnolia Subversion repository which organizes them into Community and Enterprise modules. The Enterprise section of the source code repository is password protected. Module support is relatively new for Magnolia, so users can expect more development in this area. Also in the future, Magnolia plans to introduce more visitor facing interactive features. They are currently working on a marketing module that will have functionality like A/B testing and SEO optimization tools like a Google sitemap. Another option for presentation is through a third party JSR 168 (See Glossary JSR 168) compliant portal product. A Magnolia instance can be wrapped in a JSR 168 portlet and subscribe to updates from the authoring instance. Sold separately, there is a Web Services for Remote Portlets (See Glossary WSRP) module that publishes content out to this standard. The WSRP module is new and has not yet been aggressively sold as a product. Caching configuration is done at the page level based on URL rules. By default, all pages are cached in the delivery tier. Caching is then turned off on a page by page basis for pages that have dynamic or personalized behavior. The Magnolia architecture now tolerates co-existence with other security and cache filters. However, setup has been reported to be tricky. Large, high traffic Magnolia implementations tend to deploy across multiple delivery servers which, although expensive from a hardware prospective, is simple enough to do using Magnolia's publish/subscribe model.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 73

Product Evaluations

Figure 3.27. Magnolia Screenshot: Configure Cache

Cache is configured at the page level through AdminCentral by editing URI nodes.

Delivery and Support


As with many commercial open source projects, you must buy the Enterprise Edition in order to get support for the product. However, unlike other commercial open source projects in this report, there is actually a community behind the community version. The Community Edition and the core of the Enterprise Edition share the same code base. In other words, the Magnolia core carries a dual license: GPL and the commercial Magnolia Network Agreement (MNA). The Enterprise Edition uses the MNA licensed core and adds some MNA licensed code. There are five committers on the Magnolia core who are not employed by Magnolia. The Enterprise Edition costs $12,000 per year per server. Non production servers are half that cost. If you are running WebSphere, your license cost is nearly double that ($22,000 per year for a production server). It would seem the rationale is that if a customer is willing to overpay for their application servers, they can pay more for CMS licensing. This pricing system is new and it will be interesting to see if it works out. Because of the way the Magnolia project is organized, using the Community Edition is actually viable. Some of the largest Magnolia sites (including France24) are running the Community Edition. Some of Magnolia's 30 paying customers that license the Enterprise Edition are really running on the Community Edition. They bought Enterprise for support but do not use the Enterprise Edition features. The Community Edition usually comes out a couple of weeks before the Enterprise Edition. This time is used to certify the platform on the various supported application servers and to ensure compatibility with the extended feature set. When bugs are fixed, they are fixed in the common code base.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 74

Product Evaluations

Magnolia is a small company committed to being a dedicated software company rather than a software and services hybrid. The challenge for them is to build the Enterprise Edition license base to increase revenue on licensing and support fees. But it is not easy for the company to convert Community Edition implementations into Enterprise customers because the Community Edition gives most users what they want. Magnolia is actively growing its partner program. Most of the official integration parters are in Europe but there is a growing number of North American systems integrators that do Magnolia implementations. However, there is only one official U.S. based partner. The monetary hurdle to be listed as an official partner is minimal ($1,250) but there is a requirement to buy a high tier support package. Consultancies that make this commitment get a 25% commission on sales. The mailing list for Magnolia is active for a community of its size and it is your best resource for information other than paid Magnolia support. The documentation is very thin, especially for the 3.x releases. The Community wiki (http://www.magnolia.info/wiki/ recently ported over to Confluence) is frequently better than the official documentation (http:// documentation.magnolia.info). There are some Javadoc comments in the code base and the code is relatively well named and easy to follow. From a social perspective. Most of the action happens in Magnolia's home territory in Switzerland where there are social gatherings. Magnolia has opened an office in New York where project co-founder Boris Kraft spends some of his time building Magnolia's U.S. presence.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 75

Product Evaluations

Conclusion
Table 3.7. Magnolia 3.5 Enterprise Summary
Category Contributor Navigation Content Entry Score Explanation The tree-based AdminCentral interface is the content contributor's starting place, but from there he can launch into the preview view for an in-context editing experience. The content entry model is very page oriented, which has the advantage of being intuitive for most uses, but is weak for semantically meaningful content types such as an "event." WebDAV would help with uploading binary files. Spell check would be helpful too. Magnolia is the only product in this category without a strong link management system that discourages users from actions that break links. Full versioning support with new version created with every save. The page oriented editing model does not naturally lend itself to high levels of content reuse. While many customers use Magnolia to run multiple language sites, parallel localization is new to Magnolia and this first attempt is rough. The standards based JCR repository is well designed for integrating Magnolia content in other applications. Java developers will be familiar with writing JSP/Java code to interact with any relational data source. A basic approval workflow comes out-of-the-box. More complex workflows are supported by pre-integrated OpenWFE workflow engine. The JSP based delivery templates are easy for a Java developer to work with and the paragraph model can be easily translated into page components for building flexible pages. The Sitedesigner tool enables non-programmers to control the look of the site. The main question is where to put user submitted content since the publishing model is based on a uni-directional flow between the back-end authoring server and the front end delivery servers. XML orientation tends to output clean XHTML. User friendly, human readable URLs are supported. None The new documentation site (http:// documentation.magnolia.info) is extremely thin and does not cover any of the new features introduced in 3.5. The developer and user mailing are fairly active as the community version of Magnolia is widely used. Below Average; Average; Above Average; Exceptional.

Link Management

Versioning Content Organization and Reuse Localization

Content Integration

Workflow

Layout and Branding

Interactivity

SEO Books Online Documentation User Forums Key: Nonexistent;

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 76

Product Evaluations

Magnolia hit the market at precisely the right time: when buyers were looking for a simple, easy to use Java based WCM product and when the JCR was beginning to emerge as a stable platform to build on. As a result, Magnolia International has been able to rapidly build a compelling product without the need of venture funding. While Magnolia CMS has enjoyed widespread adoption, the company has been trying to figure out a business model that will turn the success of the product into corporate growth. From a business perspective, Magnolia International is somewhat between Alkacon and Alfresco. Like Alkacon, the free version is, in itself, a useful product that large companies can deploy. Like Alfresco, Magnolia asks their customers to buy the Enterprise Edition to be eligible for support packages. However, Magnolia is not as forceful in pushing users to the Enterprise version... probably because they don't have the same venture capital revenue pressures that Alfresco does. Although Magnolia (the company) is small, it seems stable and solid and gets high remarks from their customers. As an open source project, Magnolia appears to be a safe technology to adopt because of the install base and the development energy behind the underlying technology: the Apache JackRabbit JCR.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 77

Product Evaluations

OpenCms 7.0.3
For quite some time, OpenCms has enjoyed the distinction of being considered the most mature and best organized of the community based open source Java web content management systems. With the release of the 6.x series and the recent 7.x release, OpenCms has been able to stay relevant despite an influx of new competition, and remains a viable option for companies looking for comprehensive basic web site management. OpenCms has also avoided Web 2.0 functionality such as user generated content and social media. For a company looking for a solid platform with extensive traditional WCM functionality, OpenCms is an attractive option. The budget conscious will appreciate that, unlike new market entrants, OpenCms is a fully open source product and requires neither commercial licensing fees nor an obligation to buy support contracts. The size of the OpenCms install base makes working with the community a realistic option for getting support. For companies seeking additional support, Germany based Alkacon Software sells commercial-style support packages as well as a bundle of commercial extensions targeted at enterprise installs. Beyond Alkacon, over 100 systems integrators are registered as official OpenCms solution providers. From a usability standpoint, OpenCms is decent, but not exceptional. Customers that are looking for newer, flashier, Web 2.0-style user interfaces tend to select other products such as Magnolia. OpenCms straddles the line between a community and commercial open source product. Whereas Alkacon once subtly positioned itself as a premier provider of OpenCms services and software, it now calls the product Alkacon OpenCms and has taken more overt ownership of it. Since they own the copyright on the source code and the OpenCms name, they are well within their rights. And given the influx of commercial open source products in this category, the move has served OpenCms well.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 78

Product Evaluations

Project Overview
Table 3.8. OpenCms Project Overview
Web site: Project Inception: Current Version: Project Type: Licensing Options: Geography: Common Uses: Sample Customers: http://www.OpenCms.org 1999 7.0.x since July 2007 Community. Commercially supported by Alkacon. LGPL with commercial extensions available. Global, with a concentration in Europe. Brochure sites, informational intranets, news sites. The public facing web site of The North Face [http:// www.thenorthface.com]. The public facing web site of Virgin Money Australia [http:// virginmoney.com.au]. Frameworks and Components Integration Standards: Java Support: Application Servers: Databases: Apache Lucene, Digester, EHCache, JTidy, PDFBox WebDAV, EJB, XML 1.4, 1.5, 1.6 Tomcat (default), JBoss, Websphere, WebLogic MySQL, Oracle, Microsoft SQL Server, Sybase

History
OpenCms was originally developed in 1999 by the interactive agency BKM Online Medien GmbH as a proprietary product called MhtCms. In 2000, the product, then at version 4, was released as OpenCms under the LGPL. The OpenCms core team chose the LGPL because it allows developers to use OpenCms within other applications without the viral effect of the GPL, but prevents the commercial sale of enhancements made to the OpenCms core without contribution back to the community. Third party vendors, such as QBizm, have legally created proprietary extensions to OpenCms that they sell under a commercial license. In 2002, several of the core developers from the original MhtCms team formed the company Alkacon Software, which has done most of the development on the product and hosts the OpenCms.org web site. Still, OpenCms is an open project and there are external contributors and committers. In particular, the database connectors for PostgreSQL and Microsoft SQL Server are managed by external developers. Alkacon practices a software based business model with most of its revenue coming from consulting, training, and support contracts on the OpenCms platform. Nearly 100 customers currently buy support contracts from Alkacon. The relationship between Alkacon and the community is interesting. Unlike many community open source projects that form neutral non-profit holding organizations to own the code and govern direction, Alkacon owns the code and runs the project. Increasingly, Alkacon and the OpenCms web site (which Alkacon runs) refers to the product as Alkacon OpenCms. This appears to have a positive effect of reducing the ambiguity around the project, both externally and internally, and clearly positioning Alkacon as the OpenCms supplier. The general feeling within the community is that of loyalty and gratitude, and there has been backlash when other companies have tried to wrestle control of the community from Alkacon. Because they own the code, Alkacon could create a dual licensing scheme or change the license entirely. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 79

Product Evaluations

Architecture
OpenCms is probably the most mature of the open source Java WCM products. On the downside, however, the architecture tends to show its age. While newer Java WCM products are taking full advantage of the latest frameworks and components such as Spring, Hibernate, and the various Ajax libraries, OpenCms is essentially built from the ground up with the exception of a number of third party XML libraries and the Lucene search engine. This is understandable given that most of these frameworks did not exist or, at least, were not mature when OpenCms was originally developed. That said, OpenCms is current with the latest versions of Java and releases of the major application servers and servlet containers. Some customers have reported success integrating OpenCms into applications using these modern frameworks, but this is far from the mainstream. The technical leadership of the project has prioritized adding features and fixing bugs over refactoring the architecture to leverage these frameworks. In hindsight, this may have been the right choice given the shifting popularity of the various frameworks and the fact that some of these features are being incorporated into the core Java platform. Unless you are already familiar with these third party frameworks, their absence makes the application easier to understand by eliminating layers of abstraction and indirection. OpenCms is divided into three major components: the OpenCms core, where all the business logic is executed; the delivery tier, which executes JSP templates to render the site and also runs the "Workplace" (a web based client for contributing content and administering the system); and a database adaptor layer that manages persistence in a SQL compliant database. Alkacon's support packages are available for OpenCms installations running on Tomcat or JBoss application servers, although customers have also reported success on WebSphere and WebLogic. Content, template code, and configuration files are stored in a "virtual vile system" (VFS) that is backed by a relational database. The introduction of WebDAV in version 7 has finally made the VFS useful as a file system. Prior to WebDAV support, there was an awkward synchronization strategy where a physical file system directory served as a proxy for the VFS that synchronized with the directory on a periodic basis. This was most problematic for developers who wanted to develop their templates on a file system so they could use their favorite IDE, then push them to the VFS where they could be executed and tested. There is also a third party Eclipse plugin that facilitates editing code in Eclipse. Behind the VFS is a relational database. The database schema is fully UTF8 compliant and makes use of blob fields to store both binary and XML based assets. Metadata properties are stored in a property table. Alkacon will support OpenCms on a number of databases including MySQL, Microsoft SQLServer, Postgres, and Oracle. Alkacon's commercial OpenCms Enterprise Extensions (OCEE) module for repairing a corrupted VFS (VFS Doctor) is compatible with MySQL 4.1 or 5.0; Oracle 8.1.7, 9.x, 10.x or 11.x.; PostgreSQL 8.x (only with OpenCms 7.0.2 or newer) and MS SQL Server 2000 and 2005. The "Workplace," which is used to administer the site as well as edit content, is an ambitious Javascript application that works well on the recent versions of Internet Explorer and the Mozilla Gecko engine (Firefox and Mozilla browsers); using Opera or Safari browsers is not advised. Even though one could legitimately argue that the Workplace was AJAX from the beginning (before the term "AJAX"), some AJAX oriented enhancements introduced in version 7 has made the client more responsive and stable. The property dialogs that pop up to accept user input are extensible to a degree, but you do not want to go tinkering with the user interface code. OpenCms also ships with a command line interface that is suitable for scripting tasks against the repository and executing business logic. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 80

Product Evaluations

The OpenCms content model is broken down into two high level classes: XML Page for mainly unstructured content, and XML Content for structured content classes. Like several other products evaluated in this report, an XML Content type is defined by an XML Schema (described in an XSD file) and stored in the repository as XML documents. The schema defines the content structure (as in the fields and their data types) and also the form elements used to edit them. OpenCms comes with a pre-defined set of data types (OpenCmsBoolean, OpenCmsColor, OpenCmsDateTime, OpenCmsHtml, OpenCmsLocale, OpenCmsString, and OpenCmsVfsFile) that come with their own basic validation logic. Additional custom validation logic can be defined within the XSD using regex syntax. There are 14 form widgets to choose from. These include default widgets for the basic data types and some extended widgets for compound elements like an image gallery.

Figure 3.28. OpenCms Screenshot: Editing Structured Content

Structured XML Content is edited through a form. Content is stored in hierarchical folder structure within the VFS. version 6 introduced the concept of "siblings" that are like symbolic links except, rather than a target and a reference, siblings behave like true peers. Siblings can also have different values for their metadata attributes. Siblings are distinguished by a small arrow on the asset type icon and there is no way to tell which one is the original and which is the reference. An Apache Lucene based search service does full text and metadata of XML Content and XML Documents. There are also extractors to full text index binary formats such as Microsoft Office and PDF. Through the Workplace, an administrator can configure multiple search indexes by specifying the section of the directory structure to include, filters, and sort order. Only the default metadata fields are available for inclusion in the search index. Custom attributes within structured content are only indexed as part of the full text index. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 81

Product Evaluations

Figure 3.29. OpenCms Screenshot: Configure Search Index

OpenCms supports the definition of different search indexes through the Workplace graphical user interface. A search index is defined by the folders to include, filters, what fields to search, and instructions for ordering. There are extensive and well documented APIs for the OpenCms core, the Workplace, and front-end modules. Surprisingly, however, there is no Web Services or REST style API that ships with the product and there do not appear to be any modules that provide this interface. Most developers write their own XML over HTTP interfaces using the JSP delivery tier or in modules. Extending OpenCms is done through the addition of modules that are implemented as Java packages and registered through the administrative user interface. Modules can be built to extend both the Workplace and the front end web site. Modules can be exported through the UI. Doing so creates a zip file with the necessary code and configuration information that is read into the system configuration when the module is imported. One of the areas that OpenCms excels in is hosting multiple web sites on the same instance of the platform. At the root of the VFS is a node called "sites." Out of the box, OpenCms comes with a default site, but more can be added by editing the OpenCms system configuration file. After that, the site can be configured in the Workplace. URL management is done in through Apache Virtual Hosts. Within this configuration, editors see the entire content tree in the Workplace. Through project roles, users can be prevented from editing sites that they should not have access to. OpenCms supports clustering for fail-over and load balancing, but not a multi-tiered architecture or separate instances for staging code or content. Instead, OpenCms creates sandboxes called "projects" that serve as virtual environments to edit and preview content and code. For a multiple tier configuration, Alkacon's commercial Cluster Package provides functionality to manage a cluster and replicate content and code between multiple OpenCms instances. The cluster package also provides database transaction and LDAP support. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 82

Product Evaluations

Figure 3.30. OpenCms Screenshot: Database Replication Module

The Database Replication Module allows an administer to replicate the repository to a remote server. This is useful for multi-tiered architectures with a content production instance and a delivery instance.

Content Contribution
The primary power user interface for managing both content and the OpenCms application is the "Workplace" which works in two modes: Explorer and Administration. Content editors work in the Explorer mode that is modeled after Windows Explorer. On the left side is a tree base structure that contains a folder structure; on the right side is the detail pane. Clicking on the icons launches context sensitive menus that list actions available to the user. While Workplace is an impressive display of Javascript coding, there are some usability issues that have the potential to frustrate some users. First off, the application is entirely model. That is, the user can either be editing an asset or exploring the repository, but not both at the same time. Unfamiliar users clicking to edit an asset assume the window they are taken to is a popup dialog, and close the entire application when trying to exit the edit form. It takes them a while to remember to click on the "X" button within the button bar of the application. Second, navigation of the content tree is based on file names rather than titles so it is not always easy to know what the content asset is about.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 83

Product Evaluations

Figure 3.31. OpenCms Screenshot: OpenCms Workplace Interface

Power users and administrators use the Workplace to edit content and administer the site. Creating new content follows a wizard-like process of first determining the type of content, then editing the metadata values, then saving the asset. After that point, the user can edit the asset and the Workplace shows the appropriate interface: either a Microsoft Word-like window for XML Pages or a forms based editor for XML Content.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 84

Product Evaluations

Figure 3.32. OpenCms Screenshot: Editing XML Pages

XML Pages, or unstructured content, are edited in a simple MS Word style dialog. OpenCms supports advanced WCM concepts such as contributor sandboxes, strong versioning, access control, dependency management, and localization. For contributor sandboxes, OpenCms uses a "projects" metaphor. In other WCM systems, projects would be called "workspaces," "sandboxes," or "stages." Content in the "online" project is what external visitors to the site see. To safely edit an asset, a user checks the content into an offline project. This locks the asset so that it cannot be edited in another project. Depending on the user's privileges, he can either check the asset back to the "online" or live project, or submit it for review so that someone else can check it back in. The configuration of a project controls what content can be checked into the project, who can view and edit content that has been checked into the project, and who can approve content to be checked back in. Behind the scenes, OpenCms creates a collection of tables for every project. Users with sufficient privileges can also lock assets in the central staging project. New with version 7 is the ability to break a lock. Unpublished modifications are marked in the Workplace with a flag icon. Assets can be published individually, recursively through a branch of the directory structure, or the whole project can be published. As of version 6.0, a link checker automatically runs whenever items are published and the user is shown a report of issues if any occurred. version 7 improved link management with the introduction of a "content relationship engine." This allows OpenCms to warn a user if the asset that he is about to delete will create a broken link, and automatically updates links if content is moved or renamed. The relationship engine also publishes dependent assets such as images, when an asset is published. From within the Workplace there is also a feature to check for broken external links. This process runs through the content repository to collect external links and then verifies that those external pages are accessible. The relationship engine is exposed through the API.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 85

Product Evaluations

Figure 3.33. OpenCms Screenshot: Link Checking

The OpenCms content relationship engine checks for dependencies and warns a user if deleting an asset will break a link. The OpenCms localization strategy is based on the use of "siblings" mentioned earlier in the architecture section. Siblings behave like symbolic links but they can have different metadata values. With this technique a single asset has multiple values for each of its elements (such as "body" and "title") - one for each language. The display template looks at metadata attributes on the content asset or on the enclosing folder to determine which localized value of the attribute to display. This strategy keeps the different localized versions of content in sync and allows fall-back logic to display the asset in another language if the requested translation does not exist.

Figure 3.34. OpenCms Screenshot: Localizing Content

The OpenCms localization strategy involves overloading elements with different language versions and then using siblings to place the same content asset in multiple folders. Up until version 6.0, content authors and editors needed to work in the Workplace. version 6 introduced an in-context editing feature called "Direct Edit." Direct Edit allows a user to navigate through the rendered site (as a regular user would) and click on icons to edit regions of the page in a fashion similar to Hummingbird's (now OpenText's) RedDot product Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 86

Product Evaluations

(OpenCms uses bull's eyes, not red dots). The icons are visible both in detail and list views of content and pull up the full asset editing form. Direct Edit can only be activated on "offline" projects. Direct edit has had a huge impact on the perceived usability of OpenCms and made it competitive again with other products like Magnolia. Like Magnolia, however, in order to use the browse to edit interface, the user must log in through the Workplace and then launch a page as a starting point.

Figure 3.35. OpenCms Screenshot: Direct Edit Interface

The Direct Edit interface provides browse to edit functionality and allows casual users to spend less time in the Workplace. Depending on the browser, OpenCms has a number of WYSIWYG editors it can present to the user. All of the editors have the option to browse the server to create an intra-site link. Linking pages in this way, as opposed to by URL, allows OpenCms to register these links with the relationship engine so that they can be managed. There is no embedded spell check but there is a convenient button to strip the extraneous formatting from text copied from Microsoft Word. Presumably many OpenCms authors do most of their writing in Word and then copy in their text when it is ready for the web. This is not unlike other products; both commercial and open source. Image handling is much better with version 7. In addition to WebDAV support that allows a user to drag images into the VFS using Windows Explorer or another WebDAV client, there is also automated image scaling and manipulation functionality built into the product.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 87

Product Evaluations

Figure 3.36. OpenCms Screenshot: Insert Image

From the WYSIWYG editor image button, a user is able to select an image from the repository and set sizing parameters that will control the automated image scaling functionality. OpenCms has a feature to import a zip file containing a static HTML site. The OpenCms import tool is somewhat better than its peers because it allows you to use regular expressions to extract the body of the page (stripping out all the layout and branding embedded in the static HTML page) and apply a presentation template. The importer also parses through the HTML and corrects links. Content imported in this way can be managed as actual content, not unstructured HTML files. In the version 6 series, the workflow feature of OpenCms was just a generic task list that is totally separate from the explorer view. Workflow items were not associated to content or publishing events and were visible only on a different view of the Workplace (the Workflow view). Content approval uses the "project" system - not workflow. Version 7 was supposed to have a major upgrade to the workflow capabilities by integrating a legitimate workflow engine but the sponsor for that effort backed out. As a result, version 7 was released without a workflow capability. However, the consultancy BearingPoint has published an early version of an add-on module (Workflow2) that provides workflow functionality.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 88

Product Evaluations

Configuration and Administration


User management in version 6 was simplistic and limited. Users can be part of groups that have permissions. OpenCms ships with some default groups with different permissions settings: Users, Project Managers, and Administrators. It is difficult to set permissions on custom groups other than within the context of a project. Within a project, one group is designated as having the "manager" role that allows members to be able to publish assets into the online project. Another group is given the "user" role that allows members to view, add, and edit assets within the project. Version 7 introduced the concept of "organizational units" that allows users to manage a sub-domain of users without being a global administrator. This is useful for sub sites with different roles. Version 7 increased the granularity by which permissions can be applied to make it easier to trim down the workspace and make it easier to use. LDAP integration can be achieved with the OCEE LDAP connector sold by Alkacon. This module synchronizes with an external LDAP directory and is configurable through the Workplace.

Figure 3.37. OpenCms Screenshot: OCEE LDAP Connector

Alkacon's OCEE LDAP Connector provides LDAP support. Being a mature product, OpenCms is a stable platform with a rich set of administrative features and utilities such as a Content Tools section that allows an administrator to make global changes to content in the repository. There is a diagnostic check the validity of content in pages and there are tools to rename elements of existing content for when the content model changes. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 89

Product Evaluations

Figure 3.38. OpenCms Screenshot: Content Tools

OpenCms provides a number of tools to make global changes on content in the repository.

Presentation
Presentation templates for OpenCms are written in standard JSP using the JSTL plus a custom tag library provided by OpenCms under the "cms" name space. There are a couple of limitations such as lack of support for the newer XML style JSP syntax and certain styles of includes. However, template developers with a familiarity with Java and JSP will feel right at home. As with all tag libraries, there is a constant tension between simplicity of the API and simplicity of the code. Providing too many functions makes the library hard to manage; too few means that developers have less helper functions and need to write more verbose template code. OpenCms has kept its tag library small and developers frequently resort to using inline Java print statements rather than sticking to the tags. Version 7 has improved matters somewhat by upgrading to the 2.4 servlet engine and JSP 2.0 and exposing more objects to the JSP expression language. JSPs are stored in the VFS and can reside in the same folder structure as the content or packaged as modules in the system directory. Like content, templates are versioned and deployed from offline projects to the online project. Managing display template in content folders is a little messy. The only advantage is that if you name a template "index.html" it will automatically be used as the index page of the directory. Since version 6, OpenCms ships with a module that provides a presentation framework called TemplateOne, which is used in the demo site. TemplateOne is flexible and may be a useful starting point for building new sites, but developers can get away with writing simple but less Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 90

Product Evaluations

elegant JSPs. If nothing more, TemplateOne is a good introduction to advanced concepts and clever ways to use the OpenCms presentation tier. For performance reasons, JSP templates are written out to the file system on first request and then served directly by the servlet container. During this process, OpenCms updates paths, renames the files, and stores them in a directory mapped to the project that the content is being rendered out of (for example, online, or offline). Content is associated with display templates through metadata properties of the content assets or the enclosing folder. This is an awkward system because the property value field is free text so the user needs to correctly name the path of the template. The modal user interface does not help matters. The TemplateOne framework makes it a little easier by providing a pick-list for the template attribute. Visitor-facing functionality can be developed by using the Module API. Alkacon has a LGPL licensed set of modules called OAMP (OpenCms Add-on Module Package) available for download. OAMP includes modules for syndicating content over RSS, managing and delivering email newsletters, and a basic web form module. There are also a couple of community contributed modules on the site. Probably the most interesting is the KonaKart module that integrates the popular free Java shopping cart KonaKart with the OpenCms platform. This integration uses OpenCms JSP templates to call the KonaKart SOAP API and display products. For high traffic sites, there are two main options to increase performance. First, OpenCms ships with a feature called FlexCache that is based on the open source caching framework ehchache. FlexCache lets you configure cache parameters on each asset, such as whether to cache a different copy for each user (good for personalized sites), whether to cache different copies if the query string parameters are different, and a cache timeout. Cache settings are done in metadata properties of each asset which can get tedious to manage but gives a lot of control. The Administration section of the Workplace provides an interface to invalidate and manage the cache. For higher traffic sites, there is a Static Export feature that stores and serves generated pages on the file system rather then dynamically generating them every time. This setup requires some configuration with Apache mod_rewrite and mod_proxy and is not quite as high performance as a true "baking" style presentation architecture because it cannot publish to a farm of simple web servers.

Delivery and Support


OpenCms is widely used in Europe and North America for the corporate web sites of small and medium companies and regional web sites of large companies. While it is difficult to estimate, there are between 1,000 and 2,000 actively used OpenCms implementations. Over 100 sites have been submitted to the reference site page on the OpenCms web site. The mailing list is very active thanks to the broad install base. OpenCms has a "solution provider" directory but the criteria for getting listed are not at all demanding; a prospective provider just has to list one OpenCms site that they built. There is no certification or training program and there are nearly 100 solution providers listed on the OpenCms site. OpenCms is designed by Java programmers for Java programmers to maintain. OpenCms makes no pretensions that business users can configure and customize the system. OpenCms does not have a WYSIWYG template designer and the documentation assumes a strong foundation in Java and JSP programming. The Javadoc API documentation for OpenCms is quite good. Formal documentation, how-to's, and tutorials are managed and Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 91

Product Evaluations

deployed as modules. This is convenient if you are working with OpenCms without internet connectivity, although needing to install a module to read the documentation is a bit of a hassle. Documentation coverage took a while to catch up after the release of 6.0 and is still somewhat spotty. Version 7 put documentation another step back. The OpenCms team has established more user focus as a primary goal for the next few milestones of the project. They have already made significant progress in versions 6 and 7. The book Managing And Customizing OpenCms 6 Web sites: Java/jsp Xml Content Management, by Matthew Butcher, is probably the best resource for a business user; more technical users, trying to do technical tasks, will find it somewhat light. A book on version 7 is upcoming. Alkacon has roughly 10 full time employees (all software engineers) and uses freelancers for graphical and user interface design work. Development on OpenCms has largely been driven by the needs of the Alkacon customer base. Usually features can be tied back to a sponsoring customer's requirements. Since OpenCms was first publicly released in 2000, there has been a major release of the platform roughly every two years. The stable release of version 7 came out in July 2007. While external systems integrators could potentially develop a feature and contribute it to the core, few actually do. This is probably because Alkacon owns the source code and would obtain ownership of the contribution. The one exception is a third party systems integrator that donates database adaptors for different databases. There are several add-on modules available for OpenCms, but they are not organized in one location like other open source projects. The OpenCms.org web site has a module sandbox, but there is not much there. You would have more success doing an internet search for "OpenCms" and the capability that you need, which would probably turn up results from SourceForge and third party developer sites that sell commercial modules.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 92

Product Evaluations

Conclusion
Table 3.9. OpenCms 7.0.3 Summary
Category Contributor Navigation Score Explanation The main "Workplace" packs a lot of functionality and looks complicated to non-technical users. Pages are listed by filename which can be less than descriptive. However, the Direct Edit interface has introduced significant improvement. OpenCms supports structured and unstructured content assets well. A spell check feature would be useful but Microsoft Word compatibility functionality is a good substitute that authors may prefer. Introduction of WebDAV support is a big win. The new relationship engine tracks dependencies and helps prevent broken links. Full versioning support with new version created with every save. Siblings, while difficult for non-technical users to understand, enable content to be used in multiple locations of the site. The sibling framework supports related translations of assets but it makes the user interface complex. Most companies use a naming convention to distinguish between different translations of an asset. Straightforward JSP templates make it easy for any Java developer to interact with any data source. WebDAV support has made the repository accessible to other technologies. However, a SOAP or REST based API would be helpful. While workflow has been stripped out of version 7, the "project" editing model supports basic approvals. The JSP based delivery templates are easy for a Java developer to work with and the paragraph model can be easily translated into page components for building flexible pages. The standard Java orientation makes building interactive applications straightforward for Java developers and the module framework facilitates the deployment of the applications. However, the delivery tier (with all of its caching) is optimized for information display. XML orientation tends to output clean XHTML. User friendly, human readable URLs are supported. Managing And Customizing OpenCms 6 Web sites: Java/jsp Xml Content Management (Paperback), by Matthew Butcher. A new edition that covers version 7 is in the works. Much of the documentation is deployed as modules that can be installed on an instance of OpenCms. The mailing list is very active and helpful. Below Average; Average; Above Average; Exceptional.

Content Entry

Link Management Versioning Content Organization and Reuse Localization

Content Integration

Workflow Layout and Branding

Interactivity

SEO Books

Online Documentation User Forums Key: Nonexistent;

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 93

Product Evaluations

OpenCms is a mature and stable platform with many of the features that one would see in commercial products. OpenCms is particularly strong in the basics (check-in, check-out, versioning, and organizing content), but is not designed for building interactive, Web 2.0 style applications. Usability, once a real weakness for OpenCms, is being steadily refined after a big improvement with the Direct Edit user interface. Reasonable support prices make OpenCms one of the least expensive platforms to operate. OpenCms has a strong user and developer community anchored by Alkacon Software. Most of the active community is in Europe, where Alkacon is headquartered and where several agency style consultancies have built practices on delivering OpenCms powered web sites; there are fewer OpenCms integrators in the United States (28 North/South American solution providers listed on the OpenCms.org web site). Since the bar is low when it comes to being listed, be sure to qualify a prospective OpenCms solution provider.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 94

Product Evaluations

Informational Brochure Platform Summary


Of these four projects in this category, Magnolia and OpenCms usually get more attention and turn up when a potential customer does a quick survey of the open source Java WCM platforms to consider. Actually, Alfresco always pops up first but, as you will see later, Alfresco is not very well suited for this kind of content management problem. Although Apache Lenya has the distinction of being an Apache Software Foundation top level project, the community has dwindled and progress has slowed. Things may be changing for Lenya as the team has promoted several new committers and has just announced a major release of the software (2.0). Between Magnolia and OpenCms, users are frequently drawn to Magnolia's ease of use but OpenCms has a richer feature set than Magnolia. In particular, OpenCms is better for multisite hosting and localization and has better support for structured content types. OpenCms has also been making large strides in usability, functionality, and visibility thanks to two very high impact releases (6.0 and 7.0). Magnolia fills an important niche of marketing sites because the page-oriented content model aligns with how many site managers see their web site: a collection pages that are composed of editable regions. The Sitedesigner tool, while simplistic and lacking drag-and-drop-style functionality, gives a surprising level of control over the branding to the non-technical, non-HTML literate web designer. For information-dense web sites, Daisy is somewhat of an insider secret. This is, in part, because of marketing confusion around its classification as a wiki. The small size of the Belgium-based Outerthought team and its affiliation with a single systems integration firm may also be holding back from getting wider visibility. Designers who are looking for complete creative control over the branding and layout but lack the XSL skills to execute their vision on the Cocoon-based technology stack, will probably get frustrated by the platform. Adding interactive functionality may feel even more daunting. Building a theming system that abstracts the designer away from the XSL oriented templating layer would make the product more accessible for marketing sites, but not solve the problem of complexity of the underlying Cocoon platform when it comes to building interactive functionality. For intranets and other knowledge resources where branding and design are not so critical as a corporate brochure site, Daisy is definitely worth a look. Documentation sites are where the technology really shines.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 95

Product Evaluations

Table 3.10. Informational Brochure Score Summary


Category Contributor Navigation Content Entry Link Management Versioning Content Organization and Reuse Localization Content Integration Workflow Layout and Branding Interactivity SEO Books Online Documentation User Forums Scoring Key: Exceptional. Nonexistent; Below Average; Average; Above Average; Apache Lenya Daisy CMS Magnolia Enterprise OpenCms

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 96

Product Evaluations

Web Content Management Framework


While the Informational Brochure category of products introduces some interactive capabilities to primarily informational resources, the WCM Framework category provides content services for sophisticated web applications. The web content management industry has coined the term "content centric applications" to define a class of web site that is highly interactive and dynamic and also contains a lot of content. Content centric applications are different from the data centric applications that most developers are used to building. Unlike generic data, content is semi-structured, writing content is more creative than data entry, and, because of that, content production usually involves some sort of editorial process. Of course, there is some overlap between the two types of systems. A data management application may have a comments field where a user can express himself and a semi-structured content type may have a numeric or check-box field. In fact, it is sometimes unclear just how much unstructured or semi-structured information it takes for a data management application to require content management services. Typically, the watershed moment is when features like versioning, workflow, preview, and localization are requested. Unfortunately, these requirements are often recognized after the core architecture is designed. More often than not, web applications do have significant amount of content that needs to be managed. Even data centric web sites have pages of semi-structured content like help, terms and conditions, privacy statements, and other "about" pages. There are also global content components such as headers and footers and promotional elements. Usually this content starts out hard-coded into files deployed as part of the web application. The weakness of this strategy is revealed the first time a business user wants to change something and is told that the request has to go into a feature request queue that will be prioritized and handled in some future build of the application (after a rigorous series of regression tests, of course). Companies soon realized that content should have a different lifecycle than code, and also does not comfortably fit into software development, QA, and deployment processes. Customers look to platforms in this category as an alternative to building their own content management functionality, but want full control to build their own unique custom applications. Sometimes this means retro-fitting a pre-existing application with content services. Other times, the content management framework is designed as a major component of the original architecture. In the peak of the Internet bubble, companies with no concern for cost were using high end commercial frameworks like Interwoven TeamSite and Vignette Story Server for the most trivial services like deploying files or just caching pages. Today, more cost conscious companies are looking to open source to fill these discrete roles within larger architectures. The industry most actively pursuing this strategy is Media and Publishing. The vendors that have historically served this market (Vignette, FatWire, Interwoven) have all abandoned a media focus in favor of marketing oriented sites. Media companies - frustrated with these high cost, high lock-in, slow moving platforms - pursued alternate strategies. They started to look for inexpensive, standards oriented, pluggable technologies and open source was a logical category to explore. Media companies tend to favor back-end CMS that can publish content into an independent, de-coupled delivery tier. The value of this strategy is that it enables the owner to swap out the CMS and not lose an investment in personalization and other visitor facing interactivity. Because of the availability of powerful open source web application development frameworks and components, developing custom visitor facing web applications is less expensive than it used to be. Astute development teams are able to rapidly assemble applications out of reusable components and write minimal amounts of custom code. In many ways, open source is helping Java truly realize its potential as an object oriented programming language. By Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 97

Product Evaluations

using open source to freely share code across corporate boundaries and change developer's mindsets, companies are finally experiencing the levels of re-use that object orientation promised. Companies like these are typically very demanding of the architecture and want full control over every aspect of the system. They have strong technical capabilities and are constantly on the brink of abandoning the third party framework in favor of a custom solution. Companies that hold back from this temptation and strike the right balance between customization and compromise are usually the most successful. These companies are able to assemble, rather than build, custom applications to get the technology they want with less risk and cost than traditional custom software.

What Makes a Good WCM Framework?


In order for a WCM platform to be useful in the CMS Framework category, it must have an open and customizable architecture with a complete and well documented API. The technologies used to extend and integrate the platform should be widely known or, at least, easy or desirable to learn. There are also some specific functional characteristics that commonly required.

Content Contribution
Typically the platform is only used for its repository and editorial interfaces and not for its delivery functionality. In order to support the high degrees of dynamism and content reuse that is typical for this category of web application, the content managed needs to be highly structured and to have high quality metadata. Of course, there is a trade-off. More structure generally means more user interface complexity and less similarity to the grand-daddy of unstructured authoring tools Microsoft Word. A good rule of thumb is to impose as much structure as your users can stand. If you exceed the amount of complexity that users are willing to tolerate, they will undermine your best attempts at enforcing high quality data standards. For example, if you insist on turning what a user perceives as a large text area (such as an article body) into a collection of repeating paragraph elements, the average content contributor will probably paste the whole article out of Word into the first paragraph element. All your ideas around pagination and inserting promotional items between paragraphs will be undermined. Balancing usability and structure is a negotiation between the competing interests of the content contributors who want something like Microsoft Word, and the content consumers who want a dynamic presentation tier that needs structured content to deliver the right information in the right format at the right time. The CMS Framework can help by being generally easy to use (if the contributor is already swearing by the time he opens the content asset for editing, you have lost the battle), flexible (so you can quickly make adjustments by adding or combining fields), and have a good input validation framework (to help you be firm when you have to). XML based technologies have the advantage of having inherently flexible content models, although similar behavior can be achieved through highly abstracted and normalized relational database schemas. All of the technologies reviewed in this section allow a system administrator to change the structure of a content type through the user interface or by editing configuration files without having to worry about restructuring the database or updating the content entry forms. There are a number of techniques that a CMS can use to enforce input validation, and the better platforms work on multiple levels. The first level is data typing support. If the delivery tier needs a date field for its display logic (to do things like list articles in reverse chronological Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 98

Product Evaluations

order or search for event assets within a date range), the CMS should be able to designate a date attribute to be of datatype "date." The same goes for numeric fields like "price." If the CMS fully understands the content model, it can automatically perform some input validation. Custom validation can be done at the client and server side - preferably both. Client side validation is useful because it saves a trip to the server and gives the user more immediate feedback. Server side validation is more reliable and can be more sophisticated (for example, checking to see if a zip code is valid). Of course, AJAX is blurring the line between server and client side logic because it allows the client to call server methods without submitting the whole page. Input form controls such as radio buttons, check boxes, pick lists, and drop-down lists are helpful because they prevent the user from even trying to enter invalid values. The CMS should have a rich library of form controls that can be configured with the appropriate validation logic. Advanced functionality like dependent select lists (where, for example, the values in "state" field change based on the country that has been selected) are also useful. These systems frequently need to handle large volumes of content and a business user needs to be able to find the appropriate assets to edit. The abilities to organize content within the repository and search are critical.

Development and Configuration Management


Configuration management is an often overlooked aspect of a framework but it is extremely important if the framework is going to be your development platform. Without solid tools and practices, it is difficult to have multiple developers work together on a project without getting in each others way. The platform should have a way for multiple developers to work on their own instance of the application, or work area on a shared server, and synchronize their changes between their own environments and the integration environment. It is also important to be able to replicate content from the production environment for testing purposes. Technologies that store configuration and code as data in the repository are at risk if they do not have a clean packaging mechanism and a way to version configuration. The ability to use a familiar integrated development environment (IDE) and source code management (SCM) system are also plusses. Because these platforms are highly customized, the architecture should clearly delineate core functionality from customization and configuration to ensure clean upgrade paths. Because of its object orientation, Java is generally a good architecture for safely overriding default system behavior. Thanks to Inversion of Control (or "dependency injection") frameworks like Spring IoC, and aspect oriented programming, Java can be even more extensible and configurable. Projects that leverage these newer architectures enjoy a distinct advantage here.

Presentation
Many products in this category publish content into a separate presentation tier that is potentially not even on the Java stack. In order to be useful in this way, a platform needs to have functionality to deploy structured content into another application's repository. The most common way of doing this is through XML. An adaptor is built onto the presentation tier application to read in XML and store it in its local repository. This is also a good time to execute logic to clear caches and update search indexes. Getting structured content into an XML format should be easy work for any WCM: deployment is usually the more challenging requirement. Getting the files (XML and associated binaries: images, flash, pdf, audio and video) onto the server is not so much the problem as knowing which files to push. Ideally, you do not want to re-publish content that hasn't changed. Breaking dependencies by failing to publish linked pages and images is equally problematic. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 99

Product Evaluations

Publishing directly to an external database is another alternative and several commercial commercial alternatives have database adaptors (Interwoven has DataDeploy, Percussion Rhythmyx has Database Publisher). In the absence of these adaptors, typical approaches include using a workflow event or template code to call functions that write to an external database. However, there needs to be some way of notifying the de-coupled presentation tier that data has changed and to clear its display caches.

Figure 3.39. Architecture Diagram: Structured Publishing

In the structured publishing pattern, the CMS publishes structured data into a de-coupled delivery tier. The two use cases to consider when designing these de-coupled architectures are preview and linking. To achieve an accurate preview, the CMS will have to push the new version of the asset to a content staging instance of the delivery tier. How much content and when depends on the requirements. Single page preview is relatively easy to achieve, whereas full site preview (where a user can browse around the site to see where the asset appears) is more challenging. A common short cut is to build "low fidelity" preview templates in the presentation tier that come with the CMS. The risk of this approach is that the CMS preview templates may fall out of sync with the production templates as the site is updated and re-branded. As for linking, the issue is that, since the presentation tier owns the URLs, the WYSIWYG editor will have a difficult time constructing link tags (<a>) because the URL of the target is unknown at the time of editing. There needs to be some process that transforms a link target to a URL that will be recognized by the presentation tier for re-pointing. Technologies that have dependency management functionality that scans rich text areas for intra-site links have an advantage because they have the hooks to invoke link re-writing logic. If the presentation tier of the CMS Framework is used to render the web site, it should be flexible, easy to manage, and leverage widely known and/or easy learn and manage technologies. Look for standards based (like JSP and XSLT) and widely used templating languages (like Velocity and Freemarker). Even more important, however, is the controller. To support a transactional web application, a CMS Framework should leverage a capable MVC based framework (like Struts, Struts2, Spring MVC, Tapestry, or Wicket) or be able to work in conjunction with one.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 100

Product Evaluations

Delivery and Support


Unlike building an informational brochure where you can get away with just learning the templating language, companies that adopt a content management platform should be prepared to get relatively deep in the technology. Good information and support resources will be critical in achieving this level of familiarity with the platform. Documentation is extremely important although it is rarely very useful; perhaps more useful is the documentation provided by the underlying technologies. For example, there are a number of well written O'Reilly books on the various Java frameworks. Books reflect a market for the technology, and are often a good indicator of interest. Commercial open source companies usually rely on support and training programs as a primary revenue source. Most of the companies in this report are relatively young and have immature training and support organizations. On the plus side, this frequently means that your trainer or support engineer is a senior employee and intimately knows the product. The down side is that staffing of these departments tends to be lean so it is not uncommon to have an urgent support call answered by someone on pager duty rather than sitting in a 24/7 call center. The training schedule can be sparse and because all of the projects are headquartered in Europe, travel may be necessary. To avoid unnecessary development and painful upgrades, it is a good idea to synchronize your development roadmap with that of the framework. This is easier to do when the product developers are open about their road map and actually follow it. Projects should be run as transparently as possible but there is a fine line between too much communication and too little. Too much information can be confusing, especially when it is conflicting . Early ideas are subject to change and need to be qualified. Promotional marketing language that describes the potential of a feature should be balanced with realistic descriptions of its current capabilities. Typically, people with commercial backgrounds tend to have the hardest time toning down their marketing language and not over-promising. The better platform projects have a technical evangelist that interacts directly and candidly with developers. Developer events, such as user groups, are important for this exchange.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 101

Product Evaluations

WCM Framework Market Overview


The Java community loves frameworks and it should come as no surprise that there are many Java based technologies that have strong content management capabilities. Architects that are looking for a content management service to integrate with their own custom web application or third party software will be pleased to know that two of the three projects evaluated in this report (Alfresco and Hippo CMS) have a powerful, de-coupled repository with the potential to be accessed in a stand-alone fashion and accessed by a custom web application. Jahia takes a different approach by integrating a front end based on the Apache Jetspeed-2 project. Having a portal based delivery tier provides a standards based framework for building and deploying interactive applications. All three of the products in this category are commercial open source projects developed by software companies. Both Alfresco and Jahia practice a tiered product model, while Hippo revenues are based entirely on support, services, and training.

Table 3.11. Informational Brochure Strengths and Weaknesses


Platform Alfresco Strong Features Repository services Web Scripts JCR support Hippo CMS De-coupled architecture Contributor navigation Link management Repository services Cost Jahia Enterprise Portal integration Web clipping Badgeware "open source" version Clustering There are also a number of projects that are not covered in this analysis. Below are brief summaries of the "bubble" projects that just missed the cut and may be considered for future versions of this report. Cofax (www.cofax.org). Cofax was originally developed by Knight Ridder for their Philly.com web site and was eventually rolled out to the rest of the Knight Ridder web sites. There is not much activity on the project now and, in fact, there has been an initiative at Knight Ridder to replace Cofax. The replacement project was suspended when Knight Ridder was acquired by McClatchy. Philly.com now runs on Clickability's software as a service product: cmPublish. Today there is little news coming out of the Cofax project. dotCMS (www.dotcms.org). dotCMS is a commercial open source CMS project run by Dotmarketing, a systems integration firm that specializes in higher education, associations, and service organizations. The dotCMS product is built on top of the popular Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 102 Separation of content definition from display Small U.S. footprint About to be re-built on a different web application framework Weak Features User interface Complex and expensive pricing

Product Evaluations

LifeRay [http://www.liferay.com/web/guest/home] portal. Dotmarketing has deployed dotCMS for a number of customers. Interestingly, dotCMS also includes CRM and eCommerce. InfoGlue (www.infoglue.org). InfoGlue has a nice looking web site but very little is happening with the project. However, there are some systems integrators in Asia doing InfoGlue deployments for non-profits and NGOs like the United Nations Viet Nam site. mmBase (www.mmbase.org). mmBase was originally developed in 1995 by the Dutch Public Broadcasting organization VPRO (www.vpro.nl); the project was open sourced in 2000. VPRO did all the right things to start an open source project: they created a nonprofit foundation to own the code and worked to build a community. There was a time when some large multi-national companies like IBM were building sites on mmBase, but the project has been on the decline. Most of the activity is from a few freelance developers and small systems integrators building various web sites. The project is still actively being developed and release 1.9 is targeted for May 2008. Content Here will continue to watch this one.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 103

Product Evaluations

Alfresco 2.2 WCM


Abstract
Alfresco is considered a true ECM platform with capabilities in all the disciplines of content management: Document Management, Digital Records Management, Web Content Management, and Digital Asset Management. From an architectural perspective, the product supports all of these areas by building a feature rich and standards compliant repository that is capable of handling large volumes of content. Alfresco's repository supports advanced functionality (such as user sandboxes, event based content rules, and virtualization) historically seen only in upper tier commercial software. From a functionality perspective, the Alfresco product line seems to compete most frequently with Microsoft SharePoint as a low cost platform on which to build collaboration systems. Both product lines have good support for basic document management and collaboration needs, but are awkwardly venturing outside of a document centric world to become a true web publishing platform. Of course, the Alfresco team's Interwoven lineage provides a better understanding of the needs and best practices of web content management. In fact, the product does appear to be aggressively pursuing functional parity with TeamSite. It remains to be seen, however, if these features will make the product too complicated for the target market. Introduced in version 2.0, WCM is still a new offering on the Alfresco platform. Many Alfresco integrators say that the WCM capabilities that came with 2.0 should have been positioned as Alpha or Beta quality - not a production ready product. Starting with release 2.1 and more so with 2.2, Alfresco offers the core functionality to make it production ready as a useful framework. And while Alfresco's architecture is robust and flexible, the user interface still struggles to intuitively support web content management. The immaturity of the WCM functionality has a second risk - that there are few companies using it. Best practices for implementing Alfresco WCM have not yet been established and documented. The WCM implementations that do exist have used very different strategies. Because the architecture is so flexible, there are many ways to solve the same basic problems and the different integrators seem to be trying them all. As the pioneers start finding repeatable patterns and sharing their learnings, the community will be able to more efficiently use the platform and take it to new levels.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 104

Product Evaluations

Project Summary
Table 3.12. Alfresco Enterprise Project Overview
Web site: Project Inception: Current Version: Project Type: Licensing Options: Geography: http://www.alfresco.com 2005. WCM launched in 2007. 2.2 since February 2008. Commercial: tiered product model. Alfresco Community is licensed under the GPL with a FLOSS exception. The Enterprise Bundle has a commercial license. Alfresco Software Inc. is headquartered in the UK with some staff distributed across North America. The user community is global with concentrations across Europe and North America. Repository services for custom web applications. Electronic Arts runs their EASports site [http:// www.easportsbig.com/] on Alfresco WCM. The Harvard Business School Publishing [http:// www.hbsp.harvard.edu] site runs on Alfresco WCM. Frameworks and Components: Integration Standards: Java Support: Application Servers: Databases: Apache MyFaces, ehcache, FreeMarker, Hibernate, jBPM, Lucene, OpenOffice, Rhino, Spring, Velocity JSR 168, JSR 170, WebDAV, Common Internet File System (CIFS) 1.4 and 1.5 Tomcat, JBoss, Websphere MySQL, Oracle, MS SQL Server

Common Uses: Sample Customers:

History
Alfresco is a generously funded software company with a commercial enterprise software pedigree. The company was founded by John Newton (co-founder of Documentum) and John Powell (former CEO of Business Objects) and they have rounded out their team with senior people from Novell and Interwoven. The fact that the early team came from Documentum is clearly visible in the product with its early focus on document management, repository services, and access control. Since those early days, the Alfresco team has worked hard to layer in web content management functionality and support for structured content. Development of Alfresco started in January 2005 and the team has made tremendous progress in both building the software and visibility for the company. Alfresco describes itself as the first and leading open source ECM product - a claim that frustrates companies like Nuxeo whose ECM products pre-date Alfresco. While Nuxeo was there first, few can argue with the fact that Alfresco has put open source on the map as a viable alternative to commercial ECM products. Commercial vendors and open source projects alike have adjusted to Alfresco's market disruption. Nuxeo ported its ECM product from Zope to a more familiar Java platform. Commercial vendors are reconsidering their pricing and value propositions. Alfresco's drive into the WCM space got serious when they recruited Kevin Cochrane and other thought leaders from Interwoven and, in the process, accepted that the Documentum view of WCM was just not good enough. WCM was officially launched with release 2.0 of the Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 105

Product Evaluations

Alfesco Community Edition in July 2007; first customers launched in August. As of January 2008, there are roughly 43 paying Enterprise Edition customers using Alfresco WCM (as opposed to 300 customers using the basic ECM platform) and roughly half of these sites are live.

Architecture
Alfresco gets the attention of software architects and Java developers for its standards support and its use of popular open source components and frameworks. The first thing you notice when you download Alfresco is that it is a lot of software. The lib folder is packed with 108 JARs totalling nearly 40 megabytes; that is a lot even by Java standards. What that provides is some of the most modern and elegant open source components and frameworks around. In some ways, you can think of Alfresco as one big supported bundle of best-of-breed open source software projects. Reusing these components is what has enabled Alfresco to develop their product so quickly and stay current with the latest technology and standards.

Figure 3.40. Alfresco Architecture Diagram

Alfresco has a very open service-based architecture that supports a number of standards. Source: Alfresco documentation site. Alfresco's standards support and openness makes it very effective for integration with other systems and use in service oriented architectures. When Alfresco first hit the market, it was positioned as a framework for building any kind of content centric application and the web client was merely an example of what you can do with the platform. Today, many architects still look at Alfresco as an ideal building block for larger architectures. Java, PHP, and Web Services APIs expose most of Alfresco's functionality. The repository is accessible over WebDAV, Common Internet File System (CIFS), and FTP. CIFS support, which allows a Windows user to map a letter drive to the repository as if it was a Windows file server, is one of the Alfresco team's biggest achievements. Long time Unix users will remember what an impact that Samba [http://www.samba.org] had by allowing Windows and Unix to share files Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 106

Product Evaluations

over Microsoft's proprietary standard. Alfresco has the only Java implementation of a CIFS client. One could say that CIFS is the user interface that engenders the most pride from the Alfresco team and the most adoration from business users (see commentary on the web client later). JSR 168 (the Java Portlet Standard. See Glossary for JSR168) and JSR 170 (the Java Content Repository Standard - level 2. See Glossary for JCR) are supported. Business Process Execution Language (BPEL. See Glossary for BPEL) support is provided by Alfresco's inclusion of JBoss's jBPM workflow engine. The key to the Alfresco architecture is the repository whose node based hierarchy is similar to the Java Content Repository. Indeed, the Alfresco JCR interface complies with level two of the JCR specification. There are a couple of places where it is difficult to use the JCR calls and you need to resort to the native Alfresco repository API such as observation feature (where you can monitor a set of assets and then be notified if there is a change). This is more a function of the JCR's newness than Alfresco's recalcitrance, but internally the Alfresco team is critical of the JCR. Hopefully, they will use their position on the JSR 283 team and ideas to improve the JCR specification. Early in 2007, Alfresco created publicity around their JCR benchmark tests and claimed to be the fastest open source JCR implementation (faster than the other: Apache JackRabbit). They had a platinum partner certify the results. However, the JackRabbit configuration was using the default file system persistence rather than the much faster relational database persistence that most non-demo implementations of JackRabbit are configured with. Still, the definition of a benchmark was a great contribution to the community. The Alfresco Repository is composed of three core services: the Node Service, the Content Service, and the Search Service. Together, these three are called the "Foundation Services." The Node Service manages the metadata of content objects or "Nodes." Alfresco's definition of Node maps directly to the JCR definition. Every content asset is a node placed in a hierarchical tree. Node metadata information is stored in a relational database (MySQL by default, although most database platforms are supported thanks to a Hibernate object relational database layer). The Node service is used for organizing and browsing content. Every content object in Alfresco is stored in a file: XML for structured content, HTML or native binary formats for everything else. These files are managed by the Content Service which takes care of things like retrieving the proper version of the asset and encapsulates the mechanics of persistence. Currently, metadata is not versioned within Alfresco, only the actual files. The Search Service uses Lucene search indexes that are also stored on the file system and is also used in the on-board search functionality and for listing operations in display templates.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 107

Product Evaluations

Figure 3.41. Alfresco Architecture Diagram: Repository Services

Alfresco's repository architecture is based on three core services: Node, Content, and Search. Source: Alfresco documentation site. Additional services may be added to the Alfresco repository by registering them with the Registry Service. All the other repository functionality is built on top of these three services. This includes: Content transformation and image manipulation, metadata extraction, templating, classification, versioning, locking, workflow, and permissions. Alfresco has a modular architecture to allow for plugins called AMPs (Alfresco Module Package). Modules are encapsulated and kept separate from the core execution logic to ensure system stability and clean upgrades. A command line management tool called MMT (Module Management Tool) installs, removes, enables, and disables modules on the system. The introduction of WCM to the Alfresco architecture forced the company to make some major enhancements to the repository. A key change: the introduction of the Alfresco Versioning Model (AVM). The AVM supports functionality like file-level branching, snapshots, and directory level versioning. There is also the construct of "transparencies" that allow one collection of assets to be "overlaid" over another collection to create a view that is the union of the two collections. Where both collections have the same file, the overlaid version is shown; when the overlaid collection has deleted a file, the file is removed from the view. It is this architecture that enables the sandboxes and snapshots that are explained in the content contribution segment of this evaluation. The new AVM supports distributed repository model where multiple repositories can run virtually on a single instance of Alfresco or on multiple Alfresco instances. Content can be replicated between repositories and the process is identical for repositories running on the same instance or for repositories distributed across the network. Replication is based on snapshots that are automatically taken every time content is pushed to the content staging workspace. When replication is initialized, the source repository asks the target repository for a hash of its latest snapshot. The source server then sends over the files that have changed along with the hash of the replicated snapshot (to save the target server the work of computing Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 108

Product Evaluations

the hash of its snapshot). This model reduces the amount of traffic over the network and the amount of workload on the target server that is expected to be busy servicing web traffic. A similar architecture is available for simple file system deployments. In this case, a lightweight daemon is installed on the target server rather than a full blown Alfresco instance. While the repository-to-repository replication functionality was available in version 2.1, version 2.2 provides file system replication and a GUI to manage target servers. Web projects can be accessed through CIFS (as opposed to the ECM standard repository that is accessible over CIFS, WebDAV, and FTP) under a separate mount point than the general ECM repository. Under the WCM mount point, the user will see two directories: data and versions. Under versions there will be directories for v0 through vn - one for each snapshot taken of the repository. This structure allows you to "time travel" to different read-only views of the web project's repository through Windows Explorer. Other mount points can also be defined based on filtered views of the repository. There are some pre-defined ones that restrict what a user can see based on role. Other mount points can be defined through XML configuration files. Despite the fact that web projects are accessible through CIFS, Alfresco does not generally recommend business users accessing the web projects in this way. It is considered safer to have them work in standard ECM project spaces and use rules to push content into web projects. The Foundation Services are exposed through a Java API to create yet more functionality. Developers can also leverage these services through two external interfaces: the Web Services API and the JCR Interface. One of the more exciting integration features: Alfresco's Web Scripts, which allow the creation of a custom REST based API by coding server side Javascript code. Web Scripts are a significant part of Alfresco's strategic move away from their SOAP based API to a simpler REST API. Using the REST that comes out-of-the-box and extending it with custom methods using Web Scripts is a very powerful way of extending Alfresco. It is particularly useful for supporting AJAX calls to provide some additional data driven, client side interactivity. Some systems integrators are using the native REST API extended with Web Scripts to build custom management and delivery applications on top of the repository and eliminating the use of Alfresco's clunky administration interface. As one systems integrator put it: "Alfresco is the ideal development platform for a customer that has mock-ups showing a clear idea of what they want the user interface to look like. Rather than fighting with the UI that comes with the CMS to transform it into their design, we can give them an API and let their web developers build what they want." Alfresco is a great alternative to building a custom CMS from the ground up. They take care of all the content management specific functionality that most developers are not familiar with building, and leave the rest to a custom software development team. Many architects see the REST API and the introduction of Web Scripts as positioning Alfresco as a content service in a service based architecture, or the back end of any number of Web 2.0 style applications. Best practices are still emerging as to how much application code should be written in this layer. No doubt there will be a healthy debate similar to that between programming in the database with stored procedures or in the application tier; the more programming done in this tier, the more the lock-in. Architects looking for standards support may consider integrating through the JCR level two compliant API. There is also a rich set of functionality that is not covered in the JCR spec. For example, Alfresco's supports a construct of the "aspects" to add attributes and functions across different asset types. Adding the "versionable" aspect to an asset makes that asset support versioning; a "searchable" aspect causes the asset to be indexed. This is different from object oriented classing because it is done at the object instance level - the class or type stays the same. Aspects are defined through XML files and manually applied through Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 109

Product Evaluations

the user interface or by business rules triggered by Alfresco's event model. For example, a business user can set a rule to add "categorizable" aspect to content when it is added to a specific folder. Although no compiling needed for defining most aspects, you need to restart the application server for them to be recognized. Alfresco ships with several native aspects that can be added to assets in the repository through the user interface or set by default in the XML repository configuration file. Defining new aspects is a convenient way to add functionality to the system. A developer could add a "synchronization" aspect to push updates to an asset to another system, for example. The Alfresco repository also has an event model that can trigger the execution of code on events such as update, move, or a change in workflow state. While the release of the AVM in version 2.1 introduced many new capabilities to the platform, the user interface of web projects took a step back and is just starting to catch up. For instance, although web content was indexed, and the API supports search, there was no search functionality in the web client until the 2.2 release (late January 2008). Also, content within a Web Project cannot be "made multi-lingual" like the rest of Alfresco content.

Figure 3.42. Alfresco Screenshot: Web Content Properties

Web Projects expose a small fraction of the repository functionality supported in the rest of the application. To manage structured content, Alfresco uses the open source Chiba XForms engine to automatically generate web forms. Defining a new content type is done through a wizard interface that involves uploading a schema definition (.xsd) file and associating workflows and display templates. Content types can be shared between web projects. One of the more ambitious concepts that came over from the Interwoven engineers is "virtualization." Unlike TeamSite, which proxies over to a web server running the presentation tier, Alfresco is provides a container for any "well behaved" (Alfresco's words) Java web application to run in. This allows both code and content to be tested in safe virtual instances running on one instance of Alfresco. Alfresco's virtualization architecture is strictly designed for preview and staging content - not running a production web site. Still, Alfresco does some clever optimization to reduce the memory footprint of multiple virtual environments. So, for Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 110

Product Evaluations

example, JARs used on the delivery application are only loaded once and shared across virtual instances of that application. For complex de-coupled web applications (such as a site running ATG Commerce), version 2.2 supports remote test servers that are not virtualized within Alfresco. Alfresco can deploy content to these servers and then proxy requests for preview and content staging in essentially the same way that Interwoven TeamSite works. To make this happen, test server instances need to be set up beforehand and registered with Alfresco. Since there is a finite number of test servers, only a finite number of people can preview at the same time. Although Alfresco does not have many high traffic WCM sites live today, they certainly have thought about how to do it. The recommended high availability, high load configuration is a three tiered architecture with a cluster of application servers running the delivery application. This delivery tier just needs Alfresco's deployment module to receive content from a cluster of Alfresco servers. For dynamic requests against the repository (for Web 2.0-style applications where end users submit content), the presentation tier can call back to the Alfresco cluster over the REST API. Behind the Alfresco repository would be a cluster of MySQL servers.

Content Contribution
As impressed as technologists are about the Alfresco architecture, users are often less enthusiastic about the user interface that tries to split attention between web content and document management focus. It seems that the Alfresco team is a bit confused about the role of the "Web Client." Originally, it was positioned as a reference application to show what one could build on top of the Alfresco platform and it seemed to get less attention from the engineering team than the programming interfaces and modularity of the system. When Alfresco positions itself as a business application rather than a development framework, it is more likely to defend the web client. Still, when speaking to Alfresco staff, it is easy to tell from the relative enthusiasm between the UI and the architecture that they see the UI as a necessary evil. It is not surprising that many integrators do as little with the web client as they can. At least one of the early WCM implementations had contributors edit content in DreamWeaver and XML editors against a CIFS drive rather than use the Web Client, and then use Alfresco to deploy these files to the delivery environment. When going this far to work around a CMS, one should consider just using a source code control system to manage HTML files. As marginal as the web client is for document management and collaboration, it is even less suited for web content management. Web sites are created in special folders called "web projects." With no tree based navigation or in context editing, the web project user interface is way behind pure WCM products in terms of usability. Content assets are listed by their file name so a user must guess from the file name what the content is about and then figure out what enigmatic icon will execute the desired action on the content. The UI has a "paging style" design where the contents of a folder is shown in pages of 10 assets at a time. The sort columns are limited to very basic attributes: file name, size, modified date, created date, and modifier, so it can be hard to find an asset. The newly enabled search functionality should help here.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 111

Product Evaluations

Figure 3.43. Alfresco Screenshot: Browse Site View

The browse view of a sandbox allows a user to navigate through folder structure of web assets. While the problem of browsing and finding content was not a primary design concern for Alfresco's initial WCM release, handling concurrency between multiple content contributors clearly was. Alfresco followed Interwoven Teamsite's approach of creating user "sandboxes" where users can edit and preview content without interfering with other user sandboxes or the production site. Changes made in a sandbox are only visible in that sandbox until checked back into the staging sandbox. Unedited content in a user's sandbox is automatically updated to reflect changes other users submit to staging. Depending on the user's permissions, he can directly check an update in to the staging sandbox or initiate a workflow that will collect the necessary approvals.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 112

Product Evaluations

Figure 3.44. Afresco Screenshot: Sand Boxes

Alfresco's sandbox model provides contributors with their own work areas to edit and preview their changes prior to checking back in. The other Java open source WCM project to employ the sandbox model is OpenCms with its notion of "projects." In the PHP world, TYPO3 has introduced a similar work area concept. However, Alfresco's implementation is more sophisticated thanks to its "virtualization" technology that allows a user to browse through the site as it would appear after the modifications are checked in. Like most Alfresco interfaces, the content editing experience is wizard based with control buttons (back, next, finish) on the upper right corner of the page. Although awkward, users get into a rhythm of working their way down a form and then scrolling to the top to continue on or finish the wizard. Input validation (based on Apache Commons Validator) is done at submit time and presents the user with a list of validation layers at the top of the form. As mentioned earlier, the editing forms for structured content types are automatically generated by the Chiba XForms implementation. Although standards based, Alfresco's implementation is less powerful and flexible than Hippo's forms engine. For example, Hippo gives you more control over the layout of the form. Still, there is adequate support to model complex content types including items with repeater and nested elements, and there is a basic set of form controls including a calendar date selector, and a browser widget and other controls can be added. TinyMCE is shipped as the default WYSIWYG editors, but developers report success with other editors, as well. The TinyMCE configuration comes with custom browse dialogs for adding links and image references, but a surprisingly limited of formatting buttons are enabled. More can be added by editing a section of the web client configuration file that sends parameters over to the TinyMCE control. Some buttons, like spell check, require the addition of plugins that can be easily installed (See the TinyMCE web site [http:/ /wiki.moxiecode.com/index.php/TinyMCE:Control_reference] for a full list of configuration options). Formatting buttons can be turned on per content type and field. For example, the summary element of a content asset can get less buttons than a body element.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 113

Product Evaluations

Figure 3.45. Alfresco Screenshot: TinyMCE Formatting Buttons

Alfresco ships with a stingy number of formatting buttons enabled. The image and linking browse controls allow a user to browse and add new targets. The image control provides fields to set the dimensions, position, and alt text of the image.

Figure 3.46. Alfesco Screenshot: Image Position Dialog

The image dialog has fields for sizing and positioning the image and for adding alt text.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 114

Product Evaluations

In order to use Alfresco's rendering engine, content rules are set to process the presentation templates when the content is saved, creating rendered versions of the source XML content. For example, if you have an article123.xml source file, and rendering templates for a detailed view and a summary view, you may get the files article123.html and articlesummary123.html. The best practice is to store rendered content in a different directory structure than the XML sources. This makes sense because rendered content should be stored as it will be navigated on the external site - not as it is managed. This also enables content re-use because content can be rendered to multiple places. The paths and file names that the content is rendered to are configurable via rules that can use variables and information about the content to determine where to put it. One could use a taxonomy to render content into various folders; alternatively, Alfresco can be configured not to render the content and instead save it as XML and have a dynamic delivery tier do the rendering when the assets are requested. Alfresco also touts its "site import" technology that can import an entire static site in a zip archive. This is useful to quickly deploy a site on Alfresco and does enable library services (check-in, check-out, versioning, access control), but it doesn't provide much in the way of the high value content management benefits such as separation of layout and content, content reuse, and business user empowerment. At best, this approach may be considered a way to incrementally replace a static web site with a managed one. With release 2.1, Alfresco introduced some basic link checking functionality. Users can click a "check links" button from within their workspace and link checking can also be added as an automated step in the default workflow that comes with the product. This is helpful because the complexity of the UI makes visually checking links and images a bit flakey. A user that is familiar with the "make multi-lingual" feature supported in the rest of Alfresco will be disappointed in the lack of localization support within web projects. Companies tend to use primitive work-arounds, like appending a locale code (en, es, fr) at the end of the file name to create localized web sites. The only other alternative is to manage different locale web sites in different web projects. While earlier releases of Alfresco had a simplistic folder based workflow model, Alfresco now uses JBoss's jBPM for workflow services. jBPM is a popular workflow component but it is primarily used for choreographing services across applications in service oriented architectures. Still, jBPM has the name recognition and tool support to justify the choice even if it is a little over the top. Workflow processes are defined in a powerful but proprietary XML language called jPDL (JBoss Process Definition Language). The JBoss jBPM Process Designer Eclipse plugin provides a graphical interface for designing workflows. While designing workflows is very point-and-click, it takes a little more effort to wire these workflows into Alfresco application logic. JBoss jBPM also supports the standards-based BPEL for cross system choreography. When a new content type is defined, through the "web form wizard," it is associated with a workflow. Depending on the workflow, there will be different configuration options that can be selected by the user that submits the content. Workflows can initiate business logic like checking links, and create manual tasks that are emailed to the user and show up on the user's dashboard.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 115

Product Evaluations

Figure 3.47. Alfresco Screenshot: Workflow Dialog

The workflow task form combines some task management plus a workflow state machine. Alfresco allows a content type to be defined with more than one workflow option. When more than one workflow is enabled for a content type, the user can select which workflow to use.

Development, Configuration, and Administration


Development on Alfresco is done at two levels. At the core application level, a developer can edit configuration files add JARs and create AMPs. There is a project on the Alfresco Forge that uses Maven to build and deploy Alfresco AMPs. The second level is at the presentation tier where display code like Freemarker templates for static rendering and JSPs for dynamic rendering are developed and managed directly in the Alfresco repository. Teams developing on Alfresco have options as to how to manage their environments. Backend Java code is typically managed in a traditional source code control system and tested on local instances of Alfresco. Alfresco could be used as a source code control system but, until the Alfresco team uses Alfresco as its source code repository, it might not be a good idea. There is talk about putting a Subversion interface on top of the Alfresco repository so that Subversion clients like Tortoise SVN and Subclipse (the Subversion client plugin for the Eclipse IDE) can talk to it. In the meantime, developers typically use scripts to check code out of a source code control system, compile it, and FTP the JARs over to the Alfresco repository for deployment. Display code, such as presentation templates, are usually developed on a shared server and managed within the repository. Alfresco's new deployment GUI described earlier is useful for Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 116

Product Evaluations

pushing code from development instances through QA and to production. It is also for pushing snap shots of production content back to staging and development environments for testing purposes. The deployment user interface can configure directories to include or exclude or file name patterns to exclude. One thing that is missing in the deployment GUI is the ability to save deployment definitions for later reuse. Access control within web projects is limited. Alfresco comes with some pre-packaged roles that should look familiar to a user of its document management functionality: Content Manager, Content Contributor, Content Reviewer, and Content Publisher. A Content Manager has full permissions on the workspace; a Content Contributor can edit and add but not publish; a Publisher can approve content but not edit; and a Reviewer can only read content. More roles can be created by editing configuration files. The real shortcoming of the access control model is that roles are applied at the web project level - not at the sandbox or folder level. This makes it difficult to do things like restrict access to edit a portion of a web project. One work around would be to do this by adding custom roles but a more practical approach is to use workflow to prevent users from publishing content that they shouldn't be editing. Unless they are approved, their edits will linger harmlessly within their own personal sandboxes. Another strategy would be to separate the web site into multiple web projects, however this would hinder sharing content across site sections.

Figure 3.48. Alfresco Screenshot: Managing Permissons

Managing permissions is done by inviting users and groups and assigning them roles. Alfresco's LDAP support is based on a replication model. Alfresco periodically gets updates from an external LDAP repository. This is implementation is problematic if you want to edit group memberships through the Alfresco UI because they will just be overwritten by the next update. If you want to integrate with LDAP, it is best to do all the group assignment directly in the LDAP directory or customize Alfresco to only consult the LDAP directory for authentication. There is currently no special back-up functionality other than to shut down the system and run standard MySQL and file system back-up. This does not pose a problem for most Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 117

Product Evaluations

companies since the de-coupled delivery tier would remain operational. However, for global companies working on the same Alfresco instance, having a daily maintenance outage would not be acceptable. There is talk within the Alfresco team about implementing a live backup mechanism, perhaps, using the replication functionality. No doubt some customers are probably experimenting with this approach right now.

Presentation
Alfresco gives you several options when it comes to presentation. Presentation templates written in FreeMarker or XSL are registered with a content type are executed whenever an asset of that type is saved. Each content type can have multiple presentation templates that each make a "rendition." For example, one template could make a detailed view while another template could make a view to be used in a list of assets; this is good for static delivery. For dynamic delivery, Alfresco allows you to build a Java web application in the web project (WEB-INF directory and all) and Alfresco will serve as a container for that web application to run in. For preview, Alfresco advertises that it can "virtualize" any "well behaved" Java web application using any web application framework. Production environments are not "virtualized." They are real application servers running on production hardware. Using Alfresco's new deployment user interface, code and content deployment can be separated. The pioneers that built the first Alfresco powered web sites went with a static HTML deploy model where rendered HTML files were deployed to a simple web server. This model is particularly appropriate for sites that have been statically imported into Alfresco. Other models include structured publishing of XML files or publishing a whole web application content, code and all. The verdict is still out as to whether to use Alfresco as a source control system for a delivery web application.

Figure 3.49. Alfresco Static Deploy Model Diagram

The simplest delivery model is static deployment where static HTML files are pushed over to a simple web server. In terms of presentation tier functionality, Alfresco doesn't offer much out of the box - especially in the world of Web 2.0. Despite the company's obvious interest in Web 2.0 (John Newton is an excellent blogger and frequently speaks at conferences about Web 2.0), and marketing Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 118

Product Evaluations

rhetoric, a quick browse through the Alfresco corporate web site reveals a strategy of relying on "best of breed" technologies rather than the ECM vision of one tool to consolidate different content management functionality. Alfresco uses phpBB for forums, GForge for community collaboration, MediaWiki for documentation, Baynote for search, and executives use Wordpress for blogs. Interestingly, there is a recipe on the Wiki for creating a Facebook application that reads from an Alfresco repository. While the marketing brochure web site may be managed in Alfresco WCM, the only appearance of the Alfresco product on the alfresco.com domain is for downloading PDF documentation and for the partner extranet that contains PowerPoint presentations and other documents in Microsoft Office formats. Most Alfresco WCM customers tend to use Alfresco to publish into custom presentation tiers. Integrations with technologies like LifeRay Portal [http://wwwl.liferay.com] and even OpenCms have been successful. Alfresco, with its robust repository and open architecture, fits nicely behind presentation tiers. Now with the deployment options available in version 2.2 and Web Scripts, these architectures are even more promising.

Delivery and Support


Alfresco has been iterating on its licensing and business models since the first public release of the software. The company a strong moral stance that those who use Alfresco's software should pay for it. In an effort to convert users into paying customers, Alfresco has dabbled in tiered products with different features and badgeware versions. However, their strategy has been getting progressively more open source friendly. The current approach is a single code base for both the Community and Enterprise tiers. The Community Edition is essentially the SVN head and changes daily. The releases of Enterprise Edition are the tested, certified branches on that code base. Currently, the Enterprise Edition 2.2 is about to come out while there is a "lab version" of the Community Edition that is at version 2.9. The functionality of this lab version will probably go to a 3.0 release of Enterprise. Customers are advised against using the free Community Edition for anything but experimental uses; the Community Edition is neither supported nor patched. When a new major release comes out, the Enterprise Edition and the Community Editions are identical. However, when a bug gets fixed, it is only fixed in the Enterprise Edition. Certified integration partners are forbidden from working on the Community Edition for their clients. It is possible that one day a community will form to support the Community Edition but as of today none exists, and it is doubtful that the Alfresco team will encourage one. Alfresco is not a community project. The code base is very tightly managed by Alfresco corporation and is exceptionally clean and well organized. External programmers wishing to contribute to Alfresco are encouraged to publish extensions on the Alfresco Forge [http:/ /forge.alfresco.com/]. As of February 2008, there were 95 hosted projects ranging from language packs to Outlook plugins. The number of projects on the Forge is not growing very rapidly. There were 94 projects in June of 2007. While the establishment of the Forge was certainly a step in the right direction, Alfresco has realized the need to take a more active role in community building. There have been a few very well attended user group meetings and Alfresco has recently hired someone in charge of community development (interestingly, this hire comes not from an open source background, but from Documentum). It is worth noting that the Alfresco team is all new to open source. The one team member with open source credibility is the VP of Marketing, who worked within Novell's legal team as they defined their Linux strategy. Everyone else comes from purely commercial software companies. Using the Enterprise Edition requires a subscription to one of Alfresco's maintenance programs (or "networks"), which run around $10,000 per CPU per year for the management instance Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 119

Product Evaluations

and half that for each CPU on the delivery tier. This is roughly comparable to MS SharePoint. Customers not wishing to renew their support and maintenance contracts must downgrade to the Community Edition. Support is not included in this fee and runs an extra $12,000. The documentation and support forums are hit or miss and Enterprise customers report that paid support is not much better. Munwar Shariff's book Alfresco, Enterprise Content Management Implementation is a useful introduction to the user interface, the architecture, and its customization points. However, it does not cover advanced topics and was written before the WCM and the AVM were available. The best resource is the wiki, but the articles are not as thorough as more formal product documentation would be. There are a few articles on best practices, but not nearly enough. Alfresco delivers training at its offices near London and through partners elsewhere. Customers report that the training is useful. At a recent user group, there was a general sentiment of frustration that Alfresco was being too aggressive adding new sophisticated features rather than refining and documenting the pre-existing basic features. The project moves fast and although the wiki changes daily, it does not keep up with the new ideas and initiatives that the Alfresco team is working on. For example, one customer built the equivalent of web scripts only to find that it was being added to the product. Perhaps a more significant example is the Web Client that could really use some sustained re-design and re-factoring to be a useful business application. These rapid and unpredictable movements may be for competitive reasons, but to the outsider it looks like attention deficit disorder. With some digging, you will find a lot of information on the wiki. However, there is a poor signal to noise ratio. The best bet for getting the most out of Alfresco is to go through a systems integrator who may have an inside line on the product. Alfresco operates a network of SI partners. The network is tiered (Platinum, Gold, and so on) and based on company size and financial (and other) commitments made by the systems integrators to Alfresco - not by the amount of Alfresco work that the systems integrators do. The best way to evaluate SI partners is to look on the forums. The good SIs are the ones that are answering the questions and publishing modules on the Forge.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 120

Product Evaluations

Conclusion
Table 3.13. Alfresco 2.2 Summary
Category Contributor Navigation Score Explanation While the hierarchically organized content repository can handle large volumes of content, the Web Client is not optimized for managing web content. It is so weak that customers prefer using the CIFS interface to navigate the repository as a simple file system. Structured content types can be through forms edited generated through the Chiba XForms engine. There is less control over the form generation than in Hippo. Developers have the option of using an external source code control system or using the repository for version management. Virtualization is useful for spot testing code. New deployment functionality makes it easier to deploy code and content to the delivery tier. Alfresco is very clear that its customization layer and its licensing prevents users from re-compiling any of the core code. Customization of Alfresco is done through writing presentation templates, developing modules (AMPs), and adding jars that override default behavior. The Spring IoC control framework allows you to wire in code. Alfresco does not come with its own delivery tier. Developers can use its Freemarker or XSLT engine to transform XML content when it is saved for static content delivery. Most customers build their own dynamic delivery tiers that either read XML deployed to a file system or from the Alfresco repository. The PHP and REST APIs are also useful for building dynamic delivery tiers. Alfresco also allows you to virtualize and deploy your presentation tier code. Alfresco is a Java programmer's dream. It uses all the technologies that a developer either knows or wants to learn. Sometimes communication is open and candid, other times it is more marketing hype. To get the straight dope, work with a good systems integrator, read the wiki, and get to know other customers. Alfresco Enterprise Content Management by Munwar Shariff. Useful as an introduction to the platform but does not cover WCM or many advanced configuration and extension topics. The formal documentation has adequate coverage of some of the primary topics. The wiki has a lot of information but it is not particularly well organized. Better search would tie everything together. Online forum is monitored by Alfresco staff. Still, responsiveness is just average. Below Average; Average; Above Average; Exceptional.

Structured Content

Configuration Management

Customization Layer and API

Delivery Tier Flexibility

Widely Used Technologies Project Transparency

Books

Online Doc

User Forums

Key:

Nonexistent;

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 121

Product Evaluations

Alfresco is an ideal platform if you really want to build your own CMS but don't trust yourself to get versioning, deployment, and workflow right the first time. The strength of this product is in the architecture, the APIs, and the repository; certainly not in the user interface. It provides a higher starting point than a naked JCR implementation like JackRabbit, but comes at the cost of some lock-in. One could think of Alfresco as the Zope [http://zope.org] of the Java world an elegant technology waiting for a nice UI (although Alfresco is more aesthetically pleasing than the Zope Management Interface). In the case of Zope, many companies built elaborate custom systems on the platform; later, Plone came along and became the Zope application for mainstream content management. Because of Alfresco's licensing and release model, it is doubtful that a third party Plone-like application will appear, and the Alfresco team has not shown the interest or commitment to build one of its own. However, since Java is a much more mainstream technology than Python, it is not clear that Alfresco needs to be more than a great development framework. Alfresco is seeing an impressive amount of traction: just six months after releasing what many claim as an Alpha or Beta quality WCM product, they have several big-name customers going live. In content management circles, Alfresco has built name recognition that preexisting open source products - and many commercial products - will never achieve. As an open source product company, Alfresco is continuing to evolve. It is becoming more actively engaged in the community and has some real evangelists among customers and systems integrators. Alfresco could probably stand to hire more people with open source backgrounds to check commercial software company instincts which have a tendency toward closed internal communication and processed marketing messaging.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 122

Product Evaluations

Hippo CMS 6.05.02


Abstract
Hippo CMS is developed by Dutch software company Hippo B.V., which sells support, training and services for Hippo CMS and their portal project Hippo Portal. Hippo is probably the most versatile and well managed of the Cocoon-based Java WCM systems. It has a broad European install base with particularly good penetration in the publishing sector. Use in North America is rare to non-existent, although the team is actively building partnerships with U.S based systems integrators that they know from involvement with the Apache community. Hippo's de-coupled architecture consists of a repository and a management user interface. Customers build their own delivery tiers that can read content from the repository over the WebDAV protocol. This architecture makes it very flexible and useful in different deployment scenarios and easy to feed content into other applications. The fact that content is not managed through the delivery tier makes Hippo acceptable when security is a concern. Although Hippo does not support an in-context editing environment, the management user interface is simple and clean enough for casual users. Hippo is on the verge of a major re-build that will replace the Jakarta Slide-based repository with a repository built on Apache JackRabbit implementation of the Java Content Repository (see Glossary for JCR) and replace the management application with a new application based on Apache Wicket. The latter is very good news for those Java developers who like the breadth of features, but are justifiably concerned about the complexity of the Cocoon development framework. Fortunately Hippo has established a sufficient abstraction layer so that customization made to the current Hippo has a chance of being portable to the new Hippo management application. The prospects for porting newer Hippo implementations will be considerably better than older implementations, but only experience will tell how easy it will be.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 123

Product Evaluations

Project Overview
Table 3.14. Hippo CMS Project Overview
Web site: Project Inception: Current Version: Project Type: Licensing Options: Geography: Common Uses: Sample Customers: http://www.hippocms.org 2000 6.05.02 since December 2007. Commercial: support based. Apache 2.0 Hippo B.V is headquartered in The Netherlands. The install base is currently limited to Europe. Managing large web sites such as media and publishing sites. ABN Amro VNU/Incisive [http://www.vnunet.com/] uses Hippo for all of its publications. The Dutch Ministry of Finance [http://www.minfin.nl/nl/home] web site is running on Hippo. Frameworks and Components: Apache Avalon, Apache Batik, Apache Cocoon, Apache FOP, Apache Geronimo, Apache Lucene, Apache Slide, Excalibur Fortress, Hypersonic DB, Jetty, Jgroups, OpenJMS, OSWorkflow, Spring WebDAV, DASL, LDAP Jetty (default), Tomcat (also commonly used), JBoss, Weblogic, Websphere Hypersonic (default). MySQL, Postgres, Oracle, and Microsoft SQL Server

Integration Standards: Applications Servers: Databases:

History
The Hippo CMS project and its Apache 2.0 licensed code repository received visibility with the initiation of the HippoCMS.org community web site in 2005. This, after five years of being used for custom implementations by Dutch content management software company Hippo B.V. During that time, the technology was known within the Apache Cocoon community but had very little visibility with the general public. This changed with the release of version 6.02.00 of the platform and the 1.0 release of Hippo Repository that is based on Jakarta Slide; prior to that release, Hippo sat on top of the XML database X-Hive (recently acquired by EMC). Since moving to a public community model, Hippo B.V. has shifted from a consulting/software company to a pure software company offering Hippo CMS along with closely aligned Portal and Document Management products. Hippo has an impressive client list in Europe. One of the premier customers is Incisive Media, which publishes several web sites on the platform including VNUNet.com and CRN UK. Hippo has not been sited in North America but the company is exploring relationships with North American integration partners and leveraging its relationships within the Apache community.

Architecture
Like Daisy, Hippo has a componentized architecture where the repository server is separate from the management application. Hippo takes the de-coupling one step further by pulling Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 124

Product Evaluations

delivery services into a separate tier that can be easily replaced with another delivery application. This separation of concerns makes Hippo attractive to large "architecturally pure" applications, such as large scale digital publishing. The management tier is built on top of Apache Cocoon and most Hippo implementations use Cocoon in the delivery tier, although that is not a requirement. Hippo offers a generic Java repository client that would allow any Java web application framework to access Hippo Repository. The standards-based repository would also be open to other technology platforms such as PHP or Ruby, although no known sites are configured this way. The use of Cocoon on the front end does make sense, though, because the Hippo architecture is so XML centric and an affinity toward Cocoon is what attracts many developers to the Hippo platform in the first place. However, those who are intimidated by Cocoon will be interested to know that Hippo is exploring alternative web application frameworks for the next major release (called Hippo ECM 1.0), which bundles a new repository based on Apache JackRabbit and a rewrite of the Apache Cocoon-based management application in Apache Wicket.

Figure 3.50. High Level Hippo Architecture Diagram

Hippo has a three-tiered architecture consisting of a management tier, an open repository, and a de-coupled delivery tier. Today, Hippo Repository is based on Jakarta Slide [http://jakarta.apache.org/slide/] and therefore supports the WebDAV [http://www.webdav.org/] standard: an extension of HTTP standard that allows users to collaboratively edit and manage files on remote web servers (Slide is the reference implementation of WebDAV. See Glossary for WebDAV). The Hippo team contributed a considerable number of improvements to the Slide project and also built on top of it to make a scalable and full featured repository. In addition to the the WebDAV standard, the Hippo repository supports some custom methods such as "Replace" (used for a search and replace feature) and a "Facets" method (used for a faceted browsing feature that is supported by the query syntax, but not exposed through the user interface). While Hippo Repository provides versioning, search, and access control services, it is less functionally rich than Daisy's and Alfresco's repositories. At the end of the day, every piece of content in the Hippo Repository is just a file. Slide's WebDAV implementation does support the notion of "properties" that can be used for metadata. To use this feature, you need to write custom "extractors" that parse through the file, grab the appropriate data, and store them as properties. Doing so allows you to query the repository using the DAV Searching and Locating (DASL) standard [http://www.webdav.org/ dasl/]. Out of the box, Slide comes with extractors to grab the text out of MS Word, Powerpoint, Excel, and PDF. Hippo comes with some useful base extractors that can easily be configured to meet most basic needs. For example, the XMLDatePropertyExtractor can be configured to grab a date element out of an XML document.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 125

Product Evaluations

DASL is an XML syntax that follows the basic form of a SQL query with clauses that define what fields to return ("select"), what types to look in ("from"), the filter conditions ("where"), ordering rules ("order by") and number of records to return (limit). As query languages go, DASL is somewhat esoteric and unknown when compared to SQL, XQuery, and the new JCR Query syntax. Still, it is a standard and that counts for something. Slide implements DASL natively, but Hippo uses a more powerful Lucene based implementation that is faster and can search a broader collection of content because it searches an index rather than opening each XML asset. Hippo Repository uses Slide's pluggable persistence model. The default configuration uses a simple file system. Hippo's developer documentation describes MySQL, Oracle, Microsoft SQL Server, and PostgreSQL configurations. Hippo Repository supports replication that mirrors content to other repository instances. Based on mailing list traffic, this configuration is fairly common in the field. In particular, replication is often used to push content from the published area of the repository to a collection of read-only Hippo Repository instances on the delivery tier. There is also an option to create a cluster of repositories reading from the same database. The next major release of the Hippo Repository (2.0, which will be part of the ECM 1.0 product) is going to be a total rewrite based on Apache JackRabbit [http:// jackrabbit.apache.org]. JackRabbit is a more capable content repository and able to represent content as hierarchical structured nodes. The Hippo team is openly discussing interesting ways to leverage and extend the platform effectively. One area of deep inquiry is how the repository will be organized. The JCR spec is inherently hierarchical, but the Hippo team wants to enable faceted organization of content where assets can appear under multiple different collections. The DASL query syntax will be replaced by the JCR's own query syntax, which is already starting to enjoy adoption by companies that don't even support the rest of the JCR standard. For example, the commercial WCM product Percussion Rhythmyx uses JCR query syntax to retrieve content from its non-JCR repository. Hippo has very good support for structured content types. Edit forms are auto-generated using Cocoon's CForm technology (See Glossary for CForms) based on a content definition described in an .XSD file, a layout.xml file that describes the selection and organization of form controls (Hippo provides a comprehensive list of form field widgets) and rules for showing them, and a business_logic.xml file that contains validation rules that can be defined as assert statements or regexp expressions. The business logic syntax also has some built-in rules that can be applied such as that the field is required, that the entered value needs to be a valid email address, or upper and lower character limits. Each content type can have its own style sheet (CSS) to control layout and styling. There is also a properties file that defines properties stored outside of the asset as properties that can used in various queries in the repository and shown in the user interface. By editing these files, a developer can build complex and attractive content entry forms without needing to understand Cocoon's CForms, generally considered difficult to master. The one thing missing is AJAX-based form controls. Another benefit of this abstraction layer is that it will help customers migrate to ECM 1.0 without needing to port logic written in XSL or other Cocoon code. AJAX controls are slowly working their way into the platform, but will not be fully exploited until the ECM 1.0 release that will also benefit from the JCR's fine grain editing model. The ability to update individual nodes within an XML document will enable features like micro-edits (where you can edit document one field at a time) for a richer, more dynamic user interface.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 126

Product Evaluations

Figure 3.51. Hippo Architecture Diagram: Forms Generation Architecture

Hippo Architecture documentation.

Diagram:

Hippo

forms

generation.

Source:

Hippo

Typical Hippo implementations publish to multiple front end instances (usually based on Cocoon). Many Hippo customers use a single instance of Hippo to publish content to several different web sites that may share content. Hippo provides a skeleton and a sample web site based on Cocoon. There is also a code sample of a simple Java class reading from the Hippo repository over WebDAV. A Java client library encapsulates communication with the repository for integration with other Java platforms. Like most Cocoon applications, Hippo web sites use elaborate caching techniques to achieve performance. The caching system, called eventcache uses JMS to receive notifications of what cached objects to invalidate. Hippo Repository bundles the open source JMS server OpenJMS [http://openjms.sourceforge.net/]. Cocoon sites subscribe to this service to listen for invalidation events (delete, add, change, or move). This mechanism has been implemented in the Java adaptor as well so that non-Cocoon Java presentation tiers can also benefit from Hippo's caching system. The binary distributions of Hippo CMS and Hippo Repository come bundled with the Jetty servlet container that is executed from within an Excalibur Fortress container. This a common pattern among Cocoon technology projects. Fortress is part of the Apache Avalon Framework project. Avalon as a parent project is officially closed after a couple years of drift. The sub projects, like Excalibur, live on and are used by other Apache technologies like Cocoon although not happily. Projects that have the resources and initiative are migrating to trendier technologies like Spring. Nevertheless, Fortress does what it needs to do for most Hippo Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 127

Product Evaluations

implementations. Others have successfully deployed the framework on top of Tomcat, but this configuration requires the addition of a JMS provider like ActiveMQ.

Content Contribution
The Hippo content model recognizes two high level content types: Assets and Documents. Documents are structured content types (XML documents, really) and Assets are binary files. Assets are uploaded into the system using the web based interface. Documents are edited with web forms. It is possible (although not advised) to upload content directly into the repository through WebDAV. Doing so will prevent Hippo CMS from executing business rules, and other operations. Primitive WebDAV clients, like Windows Explorer, tend to trample properties and other advanced WebDAV data structures. The Hippo team recommends using the WebDAVPilot plugin for Eclipse when administering the repository, even though it is no longer being developed or supported. The Hippo Repository is organized in a hierarchical directory structure that can be used to drive the navigation of the site or for internal purposes only. Many Hippo sites use keywords and taxonomy to drive navigation. The "list item" form widget can read an XML document that describes a node tree representing a hierarchical taxonomy managed by either Hippo or externally. However, customizing the search and browsing features in the management interface to traverse the taxonomy rather than the folder structure is less trivial.

Figure 3.52. Hippo Screenshot: Taxonomy Browser

The "list item preview" widget is useful for selecting hierarchical taxonomy terms for a document. Hippo's decoupled architecture creates a clear distinction between the back end and the front end. All of the content contribution and management is done in the back end management interface. However, some customers have implemented a "surf to edit" feature by placing an edit button on every page of the delivery tier that points to the appropriate page of the management interface. The management interface makes heavy use of frames and IFrames rather than AJAX technology for an interactive feel. Browser support is limited to Firefox and IE, but most business users find the interface to be clean and simple. It is organized into tabs called "perspectives," which most techies will recognize from the Eclipse IDE. Business Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 128

Product Evaluations

users tend to understand the concept of tabs well enough so they don't have to worry about understanding what a "perspective" is. What makes the Hippo perspectives interesting is that they are state-full - that is, when you click on another tab, the UI remembers what you were doing on the tab that you left. This is especially useful in the case of the "Editing" perspective that is used when editing a piece of content (Document or Asset). It doesn't lose the user's changes when he navigates to other perspectives in the UI. The other perspectives are: Dashboard, Search, Documents, and Assets. More aspects can be added to manage other functions or view data from other systems. The Dashboard perspective is used for various administrative tasks as well as for displaying custom reports. Developing reports requires considerable amount of skill and experience. Out of the box comes a "to do" list that lists all content submitted for publishing. The temptation to add notifications and summaries to this perspective should be balanced for the efficiency of a lightweight start-up page. The Dashboard is also where users edit their profile and, if authorized, manage users, groups, and permissions. The on-board search engine can be accessed through a persistent search box in the upper right corner of the page and the Search perspective that provides an advanced search interface and the results. Out of the box, the Search interface presents options to search by boolean expressions, workflow state, date, location, and user. Out of the box, there is no ability to do fielded searches (as in the keyword field contains "food") or restrict by content type (you can only choose among folders, documents, or assets). There is, however, a handy search and replace feature that allows a user to replace text inside documents that were returned by the search. Adequate warnings - and the fact that it only changes returned documents - make it adequately safe for non-technical users. Search and replace queries cannot be restricted to specific elements within the XML documents and does not work on metadata properties outside of the XML documents.

The Hippo search perspective exposes advanced search functionality. The search functionality is very extensible by adding extractors to the repository that pull content attributes into indexed properties, and editing the appropriate XML files that control the search form and the layout and behavior of the results. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 129

Product Evaluations

Content contributors spend most of their time working in the documents tab where they can navigate through folder structures of content and select items to edit. The Documents perspective has a context sensitive right column that shows functions and actions available to the user based on the selected asset (folder or document), its publishing state, and the user's permissions. A properties box on the lower right can be configured to edit metadata properties on assets. By default, this only controls the "caption" attribute which is usually identical to the file name (without the extension).

Figure 3.53. Hippo Screenshot: Document Browse Interface

The "Documents" perspective allows users to navigate through folders of structured content and surfaces functions and actions that the user can execute on the selected documents. Documents are listed with basic information including the name of the document (also called the "caption"), the size, document type, modified date, and a workflow status that is communicated through icons. Although not obviously apparent, the grid can be sorted by clicking on the column headings. Clicking on the "order" column enables sorting arrows to move items up and down on the list and control the order in which the assets are listed on the presentation tier. Developers can customize the browsing interface by editing XML files, but these configurations are not well documented and a working knowledge of Cocoon is required. One of Hippo's more advanced features is its link management functionality. When a user links to another document or asset, even within a WYSIWYG text area, the reference is stored and managed. When a Document or Asset is highlighted in the browsing interfaces, a panel on the right indicates what documents are linked to it. When a user attempts to delete or move the document or asset, he is warned. By default, there is no functionality to update references when assets are moved. However, the fact that the relationships are managed is a good start for building in re-referencing options into the warning dialog. Logic could also be added and triggered during workflow transitions to help prevent broken links. Links are discovered by a process that is run periodically on the repository - but not when the asset is saved. On the Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 130

Product Evaluations

negative side, the link management is not instantaneous so references can be broken before Hippo is aware of them. However, this approach has the advantage of evaluating content imported into the system and edited by automated processes such as workflow actions. When a user selects to edit a document, he is automatically placed into the "Edit" perspective where he is presented with the appropriate edit form. As mentioned earlier, Hippo's form building functionality is particularly powerful with wide selection of form controls and widgets, including pop-ups for calendars and various browsing and pickers. The default WYSIWYG editor is the basic but stable Xinha [http://xinha.webfactional.com/] Javascript text area control. The spell check feature is disabled by default; turning on spell check requires a simple change to one of the Javascript files and installing GNU Aspell on the server (Aspell can be easily installed by most Linux package managers.) Some Hippo installations use the Xopus XML WYSIWYG editor to edit the whole XML asset at once. Xopus is known to be quirky and some users complain of its performance - especially on underpowered workstations. However, some Hippo implementations use Xopus for its ability to semantically tag text within text areas. Doing so enables the presentation tier to execute advanced display rules on the content. This means that, for example, a business periodical may tag companies mentioned in the article and have the presentation tier list all mentioned companies on the bottom of the article, or link the company name to a search of all the articles that mention that company. Some Hippo implementations use locally installed XML editors like XMetaL and Arbortext, and then upload the files.

Figure 3.54. Hippo Screenshot: Xopus Editor Integration

The Xopus Editor enables users to leverage advanced XML operations. Hippo's versioning support comes from Slide (the "V" in WebDAV stands for "versioning"). By default, versions are only created when a user clicks the "save draft" link. However, many Hippo implementations are configured to save a draft with every workflow transition. Prior versions are accessible from the "History" link on the actions column that launches a pop-up that shows all the saved versions and allows a user to revert to a prior version. Reversion is Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 131

Product Evaluations

done by creating a new current version identical to the one reverted to. This reversion strategy keeps the full history of all versions and is easily reversed. There is no auditing functionality that records every access or update of a document. Only the last modified date and modified user is shown. Assets, or binary files, can be added in the Assets perspective, where they are available for re-use, or within the context of a document, where they can be uploaded directly from the WYSIWYG editor or through the image picker control. Like Documents, links to Assets are managed by Hippo and users are warned when they try to move or delete them. Linked documents are also shown on the right side "action" column when an image is selected. This view answers a common need to find out where on the site an image is used in order to manage licensing and copyrights on images. Assets are not versioned and have no workflow capabilities. While not enabled by default, Hippo has a "Trash Bin" functionality that moves "deleted" content to a different area of the repository, rather than truly deleting them. If this feature is enabled, there should be some archiving system to periodically clean out the trash bin.

Development, Configuration, and Administration


Hippo has some solid best practices about extending the platform. The basic idea is to build your own custom project to wrap the Hippo core in. All code stays safe in an extensions directory that does not get over-written when Hippo is upgraded to a later version. In general, Hippo is a nice technology to work in because it is stable and seldom requires a restart to see changes since most of the work is done in XML configuration files rather than compiled code. However, the error messages can be cryptic and difficult to diagnose for a newbie more accustomed to stepping through code with a debugger. Developers can easily work on their own instances of Hippo and coordinate through a source code management system. This works well because all the code is managed on the file system rather than in the repository. Content from the repository can be copied backwards from the production environments to QA and development using a plugin called Repository Copy. Hippo also offers a Maven plugin (see Glossary for Maven) plugin to build and remotely deploy Hippo site projects. Hippo's access control model builds off of Jakarta Slide (which is based on a common file system) but extends it with workflow-specific permissions such as "publish," compound operations like "Move," and content management specific operations like "save draft." Users and groups can be granted or denied permissions at the folder level only. Permissions are inherited down the folder structure unless overridden. Permissions can also be set at the perspective level. For example, one could restrict users in the art department to only see the "Assets" perspective. The permission system is powerful, but the user interface to manage it is complex and dangerous for a non-technical user to experiment with. Hippo leverages Slide's native LDAP connectivity.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 132

Product Evaluations

Figure 3.55. Hippo Screenshot: Managing Permissions

Hippo's access control system builds off of the basic permission set provided by Slide. Permissions are closely tied to workflow. New permissions can be created to restrict specific workflow actions to a certain sub tree of the content folder hierarchy. Out of the box, Hippo supports a basic single approval workflow model. Workflow states are communicated through a simple set of icons (X is unpublished, checkmark is published, checkmark with a star means the asset has been updated since publication). Users can request publication or, if granted sufficient permissions, can publish the document directly. The publication interface gives the user the option of publishing immediately or at some later date. The same interface also gives an option of setting an archival date when the document will be removed from the public web site. When a user requests publication, a user with review permissions gets a review task on his "to do" list.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 133

Product Evaluations

Figure 3.56. Hippo Screenshot: To Do List

Users with approver permissions get "to do" tasks when a user requests publication. Permissions are managed within the Dashboard perspective. Other than that, there is very little else that can be done in terms of through the web configuration; Just about everything else requires a developer. On the plus side, this has a positive effect of "locking down" the system and closely managing change. However, it also means that most configuration tasks require developer intervention. More complex workflows can be configured through Hippo's embedded workflow engine: OSWorkflow [http://www.opensymphony.com/osworkflow/]. OSWorkflow is driven by XML based "workflow descriptors." Open Symphony does offer a Graphical workflow designer like JBoss's process designer but it is not considered production ready and most developers write the XML descriptors by hand. The workflow component uses a database to manage state information and the Apache Quartz to schedule tasks and jobs. New workflows are developed as "workflow projects" that contain XML descriptors and Java code. Workflows are assigned by content type in the content type definition. The publishing event is handled internally by copying content from the working area of the repository to a region that the delivery tier reads from. In a distributed, high availability model, this region of the repository is replicated to other repositories that the presentation tier reads from. Staging presentation environments used for preview read directly from the working area of the repository. Unlike Documents, Assets are not subject to Hippo's workflow and are perpetually in a "published" state. This is convenient as it eliminates the common issue of deciding what to do when publishing a Document with links to unpublished images. For back-up, there are two options. For a hot back-up, the repository replication service can mirror the live repository to a read only copy. There is also the option to do a dump of the underlying MySQL database. This may be more reliable, but it requires taking the system offline. If the delivery tier reads directly off of this repository, taking it down may not be an option. However, high availability sites will run the delivery tier off of a replica of the master repository, not the master repository itself.

Presentation
Hippo provides some best practices and tools for building delivery tiers, but this is generally considered outside of the core product. One of the more useful tools is the Cocoon Project Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 134

Product Evaluations

Wizard, which builds a generic Cocoon site that can be used as a starting place. In addition to building out the general directory structure and configuration files for the Cocoon site, the Cocoon Project Wizard also builds out the navigational menus based on the folder structure in your repository. After using the Cocoon Project Wizard and reviewing some basic samples in the Hippo documentation, a developer is best served by moving over to the Apache Cocoon web site and third party technology books to get up to speed on Cocoon. The Apache Cocoon site was just re-launched and is better organized. The professionally published books are good if somewhat behind the latest version (Cocoon 2.2) and the version that Hippo uses (2.1); the latest English language books are on Cocoon 2.0. Hippo's other product, Hippo Portal, is also a viable option for the delivery tier. Based on Apache Jetspeed-2 [http://portals.apache.org/jetspeed-2/], Hippo Portal talks to the repository via the Java client adaptor. Jetspeed-2 is generally regarded as having higher performance than the other big open source portal product, Liferay [http://www.liferay.com/], but it doesn't have all the AJAX bells and whistles and comes with fewer portlets out-of-the-box. The Jahia project (evaluated next) also uses Jetspeed-2. The Hippo team is also actively working with other Java frameworks. For example, there is a JSF (See Glossary for JSF) based repository browser prototype available for download. Hippo also encourages customers to use the Java client library to leverage Hippo's content services in any Java application. One of Hippo's larger customers is exploring the idea of breaking out of the Java stack and building a delivery tier on Ruby on Rails.

Figure 3.57. Hippo Screenshot: JSF Repository Browser Demo

An experimental application written in JSF to explore the Hippo repository.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 135

Product Evaluations

Delivery and Support


All work on the Hippo platform is done by Hippo B.V., a relatively small company of around 30 employees. Hippo B.V. tries to focus exclusively on being a software company with revenues based on providing training, maintenance, and support services. Hippo has a network of 12 European systems integration partners; there is one known systems integration partner in the U.S., but Hippo is in talks with others. Hippo sells different support packages with varying service level agreements including a 24/7 pager support package. In order to support the North American market (especially the West Coast), Hippo B.V. will need do some re-organization of its support team whose regular business hours have no hours of overlap with Pacific Standard Time. While most paying customers go directly to Hippo B.V. for support, the free support provided on the mailing list is quite good (as long as you accept the time zone offset). Hippo employees are responsive and helpful. Non-employees rarely answer questions. The mailing list just started to be archived on Nabble [http://www.nabble.com] and searching old posts is not very easy. This may be why mailing list subscribers are so understanding and answer the same questions over and over. The developer documentation is in wiki form (using Confluence) and is adequate for basic tasks. More advanced topics and examples, however, are noticeably thin. All content on the wiki is contributed by Hippo employees. Hippo B.V. has just opened up the wiki to public registration. Opening up the documentation to the general community may improve the state of documentation. User documentation is minimal consisting only of a single 25-page PDF, and the first two pages explain what the terms CMS and XML mean. Hippo offers both developer and user training. The team actively leverages blogs to communicate their ideas about the platform. CTO Arj Cahn has some very illuminating posts about ECM 1.0 in his blog [http://blogs.hippo.nl/arje]. They have also been very active in the Apache Cocoon and Slide communities for years. It will be interesting to see if they play a similar role in the Wicket and JackRabbit communities once they move the platform over. Hippo is in a state of transition as it moves from the Cocoon and Slide based 6.x series to the Wicket and JackRabbit based ECM 1.0. It is difficult to predict how the move to a new architecture will affect existing customers and Hippo B.V.'s ability to support them. The team has put forth an ambitious timeline for this move and progress is being made. In addition to incremental improvements to 6.x, ECM 1.0 is being developed and working code is available to install and use (not as a bundled binary but by building off of their source code control system using Maven).

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 136

Product Evaluations

Conclusion
Table 3.15. Hippo 6.05.02 Summary
Category Contributor Navigation Score Explanation The tree based navigation is used by several customers to manage very large news oriented web sites. The search engine and interface are also effective. Bulk search and replace is another useful feature; so is dependency management. The forms engine framework gives powerful control over form layout and user input validation. All code is managed on the file system. Hippo offers plugins to copy content from one repository instance to another. Hippo's model of wrapping the Hippo core in a custom application effectively keeps Hippo and customer code separate. The Java client library allows other Java applications to easily interact with the repository. Extending management application requires getting into Cocoon pipelines. ECM 1.0 is supposed to have a pluggable architecture. Hippo offers complete freedom on the delivery tier. Delivery tiers written in Java have the advantage of a Java client library that encapsulates connecting to the repository and handles cache invalidation notifications. Cocoon is no longer popular as a general purpose web application framework. The Hippo team is moving off of it. Everything else is XML and HTTP which are ubiquitous. Hippo B.V. is very open about its vision and progress. None The wiki is has a lot of useful information. The mailing list is very responsive. Most replies come from Hippo B.V. staff. Hippo recently started to archive its mailing list on Nabble. It will take a while for the archive to be a useful search tool. In the meantime, use Google against the Mailman archive page (site:http://lists.hippo.nl/pipermail/hippocms-dev/ ). Below Average; Average; Above Average; Exceptional.

Structured Content Configuration Management Customization Layer and API

Delivery Tier Flexibility

Widely Used Technologies Project Transparency Books Online Documentation User Forums

Key:

Nonexistent;

As a platform for managing structured content, Hippo has a lot to offer. The product has a highly configurable, feature-rich user interface that has demonstrated its effectiveness in large content volume scenarios. Versioning, dependency management, and workflow are all nicely handled. The flexibility on the delivery tier allows architects to select the most appropriate technology to support desired visitor facing functionality. As a technology stack, Hippo's move from Slide and Cocoon to JackRabbit and Wicket introduces a certain degree of risk. It is too early to start using ECM 1.0 and there is no way Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 137

Product Evaluations

to tell how hard it will be to migrate to ECM 1.0 when it is ready. While JackRabbit is now a top level project and has proven itself to be stable, reliable, and fast, Wicket is still pretty new. Not many Java developers have used it although there are books on the framework and, like many new things, Wicket has generated a lot of buzz. Mitigating the risk of the migration is that Hippo B.V. has several high profile clients on the platform and Hippo's success as a company depends on keeping them happy. However, one never knows how these major ports will turn out and there are plenty of examples of success and failure. Moving off of the Cocoon framework was a good decision and will bring the many benefits of a modern web application framework and allow Hippo to offer a more modular and more accessible architecture.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 138

Product Evaluations

Jahia Enterprise 5.0


Abstract
Jahia (pronounced J-Ya) is designed to span across portal, web content, and document management categories with an emphasis on usability and openness to integration. With its live chat interface and a new calendar plugin, Jahia's feature set would appeal to a company looking for an all-in-one collaboration platform. However, the product is also frequently used for externally facing content driven web sites. Jahia practices a tiered product model with a slimmed down free, visible source "badgeware" version called the Community Edition and some other commercially licensed visible source distributions: The Standard, Professional, and Enterprise Editions. Unlike other products in this category (Magnolia and Alfresco), production use of the Community Edition is not discouraged and Jahia offers support packages on the Community Edition. For the last five years, Jahia has been successful in gaining adoption by some large systems integration partners and customers. Architecturally, Jahia rides on the shoulders of many other open source projects including the Apache portal projects Jetspeed2 and Pluto, Lucene, Struts, Slide, and Hibernate. Jahia scales to support high traffic levels through a strategy of caching and clustering. It allows for authoring and publication from a single environment but also provides the ability to deploy a staged architecture of separate contribution and delivery tiers. The development model for customizing Jahia is designed around simplicity, efficiency, and directness, but may seem unconventional to traditional Java and CMS programming models.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 139

Product Evaluations

Project Overview
Table 3.16. Jahia Enterprise Project Overview
Web site: Project Inception: Current Version: Project Type: Licensing Options: http://www.jahia.org 1998 5.0 since September 2006 Commercial: tiered product model. Community Edition: Jahia has a limited functionality community edition licensed under a derivative of the Mozilla Public License with a "badgeware" addition. Commercial Editions: The more full-featured Standard, Professional, and Enterprise Editions have what Jahia calls a "Sustainable Source" license that is essentially a visible source commercial software license that provides rebates for code contributions that meet specific requirements. Geography: Jahia Ltd. is headquartered in Switzerland with a U.S. regional office in Washington D.C. The install base is concentrated in Europe. Large media sites, corporate intranets, corporate web sites Vodaphone Live (Germany) [http://www.vodafone.de/ vodafonelive.html] runs on Jahia. The Polytechnic School of Lausanne [http://www.unil.ch/] runs 500 web sites on Jahia. Generali Proximit [http://www.generali-proximite.fr/] runs on Jahia. Frameworks and Components: Integration Standards Java Support: Application Servers: Databases: Apache Pluto, Apache Jetspeed2, Apache Lucene, Apache POI, EHCache, FCKeditor, Hibernate, OpenJMS, Spring Framework, Struts, Zimbra AJAX libraries JSR 168, JSR 170 (partial: only for importing and exporting content via XML), LDAP, SOAP style API 1.4, 1.5 Tomcat, JBoss, WebSphere (6.1), Weblogic (8 SP5) HyperSonic, MySQL, MS SQL Server, Oracle, PostgreSQL

Common Uses: Sample Customers:

History
The Jahia product was originally built 1998 by a venture funded, Swiss-based company (called Xo3) and sold as a closed-source proprietary product. Jahia was designed to address the overlap and integration between portals and web content management. While the two product categories (Portal and WCM) have been converging from their relative starting points, Jahia approached the problem from the middle by building off of components from both sides of the spectrum. After a management buyout in 2002, the product was re-released under an open source strategy. Since then, Jahia has benefited from increased visibility and also the rapid distribution effect of open source. Free downloads lower the hurdle for companies to try their product and reduce the cost of sales. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 140

Product Evaluations

Jahia Ltd., which owns and maintains the product, has kept focus on being a software company and leaves the services business to integration partners. All development of the core platform is done by the Jahia team of 30 full time staff members, 25 of whom are engineers or architects. The Jahia project has no outside committers, although Jahia Ltd. regularly hires programmers (either on a contract or permanent basis) that are committers on some of the projects that Jahia builds off of. Jahia International staff are also actively involved in other open source projects. For example, Jahia's co-CTOs (Serge Huber and Thomas Draier) are committers on the Apache JackRabbit and Apache Slide projects, respectively. Jahia Solutions Group has been successful in working with integration consultancies in Europe and have a number of partners in their alliance program including Cap Gemini and Fujitsu. Jahia has been implemented for some large, high profile web sites. One of their leading clients is Vodaphone Live in Germany. Jahia has recently started to build momentum in the North America after establishing a regional sales office and a couple of R&D centers. Recent U.S. wins include United Nations, Abercrombie and Fitch, Virgin America, and Garmin. There has been considerable discussion as to just how "open source" Jahia actually is given that its flagship products (Jahia Standard, Professional, and Enterprise Editions) carry an essentially commercial software license and the Community Edition is released under a nonOSI certified Jahia Common Development and Distribution License (JCDDL). The JCDDL is based on Sun's Common Development and Distribution License (CDDL) derived from the Mozilla Public License and approved by the OSI in January 2005. The Mozilla license is fairly permissive about re-distributing bundled works and Sun's version has some modifications to make it more patent friendly. However, the JCDDL adds a requirement to display a "Powered by Jahia" badge on every page of the sites. Software distributed with this description is often derisively called "badgeware" and may not be acceptable for external web sites that do not want to advertise for Jahia. The "powered by" logo may not be a problem for less visible sites like a corporate intranet. However, unlike Magnolia and Alfresco, the commercial support packages are available for the Jahia Community Edition. With its tarnished open source pedigree, why is Jahia covered in this report? There are a few reasons for inclusion. First, the licensing may change. Second, two of the other commercial open source Java WCM platforms covered in this report (Alfresco and Magnolia) require (or at least strongly encourage) the use of the commercially licensed versions of their platform. Third, customers use open source for different reasons. While this licensing model may turn away an open source purist, using Jahia still provides a couple of open source benefits such as its use of open source libraries and components and shipping the source code with the commercial versions of the product. The "Jahia Sustainable Source License" (JSSL) that the commercial versions are sold under is a commercial license with some interesting nuances. First of all, the source is viewable by anyone, not just Jahia customers. This is more transparent than the typical small commercial independent software vendor and consultingware practice of making source code available to customers to reduce the risk of adopting a proprietary technology. Larger software companies maintain code escrow programs where a customer can access source code (for a price) in the event that the software company folds or ceases to support the product. What is most interesting about Jahia's commercial license is that they give credit off the license fees to customers that underwrite extensions or enhancement of the product. The contributions program is closely controlled. In order to qualify for a credit, the enhancement has to be on the Jahia roadmap and it has to be implemented by the Jahia team itself or by another certified Jahia partner working closely with the Jahia engineering team. These policies help Jahia set the direction of the application, control the quality of the contributions, and avoid non-compatible forking of the software. The benefit to the underwriting company (beyond the Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 141

Product Evaluations

licensing discount) is getting more of a say in the details of the design and implementation. Having the desired feature integrated in the software also reduces maintenance risk for the customer.

Architecture
Jahia is built on an open source software stack that includes Hibernate and a large collection of Apache projects: Struts, Slide, Jetspeed-2, Pluto, and Lucene. Struts provides a strong MVC framework for the presentation tier that combines content management, presentation, and system administration in one user interface for an in-context editing and management experience. Presentation templates are written in standard JSP with support for JSTL, and JEXL and Struts EL expression languages. Portal functionality and support for the Java portlet specification (See Glossary for JSR168) comes from the Apache Portals projects Jetspeed2 and Pluto. Pluto gives Jahia the ability to embed any third party portlet that meets the JSR 168 specification. The portal layouts and profile management functionality come from Jetspeed2. The overall platform is organized and managed in a collection of sub-projects: Enterprise Content Management Server, Document Management Server, Search and Indexing Server, Corporate Portal Server, Collaborative Suite, Business Process Management Server, and Cache Proxy Server. With the exception of the Proxy Server that is not available on the Standard version, all of these components come with all the various versions of the platform. The distinctions between the different versions are mainly at the discrete feature level. Jahia's use of Hibernate for database abstraction makes it compatible with most relational database management systems. Although the product ships with an embedded Hypersonic database, it is highly recommended to swap this database out with a more robust RDMS for any production instance of the application because Jahia is a database intensive application. The content repository leverages Apache Slide for WebDAV support (especially useful for Jahia's document management functionality) and an event model that can fire 30 different types of events when content is moved or edited. In an upcoming release, Jahia will replace Apache Slide with Apache JackRabbit which, in addition to maintaining WebDAV support, will make Jahia fully JCR compliant (JSR 170 and eventually JSR 283). For now, the JCR API is only supported for importing and exporting content in XML format. Architects familiar with upper tier WCM products will appreciate Jahia's multi-stage architecture: it can run on a single stand-alone server or on a clustered and tiered architecture. In the tiered architecture, different instances are designated for code development, content production and preview, and live publishing. A publishing framework pushes content from the staging environment to the production environment. For hightraffic sites and maximum availability, Jahia recommends a clustered configuration with one node dedicated to activities like publishing and indexing. In this configuration, Jahia servers broadcast the cache update requests to all the other clustered servers. Nodes can be added to the cluster without a system restart for rapid scaling to high traffic volumes. Communication between nodes in a cluster and other notifications uses multicast/UDP (user datagram protocol) managed by the embedded JGroups framework. The clustered deployments model is only available on the Enterprise Edition.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 142

Product Evaluations

Figure 3.58. Jahia Architecture Diagram: Distributed Architecture

Jahia has distributed architecture consisting of multiple environments for developing code, creating and previewing content, and production publishing. Source: Jahia documentation. Interestingly, content types are defined in the same JSP templates used to display the content. While this breaks the conventional wisdom of having lean view code that is easily managed by HTML developers, there are some practical aspects of this design. First, it means that all aspects of defining and displaying content is done in one place. When you add an attribute to a content type, you usually want to edit it and display it; in a more traditional CMS, you go into the content definition system (either a config file, a database table, an administration interface, or all of the above), build a form to edit the content, then go into the view templates and add code to display the attribute. Here you do it one place (or two places very near each other). Another reason why this is appealing is that because the work is done in a JSP, nothing needs to be manually recompiled or restarted even though this is a Java based CMS. The changes are activated the next time you load the page.

Figure 3.59. Jahia Code Sample: Content Type Definition

Content types are defined in the JSP code creating an unconventional coupling of structure and display while providing some convenience. In larger sites, putting so much control in the JSP templates - which should be the domain of front end programmers - could lead to chaos. HTML is busy enough without the help of additional JSP tags and content definition code could accidentally changed. Adding new Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 143

Product Evaluations

attributes may become so easy that they proliferate, leading to legacy fields that no one knows how to use. This should be addressed with strong change control and governance practices. Adding an attribute should be thoroughly thought through, not an off-the-cuff decision that leaves a legacy form field that nobody knows the use of. The trend towards CSS driven layouts may mitigate the risk of this poor separation between code and HTML. HTML is getting simpler and shifting focus of HTML designers away from the JSP to the CSS file.

Content Contribution
A Jahia instance can support multiple sites that can be independent or share content with each other. When defining a site, the administrator can control which page templates, portlets, and languages will be available to content editors. Once created, a site is organized into a hierarchical structure of "container lists" (ordered collections of containers). A container (also called a "content object") is a structured content type using a content definition within the JSP as described earlier. For example, a "container list" showing a collection of Links (the "containers") to sub-pages defines a navigation bar that will structure the navigation of the web site. When a user creates a new page in this navigation bar, he chooses a template that defines a set of content containers that can contain other editable content types. Containers are made up of editable attributes. Jahia has a full set of data types including various length text fields, multi-value list, date, color, file, and portlet (for JSR 168 compliant portlets). While the administrative user interface does not have controls to set validation rules (because all form handling is done by Struts and the Apache Commons Validator), configuring validation rules is a relatively simple development task. Also, Jahia content types support field level access control.

Figure 3.60. Jahia Screenshot: In-Context Content Management

Jahia practices an in-context management model. Jahia practices an in-context content management model. This decision is consistent with Jahia's focus on ease of use since most casual users find the in-context model more intuitive thanks to familiarity with tools such as Microsoft Word. While the browsing aspect uses the in-context model, the act of editing is done through a pop-up form. Forms are auto-generated from the content container definition. Where the in-context model tends to suffer is in the area of content reuse. Jahia addressed that limitation by adding some content picker and filter modules that allows a user to reuse content either explicitly (using the content picker to create Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 144

Product Evaluations

an alias or a linked copy of an asset or a whole branch of assets) or by query (using a filter module to define the filter that retrieves assets meeting certain conditions such as the 10 most recent press releases). The content picker module can be used to create linked satellite sub sites that share content with each other.

Figure 3.61. Jahia Screenshot: Forms Based Editing

Users use pop-up edit forms to edit content components. Contextual menus show where a piece of content is reused and warns a user if deleting an asset would have unintended consequences on other pages. Content is locked at the container level and lock status is indicated by color coded dots (yellow for locked), which are visible by authenticated contributors as they browse the site. Assets are also locked when they are in a review state of a workflow. By default, Jahia uses FCKeditor to edit rich text fields but other editors can be plugged in and made available to users. FCKeditor is one of the more full featured WYSIWYG and is well maintained for cross platform support. FCKeditor is particularly well integrated into the Jahia editing interface with good browsing functionality for links and image references. When content is saved, Jahia records relationships created through the WYSIWYG editor and manages dependencies. Jahia has a simple form builder functionality that allows content contributors to create interactive forms as content. Data collected by the forms is available through a reporting interface. The user has the ability to select which form controls to display, whether the field is required, and available values. This functionality is useful for contact forms and simple data collection, and is pretty good for an end user tool. For managing binary assets such as documents and images, Jahia's integrated Document Management services support a WebDAV interface. Individual assets can be dropped in using Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 145

Product Evaluations

Windows Explorer or another WebDAV client. Jahia can automatically explode the contents of an uploaded zip file so these assets can be individually managed. Indexable assets are indexed when they are added or modified on a page but metadata are not addable until the file is wrapped in a container. Each virtual site has a folder structure that includes a common shared folder and private areas for individuals and groups. Each individual or group folder has private and public folders to determine access to other users. This folder based access control model is adequate for simple uses, but the need to move an asset to change permission has its limitations. All text based content is indexed by Jahia's Lucene-based, on-board search engine. The base product comes with file extractors such as Apache POI for indexing binary formats such as PDF or the Microsoft Office formats. Simple and advanced search interfaces are powerful enough for most common uses. Although there is a saved search feature, Jahia's out-of-thebox search functionality is not suitable as an ad hoc content reporting tool because there is no field level searching. However, the individual fields are indexed and exposed through Jahia's search API so this is possible with some systems integration effort.

Figure 3.62. Jahia Screenshot: Advanced Search Form

Jahia's advanced search form allows users to define complex full text searches without the need to know a specially query syntax. Search definitions can be saved for later use. However, a user cannot restrict the search to match within a single custom field. Because of its common use in intranets, Jahia emphasizes its document management functionality and supports WebDAV management of binary files and containers that can store metadata attributes. Additional collaboration features like embedded chat and email notifications round out Jahia as a collaboration platform. More can be added by downloading them from the community site. Examples include a discussion forum, a calendaring server, and some RSS feed readers. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 146

Product Evaluations

Jahia has decent page level versioning that allows an authorized user to revert back to a previous version of an asset or restore a deleted asset. The versioning capability shows differences between two versions and also the consequences of restoring a deleted version.

Figure 3.63. Jahia Screenshot: Version Differences

Jahia's versioning system provides a "difference" view showing the changes between two versions of an asset. Jahia's localization system is built on the parallel model where each asset can have multiple translations. When you define a site, you select which languages the site will support and therefore the languages that assets can be translated into. When an asset is not translated, Jahia can optionally display the asset in its default language. The management interface of Jahia is maintained in six languages (English, French, German, Italian, Portuguese, and Spanish) and customers are able to translate the UI into other languages by adding Java resource bundles. The workflow model is simple but well integrated with the localization functionality. Assets are approved or rejected on a per language basis. Workflow state is shown with color-coded dots. Red indicates an editing state; yellow means that the asset has been submitted for approval; green means the asset has been approved and published. Assets cannot be edited when under approval. If a reviewer wants to edit an asset that has been assigned to him for review, he must reject the asset, edit it, and then approve it again.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 147

Product Evaluations

Figure 3.64. Jahia Screenshot: Workflow Approval Page

The Jahia workflow approval interface allows a user to approve or reject multiple assets in multiple languages. If this example site had more languages, there would be more approval columns like the British flag shown here. Workflows can be assigned by section or by individual assets and determine what users or groups get notified when the asset has been submitted for approval.

Development, Configuration, and Administration


Jahia's multi-environment support helps with the management of synchronization between environments for development, QA, and prodution. As mentioned earlier in this evaluation, defining content types is done within the JSP and that has its advantages and disadvantages. There is a well documented procedure for setting up Eclipse and Maven for managing development environments. The configuration uses Maven to build and deploy code. The access control functionality built into Jahia is generally strong enough and meets the needs of most intranet uses. Permissions are granted along the lines of user, group, or everyone hierarchically down the content tree similar to filesystem access control. Permissions can also be applied at the field level. This feature is useful when you have categorization fields that you only want particular users to be able to edit.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 148

Product Evaluations

Figure 3.65. Jahia Screenshot: Field Level Access Control

Jahia supports field level access control. While the feature is powerful and useful, the administrative interface to make these settings will confuse anyone who has not served as a Unix or Linux system administrator. A common configuration for corporate customers is LDAP integration that allows Jahia to authenticate users against any LDAP compliant directory. Jahia really only uses LDAP in read only mode for authenticating login credentials. Users only edit their Jahia managed profile information from within the Jahia UI but not their LDAP directory profile information.

Presentation
A Jahia instance can support multiple sites. Each site is defined by a set of templates, users, groups, portlets, a site key, a set of languages, and a host name. Sites can share content with one another and can also be created as derived copies. Because each site gets its own branding and access control, Jahia's multi-site functionality makes it useful for multiple departmental sites on a corporate intranet. Jahia's presentation functionality is based on the portal model. Each page is a collection of "containers," which are actual portlets or behave like portlets. Using the Apache Pluto portlet container, a Jahia page can host any JSR 168 compliant portlet complete with edit and public views. Jahia also ships with a few native portlets including a web-clipping portlet that can be used to embed any remote web based content or application. This technology works as a proxy to grab pages from the target application and return them within the Jahia page. Transactional applications can also be integrated in this way although the additional layer of rendering may slow performance. The Jahia team is currently working on an Ajax based Netvibes/iGoogle-inspired personal portal interface using the AJAX Google Web Toolkit libraries, but this feature is still in Beta. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 149

Product Evaluations

Users can create their own portal pages and organize components by dragging and dropping them into position.

Figure 3.66. Jahia Screenshot: Personal Portal Page

Jahia is working on a Netvibes-inspired personal portal framework (currently in Beta). The use of different engines and portlets makes Jahia a powerful and flexible web application development platform. Jahia includes engines like Search, Advanced Search, Sitemap, XML Import and Export, and Workflow. Developers can add their own engines as well as develop applications that can run as JSR 168 compliant portlets. The URLs of a Jahia site are based on its underlying MVC (Struts) architecture. At the start of the URL path is the engine name (such as jahia) that corresponds to a Java class named jahia_Engine.java. Then, like with most portals, URLs get ugly. A typical URL may look like / jahia/Jahia, site/mySite/pid/10. Jahia would interpret this URL as using the Jahia engine, the "mySite" site, and page number 10. Of course, what page 10 is about is anyone's guess. While human readable, search engine friendly URLs are not supported, Apache mod_rewrite or a Java URL rewrite filter can turn this URL into something like /Jahia/mySite/page_10.html better, but not necessarily descriptive. Jahia templates are written in standard JSP with tag libraries and scriplets (when necessary). This lowers the learning curve for most Java developers and provides lots of flexibility, but does not enforce the practice of keeping Java logic out of display code (as technologies like Velocity, FreeMarker, and XSLT do). Because of this, the separation of logic and display must be enforced by voluntarily accepted coding standards and code reviews. Otherwise, Jahia presentation templates could quickly become complex and unwieldy. Jahia uses Java's standard resource bundle framework for localizing, labels, messages, image references, and other localized strings so they are not hardcoded into the template. JSP code is packaged into a .jar file and deployed to the Jahia environment. Of all of the products evaluated in this report, Jahia has the richest set of community and collaboration functionality. Part of this is because of its common use as a platform for building intranets. Another advantage is its portal based architecture that provides a framework for Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 150

Product Evaluations

profile management and a container for building and deploying interactive applications. The portlet exchange could turn into a valuable resource for Jahia customers and provide energy similar to what is enjoyed by the Drupal and Plone projects. However, the non-open source business model will hinder Jahia's popular appeal with non-profits and community oriented sites. Jahia's large corporate customers may be less able to share their intellectual property with potential competitors. Jahia hosts a developer exchange site where the community can submit portlets, templates, and other code for the community to share. Many submissions are from Jahia Ltd. although there are some third party contributors represented in the catalog, as well. The portlets submitted by Jahia are published under a true open source license (Sun's Common Development and Distribution License [http://www.sun.com/cddl/]). A status rating indicates the contributors assessment of whether the contribution is stable, unstable, or Beta quality. Jahia's multi-layered caching system is quite powerful. Pages can be cached for different users and groups and there is support for the ESI (JSR128) [http://www.akamai.com/html/support/ esi.html] standard for caching page fragments. Underneath the page layer, data caching services are provided by the Hibernate object relational mapping layer. The largest Jahia site gets 500 hits per second on each of its three clustered nodes.

Delivery and Support


All of the development on the Jahia platform is done by Jahia's engineering team and by SIs contributing sanctioned enhancements. Jahia's "sustainable source" commercial licensing model encourages customers and systems integrators to work collaboratively to enhance and extend the platform. Customers participating this program get discounts on licensing fees. Platform testing or documentation enhancements may also be considered a contribution. Because most of the Jahia install base is on the commercial version of the product and pay for commercial style support, there is not much activity on the public mailing list. Most of the questions and answers go directly between customers, Jahia integration partners and the Jahia support team. There is no real community around the free Community Edition. If you are considering deploying Jahia, expect to purchase a license and support agreement. Jahia sells support packages with different SLAs ranging from $3,490 to $30,000 per year. It is also advisable to work with a certified integration partner. Unfortunately Jahia's SI network in North America is still in early stages of development and there are not many to choose from. Jahia is making a concerted effort in this area and it will be interesting to see which consultancies they wind up partnering with. It will also be interesting to see if individuals and small consulting companies try to build a business around the open source Community Edition and whether they can form a usable support network for it. Documentation on the product is fair, not exceptional. There are no professionally written books on the platform nor will there likely be in the near future. However, the underlying technologies, such as Spring, Struts, and Hibernate, are very well documented and widely used. The use of simple JSP for presentation templating removes another common obstacle in learning a WCM platform. Jahia Ltd. has been steadily growing and has shown financial stability and minimal debt. The staff of 30 makes it roughly double the size of most of the commercial software vendors covered in this report with major the exception of Alfresco that is heavily funded with venture capital.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 151

Product Evaluations

Conclusion
Table 3.17. Jahia 5.0.3 Summary
Category Contributor Navigation Structured Content Score Explanation In-context editing is intuitive for most users and Jahia has addressed content reuse issues typically associated with this model. Content objects, called containers, are defined by templates that specify fields and their data types. The rich text editor is particularly well integrated. Access control can be set at the field level. Creating validation logic is more difficult than other products in this category. Overloading the display templates to define content types may be convenient for smaller sites but may become unwieldy on large complex sites. Jahia's robust replication model makes it possible to push settings made in the through the web administration interfaces to QA and production environments. Jahia's event listener framework is a powerful mechanism to wire in custom code. Portlet support provides a container to build and integrate custom applications. Jahia's use of portal technology is both an advantage and a disadvantage. The portal architecture is flexible and conducive to building new functionality and integrating existing applications, but it is just one way to solve the problem. Jahia customers are somewhat locked into the portal approach. Luckily, it is not a bad approach. Jahia has done an admirable job of selecting popular technologies and keeping the architecture current. Jahia's non-standard licensing approach and the vagueness of the "sustainable source" model are less transparent than your average open source project. However, Jahia is not a full open source product and the fact that non-customers can see the source code base and Jahia makes all its documentation public (even in draft form) make information much more available than commercial products. None Wiki and PDF Guides for users, administrators and developers. There is a mailing list but most of the support appears to be delivered directly to paying customers. Below Average; Average; Above Average; Exceptional.

Configuration Management

Customization Layer and API Delivery Tier Flexibility

Widely Used Technologies Project Transparency

Books Online Documentation User Forums

Key:

Nonexistent;

In many respects, Jahia is a unique product in this space. Most interesting is the approach to combine portal and content management functionality from the beginning rather than to start on one side and then work over to the other. The result is a well designed, well executed hybrid of these two types of applications. As a development platform, the portal model provides great flexibility to build functionality and integrate with other systems. The tight integration with Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 152

Product Evaluations

content management services allows Jahia to avoid many of the trappings that other contentoriented portals suffer: poor link management and simplistic cache management, for example. However, the portal model is not ideal for every problem and those looking to use different frameworks on the delivery tier will find Jahia limiting. The other way in which Jahia is unique is in its licensing model. For practical purposes, Jahia is more of a "visible source" commercial software application than a true open source one. Unless you mind having a "powered by Jahia" logo on every page, expect to purchase one of the commercial products. However, those companies that use Jahia for their intranets can enjoy using the software for free and still have the option to buy support if they need it. For most North American customers, Jahia is probably still a new name. The establishment of a U.S. based office and some recent (although not yet public) client wins may change that soon.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 153

Product Evaluations

WCM Framework Market Summary


Of the three products in this category, most North American buyers will most readily recognize the name "Alfresco" thanks to the media attention that the company has been able to attract. Market timing surely played a role in Alfresco's buzz and the fact that the company started on a global scale with internationally known founders was also important. Hippo and Jahia are not as familiar and will probably never achieve the same brand-name status. They have grown organically from regional origins and are only now starting to branch out onto an international stage and aggressively target North America. Both of these products have a good deal of potential in the North American market. The market is hungry for solid and open platforms on which to build content rich, dynamic sites. Potential buyers that do stumble across these products will be impressed with large scale European reference accounts. Alfresco and Hippo are the best suited to solve the problem of managing content for decoupled custom delivery tiers. Both products effectively support core content management services but can "stay out of the way" of developers trying to innovate on the presentation side. Some companies like this approach because it gives them so much freedom. Others, with less appetite for custom development, will be disappointed with the amount of development work needed to implement these platforms. Media companies, in particular, have already and will continue to embrace these technologies as they create pluggable architectures with highly customized delivery tiers. Out of the box, Alfresco provides a more modern and flexible architecture but much less in the way of user facing functionality. Companies that tend to be dissatisfied with anything but complete control over the user interface will appreciate Alfresco's ability to support custom user interfaces. Customers who would prefer to focus their development resources on the delivery tier will get more out of Hippo whose management user interface is more mature and supports a rich feature set. The primary question with Hippo, is how smooth the transition to a new technology stack will be. Another differentiator between these two products is price: Alfresco locks the customer into a relatively expensive commitment to subscribe to the maintenance network and support packages are extra. Customers looking for a more end-to-end web application development platform and who are open to adopting a portal style architecture will be interested in Jahia. While Alfresco has often been looked to as the Java analog to the popular Zope-based WCM platform Plone, Jahia is much closer from a functional perspective. Like Plone, Jahia provides a great deal of outof-the-box functionality, ease of use, and a "portal oriented" delivery framework (although Plone is less of a true portal, it has many portal aspects). If Jahia is able to sort out its open source messaging, it has the chance of building a very broad user base. As Plone's adoption indicates, the demand is certainly there.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 154

Product Evaluations

Table 3.18. Informational Brochure Score Summary


Category Contributor Navigation Structured Content Configuration Management Customization Layer and API Delivery Tier Flexibility Widely Used Technologies Project Transparency Books Online Doc User Forums Scoring Key: Exceptional. Nonexistent; Below Average; Average; Above Average; Alfresco Hippo CMS Jahia

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 155

Product Evaluations

Round Up
In the last five years the open source Java WCM market has grown from a disappointing collection of small niche projects to a set of legitimate options for enterprise buyers. General interest in open source, as well as the marketing efforts of Alfresco and other commercial open source vendors, has brought attention to this sector of the market. The selection is the broadest for customers looking to build basic informational and moderately interactive web sites and architects trying to plug content services into more elaborate web sites and web applications. Community oriented functionality is generally lagging in terms of what is provided out-of-the-box. It will probably continue to be a weakness on the Java stack because many of the community oriented sites are being built on technologies like PHP and Ruby On Rails. Still, Java WCM technologies may have a role in these lighter weight architectures by providing back-end content services and other basic infrastructure. For example, customers are building highly interactive applications in PHP using Alfresco's PHP API and Web Scripts and one of the primary Hippo customers is thinking of building their delivery tier on Ruby on Rails. Despite the temptation, consolidating open source products together as one category of software is a mistake because these products are so different. Of the seven platforms evaluated in this report, six (all but Apache Lenya) can be supported by commercial style support and maintenance agreements. Four (Alfresco, Jahia, Magnolia, and OpenCms) can be purchased as commercial software applications. Alfresco and Jahia operate the most like commercial software companies but Magnolia and OpenCms also sell commercial Enterprise Editions and can deliver a commercial software customer experience. These commercial open source companies encourage their customers to engage in their open source communities, but it is not a requirement and many of their customers do not. Daisy, Hippo, Apache Lenya, Magnolia, OpenCms and, (if you dont mind the badge) Jahia all have free versions of the software that can realistically be used in mission critical production environments. Customers that have the knowledge and bandwidth can potentially save money by self-supporting the software or buying consulting support when needed. Daisy, Hippo, Jahia, and OpenCms will give the option to support their free versions. Magnolia users must convert to the Enterprise Edition to qualify for a support package. However, since the code base is the same, this does not require migrating their implementation to another version of the software. If your company has the potential to execute a well run software implementation project and self-support the solution, a product like Hippo CMS or Jahia Community offers the potential cost savings in the realm of $150,000 in up front licensing and $30,000 per year for maintenance and support. Alfresco is the costliest of the products in this report, but typically competes against the most expensive commercial products. Products in the informational brochure category can save between $30,000 and $60,000 in licensing plus $6,000 to $12,000 in annual maintenance. The savings is less because commercial competitors at this level are cheaper. Companies with less aptitude and appetite for owning the technology can still save money as the support and licensing costs are considerably cheaper than the commercial software analogs. For the customers that would like to delegate more responsibility for maintaining the application, success usually hinges on connecting with the right systems integrator. In the U.S., Alfresco has the most options of qualified integration partners. In Europe, OpenCms has the largest community of independent integrators that bring the product into accounts that they win. Magnolia's market presence is growing rapidly. Daisy and Hippo can field capable implementation teams, but their networks are much smaller. The Lenya Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 156

Product Evaluations

community is dwindling to a few independent contractors and small systems integrators that still build sites on the platform.

Comparing with Commercial Products


As the market matures, open source products are finding their way into more selection short lists and compete head to head with commercial products. Open source solutions are also displacing commercial products within companies that face an expensive and risky upgrade to their existing platform. What follows are some commercial products that the open source products in this report frequently go up against in software selections.

WCM Frameworks: Open Source vs. Commercial


Alfresco, with its commercial orientation and name recognition, is the product that is the most frequently considered by commercial buyers in the U.S. However, because the WCM product is so new and requires so much custom development, Alfresco usually falls off the list unless the prospective customer has the resources to cost-effectively invest in the platform. Alfresco is frequently seen on selection lists with products like Interwoven TeamSite, Documentum, and Vignette, and usually does quite well because the integration costs of these commercial Enterprise Platforms is also high and the Alfresco technology is much better. Interwoven and Vignettes neglect of the media and publishing industry has created many opportunities, like Sony and Electronic Arts, that Alfresco is winning. The fact that several Alfresco employees come from these competitors also gives them an advantage because they are able to talk directly to the weaknesses of their commercial competition. In The Netherlands, Hippo is seeing success against products like the Tridion. Customers who have selected Hippo point to the Hippos support of standards and Tridions aging Microsoft based architecture. Tridion, with its acquisition by localization specialist SDL [http:/ /www.sdl.com], has narrowed its messaging of the product to its localization capabilities. Tridion also has been actively building and promoting a dynamic presentation tier. If the customer has fewer than twenty or thirty large sites it needs to manage and wants to have an independent delivery tier, Hippo has a good chance of getting chosen. Jahia sees interest from both portal and WCM buyers. From a functional perspective, the product matches up nicely with FatWires Content Server and Vignettes Portal in that the coupled, dynamic tier serves as a powerful foundation for building interactive applications. Customers looking at historically "baking style" (see Glossary for "Baking and Frying")products that have more recently added dynamic presentation tiers (such as TeamSite's LiveSite, RedDot's Live Server, and Tridion) will be attracted to Jahias cleaner, better integrated, and more modern architecture.

Informational Brochure: Open Source vs. Commercial


The basic interactive, informational brochure market, where Daisy, Magnolia, and OpenCms all compete, seems to be organized around systems integrators. Buyers in this market are not buying a technology so much as a complete web site solution that includes implementation, project management, and (frequently) design services. Systems integrators go to market and compete with a WCM platform that allows them to deliver value to their customers. Open source is an attractive option for a systems integrator because the cost savings on licensing can make them more competitive against integrators proposing commercial offerings. Daisy, Magnolia, and OpenCms most commonly go up against mid-to low-market commercial products like Ektron that also have strong integration partner networks. Other commercial vendors in this space include Serenas Collage, Hannon Hill's Cascade Server, PaperThin's Common Spot, and many others. This market is crowded primarily because there is so much Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 157

Product Evaluations

opportunity. Every company, non-profit and institution needs at least one online informational brochure to explain what they do. Other than Lenya, which has been in a state of limbo for a couple of years, Daisy has the smallest market presence of the products reviewed in this category. A major reason is its foundation on the complex Cocoon platform which limits the number of SIs capable of integrating the software. Still, through its affiliation with Schaubroeck, the install base is stable and growing. What is most interesting about Daisy, however, is its range into the wiki and knowledge base applications that makes it somewhat of a unique offering. In this capacity, Daisy competes very favorably with commercial wiki products like Confluence [http://www.atlassian.com/software/confluence/], Traction Softwares TeamPage [http:// traction.tractionsoftware.com], and MindTouchs Deki Wiki [http://wiki.mindtouch.com/]. With sophisticated access control, its faceted navigation, and support for structured content types, Daisy has better potential for managing persistent knowledge resources than traditional wikis that tend to excel for temporal collaboration spaces.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 158

Product Evaluations

Selecting a CMS and Beyond


As open source software becomes commercialized and works its way into the main stream, it will become easier to compare open source products to commercial ones. Many of the products in this report have made great strides in making their products more accessible to traditional software buyers and see themselves as competing against commercial products more than against other open source products. At a fundamental level, all of the best practices of commercial software selection carry over to open source software: understanding requirements, commitment to process, attention to detail. However, there are some aspects of the open source model that continue to confuse the commercial buyer: lack of analyst coverage (this report not withstanding), new concepts in licensing, the potential to interact with a community, and more options when it comes to support. Additionally, most commercial open source companies are young and small and look like small commercial independent software vendors that large company buyers are generally afraid of. The fact that the code is open source mitigates the risk of a vendor going out of business, but it is still difficult to tell to what degree the software will survive the loss of its primary contributor. The evaluations in this report characterize the level of activity and involvement within the communities using the software and provide a good starting point. However, experience will vary depending on your geographical location and how well your intended use of the software lines up with that of the community. As part of the selection process, analysis of the open source projects on your short list should include interacting on the forums and user groups and talking with potential systems integrators. You may find someone using the software in your industry for a similar purpose who is willing to share information - perhaps even collaborate. The dynamic of a community may allow you to connect directly with a peer rather than rely on the software vendor as an intermediary. As you go through a software selection process, consider the different opportunities that may be afforded to you because the software is open source. Depending on the products under consideration, you may be able to contribute modules or extensions that a community will share the burden of maintaining. You may be able to work with the vendor to sponsor a feature in the core application and reduce the risk of building it on your own. Different projects have different mechanisms for contributions. If you select a commercial open source product, you may very well exercise the option to not participate; many customers do have passive relationships with these projects. If you plan to contribute in some way, however, it is a good idea to establish a set of guidelines and expectations. Your in-house developer or implementation partner should understand what you hope to get out of the community and what types of investments are acceptable. If it is worthwhile for your company to have a visible role in the community, your representative should be given the proper incentives to budget time accordingly. Few companies actually do this. Most active community participants are not directly encouraged by their employer and engage out of their own interest. This type of participation may be creating value for their companies that is not recognized and may not be fully leveraged. (Conversely, their time investment in the community may come at the expense of more important responsibilities.) That said, the same issues exist with traditional software companies who ask their customers to provide references and go to conferences share their successes. However, the potential of time spents on an open source project is usually greater that that of commercial products.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 159

Product Evaluations

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 160

Glossary
Baking and Frying "Baking vs. frying" refers to when presentation templates are applied to render pages out of structured content. Baking style rendering systems generate pages when content is published. Frying systems generate pages on the fly when they are requested by the end user. Whether a system bakes or fries content tells a lot about its architecture and what it is good at. Baking systems are great for high volume sites that do not need to personalize content. Frying systems excel when requirements include personalization, access control, and other presentation logic that uses information about the user in order to decide what to show and how. BPEL (or Business Process Execution Language) is an XML language for defining workflows. Workflow engines read in BPEL definitions and use them to drive workflow logic. BPEL is most commonly used in service oriented architectures to orchestrate processes across different de-coupled services. BPEL can also be used within an application to coordinate workflow states, events, and transitions. CForms (or Cocoon Forms) is a form handling system for the Cocoon web application framework. Although very powerful, CForms are more complex than the form handling systems of the other general purpose web application frameworks. The core design is that the programmer defines a "model" that describes a form as a set of form control widgets that will be presented to the user. Then the developer writes a template that controls how the form is displayed to the user. While much functionality can be achieved by writing minimal amounts of Java code, the amount of XML that one does have to write can be very complex. In most open source projects, only a few trusted developers have rights to check-in (or "commit") code updates to the source code repository. The people with this "commit" status are called committers. Noncommitters can submit patches to the code base and their submissions are reviewed by committers who either accept or reject them. Depending on the size of the project, the committer team can be small or large. Different governance structures have different ways of selecting committers. FreeMarker is a templating language that tries to do a better job of separating business logic from layout than JSP. Unlike JSP, FreeMarker prevents a developer from writing scriptlets or other procedural code in the template. The developer is forced to call Java classes or use an expression language like JEXL. The value of FreeMarker has diminished somewhat with improvements to JSP such as improved tag libraries like the Java Standard Tag Library (JSTL). Maven is displacing Apache Ant as the industry standard for scripting automated builds. More sophisticated than Ant, Maven also checks against remote code repositories to pull down the appropriate libraries. When Maven works, it works like magic; when it doesn't work, it is a Version 1.0, Workgroup License Page 161

BPEL

CForms

Committer

FreeMarker

Maven

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Glossary

programmer's worst enemy. If you have spent any time on a mailing list or IRC channel of any Java project, you probably have heard complaints about Maven. JCR The JCR is a relatively new Java standard (JSR 170) that defines a repository for managing content. The JCR is well suited for semi structured content that is hierarchical in nature. Unlike relational databases, JCR's natively support content management specific functions like versioning, workspaces, and content deployment. There are various levels of JCR support. The important distinctions are that level one is read only access, level two is read and write and specifies an access control model, and there are some optional features like observation (where you can monitor a set of assets and then be notified if there is a change). The JCR specification has not yet enjoyed widespread adoption. The biggest proponent is Day Software, whose CTO David Nscheler is the specification lead. Day also has a number of their own developers working on the reference implementation, Apache JackRabbit. Day also sells a commercial JCR implementation called CRX as well as JCR adaptors for other repositories like Documentum, FileNet, Lotus Notes, TeamSite, Sharepoint, OpenText Livelink, and Vignette. Outside of Day, however, use of the JCR has been limited. The big news for the JCR community is that Oracle now supports the JCR standard with the Oracle 11g XML DB product. There is more JCR interest within internal corporate software engineering departments that are building custom systems and want to reduce risk by sticking to standards. If the JCR is to become a truly successful standard, it will require these corporate architects putting pressure on software vendors like Mark Logic to support the specification. JSF JSF, or Java Server Faces, is a Java standard for a web programming model that is similar to .NET. The basic idea is that there is an event model that triggers "code behind" at the server. JSF implementations (such as the popular Apache MyFaces [http://myfaces.apache.org/]) generate large amounts of HTML to build data-bound HTML controls that have Javascript to trigger server side methods through an HTTP post. While all this code generation greatly increases developer productivity, it adds complexity under the covers. Tool vendors like the idea of JSF because it allows them to provide a WYSIWYG programming environment similar to Visual Basic, where a developer can drag controls onto a panel and set bindings and properties. JX Template is the official templating language for the Cocoon framework. Easier to understand than its predecessor, XSP (eXtensible Server Pages), JX Templates use an XML based syntax for basic conditional logic and control flow but try to limit the amount of business logic written in the template. JSR 168 defines a standard interface that a Java portlet can implement in order to be able to run within different portal products. JSR is supported by all of the major Java portal products and the Apache Pluto project provides a portlet container that can run a JSR 168 compliant portlet in any Java web application. The portlet standard is now being improved by JSR 283. Version 1.0, Workgroup License Page 162

JX Template

JSR 168

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Glossary

REST

REST, Representational State Transfer, is a slimmed down web services style architecture. Unlike classical web services that communicates over the SOAP protocol, a REST interface consist of a URL based API accessible via HTTP GET or POST actions. These methods return XML documents or execute other functions on the services. You could think of the web itself as being REST based with the response being simple HTML (or preferably XHTML) documents. Without all the overhead of building SOAP packages, REST APIs are easier to develop and are generally preferred by pragmatic programmers. Most of the major web service providers (like Amazon) are seeing much greater adoption of their REST interfaces than their SOAP interfaces. Jakarta Slide is an Apache project to implement the WebDAV protocol. Slide is actually the reference implementation for the WebDAV standard. Slide is a mature and robust platform capable of high performance and large volumes of content. A Sprint is an event when a group of open source developers get together to do some major work on the platform. The duration is anywhere from a day to three or four days. Sprint's usually have an established theme that describes the scope of work that will be attempted. Often a company that has an interest in building this functionality will sponsor and host the sprint. The term "sprint" is shared with various Agile development methodologies and there is a considerable amount of overlap in process. At the start of a sprint, the leaders communicate a game plan and organize the team to work on various tasks. The duration is a fixed amount of time. Scope is variable. Open source sprints also usually follow the Agile methodology practice of pair programming where developers work in teams of two on a single computer. In addition to all the work that gets done, there are important sidebenefits of sprinting. Information sharing, identifying leaders, and creative problem solving all result when people with different experiences and backgrounds work closely together. Many developers feel they have learned most of what they know from participating in sprints. There are also positive social aspects of sprints and developers travel from all over the world to participate in exotic locations. The deal is that they (or their employer) pay for their travel but, once they are there, their food and lodging are taken care of by the sprint organizers.

Slide

Sprint

Velocity

Like Freemarker, Velocity is a templating language that tries to do a better job than JSP of separating business logic from layout. Unlike JSP, Velocity prevents a developer from writing scriptlets or other procedural code in the template. The developer is forced to call Java classes or use an expression language like JEXL. Velocity is an Apache project and enjoys a wider install base than FreeMarker. Velocity is used in some commercial CMS applications such as Rhythmyx or Clickability's cmPublish product. The value of Velocity has diminished somewhat with improvements to JSP such as improved tag libraries like the Java Standard Tag Library (JSTL). WebDAV (Web Distributed Authoring and Versioning) is an extension to the HTTP standard that allows documents to be updated over the web. WebDAV is a very important standard because it is used by many Version 1.0, Workgroup License Page 163

WebDAV

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Glossary

technologies. If you are Microsoft Windows user and you are connecting to "Web Folders," you are connecting a WebDAV server. WebDAV support is well supported by many desktop applications including Microsoft Office. Many CMS vendors use WebDAV as a way for users to easily drag files from their hard drive into the repository. WSRP WSRP (or Web Services for Remote Portlets) is an OASIS XML standard for how portlets can communicate to back end services through web services. Unlike JSR 168, WSRP is technology agnostic because it is a communication protocol rather than a programmatic interface. A JSR 168 portlet can talk to a web service over WSRP. XPDL (or eXtensible Process Definition Language) is an XML format that allows graphical modeling tools to store and exchange workflow process definitions.

XPDL

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 164

Colophon
This book was written in DocBook format Oxygen XML Editor. The content was Transformed into PDF using XSLT based on Norm Walsh's DocBook XLT examples.

Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.

Version 1.0, Workgroup License Page 165

You might also like