This action might not be possible to undo. Are you sure you want to continue?
Release 2.0 September 2009 Seth Gottlieb, Content Here, Inc.
© 2009 All rights reserved. Content Here, Inc.
Open Source Web Content Management in Alfresco
Single Product Review Version 2.0
Copyright © 2009 Content Here, Inc. All Rights Reserved. Not for Redistribution.
License Agreement and Disclaimer
This report is licensed under a "Workgroup License" that allows your company to make this report available to up to ten (10) staff members. It may not be shared with customers or copied, reproduced, altered, or re-transmitted in any form or by any means without prior written consent. Any rankings or scoring information may not be used in promotional materials.
This report is intended to be an overview of the technologies described and not a recommendation or endorsement of a specific platform or technology strategy. The most appropriate platform for your use depends on the unique requirements, legacy architecture, and technical capabilities of your organization. Content Here, Inc., cannot ensure the accuracy of this information since projects, vendors, and market conditions change rapidly. Content Here, Inc., disclaims all warranties as to the accuracy or completeness of the information in this report and shall have no liability for errors, omissions, or inadequacies in the information presented.
Credits and Acknowledgements
Cover design and page layouts by LAC Design XLST work by Patrick Liddy
OPE N S O UR CE WE B C O N T E N T M A N A G E M E N T I N A L F R E S C O | V E R S I O N 2 . 0 | S E P T E M B E R , 2 009
Alfresco Enterprise 3.1 WCM
Alfresco Enterprise is considered a true Enterprise Content Management (ECM) platform because of its capabilities in all the disciplines of content management: Document Management, Digital Records Management, Digital Asset Management, and Web Content Management. Similar to other ECM products, Alfresco sees its value in bringing together and harmonizing content across an organization, not for its excellence in any one area. Also, consistent with its ECM industry peers, the Alfresco solution is expensive. While there is a free "Community Edition," it contains experimental features and neither Alfresco nor its partners will support it. Customers that use the Enterprise Edition can expect to pay between $20,000 and $50,000 in mandatory annual subscription fees. The underlying philosophy behind the product is to provide a powerful content repository and a robust application programming interface (API) for building content-centric business applications. Alfresco delivers very well on this promise. Alfresco's repository supports advanced functionality (such as user sandboxes, event based content rules, and virtualization) historically seen only in upper tier commercial software. Recent releases of Alfresco have introduced more elegant and powerful APIs and a new framework, "Surf," for building content enabled web applications. Alfresco is particularly attractive to architects building unique, custom applications. The resulting solutions tend to employ elaborate architectures and hefty helpings of custom code which are sometimes overkill for the business problems they are trying to solve. Over the past year, the Alfresco team has been pursuing the knowledgeworker collaboration market that was exposed by the success of SharePoint. Indeed, most of the software company's engineering bandwidth is focused on its new "Share" application. Like SharePoint, Alfresco's support for traditional web publishing has been neglected. While the underlying APIs are available, Alfresco's user interface is clearly not designed for use as a stand alone, best of breed web content management system. In fact, many systems integrators build their own contributor user interfaces using using Web Scripts, Alfresco's highly productive server side scripting interface.
© 2009 Copyright Content Here, Inc. | All Rights Reserved. Not for Redistribution. | www.contenthere.net
OPE N S O UR CE WE B C O N T E N T M A N A G E M E N T I N A L F R E S C O | V E R S I O N 2 . 0 | S E P T E M B E R , 2 009
Table 1. Alfresco Enterprise Project Overview
Website: Project Inception: Current Version: Project Type: Licensing Options: http://www.alfresco.com 2005. WCM launched in 2007. 3.1 since April 2009. Commercial: tiered product model. GPL with a FLOSS exception. The Enterprise Edition distribution requires an annual per CPU subscription ($15,000 - $20,000) to use. Alfresco Software Inc. is headquartered in the UK with some staff distributed across North America. The user community is global with concentrations across Europe and North America. Repository services for custom web applications. Activision uses Alfresco to manage marketing sites for its titles. Two Harvard Business School Publishing sites run on Alfresco WCM. Travelocity uses Alfresco to manage thousands of whitelabel travel sites. British Telecom uses Alfresco WCM to manage configurations and content for co-branded customer sites. Frameworks and Components: Integration Standards: Apache MyFaces, ehcache, FreeMarker, Hibernate, jBPM, Lucene, OpenOffice, Rhino, Spring, Velocity JSR 168, JSR 170, WebDAV, Common Internet File System (CIFS), CMIS 1.4 through 1.6 Tomcat, JBoss, Websphere MySQL, Oracle, MS SQL Server
Common Uses: Sample Customers:
Java Support: Application Servers: Databases:
© 2009 Copyright Content Here, Inc. | All Rights Reserved. Not for Redistribution. | www.contenthere.net
OPE N S O UR CE WE B C O N T E N T M A N A G E M E N T I N A L F R E S C O | V E R S I O N 2 . 0 | S E P T E M B E R , 2 009
Alfresco is a generously funded software company with a commercial enterprise software pedigree. The company was founded by John Newton (co-founder of Documentum) and John Powell (former CEO of Business Objects) and they have rounded out their team with senior people from Novell and Interwoven. The fact that the early team came from Documentum is clearly visible in the product with its early focus on document management, repository services, and access control. Development of Alfresco started in January 2005 and the team has made tremendous progress in both building the software and visibility for the company. Alfresco describes itself as the first and leading open source ECM product — a claim that frustrates companies like Nuxeo whose ECM products pre-date Alfresco. While Nuxeo was there first, few can argue with the fact that Alfresco has put open source on the map as a viable alternative to commercial ECM products. Commercial vendors and open source projects alike have adjusted to Alfresco's market disruption. Nuxeo ported its ECM product from Zope to a more familiar Java/JBoss platform. Commercial vendors are reconsidering their pricing and value propositions. Alfresco has been iterating on its licensing and business models since the first public release of the software. The company has a strong moral stance that those who use Alfresco's software should pay for it. In an effort to convert users into paying customers, Alfresco has dabbled in tiered products with different features and badgeware versions. However, their strategy has been getting progressively more open source friendly. The current approach is a single code base for both the Community and Enterprise tiers. Both products are licensed under the GPL. The Community Edition, also called "Labs," is less rigorously tested and contains experimental features that may or may not make it into the Enterprise Edition. Customers are advised against using the free Community Edition for anything but experimental uses; the Community Edition is neither supported nor patched. There are features in Community Edition that will never be incorporated into the Enterprise product. Certified integration partners are forbidden from working on the Community Edition for their clients. It is possible that one day a community will form to support the Community Edition but as of today none exists, and it is doubtful that the Alfresco team will encourage one. Alfresco's drive into the WCM space did not start until the beginning of 2007 when it recruited Kevin Cochrane and other thought leaders from Interwoven. This was a promising sign that the former Documentum team recognized that their limited web content management vision could not compete against the best of breed WCM platforms. There seemed to be great promise for Alfresco as a capable web content management system. WCM was officially launched with release 2.0 of the Alfresco Community Edition in July 2007; first WCM customers launched in August. They tended to use the product for ether very simple static websites or for managing and deploying XML files in complex custom architectures. Based on published reports and customer counts there appears to be between 300 and 350 out of 1,0000 paying Enterprise Edition customers currently using Alfresco WCM.
© 2009 Copyright Content Here, Inc. | All Rights Reserved. Not for Redistribution. | www.contenthere.net
The first couple of releases of Alfresco WCM are now universally acknowledged to have been a rush job. Many of the systems integration partners claimed WCM should have been positioned as Beta quality prior to the 2.2 release when critical functionality (like search) started to be exposed through the user interface. Version 2.2.3 made up for past sins by addressing a majority of the bugs and set the stage for the current 3.x series. Still, the latest release, 3.1, has introduced no significant end-user functionality and has primarily focused on putting in place better testing facilities, refactoring the code base, and addressing performance issues. Over the last year, pure web content management has taken a second seat to SharePointstyle collaboration functionality. The first indication of this was in June 2008 when the WCM team Kevin Cochrane, Jon Cox, and Britt Park (all Interwoven alumni) left the company. This led to a tighter focus on the document management and collaboration comfort zone that the original team had at the start. The WCM team was replaced by Michael Uzquiano, who was promoted from the role of sales engineer and, to this day, is not listed on the Management Team page of the Alfresco website. Michael spends nearly all of his time working on the "Surf" web application framework which is designed primarily to support Alfresco's SharePoint killer "Share." Alfresco's web content management functionality still has plenty of room for improvement, but it is unlikely that addressing these deficiencies will take priority over functionality aimed at "Enterprise 2.0" and knowledge worker collaboration. That said, the foundation is there for capable systems integrators to build highly specialized custom web content management solutions on top of the Alfresco platform. There are some interesting examples of companies using Alfresco WCM to manage XML configuration files and content snippets for custom built web applications. However, this may not be the best use of such an elaborate and feature rich technology stack.
Alfresco gets the attention of software architects and Java developers for its standards support and its use of popular open source components and frameworks. The first thing you notice when you download Alfresco is that it is a lot of software. The lib folder is packed with 149 JARs totalling nearly 71 megabytes; that is a lot even by Java standards. Alfresco includes some of the most modern and elegant open source components and frameworks around. In some ways, you can think of Alfresco as one big supported bundle of best-ofbreed open source software projects. Reusing these components is what has enabled Alfresco to develop their product so quickly and stay current with the latest technology and standards.
Figure 1. Alfresco Architecture Diagram
Alfresco has a very open, service-based architecture that supports a number of standards. Source: Alfresco documentation site.
Alfresco's standards support and openness makes it very effective for integration with other systems and for use in service oriented architectures. When Alfresco first hit the market, it was positioned as a framework for building any kind of content centric application and the user interface was merely an example of what you could do with the platform. Today, many architects still look at Alfresco as an ideal building block for larger architectures. In some cases, Alfresco WCM has a role as small as simply managing and deploying XML configuration files. Java, PHP, and Web Services APIs expose all of Alfresco's functionality. The repository is accessible over WebDAV, Common Internet File System (CIFS), and FTP. CIFS support (which allows a Windows user to map a letter
drive to the repository as if it was a Windows file server) is one of the Alfresco team's biggest achievements. Long time Unix users will remember what an impact that Samba [http://www.samba.org] had by allowing Windows and Unix to share files over Microsoft's proprietary standard. Alfresco has the only Java implementation of a CIFS server. Jahia [http://www.jahia.com] includes the technology for their own content management products. One could say that CIFS is the user interface that engenders the most pride from the Alfresco team and the most adoration from business users (see commentary on the Alfresco Navigator client later). JSR 168 (the Java Portlet Standard), JSR 170 (the Java Content Repository Standard - level 2.), and Business Process Execution Language (BPEL) are all supported. Most recently, Alfresco has taken a leadership role in developing and promoting Content Management Interoperability Services (CMIS, pronounced "see-miss."). Alfresco produced the first implementation of the draft specification on the same day it was announced. Alfresco CTO John Newton is an active member of the CMIS specification committee. The key to the Alfresco architecture is the repository whose node-based hierarchy is similar to the Java Content Repository. Indeed, the Alfresco JCR interface complies with level two of the JCR specification but goes beyond the JCR baseline functionality in a number of ways. One of the more innovative features is its support of "aspects" to add attributes and functions across different asset types. Adding the "versionable" aspect to an asset makes that asset support versioning; a "searchable" aspect causes the asset to be indexed. This is different from object oriented classing because it is done at the object instance level — the class or type of the object stays the same. Aspects are defined through XML files and manually applied to content assets through the user interface or by business rules triggered by Alfresco's event model. For example, a business user can set a rule to add "categorizable" aspect to content when it is added to a specific folder. Although no compiling is needed for defining most aspects, you need to restart the application server for them to be recognized. Defining new aspects is a convenient way to add functionality to the system. A developer could add a "synchronization" aspect to push updates to an asset to another system, for example. The Alfresco repository also has an event model that can trigger the execution of code on events such as update, move, or a change in workflow state. The Alfresco Repository is composed of three core services: the Node Service, the Content Service, and the Search Service. Together, these three are called the "Foundation Services." The Node Service manages the metadata of content objects or "Nodes." Alfresco's definition of Node maps directly to the JCR definition. Every content asset is a node placed in a hierarchical tree. Node metadata information is stored in a relational database (MySQL by default, although most database platforms are supported thanks to a Hibernate object relational database layer). The Node service is used for organizing and browsing content. Every content object in Alfresco is stored in a file: XML for structured content, HTML or native binary formats for everything else. These files are managed by the Content Service which takes care of things like retrieving the proper version of the asset and encapsulates the mechanics of persistence. The Search Service uses Lucene search indexes that are stored on the file system and are also used in the on-board search functionality and for listing operations in display templates.
Figure 2. Alfresco Architecture Diagram: Repository Services
Alfresco's repository architecture is based on three core services: Node, Content, and Search. Source: Alfresco documentation site.
Additional services may be added to the Alfresco repository by registering them with the Registry Service. All the other repository functionality is built on top of these three services. This includes: Content transformation and image manipulation, metadata extraction, templating, classification, versioning, locking, workflow, and permissions. Alfresco's early focus on the repository has made it functionally rich but also very dense and complicated. One of the primary areas of refactoring is to pull certain services out of the repository layer to make it leaner and cleaner. For example, the Web Scripts engine is being pulled into its own tier. This will make the platform more flexible and maintainable, as well as more efficient for performance purposes. When Alfresco introduced WCM into the architecture, they needed to make some major enhancements to the repository. A key change was the development of the Alfresco Versioning Model (AVM). Like with the original Alfresco repository, every asset is managed as a file. In the case of structured content, it is an XML file. The AVM goes beyond the document management (DM) repository with functionality like file-level branching, snapshots, and directory level versioning. There is also the construct of "transparencies" that allow one collection of assets to be "overlaid" on another collection to create a view that is the union of the two collections. Where both collections have the same file, the overlaid version is shown; when the overlaid collection has deleted a file, the file is removed from the view. It is this architecture that enables the sandboxes and snapshots that are explained in the content contribution segment of this evaluation.
The AVM has a distributed repository model where multiple repositories can run virtually on a single instance of Alfresco or on multiple Alfresco instances. Content can be replicated between repositories and the process is identical for repositories running on the same instance or for repositories distributed across the network. Replication is based on snapshots that are automatically taken every time content is pushed to the content staging workspace. When replication is initialized, the source repository asks the target repository for a hash of its latest snapshot. The source server then sends over the files that have changed along with the hash of the replicated snapshot (to save the target server the work of computing the hash of its snapshot). This model reduces the amount of traffic over the network and the amount of workload on the target server that is expected to be busy servicing web traffic. A similar architecture is available for simple file system deployments. In this case, a lightweight daemon is installed on the target server rather than a full blown Alfresco instance. The AVM sits alongside the original DM and is used for "web projects" (a web project is a special kind of folder that contains a website). The initial vision was that the two sides would eventually merge. However, since the departure of the team that designed and evangelized the AVM, ambitions have been much more measured. A few community insiders even speculated that the AVM would be retired without the passion of its initial creators. That has turned out not to be the case now that the AVM is a critical component of the Share and Surf platforms. The new goal seems to be interoperability between the two repositories. The general pattern is to manage complex content assets in the AVM but publish them out to the DM for rapid access. This is not unlike publishing out to a denormalized database for performance purposes. The AVM does have limitations that Alfresco is not actively addressing. The primary issue is performance. For rapid access, Alfresco recommends using the DM repository. The AVM side is also missing some critical features. For example, permissions can only be granted at the web project level — not on a folder within the web project. Web projects do not support the features like a rules engine and localization that are supported on the document management side. For a very good comparison between the AVM and DM repostories, see Jeff Potts' article "Understanding the differences between Alfresco’s repository implementations" [http://ecmarchitect.com/archives/2009/08/31/1038]
Figure 3. Alfresco Screenshot: Web Content Properties
Web Projects expose a small fraction of the repository functionality supported in the rest of the application.
Web projects can be accessed through CIFS (as opposed to the ECM standard repository that is accessible over CIFS, WebDAV, and FTP) under a separate mount point than the general ECM repository. Under the WCM mount point, the user will see two directories: data and versions. Under versions there will be directories for v0 through vn — one for each snapshot taken of the repository. This structure allows you to "time travel" to different read-only views of the web project's repository through Windows Explorer. Other mount points can also be defined based on filtered views of the repository. There are some predefined ones that restrict what a user can see based on role. Other mount points can be defined through XML configuration files. Despite the fact that web projects are accessible through CIFS, Alfresco does not generally recommend business users accessing the web projects in this way. It is considered safer to have them work in standard ECM project spaces and use rules to push content into web projects. Early in 2007, Alfresco created publicity around their JCR benchmark tests and claimed to be the fastest open source JCR implementation (faster than the other: Apache JackRabbit). They had a platinum partner certify the results. However, the JackRabbit configuration was using the default file system persistence rather than the much faster relational database persistence that most non-demo implementations of JackRabbit are configured with. Still, the definition of a benchmark was a great contribution to the overall content management software industry. The Alfresco repository (the WCM version in particular) is probably not as fast as an optimized relational database. Most Alfresco powered high traffic websites deploy rendered content to a farm of web servers or publish XML for access by custom delivery tiers. Repository performance is getting better and is expected to
continue to improve. A big step in the right direction came in 2.2.3 when better testing infrastructure was introduced. The new testing harnesses and other facilities have already exposed performance sapping bugs. The authoring environment of Alfresco's new Share application sits on top of the WCM services so the AVM will benefit from at least some of the investment that is going into Share. Alfresco is working on a trimmed down "web delivery runtime" that leaves out components like the web client and services like workflow, access control, auditing, security, versioning, and validation. This "Headless Alfresco" would be ideal for high-performance clustered environments where a fully functional "management instance" of Alfresco publishes to a cluster of performance optimized runtime instances. The licensing cost of these components has yet to be determined. Presumably they will be less than the $15,000 to $20,000 per CPU per year subscription fees that Enterprise customers pay for the full product.
Figure 4. Alfresco Architecture Diagram: Web Delivery Runtime
The Alfresco Web Delivery Runtime is a trimmed down instance of the Alfresco application that provides repository and other services for custom web applications.
As mentioned earlier, when adding web content management functionality, the Alfresco team put a lot of thought into working with structured content and handling concurrency between multiple content contributors. Many of the ideas came from the WCM team's background at Interwoven. Alfresco followed Interwoven Teamsite's approach of creating user "sandboxes" where users can edit and preview content without interfering with other user sandboxes or the production site. Changes made in a sandbox are only visible in that sandbox until checked back into the staging sandbox. Unedited content in a user's sandbox is automatically updated to reflect changes other users submit to staging. Depending on the user's permissions, he can directly check in an update to the staging sandbox or initiate a workflow that will collect the necessary approvals. Now in 3.1, nobody, not even an administrator, can directly edit in the staging sandbox.
Figure 5. Alfresco Screenshot: Sand Boxes
Alfresco's sandbox model provides contributors with their own work areas to edit and preview their changes prior to checking back in.
The other Java open source WCM project to employ the sandbox model is OpenCms with its notion of "projects." In the PHP world, TYPO3 has introduced a similar work area concept. However, Alfresco's implementation is more sophisticated thanks to its "virtualization" technology that allows a user to browse through the site as it would appear after the modifications are checked in.
As impressed as technologists are by the Alfresco architecture, users are often less enthusiastic about the user interface that tries to split attention between web content and document management. It seems that the Alfresco team is a bit confused about the role of the "Alfresco Explorer" (formerly known as the "Web Client"). Originally, it was positioned as a reference application to show what one could build on top of the Alfresco platform and it seemed to get less attention from the engineering team than the programming interfaces and modularity of the system. When Alfresco positioned itself as a business application rather than a development framework, it was more likely to defend Alfresco Explorer. Still, when speaking to Alfresco staff, it is easy to tell from the relative enthusiasm between the UI and the architecture that they see the UI as a necessary evil. It is not surprising that many integrators do as little with the Alfresco Explorer client as they can. At least two of the early WCM implementations had contributors edit content in DreamWeaver and XML editors against a CIFS drive rather than use the Web Client, and then use Alfresco to deploy these files to the delivery environment. When going this far to work around a CMS, one should consider just using a source code control system to manage HTML files. Alfresco Explorer is awkward and clunky in just about every way imaginable. Its use of early Java Server Faces technology creates annoying issues like a non-functional browser back button and non-bookmarkable URLs. From an interaction design perspective, there are a number of problems. The primitive tree controls are not good enough for the repository's highly hierarchical structure; wizard style interfaces are inefficient for power users. The content editing wizard control buttons (back, next, finish) are only on the upper right corner of the page. Users need to get into a rhythm of working their way down a form and then scrolling to the top to continue on or finish the wizard. The Alfresco Explorer is even less suited for web content management. Web sites are created in special folders called "Web Projects." Users need to navigate to their web projects from the the top level of the application or via a short cut. With no tree-based navigation or in-context editing, the web project user interface is way behind pure WCM products in terms of usability. Content assets are listed by their file name so a user must guess from the file name what the content is about and then figure out what enigmatic icon will execute the desired action on the content. The UI has a "paging style" design where the contents of a folder is shown in pages of 10 assets at a time. The sort columns are limited to very basic attributes: file name, size, modified date, created date, and modifier, so it can be hard to find an asset.
Figure 6. Alfresco Screenshot: Browsing the Site
Alfresco WCM lists assets as files in a paging style interface that makes it difficult to browse through large folders of content.
Alfresco's Web Projects have a flexible content model where all structured content is stored as XML files in the repository. Most content structures can be supported including nested elements and links to other assets. However, there is very little control over the form that contributors use to edit the content. Form controls (editing widgets) are listed vertically on a single page. Unlike many form based web content management systems, the editing user interface has no tabs so the forms can get very long. The developer only has control over the types of widget used, the name, description (help text), and order of the fields. There is a basic set of form controls including a calendar date selector, and a browser widget and other controls can be added. TinyMCE is shipped as the default WYSIWYG editors, but developers report success with other editors, as well. The TinyMCE configuration comes with custom browse dialogs for adding links and image references, but a surprisingly limited number of formatting buttons are enabled. More can be added by editing a section of the web client configuration file that sends parameters over to the TinyMCE control. Some buttons, like spell check, require the addition of plugins that can be easily installed (See the TinyMCE web site [http://wiki.moxiecode.com/index.php/ TinyMCE:Control_reference] for a full list of configuration options). Formatting buttons can be turned on per content type and field. For example, the summary element of a content asset can get fewer buttons than a body element. The image and link browse controls allow a user to browse and add new targets. The image control provides fields to set the dimensions, position, and alt text of the image. However, the browsing controls only list the file names.
One of the biggest limitations of the editing interface is the lack of direct preview. While the sandbox functionality and virtualization sets the foundation for elaborate preview functionality, the utility and accuracy is largely implementation specific. In its native state, a structured content asset is only rendered when it is saved. This means that an editor needs to save the asset, leave the editing interface, and then click a preview button that will allow him to navigate through the rendered content in his sandbox. Given that users like to edit, then preview, then edit some more, this behavior is annoying. Many customer implementations, however, employ custom delivery tiers which are often even more awkwardly integrated into the solution. On some of the most problematic implementations, editors have little choice but to publish blindly and check the production site. Ineffective preview is particularly problematic because the editing interface itself does not give any feedback how the rendered asset will look. For example, one customer implementation defined a home page as a collection of references to teasers of articles to display. Unlike other CMS user interfaces that allow you to search through titles (and other attributes) of content to pick related content, Alfresco forces the user to browse through file names. The editor does not see what content has been associated until he saves the asset, leaves the editing interface, and then clicks the preview button. This user interface also lacks the drag/drop mechanism for collections that is so common in CMS user interfaces.
Figure 7. Alfresco Screenshot: Highly Structured, Related Content Type Example
The form widgets that come with the Alfresco forms builder are not conducive to building references to other content. This screenshot is from a customer implementation of a home page object that references specific content on the site. Here you can see that the editor must know the file name of the content he wishes to associate.
Alfresco has been dragging its feet on improving the Explorer because it sees it as a dead end, the company is developing user interfaces that will eventually replace the Explorer. As noted earlier, the biggest investment is in the Share collaboration application. Share rejects the clunky JSF user interface model in favor of a lightweight, AJAX rich UI. Like SharePoint, Share is a re-imagining of a portal where a user can asseble a page from
lightweight portlet-like "dashlets." Sites can have components like a document library, blog, calendar, discussion, or wiki. All of these components come out of the box and, looking at the list, it is clear that Alfresco's focus is squarely on collaboration and user generated content. From a more traditional web content management perspective, the future seems to be with the Web Studio client (see screenshot below) that provides a drag/drop interface for building a website on the Surf platform. As noted earlier, developing sites using the Surf platform is tedious and complex without Web Studio. While this user interface is critical for Alfresco's growth as a flexible website development platform, it is not slated for inclusion in any of the releases on the Alfresco Enterprise roadmap. As it stands now, the application is fairly buggy and quirky; it is unclear what it would take to get Web Studio to Enterprise Edition quality.
Figure 8. Alfresco Screenshot: Web Studio
The Alfresco Web Studio user interface will be important for building more traditional websites on top of the Alfresco Surf framework. However, it is not currently ready for Alfresco Enterprise and there are no visible plans for including it.
With release 2.1, Alfresco introduced some basic link checking functionality. Users can click a "check links" button from within their workspace and link checking can also be added as an automated step in the default workflow that comes with the product. This is helpful because the complexity of the UI makes visually checking links and images a bit flakey. Localization support is pretty weak in Alfresco WCM. A user that is familiar with the "make multi-lingual" feature in the rest of Alfresco will be disappointed in the lack of localization
support within web projects. Companies tend to use primitive work-arounds, like appending a locale code (en, es, fr) at the end of the file name to create localized web sites. The only other alternative is to manage different locale web sites in different web projects. While earlier releases of Alfresco had a simplistic folder based workflow model, Alfresco WCM employs a more sophisticated workflow model. Under the covers is JBoss's jBPM workflow engine, which allows workflows to be designed and implemented visually using an Eclipse plugin. On inherent limitation of jBPM is its inability to implement multiple choice logic (see Multi-Choice workflow pattern [http://www.workflowpatterns.com/patterns/control/ advanced_branching/wcp6.php]). While designing workflows is very point-and-click, it takes a little more effort to wire these workflows into Alfresco application logic. JBoss jBPM also supports the standards-based BPEL for cross system choreography. When a new content type is defined, through the "web form wizard," it is associated with one or more workflows. Depending on the workflow, there will be different configuration options that can be selected by the user that submits the content. Workflows can initiate business logic like checking links, and create manual tasks that are emailed to the user and show up on the user's dashboard.
Figure 9. Alfresco Screenshot: Workflow Dialog
The workflow task form combines some task management plus a workflow state machine.
Alfresco allows a content type to be defined with more than one workflow option. When more than one workflow is enabled for a content type, the user can select which workflow to use.
DEVELOPMENT, CONFIGURATION, AND ADMINISTRATION
Figure 10. Surf Based Architecture
Alfresco's architectural vision is for customers to customers to build dynamic applications on top of its repository using the Surf framework. Source: Alfresco.
Today, Surf's primary goal is to support Share but it will probably be used as a key component of all future business applications that Alfresco develops. Surf is built around familiar concepts like pages, page components, and templates that allow the developer to create lightweight, portal-like applications without the overhead of a clunky Java portal framework. A Surf application is deployed as its own stand-alone WAR file. Technically, Surf applications do not need Alfresco but there are a number of areas where Surf relies on Alfresco services: security, profile management, and persistence. Surf can do clever things like virtualized content retrieval where the same Surf instance can connect to different sandboxes for preview and live view of an application. A Surf application can connect to multiple Alfresco repositories and other resources using credentials stored in a local repository called a "credential vault." As general web application frameworks go, however, Surf has the basics but is fairly immature. Today it only has what was needed to develop Share. Surf does not have advanced features like a strong security system, page flow (like Spring Web Flow), and an object relational mapping component (like Hibernate) that would make it useful as a general purpose web application framework. It could also use some re-factoring to simplify development. For example, Surf seems to have only partially committed to the popular philosophy of "convention over configuration" where developers use techniques like naming conventions to avoid maintaining relationships in configuring files. Even a small Surf project will have loads of XML configuration files that need to managed. Large Surf projects will be swimming in interdependent XML files. The plan is for the Web Studio user interface to automate the creation and alteration of these files but Web Studio is part of the supported Alfresco Enterprise product. Probably the biggest question asked in the Alfresco community is why Alfresco did not select a pre-existing web application framework. Maybe the Alfresco team over-estimated
how close they were to building a robust framework out of their Web Scripts technology. Perhaps they were afraid of choosing the wrong framework and struggling like they did with Apache MyFaces on the Alfresco Navigator user interface. For now, however, many systems integrators are finding success using Alfresco for repository services behind custom applications built in third party web application frameworks like Spring, Struts, Django, and Drupal. From an administrative standpoint, Alfresco web projects do not have much to offer. Access control within web projects is limited. Alfresco comes with some pre-packaged roles that should look familiar to a user of its document management functionality: Content Manager, Content Contributor, Content Reviewer, and Content Publisher. A Content Manager has full permissions on the workspace; a Content Contributor can edit and add but not publish; a Publisher can approve content but not edit; and a Reviewer can only read content. More roles can be created by editing configuration files. The real shortcoming of the access control model is that roles are applied at the web project level — not at the sandbox or folder level. This makes it difficult to do things like restrict access to edit a portion of a web project. The most practical work around is to use workflow to prevent users from publishing content that they shouldn't be editing. Unless approved, their edits will linger harmlessly within their own personal sandboxes. Another strategy would be to separate the web site into multiple web projects, however this would hinder sharing content across site sections.
Figure 11. Alfresco Screenshot: Managing Permissons
Managing permissions is done by inviting users and groups and assigning them roles.
Alfresco's LDAP support is based on a replication model. Alfresco periodically gets updates from an external LDAP repository. This implementation is problematic if you want to edit
group memberships through the Alfresco UI because they will just be overwritten by the next update. If you want to integrate with LDAP, it is best to do all the group assignment directly in the LDAP directory or customize Alfresco to only consult the LDAP directory for authentication. There is currently no special back-up functionality other than to shut down the system and run standard MySQL and file system back-up. You can also do a hot backup (when the system is running) as long as you back up the database before the file system. Otherwise, the database will have records of documents that are not available on the file system. While unusual for enterprise software, Alfresco's lack of a robust backup system does not pose a problem for most companies since the de-coupled delivery tier would remain operational. However, for global companies working on the same Alfresco instance, having a daily maintenance outage would not be acceptable. There has been talk within the Alfresco team about implementing a live back-up mechanism, perhaps, using the replication functionality. No doubt some customers are probably experimenting with this approach right now.
Alfresco gives you several options when it comes to rendering structured content. Presentation templates written in FreeMarker or XSL are registered with a content type. Each content type can have multiple presentation templates that each make a "rendition." For example, one template could make a detailed view while another template could make a view to be used in a list of assets; this is good for static delivery. In order to use Alfresco's rendering engine, content rules are set to process the presentation templates when the content is saved, creating rendered versions of the source XML content. For example, if you have an article123.xml source file, and rendering templates for a detailed view and a summary view, you may get the files article123.html and articlesummary123.html. You would also probably want to generate an index page with the 10 most recent articles. The best practice is to store rendered files in a different directory structure than the XML sources. This makes sense because rendered content should be stored as it will be navigated on the external site — not as it is managed. This also enables content re-use because content can be rendered to multiple places. The paths and file names that the content is rendered to are configurable via rules that can use variables and information about the content to determine where to put it. One could use a taxonomy to render content into various folders; alternatively, Alfresco can be configured not to render the content and instead save it as XML and have a dynamic delivery tier do the rendering when the assets are requested. The pioneers that built the first Alfresco powered web sites went with a static HTML deploy model where rendered HTML files were deployed to a simple web server. This model is particularly appropriate for sites that have been statically imported into Alfresco and are edited as HTML files rather than structured data. Other models include structured publishing of XML files or publishing a whole web application content, code and all.
Figure 12. Alfresco Architecture Diagram: Static Deploy Model
The simplest delivery model is static deployment where static HTML files are pushed over to a simple web server.
The big trend among Alfresco customers and integrators is around building custom active delivery tiers. As mentioned earlier, Surf is emerging as a web application framework and integrators are having success using third party frameworks such as Struts, JBoss Seam, and Django. Alfresco, with its robust repository and open architecture, fits nicely behind presentation tiers. That said, when Alfresco is used in this way, there is a tendency for the resulting solution to get more complex than is warranted. The area where these solutions get particularly complex and messy is in content preview. Unless you are trying to build a truly unique web application, it may be more efficient to select a platform with its own, built-in presentation tier. Eventually, Surf will probably play that role in the Alfresco stack but until Web Studio stabilizes and incorporated in the Enterprise distribution, it is not there yet. The other limitation with this architecture is that the editorial interfaces (particularly those provided with Alfresco Explorer) are not good enough to make content contributors appreciate the role of Alfresco in the architecture. The cost of developing a custom delivery tier and a custom content contribution interface on top of the Alfresco's relatively expensive software costs (see the delivery and support section for details) makes Alfresco an extremely expensive solution. One of the more ambitious concepts that came over from the Interwoven engineers is "virtualization." Unlike TeamSite, which proxies over to a web server running the presentation tier, Alfresco is provides a container for any "well behaved" (Alfresco's words) Java web application to run in. This allows both code and content to be tested in safe virtual instances running on one instance of Alfresco. In recent months, however, the emphasis on virtualization has been significantly reduced. Most Alfresco implementations instead use remote test servers. Alfresco can deploy content to these servers and then
proxy requests for preview and content staging in essentially the same way that Interwoven TeamSite works. To make this happen, test server instances need to be set up beforehand and registered with Alfresco. Since there is a finite number of test servers, only a finite number of people can preview at the same time.
DELIVERY AND SUPPORT
Alfresco is not a community project; the code base is very tightly managed by Alfresco Software, Inc. During its first couple years, the Alfresco engineering team moved very quickly to build the platform. Features were being defined and added so quickly that it was often difficult to determine what was supported and what was planned. While the development velocity was impressive, hasty design decisions were made and bugs were not addressed. Documentation and support struggled to keep up. All this is fairly typical for a startup software company, but the marketing message positioning Alfresco as a better Documentum at a fraction of the price made the customers expect more. There was quite a lot of frustration with broken or incomplete features, particularly in the WCM side of the application. Over the last year, however, Alfresco has pulled back from its frantic pace of adding new features and focused on improving performance, fixing bugs, and refactoring the code base. This should make everyone's lives easier. While the development has adopted a more controlled and conscientious pace, the product roadmap is still murky and confusing with parallel development paths for Community and Enterprise. It is not always clear what features in Community will get into Enterprise and, if so, when. This frustrates customers who buy the product on planned features — particularly larger customers who feel they should be getting preferential treatment from the vendor. The Alfresco external developer community has been steadily growing. Systems integrators were lured into the partner network with all the visibility the product was getting. External programmers have been contributing extensions on the Alfresco Forge [http:// forge.alfresco.com/]. As of August 2009, there were 205 hosted projects ranging from language packs to Outlook plugins. With Share and Web Studio, Forge will become a more central part of the offering. Users will be able to download and install modules available on Forge from within the administrative UI much like WordPress administrators can download an install plugins. The Enterprise Edition is distributed as a compiled binary that has been fully tested. In order to use the Enterprise Edition, customers must pay an annual "network fee" between $15,000 and $20,000 per CPU depending on the support service level agreement. Network fees for backup and staging CPUs are cheaper but it is not uncommon for a customer to pay $50,000 per year to use the software. If the customer stops paying the network fee, it must stop using the software. This is unlike commercially licensed software where customers have the option to discontinue their support and maintenance agreements. This, coupled with the fact that most Alfresco based solutions are complemented with a lot of custom software, makes this an expensive platform. The documentation and support forums are hit or miss and Enterprise customers report that paid support is not much better. Munwar Shariff's book Alfresco, Enterprise Content Management Implementation is a useful introduction to the user interface, the architecture, and its customization points. The Alfresco Developer Guide, by Jeff Potts goes into much greater depth in customizing and extending Alfresco. Although it is based on a
combination of Enterprise 2.2 and Community 3.0, Jeff has been keeping the source code examples up to date on his website ECM Architect [http://ecmarchitect.com/]. The most up to date resource is the wiki, but the articles are not as thorough as more formal product documentation would be. There are a few articles on best practices, but not nearly enough. Alfresco delivers training at its offices near London and through partners elsewhere. Customers report that the training is useful. The best bet for getting the most out of Alfresco is to go through a systems integrator who may have an inside line on the product. Alfresco operates a network of SI partners. The network is tiered (Platinum, Gold, and so on) and based on company size and financial (and other) commitments made by the systems integrators to Alfresco — not by the amount of Alfresco work that the systems integrators do. The best way to evaluate SI partners is to look on the forums. The good SIs are the ones that are answering the questions and publishing modules on the Forge. Paid professional support gets low marks from customers, especially those customers accustomed to working with large commercial enterprise software vendors. The perception is that the support organization is small and inexperienced. Support requests are frequently answered with a naked link to a sparse and barely relevant piece of documentation. This, coupled with the price, has frustrated customers who are forced to maintain their support contracts by the terms of the Enterprise license; they can't discontinue support and maintenance even if they feel like they are not getting value from the service.
Table 2. Alfresco 2.2 Summary
Explanation While the hierarchically organized content repository can handle large volumes of content, the Web Client is not optimized for managing web content. It is so weak that customers prefer using the CIFS interface to navigate the repository as a simple file system. Structured content types are edited through auto generated web forms. There is not much control over the generated forms other than the form widgets used and the order of the fields. The layout with the save buttons on the top is awkward on long content forms. Preview from the editing interface is a key missing feature. Developers have the option of using an external source code control system or using the repository for version management. Virtualization is useful for spot testing code. New deployment functionality makes it easier to deploy code and content to the delivery tier. Alfresco is very clear that its customization layer and its licensing prevents users from re-compiling any of the core code. Customization of Alfresco is done through writing presentation templates, developing modules (AMPs), and adding jars that override default behavior. The Spring IoC control framework allows you to wire in code. Alfresco does not come with its own delivery tier. Developers can use its Freemarker or XSLT engine to transform XML content when it is saved for static content delivery. Most customers build their own dynamic delivery tiers that either read XML deployed to a file system or from the Alfresco repository. The PHP and REST APIs are also useful for building dynamic delivery tiers. Alfresco also allows you to virtualize and deploy your presentation tier code. Alfresco is a Java programmer's dream. It uses all the technologies that a developer either knows or wants to learn. Sometimes communication is open and candid, other times it is more marketing hype. The version numbers between the Labs and Enterprise versions are often confusing. To get the best access to
Customization Layer and API
Delivery Tier Flexibility
Widely Used Technologies
Explanation information, work with a good systems integrator, read the wiki, and get to know other customers. Alfresco Enterprise Content Management by Munwar Shariff. Useful as an introduction to the platform but does not cover WCM or many advanced configuration and extension topics. A better book for developers is Jeff Pott's Alfresco Developer Guide. The formal documentation has adequate coverage of some of the primary topics. The wiki has a lot of information but it is not particularly well organized. Better search would tie everything together. The online forum is monitored by Alfresco staff. Still, responsiveness is just average. Paid support, however, is not much better.
Alfresco is an ideal platform if you really want to build your own CMS but don't trust yourself to get versioning, deployment, and workflow right the first time. The strength of the product is in the architecture, the APIs, and the repository; certainly not in the user interface. Alfresco provides a higher starting point than a naked JCR implementation like JackRabbit, but comes at the cost of some lock-in. Alfresco customer implementations tend to involve elaborate architectures with lots of integration and custom code. Because of the high subscription fees and the amount of customization, the cost of these solutions tends to be higher than average for open source and mid-market commercial software. These solutions often look more elegant in Visio diagrams than they do from the perspective of the business user. When Alfresco WCM replaces home grown solutions or statically managed websites, the Alfresco user interface is a step up. Also customers that are happily using Alfresco for document collaboration are finding success when they expand their use of the product to manage a small, simple, static website. However, when compared to pure WCM platforms, the Alfresco user interface is noticeably lagging. The more mainstream and free-standing a website is, the less appropriate it is for Alfresco WCM. In settings where a strong development team is building a unique content application that would require mangling a traditional WCM product, Alfresco is a more natural fit — particularly when the custom application involves significant document collaboration functionality. The next phase of Alfresco's evolution should be to mature as a business application and deliver a more compelling offering to business buyers. This seems to be the intent with Share. There seems to be less attention to creating a similar solution for web publishing and, given the makeup of the team, web content management will continue to be no more than an aspect of enterprise content management.