Professional Documents
Culture Documents
What they are, why people are using them, making them useful for
knowledge management
Michael Angeles
michael@studioid.com
http://studioid.com
Thank you.
Today I’m going to talk about weblogs inside my company, their use in
knowledge management, and how my organization is hoping to make them
usable for enterprise knowledge work if the number of blogs in the company
increases significantly.
I’ll talk briefly about our company and the types of people involved in various
forms of web publishing on the intranet.
Then I’ll look more closely at what weblogs are, how people use them, and how
we might develop information systems to make usable, the data that gets
published from these weblogs.
Disclaimer
I’d like to try to introduce this presentation in the same way so you know that
not everything I’m talking about has been implemented.
This is also a discussion of how my organization has analyzed and is planning to deal with weblogs.
* I’m going to talk a little about how we’re supporting bloggers presently.
* And I’m going to talk about how we, as the company’s information management organization, are
positioning ourselves to deal with any information growth as a result of blogging.
The disclaimer part is that we have NOT implemented all of our ideas yet, though we have the technology and
resources to implement them. The technical implementation, as you will see is trivial when compared to the
strategy and resources required to actually pull off some of the ideas we’ve kicked about.
A history of web publishing in my intranet
Before we get into the nitty gritty of weblogs ...
a very brief and incomplete history of Lucent
intranet web publishing
How web publishing has evolved
Who’s needs are being met by web-based
publishing
Let’s start with a timeline
But before we get into the nitty gritty of what weblogs are and before I start throwing
out buzzwords
I want to give you some idea of how web publishing has evolved in our intranet
And then want to look briefly at the different people involved in publishing corporate
information on the intranet and why they need to do it
First there was the command line
Technologies
Internet NCSA
protocols Mosaic
(Archie, (11/
FTP, 1993)
telnet)
Company Milestones
LINUS
IIS Milestones
(Client-
server)
Pre-web 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
From a thousand miles up and with the benefit of hindsight, we can see where the
company has gone with web publishing on the intranet.
In pre-web days the library organization’s electronic resources were accessed using
LINUS, a shell interface that you accessed by telnetting into our UNIX server. This was
a hierarchical menu interface that dumped you into oru databases, which used
command-line search syntax identical to Dialog, a large database aggregator popular
with researchers.
Then came pictures
Technologies
Internet NCSA Netscape
protocols Mosaic Navigator 1
(Archie, (11/ (12/1994)
FTP, 1993)
telnet)
Company Milestones
Simple sites
proliferate;
Hand editted and FTPed;
Front Page webmasters
(1995-96)
IIS Milestones
(Client-server) Digital Library produces
(1994-1995) customized
db-driven
BU intranet
sites
(7/96)
Pre-web 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
Then Tim Berners’ Lee wrote the specifications that became HTTP and HTML and the
web was born. Most web pages at this early stage of our intranet are all text and
almost all sites are probably marked up by hand in vi or emacs. Later people start to
use WYSIWYG editors like Front page.
In 1996 my organization begins to hire staff to produce web interfaces for customer
databases and web sites and we begin to get more heavily involved in doing db-driven
web-based information systems for business units.
Then useful data competed for screen space
Technologies
Internet NCSA Netscape
protocols Mosaic Navigator 1
(Archie, (11/ (12/1994)
FTP, 1993)
telnet)
Company Milestones
Simple sites ONSource,
proliferate; first BU
Hand editted and FTPed; portal
Front Page webmasters (1/1999)
(1995-96)
IIS Milestones
(Client-server) Digital Library produces indexing supports BU
(1994-1995) customized process portals with
db-driven BU introduced indexed
intranet sites (1998) content
(7/96) Business (1999)
taxonomy
development
(1998)
Pre-web 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
Then useful data started to crowd and compete for screen space when the first business unit portals arrived.
ON Source is probably the most successful large-scale web site implementation I’ve seen in the company. It
was the result of an Optical Networking Group team that worked with analysts who came into the
organization to do interviews with Optical Networking knowledge workers to find out what they looked for to
do their jobs, where they looked and how much time they spent looking. After an extensive report was
created describing their prospective users and estimating the amount of money spent per person searching
for information, the functional specifications for this portal started to come together.
Our organization was brought in to develop a metadata schema including an Optical Networking subject
taxonomy, and company taxonomy which was then expanded to include all of the product and research
areas at Lucent.
IIS then began to modify its applications and indexing processes to incorporate these subject taxonomy terms
and classified data going through our organization began to feed the portal.
The bubble bursts and standards are born
Technologies
Internet NCSA Netscape
protocols Mosaic Navigator 1
(Archie, (11/ (12/1994)
FTP, 1993)
telnet)
Company Milestones
Simple sites ONSource, Portals close; MyLucent,
proliferate; first BU subdomains Company
Hand editted and FTPed; portal removed portal
Front Page webmasters (1/1999) Migration to (7/2001)
(1995-96) MyLucent
begins
(2000-2001)
IIS Milestones
(Client-server) Digital Library produces process BU portals ceases to
(1994-1995) customized introduced with indexed produce
db-driven BU (1998) content custom
intranet sites (1999) sites
(7/96) Business (2000)
taxonomy
development
(1998)
Pre-web 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
Then a wierd thing happened. The bottom fell out when the dot com bubble burst. Telecom was hard hit and
from up high every executive and senior manager was looking for ways to cut costs.
So corporate standards were discussed for a long time and we began getting involved with an initiative to
migrate all of the company’s separate intranet sites into one company portal. I remember hearing about the
long meetings that seemed to go on for months around this topic.
In the end, the Oracle Portal server was selected and is now running the corporate intranet. My group
stopped doing custom-information services involving new web site development.
Then the bottom really falls out
Technologies
Internet NCSA Netscape
protocols Mosaic Navigator 1
(Archie, (11/ (12/1994)
FTP, 1993)
telnet)
Company Milestones
Simple sites ONSource, Portals close; MyLucent, Much of
proliferate; first BU subdomains Company CIO
Hand editted and FTPed; portal removed portal supporting
Front Page webmasters (1/1999) Migration to (7/2001) MyLucent
(1995-96) MyLucent is laid off
begins (2003)
(2000-2001)
LINUS InfoView ISG created; IIS indexing ISG supports ISG ceases
IIS Milestones
(Client-server) Digital Library produces process BU portals to produce
(1994-1995) customized introduced with indexed custom
db-driven BU (1998) content sites
intranet sites (1999) (2000)
(7/96) Business
taxonomy
development
(1998)
Pre-web 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
And then another wierd thing happened -- the failing economy caught up with our CIO.
The CIO organization has been decimated by forced management procedures (or
layoffs) in the last year. So much of the hard core information systems / development
work is returning to us in IIS again.
And everything old is new again
Technologies
Internet NCSA Netscape
protocols Mosaic Navigator 1
(Archie, (11/ (12/1994)
FTP, 1993)
telnet)
Company Milestones
Simple sites ONSource, Portals close; MyLucent, Much of
proliferate; first BU subdomains Company CIO
Hand editted and FTPed; portal removed portal supporting
Front Page webmasters (1/1999) Migration to (7/2001) MyLucent
(1995-96) MyLucent is laid off
begins (2003)
(2000-2001)
LINUS InfoView ISG created; IIS indexing ISG supports ISG ceases
IIS Milestones
(Client-server) Digital Library produces process BU portals to produce We are here
(1994-1995) customized introduced with indexed custom Blogs appear;
db-driven BU (1998) content sites Blog-related
intranet sites (1999) (2000) services
(7/96) Business (2002)
taxonomy
development
(1998)
Pre-web 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
Which brings us back to where we started really. We’re finding more people needing to
create/and share knowledge who are using some form of lightweight web publishing
to do it. But this time the technologies have matured and some of the savvy people
are picking up light CMS in the form of weblogging software.
Really seems like web-publishing chaos
As an aside, the CIO reaction to this chaos has been to start up large projects requiring a good deal of
spending and to mandate the use of standard processes and technologies in the enterprise. From my
perspective, it seems that not all of these processes and technologies have not always been coordinated with
user processes and needs. As a result there might be a backlash of users backtracking of users to simpler
methods. We’re starting to see this in the re-emergence of personal publishing (such as with weblogs) and
with an increase in requests for information services of my organization.
There’s a story to be told in that diversity/chaos
Looking at our timeline, I think our Intranet story can best be explained in terms of the needs of the different
user types within the company.
and specifically by observing who these needs have been satisfied (or not satisfied) using various technologies
over time.
Looking back at that timeline from a high vantage point, it seems like IT infrastructure for web publishing is
complete chaos. To some degree that's true. Until recently, all IT implementations within the company have
been executed from within the individual business units rather than directed from above. Slowly that's changing
again! We'll look more at that issue when we talk about information ecology.
The issue that we're going to focus on first is this diversity of needs.
Who’s who in intranet web-publishing
Knowledge workers
Researchers, engineers, sales force
Executives
Officers, upper managers
We can generalize about who the major players in the Intranet web publishing picture, reducing those involved
to a few key user types.
Knowledge workers
Communities of Practice
Chief Information Organization
Executives
Knowledge worker Community of practice CIO manager Executive
Researchers, engineers, sales force Groups organized around specific CIO directors and managers Executive officers, vice presidents
topics or projects and upper management
This matrix hopes to tell the story of the key players in intranet web publishing and what their needs are.
The primary users of web publishing applications are the knowledge workers and communities of practice.
These user needs have mapped to specific tools and technologies over our history. Describing their business
goals and knowledge management needs would help the CIO to try to provide a road map for IT standards,
but for my purposes, this matrix helps so we can observe how the different user needs have resulted in our
present information ecology.
Diversity is a good thing
Nardi & O'Day on information ecology
A system of people, practices, values, and technologies at work in a local
environment.
I began to realize over time that the perceived chaos didn’t represent a failure of IT or of management. The diversity, I think,
simply represented a lot of people with information or knowledge management needs that were finding different ways of
satisfying those needs. The implementations are as diverse as the people that make up the organization.
I found in the writing of Bonnie Nardi and Vicki O’Day some justification for or validation of this diversity. Essentially they use the
analogy of ecologies to describe the organization.
They define an information ecology as a system of people, practices, values, and technologies at work in a local environment.
They also say that a healthy ecology is one that is dynamic (changing/evolving), diverse (made up of different types of people
and technologies) and that allows for a diverse set of people and technologies to work in a complementary way. I’ve used this
opinion to rationalize our strategy of working with this diversity rather than imposing systems or processes from above.
Some recent research by Deloitte Consulting and Forrester Research supports this.
Analysis of Lucent information ecology
We’ve seen a lot of Database driven publishing -- Home grown or commercial CMS and document
management. A lot of these applications in the past have been front ends for databases using scripting
languages (for instance perl).
Home grown and low-cost server-based applications such as weblogs are increasing in popularity.
Desktop personal publishing tools remain popular. There are still a lot of people that maintain their own
web pages by hand and using WYSIWYG editors (simple editors like vi or notepad or using Front Page)
The first thing is maybe to accept that in a large organization people will often want to do things their own
way.
I am not against standard processes and procedures, but, I think the goal should be to work with the diverse
user needs and technologies that are expressed and find a way to make them work together.
Enter the weblog
Let's step back a bit and talk about weblogs.
They're the new up and comers in web
publishing on the intranet.
Before we go into details about how to make the data from these web-published
resources usable, let's step back a bit and talk about weblogs because they're the
most recent arrivals to web publishing on the intranet.
A lot of applications for web publishing have emerged over the last few years.
Weblogging applications in particular are growing in popularity and there are many
inexpensive weblogging tools to choose from today.
A quick look at what weblogs are
A web site (usu. of personal/non-commercial origin)
that is frequently updated with information and links
to resources within a particular subject area.
The published information is presented much like a
journal on the web in reverse chronological order.
In 1999 Peter Merholz coins the term "Blog".
Rebecca Blood. “weblogs: a history and perspective”.
Rebecca’s pocket. Essay discussing the emergence
of weblogs. http://www.rebeccablood.net/essays/
weblog_history.html
A weblog is a web site that is frequently updated with information and links to resources within a particular
subject area.
The published information is presented much like a journal on the web in reverse chronological order.
Peter Merholz announced in early 1999 that he was going to pronounce it 'wee-blog' and inevitably
this was shortened to 'blog' with the weblog editor referred to as a 'blogger.'
You can read more about the history of weblogs by reading Rebecca Blood’s essay, “weblogs: a history and
perspective.” She’s also written a book on what weblogs are titled “The weblog handbook”. She’s an example
of one one of the bloggers out there who maintain a personal weblog, hers in particular devoted to writing
about the literature and movies she devours.
What do people blog?
Personal opinion
Industry & topic specific information &
opinion
Very often meta discussion / revolve
around specific web page content
(URL) -- discussion about something
some else has written about
So now that we know what weblogs are, what is it that people blog and why should you care?
Well blogging started out as a form of personal journal writing that was just transferred from paper to the web.
That still makes up the bulk of what bloggers blog. An example of this is Rebecca Blood’s blog, Rebecca’s
Pocket.
But a growing area of interest is in publishing opinion and commentary on an industry or subject area. An
example of this is John Rhodes’ WebWord or the IA community blog iaslash. These sites also become
community discussion areas when the weblog allows people to leave their comments.
And a lot of the time, these are sites that just maintain a list of current articles and web sites within a subject
area. Examples of this type of blog are Lawrence Lee’s Tomalak’s Realm.
Blogging also means sharing
Weblogs allow you to publish a news feed
A news feed can be a data file listing recent
entries from a weblog
Or a data file listing recent news headlines
from a commercial source.
Blog feed formats are in XML format
(specifically RSS or RDF)
Look for these buttons:
Reading blogs in a news aggregator
Aside from being tools to publish and share,
weblogs often offer a mechanism for reading
other weblog data in XML feeds
News readers / aggregators
An application that retrieves and displays
news feeds from multiple sources.
Client application -- runs on PC for
individual use.
Server application – runs on a web server
for group use.
Weblogging is easy. Most of the tools available are simply HTML form-based interfaces for creating database
records.
You enter the title and body of your text. Optionally you can enter a category from a list of categories you’ve
entered in your tool. The author and date are usually auto-entered.
Blogging variations
Variations of the process -- URL based
blogging
Since a lot of the time blog entries contain
meta-discussion, the starting point is a
pointer to an article someone else has
written
Blog from an aggregator
Blog from a bookmarklet
Let’s see how it works...
There are some variations as well. The starting point isn’t always a blank blog entry
screen.
Since blog entries are often opinion of other people’s writing, the starting point in a
blog session might be a URL on a remote site. In this case, if for instance, you are
reading someone else’s site, you can use a news reader to read someone else’s blog
entry and then click a link to auto fill that blog entry into your own blog so you cannot
annotate and comment on what the other person is talking about. Sounds
complicated, but I’ll show you in a minute how this works.
So let’s take a look at a few tools to show you what a blogging session is like.
Movable Type
1. Enter title
2. Enter body
of blog entry
3. Select category
4. Publish
This is the typical web-based blog entry screen. In this example we’re seeing Movable Type, a weblogging
tool written in Perl.
Click here to read comments
One of the nice features that most blogging applications allows readers to leave their comments.
This example shows a link that indicates the number of comments attached to a blog entry. If you click that
link...
The reader comments usually appear on the screen or in a separate browser window.
Another neat feature like this using XML RPC[1] is called “TrackBack” -- feature allows people to reference
the URL for this blog entry on their own weblog, and then their trackback ping appears on my weblog.
Knowledge creation
(publishing)
You might be thinking, “Sounds good, but why would I want to blog in the intranet?”
The type of weblogs we’re seeing in the intranet are a special type called knowledge logs or k-logs.
A few articles in the past year have discussed the advantage of using weblogging software to handle some
aspects of knowledge management.
The first advantage is that it offers a low cost alternative to doing knowledge management. I’ve read on
discussion groups that a lot of people find weblog software appealing after seeing larger KM efforts fail.
Additionally, the people who are adopting this form of web publishing are savvy web users who read blogs
outside of work and know that relying on other people’s whose expertise you trust is a great way of growing
your own knowledge.
Additionally, bloggers, I think find great advantage in being connected to others who share research interests.
Social networks or societies of bloggers who read each other’s opinion often form and in this way, ideas are
challenged and tested.
Fast, cheap and in total control
Fast (and easy): Set up is quick and doesn't
require much expertise.
Cheap: Powerful personal publishing solutions
at low cost.
In total control: The real power in weblogging
is that it puts knowledge creation in your
control and also allows you a standards-based
mechanism for pushing/sharing that
information.
The bottom line, I think, is that people on the intranet are using blogs for web publishing because
1) They’re quick and easy to setup. Most setup will cost you 15 to 30 minutes if your company has web space
available for you.
2) They offer a great amount of functionality at very low cost. Some weblogging software is free.
3) And probably most importantly, weblogging puts knowledge creation and sharing in your hands. You don’t
need to rely on the processes and technologies of anyone else to do this and the sharing mechanism uses
standards based XML, which means that your data can be re-used elsewhere.
How we are supporting k-loggers
XML feeds of databases
News data
ABI/Inform
Technical documents
Almost any data set can be mapped to the
standard RSS format
Email discussion groups, CRM, directory of
new personnel
So with the appearance of new weblogs in the intranet, my organization has begun to discuss how to support
k-loggers.
The first blog user came to us asking for news feeds on specific topics. What we have done is to provide
database search results in RSS format so she can do any complex or simple search for a topic she has in
mind and then have a URL that will serve as a news feed that she can feed into a news reader or aggregator
of her choosing.
What our users do with the RSS
This example shows part of a database search result page. This particular database is our Selected News
database which pulls indexed content from published news sources via Factiva.
In this search above, I entered terms “Classification, indexing and abstracting” as my query and the search
results show a lot of records.
Embedded within the search results is an option to view the results as XML (which is a dump of the search
results contents with all fields) or in RSS (a dump of the results in a brief record format showing title, URL, and
abstract for each record.
Bloggers copy the URL for the RSS feed and can then use them in their own aggregators.
Adding the URL for the RSS feed
to your news aggregator.
Here’s an example of how a user might follow a news feed. This example is using Radio
Userland, a weblog publishing tool with an integrated feedreader.
In the news aggregator, I have Hack the Planet as a website I’m following
In news aggregator view, user sees a story they want to blog
Selecting the POST button copies that story's URL and title to a new blog entry in the editing form.
So where do we go from here?
So now we’ve prepared our company to use the data we pull in daily from various news
vendors and internal databases. Where do we go from here?
I think already we’ve done more to support k-loggers than is expected, but we’re also
hoping to support their efforts if weblogs start to proliferate in the company.
Consider your place in the ecology
The natural progression in an information
ecology where k-loggers start to proliferate is
to seek a system that pulls together the
disparate k-log data.
The role of the information services
organization is to glue together the aggregate
of produced k-log data for its users to
consume.
The XML feeds that k-loggers produce are almost always in one of a few standard RSS
or RDF formats. This is the common element that allows the your organization to glue
together the aggregate of produced k-log data for its users to consume.
Obviously you need to first begin collecting and aggregating that data. So you will
need a technical strategy for doing that. But once you’ve got the data, the real work is
in making it findable.
The first step is to use metadata to create bibliographic records for each entry. You can
rely on a standard such as the Dublin Core metadata elements to help structure your
own metadata schema.
The next step is to consider some form of classification or organizing the blog entries by
topic or subject.
Making blog data findable
Make the aggregate of collected blog entries
available by publishing it
Make searching and browsing of indexed blog
entries possible
Our organization already does a lot of the text
parsing, classification and republishing that is
needed to make a Blog aggregator fly
Offer varying means of use and notification when
new relevant data comes in. Email alerts, etc.
First you make the aggregate of blog entries available in a raw feed for re-use and also
offer a reverse-chronologically sorted spool of recent blog entries.
Next make it possible for people to search and browse the indexed blog entries in the
collection. Our organization does a lot of text parsing, classification and republishing,
and I’ll explain that process in the next slide.
Finally offer other means of use and notification such as email alerts.
Our process
Start: Raw feed from various sources
(vendor data, internal databases, weblogs)
I’ve already had someone on a discussion group tell me to just throw Google at the data to make content
findable.
I don't know that search engines are always the answer to all problems. Yes search is necessary, but are
search engines the front end you want to use for all types of databases?
We do, in fact, have search engines in-house that do cluster analysis and offer categorized and relevancy
ranked web site search results. But we aggregate a lot of data -- most not from websites -- and our process
for indexing this data uses a combination of machine and human indexing. Computer algorithms have not
proven to be capable of discerning some concepts as well as humans.
This screen shows the web-based aggregator built into the Drupal application, which we use on the IA
community blog, iaslash.
The “Latest news” shows the most recent blog entries collected from various blogs that we watch. From this
page, we can select the news that’s relevant to our community and enter it into our database.
Web-based aggregator (brute-force) example
This is view of the home page, which shows how our database is displayed with classification shown below
each entry.
Other ways you can do it
Just use search software with automated
classification
Consider our hybrid approach
And you can always rely on software to automatically do the classification of data for you. Many search
vendors offer some sot of module that allows for this kind of classification, but often some human intervention
is needed to help guide or tweak the classification module.
Finally you can consider out hybrid, semi-automated approach. Our take is that their is a bigger return of
quality indexing when you insert humans into the process.
Success
It’s important to note that the success of KM
depends on the willingness of individuals to
participate by using tools that will integrate
seamlessly with the organization’s knowledge
network.
Deloitte suggests that while localized KM
efforts may not require knowledge networks in
small organizations, the advantage of
knowledge networks becomes manifest when
communities express the need to re-use that
localized knowledge.
Sustainability
Another important factor is sustainability
If you plan on doing automated classification
of data human resources will be needed at
some point to set up taxonomies
If you plan on a hybrid machine and human
aided indexing process, full time staff might be
needed
Closing thoughts
Weblogs are really not different as a
technology, although they put control of
publishing closer to users
Classifying weblog data can be difficult and
requires human resources, but some search
applications can help
Value diversity and above all, support users’
needs
Allow users to produce organizational
knowledge using whatever tools they choose
While comments, trackbacks and XML feeds are useful, as a technology, weblogs are really not very different from other applications made for
web publishing.
What makes them different is how they’re used from the end-user perspective. It puts control of web publishing in the hands of end users, who
can decide what process they want to use for sharing knowledge and what technology to use.
Some people view this amount of control and power to publish as a danger if bloggers record too much information. However, I think one of the
ways to produce organizational knowledge is to record all sorts of tacit knowledge including ephemeral communications, as well as meeting
afterthoughts and opinions.
Making the output from weblogs usable can be as simple as allowing your search engine to spider and index their content, but if your search
application doesn’t allow for classification, the ability to browse content by attributes such as subject, business unit, and product, might not be
possible. Classification allows the system to represent the knowledge contained in data more consistently.
The caveats to doing classification are that neither an automated or a manual process can you give you best results. So the investment in doing
classification might require additional time and resources if you don’t already have indexing staff available. Some search vendors offer
classification, however, so this may be a good route to pursue if human resources aren’t available.
Finally, I think it’s important to remember to keep in mind the needs of the users who want or need to blog and to encourage it if it results in
knowledge sharing. While there are bound to be a lot of people who want to protect their intellectual capital, there are probably equal numbers of
people who see value in sharing their knowledge, and if using a cheap and easy weblogging tool is how they find their way to doing that, that
can’t be a bad thing.
Further reading on info. ecology and KM
Bonnie Nardi and Vicki O’Day. “Information Ecologies:
Using Technology with Heart.” First Monday. http://
www.firstmonday.dk/issues/issue4_5/nardi_contents.html
AmphetaDesk http://www.disobey.com/amphetadesk/
blagg. http://www.oreillynet.com/~rael/lang/perl/blagg/
Drupal. http://drupal.org