You are on page 1of 8

Emerging quality

Creating dynamic user and content profiles in online knowledge networks

Thieme Hennis
02-Sep-08
Type, theme, and length of project
Classification: 1A  This project can be classified as research treating ethical and societal aspects of concrete
technological developments.

A virtual identity has both psychological and economical significance. In my research, the motivation to share
knowledge and help others is an important aspect of the way the virtual identity of an individual is built. It also
presumes and supports other economical structures, such as a more flexible employer-employee relationship.

Theme: Virtual reality


The Internet has become an essential element of many economies and societies. Similarly, we are intrinsically
attached to and have become part of the Web (Kelly, 2005). The last few years, we have seen an enormous
increase in people being active online; connecting, creating, sharing, and building up identities. Smart data-mining
systems are able to create dynamic profiles of people and content representing expertise, relevance, and quality.
Internet pioneer Wendy Hall describes it as follows;

Every time you do something on the internet, it is effectively logged, building up this profile that is with you
for your life…. We will be able to build software that can interpret that profile to help get the answer that
you need in the context that you’re in (Smith, 2006).

Length of research: 4 years


The author applies for a single Ph.D. research (length: 4 years).

Research team
Quality is a socially defined concept. I will try to make it quantifiable by measuring certain use & user relationships
within decentralized networks. At present, the research team consists of the following people;

Professor Wim Veen – TU Delft – Wim has been involved in research into learning and innovation in
education for many years. He developed a very relevant and useful model concerning networked learning.
Alpha (sociological-educational theories)
Dr. Jaco Appelman – TU Delft – Jaco has been involved in collaboration software research for many years.
He will assist with methodological and content issues. Beta (collaborative software & systems engineering)
Job Timmermans, M.Sc. – PEERS – Job is co-founder of PEERS, has finished master degrees in philosophy
and systems engineering. His role in the project is to discuss the true practical application of the developed
system. Alpha (philosophy of quality) & Beta (collaborative software & application interface)

Research description
In decentralized (virtual) networks with tools and technologies that allow anyone to contribute anything, it is
increasingly problematic to determine reliability of content and people online. The research I propose must bring
forward rules and variables that can be used (by for example software engineers) to let quality and expertise
emerge over time and be visible.

What represents quality?


In the last decade, the internet has evolved into a platform allowing any person to participate and contribute.
Various easy-to-use web technologies empower people to share interests and knowledge, search and structure
content, and connect with friends and peers.
Web 2.0 represents a blurring of the boundaries between Web users and producers, consumption and
participation, authority and amateurism, play and work, data and the network, reality and virtuality
(Zimmer, 2008).

With the increase of online participation, a number of issues have emerged regarding quality, authority, expertise,
and trust (Keen, 2007). With organizations becoming more open and seeking ways to make use of the contributions
of people around the world, these issues become even more prevalent (Abbott, 2000). As there are many new
tools for publishing and creating new content, there are tools that are specifically made to search, filter, rate,
evaluate and recommend content to people in certain contexts. Still, filtering through more and more resources
hidden online or in internal networks, remains difficult (Benkler, 2006; Howe, 2006).

Finding the right resources and people


Search algorithms of popular search engines focus on popularity (or authority) rather than on what is commonly
regarded as quality. In this process, no human reviews are involved and thus created in a sub-optimal manner
(Lewandowski & Höchstötter, 2008). Because similar search algorithms and ineffective content management
systems are used within organizations, most of the time spent by knowledge workers is spent in recreating already
existing (and in-house available) information;

A lot of money and intellectual power is spent on reinventing the wheel and searching for knowledge. This
is a huge problem for companies and a central challenge for KM research (Swaak, Ifamova, Kempen, &
Graner, 2004).

People define quality. Usually this involves relying on others, such as experts or people you trust. This should also
be the case for the way search engines and content management systems determine quality. In specific; this means
the inclusion of human reviews and other metadata generated by people (times used, favorited, tagged or
recommended) in structuring and managing content. In doing that, quality is linked to context and more
transparent for the user and related to certain context variables (which may be user input variables in search
engines).

Standards
A number of initiatives, such as PICS (Resnick & Miller, 1996; Armstrong, 1997) and Resource Profiles (Downes,
2005) propose protocols or frameworks that can be used to evaluate, rate, or structure online content. Many
websites have implemented rating and reputation mechanisms to increase transparency and indicate trust in
content and people. Still, a general standard for online content does not exist.

Wikipedia co-founder Larry Sanger has recently called for a system for syndicating and rating online data, claiming
it to be the obvious next step (and Big Idea) for the Internet. It will enable systems to weight data not just on
Google-style PageRank algorithms, but also things like

quality according to generally trusted sources; or quality according to your peer group; or quality according
to academic and academic-endorsed sources; etc. (Sanger, 2008)

What Sanger proposes, is a system that includes relevant information of a person with the rating in order to add
context and enrich the information about pieces of content with relevant metadata (such as quality according to
peer group, evaluations, usage).

As the Web is populated with more data, it becomes easier to automatically mine these kinds of user and usage
statistics about people and their behavior online, popularity and interest, friends and activities and turn into
valuable metadata. For example, APML (Attention Profile Markup Language) and ULML (User Labor Markup
Language) intend to set standards for capturing and sharing information about people online. When you combine
this people metadata with active feedback generated by users (through rating and evaluations), profiles of people
and content can be made automatically (through use) that can be used to increase motivation to contribute and
share, enhance flexibility for freelance workers and organizations, and improve efficiency in finding people and
content (Choi, Kruk, Grzonkowski, Stankiewicz, Davis, & Breslin, 2006).

Hypothesis, research questions, instruments, and methodology


Both user profile (expertise level & domain) and usage (number of views, clicks, ratings, recommendations, etc.)
are relevant and should be utilized to determine quality and relevance of content. Furthermore, using this
information for profiling the original contributor leads to a system
of dynamic and up-to-date expertise profiles based on the value of
contributions. The hypothesis I want to falsify is that in virtual
knowledge networks, findable expert and content profiles can be
made by analyzing how content is used, and by whom, and linking
the results to the original contributor. The two fundamental
assumptions that make up the hypothesis are:

1. User and usage information determine quality and domain of creations; and
2. Quality of creations determine the creator’s expertise.

About these two important assumptions, a lot has been written and done. The recent increase of people being
active online and sharing content allows for complex data retrieval and profiling algorithms for dynamically
determining quality. How this translates into research is described in the next section.

Research questions and instruments


As mentioned, the above assumptions are addressed by various reputation systems and rating, quality, and
profiling mechanisms. I will first investigate the most important relationships, rules, and (upcoming) standards in
the generation of metadata about quality, authority, and expertise by these (stand-alone) systems. Concurrently I
will look into processes of knowledge workers using different types of publishing and rating tools, and find out the
most important variables of quality in knowledge networks. These variables are then ordered along different levels:
i.e. personal (expertise, competencies, passion, etc.), relational (quality of interactions), and informational
(usefulness of content, reliability source). This first step of literature research and case-studies is inductive, and
results in a model that will be validated by doing a large-scale survey. Through regression analysis personal biases
will be filtered out and an empiric foundation will be created for the interpretatively developed model.

Step Research question(s) or description Instrument / Outcome


Method
1. CONTENT PROFILES: What are variables and (metadata) Literature, desk- Authoritative paper(s) about
standards and initiatives for defining quality of content? research “Metadata standards and
User-driven: active rating and evaluation quality in decentralized
Machine-driven: measuring usage online networks” AND/OR
2. USER PROFILES: What are variables and (metadata) Idem “Metadata standards,
standards and initiatives for defining expertise of persons? profiling mechanisms and
User-driven: recommendations etc. authority/expertise protocols
Machine-driven: determination of authority (based and rules in decentralized
on several factors) online networks”.
3. A first case study will provide insight into criteria, Case study: Criteria, possibilities, and
possibilities and constraints of using different tools. interview, survey, constraints & toolbox.
How can the standards and variables be measured, experiment
using existing tools?
4. Using the outcomes of the three steps, I will describe the most important variables and requirements for
determining quality and expertise in online networks.
How can user-driven and machine-driven metadata about quality of content translate to dynamic expertise
profiles of content creators (or: How should content-profiles influence user-profiles?)
How should the expertise-profiles influence content-profiles?
Additionally, I will clarify the requirements for the case studies and the research that follows, to test the hypothesis.
These requirements include instruments/technologies used, user-participation, size of network, and more.
5. VALIDATION MODEL: Does the interrelation of content- Case study: Framework for measuring
metadata and user-metadata in determining quality and interview, survey search quality within
expertise improve finding of people and resources in organizations & Critical
organizations? success factors for the model
What are critical success factors?
6. Describing the outcomes of the research. Report and functional design
for the proposed system.

Timeline
The steps in the above table are ordered chronologically. The timeline below describes the structure in more detail:

1. Year 1: Step 1, 2, 3 – Literature research, creating research framework and quality model and theory,
conducting an exploratory case study, preparing further case studies and writing papers.
2. Year 2: Step 4 & 5 – Developing and deploying the model in research communities and evaluation of
model. More specifically;
Describing how different tools are used to create and share information, and how these tools
define quality/expertise.
Evaluating and refining the model and theory. This means describing (i) how usage (popularity,
rating, reviewing, etc.) and users (experts versus laymen) together determine quality of content,
and (ii) how this translates to the expertise or authority of the content creator.
3. Year 3: Step 5 – Similar to the second, but with more focus on converging research results in order to
create an improved and more abstract model for quality and expertise in online knowledge networks. The
two main requirements are that the model functions as desired and that it can be used as a basis for
creating metadata generating software.
4. Year 4: Step 6 – Describing and finalizing my research: make it useful for practical solutions.

Methodology; Grounded Theory


Because I will develop a new theory about quality based on existing literature and research, the chosen
methodology is grounded theory. Grounded theory can be described as a research method in which the theory is
developed from the data, rather than the other way around. It is an inductive approach, meaning that it moves
from the specific to the more general. Because theories for virtual identities, quality and rating systems, and
constitutions around the increased empowerment of people are currently taking shape, this is the best approach:
utilizing it to create a better model.

Societal impact and valorization


My objective is to create a system that measures people’s activities and contributions online and automatically
translates this to a virtual identity (or karma) that can be found by the right persons in the right context. Such a
system allows people to be found and employed more directly and flexibly (Malone & Laubacher, 1998). Depending
on how efforts are valued and used by community, the virtual identity of the contributor changes. I suppose this
leads to two things;

People contribute valuable content to community (otherwise it will not add value to their ID);
People are more intrinsically motivated to contribute (fun, community feeling) rather than by financial
reward. Still, the virtual ID forms a bridge to future job opportunities or assignments based on
(motivation-based) contributions.
Such a system will change organizational structures, and create a more flexible and free economy, as speculated by
Pekka Himanen:

Could there be a free market economy in which competition would not be based on controlling information
but on other factors – an economy in which competition would be on a different level (and, of course, not
just in software, but in other fields, too)?

– Pekka Himanen; the Hacker Ethic and the Spirit of the Information Age (2001) –

Competition, then, would be then based on the contrary, the sharing of information and resources between people
and in flexible networks and communities. I know that this is another testable assumption, but that could be done
in further research. Before we can do that though, we must build the foundation of this system.

Case study; Sustainable network


My analysis of quality and expertise in virtual environments (like online communities) will be the basis of PEERS
1
Interaction Management System ; software analyzing interaction of users with each other and online content. All
described relationships, rules, and standards will be built in it, so it can be tested and applied immediately.
Currently, we are deploying our software at different organizations in different settings. The following will serve as
exploratory case study in the research;

Sustainability network (100-250 professionals) consisting of DKA (De Kleine Aarde), Enviu, OSIRIS, and the
TU Delft Sustainability department (SEPAM faculty). These organizations, concerned with sustainability
and alternative technologies, have clearly expressed their interest and commitment to contribute and be
part of the proposed research. I will deploy different software tools within these organizations, and use
PEERS Interaction Management System to create dynamic exchangeable profiles of people and content.
They allow users to make use of content and connect with people outside of their own organization. Tools
and technologies already used by the organizations will be part of the research, if they allow measurement
of use and users by PEERS IMS.

1
http://aboutpeers.com
Works Cited
Abbott, V. (2000). Web page quality: can we measure it and what do we find? A report of exploratory findings. J
Public Health , 22 (2), 191-197.

Armstrong, C. (1997, May 19). Metadata, PICS and Quality. Retrieved August 10, 2008, from Ariadne magazine:
http://www.ariadne.ac.uk/issue9/pics/

Benkler, Y. (2006). Wealth of Networks; How Social Production Transforms Markets and Freedom. New Haven, CT:
Yale University Press.

Choi, H. C., Kruk, S. R., Grzonkowski, S., Stankiewicz, K., Davis, B., & Breslin, J. G. (2006). Trust Models for
Community-Aware Identity Management. Identity, Reference, and the Web Workshop at the WWW Conference,
2006.

Downes, S. (2005). Resource Profiles. Journal of Interactive Media in Education , 5.

Himanen, P. (2001). The Hacker Ethic and the Spirit of the Information Age. New York: Random House.

Howe, J. (2006, June). The Rise of Crowdsourcing. Retrieved August 10, 2008, from Wired Magazine (14):
http://www.wired.com/wired/archive/14.06/crowds.html

Keen, A. (2007). The Cult of the Amateur. New York: Doubleday Business.

Kelly, K. (2005, August). We Are the Web. Retrieved August 08, 2008, from Wired Magazine (13):
http://www.wired.com/wired/archive/13.08/tech.html

Lewandowski, D., & Höchstötter, N. (2008). Web Searching: A Quality Measurement Perspective. In A. Spink, & M. (.
Zimmer, Web Searching: Interdisciplinary Perspectives (pp. 309-343). Dordrecht: Springer.

Malone, T., & Laubacher, R. (1998, September-October). The dawn of the E-lance economy. Harvard Business
Review , 144-152.

Resnick, P., & Miller, J. (1996). PICS: Internet Access Controls Without Censorship. Communications of the ACM , 39,
87-93.

Sanger, L. (2008, July 8). Syndicated Web ratings - an idea whose time has come? Retrieved August 8, 2008, from
Citizendium Blog: http://blog.citizendium.org/2008/07/09/syndicated-web-ratings-an-idea-whose-time-has-come/

Smith, D. (2006, May 21). All set for a baby.com revolution. Retrieved August 10, 2008, from Guardian - The
Observer: http://www.guardian.co.uk/technology/2006/may/21/news.theobserver

Swaak, J., Ifamova, L., Kempen, M., & Graner, M. (2004). Finding in-house knowledge: patterns and implications. I-
KNOW04. Graz, Austria: Telematica Institute. Available at https://doc.telin.nl/dscgi/ds.py/Get/File-40767.

Zimmer, M. (2008). Preface: Critical Perspectives on Web 2.0. First Monday (online) , 13 (3).
Preliminary budget
As yet, I request the full amount needed to complete this research: €300.000 for a fulltime (4-year) PhD position,
including research team, logistics and travel support, accommodation and all other expenses.

Valorization workshop
The valorization workshop consists of 2 parts.

In November a modest online conference will be held. I will do this using free conferencing and
collaboration technologies. I will put 4 important questions forward which are addressed with by the
invited speakers (15 minutes per speaker). A discussion follows with participants with my research and
research question as the main topic.
An offline meeting will be held in December with all stakeholders, including individuals from PEERS,
research committee, and potential cases. Depending on the possibility of having this hosted by an
institution, a maximum of €1000 is needed to hire office space and arrange beverages.

Summary for laymen (in Dutch)


De laatste 10 jaar heeft het internet zich ontwikkeld met technologieën die mensen steeds beter in staat stelt om
content te maken, toe te voegen, en te beoordelen. Bijna elke persoon kan met behulp van een computer en een
internet verbinding zijn/haar passies, interesses, en kennis delen, en dat gebeurt dan ook. Verschillende
mechanismen bestaan om die overvloed aan content te filteren en categoriseren, maar het blijft erg moeilijk om
online of in virtuele netwerken het kaf van het koren te scheiden. Dit geldt voor zowel voor content (wat is
betrouwbaar/van hoge kwaliteit?) als voor mensen (is deze persoon echt een expert op dit gebied?).

De toename online activiteiten van mensen schept naast meer content, ook betere mogelijkheden om deze
content te structureren en waarderen. Dit kan op verschillende manieren:

Ten eerste kan het gebruik van content worden gemeten: dit is zowel het passieve lezen, als het actieve
structureren/beoordelen/evalueren van content;
Ten tweede kan worden gemeten door wie de content gebruikt.

De hypothese is dat door zowel het gebruik als de gebruiker te meten en te analyseren, er hele specifieke en up-to-
date profielen gemaakt kunnen worden van content en mensen. Deze profielen zijn dynamisch en afhankelijk van
de activiteiten omtrent persoon of content. Naarmate iemand actiever is, krijgt deze een rijker profiel (hoeft niet
per se beter te zijn) en naarmate een stuk content meer gebruikt wordt, kan deze beter worden geprofileerd. Zo’n
systeem ondersteunt het decentraal en flexibel werken van kenniswerkers in virtuele of open organisaties.