You are on page 1of 47

Workflow Overview and DAFv2 Case Study

Agenda

Workflow, Digitization and Digitization Workflow Definitions Simple Workflow Systems Common Features Required in a Digitization Workflow DAFv2 Overview

Data model System Architecture System Modules Achieving Flexibility Using DWMS Adaptation of BA workflow to AMEEL

Workflow, Digitization and Digitization Workflow Definitions

What is Digitization?

The conversion of data from analog to digital or binary. Data could be object, image, document or a signal (usually an analog signal)

What is a Workflow?

It is a process and/or procedure in which tasks are completed. (Wiktionary) A workflow is a reliably repeatable pattern of activity enabled by a systematic organization of resources, defined roles and mass, energy and information flows, into a work process that can be documented and learned. (Wikipedia) Examples?
5

What is a Workflow?

What is a Digitization Workflow?

It is not found on the internet as a concept yet. A process and/or procedure in which tasks are completed to convert (data) to digital form for use on a computer. How to Digitize a book?

Digitization Workflow
Books Arrival

CHECK IN MODULE Adding Metadata


to DL database
Scanning

Supporting ILS1, ILS2

AMEEL & INDIAN digital books

Encoding Processing QA-Processing

OCR

Image on text generating PDF or DJVU

QA-PDF

Check Out & Archiving Module


8

Offline Storage DVD

Online Storage Petaboxes

Simple Workflow Systems

Simple Tracking Workflow Systems

Manual workflow management using several software packages MS Excel MS SharePoint MS Project Good for small digitization projects No installation time (Startup cost) Minimum extra hardware
10

Drawbacks of Manual Workflow Management


No Resources Management (e.g. Workstations and Users) Lack of projects and collections management Manual file handling between the storage server and clients Lack of handling workflow exceptions, dynamic evolution and deviations, except through manual intervention Manual maintenance of the relation with the LIS systems and digital repositories
11

Common Features Required in a Digitization Workflow

12

Automation, Tracking and Management of the Digitization Process

Automation

Allows automatic processes without user interactions like; backup, batch image conversions, pdf creation, etc Automates file movements and Storage arrangement

13

Automation, Tracking and Management of the Digitization Process

Tracking Each Jobs current state, user, workstation and storage location Each Jobs history, including Operator, Machine, Time and date, and Action (Start, Finish, Reject, Redirect) User rates (per book or per page / first time or second time)
14

Automation, Tracking and Management of the Digitization Process

Management

Defines Projects, Job Types, Phases and manage them Simultaneously Assigns Users to specific Jobs or Projects at specific Phases (operations) Observes the overall backlogs to be able to re-allocate resources at the different Phases/Projects
15

Flexibility in Defining Digitization Workflow Phases


Set Flow path sequence Phases can be added after the system is up and running.

16

Support of Dynamic Evolution and Deviations with History Tracking

Changes the normal flow of phases Downloads and Uploads to fix files Accepts external partially digitized jobs to start at the proper phase within the digitization workflow
Phase X Phase Y Phase Z

Changes the type of flow


Phase X Phase Y

Reject

Phase Z

Redirect
17

Integration with LIS and Library Digital Repository

Integration with LIS (Library Information System)

Extract digitized material Metadata in an Automated way at Job insertion (Check-In) Support the integration with multiple LIS systems at the same time Automatically ingest and update the digitized material into the Repository
18

Integration with Library Digital Repository

Installation, Software and Hardware requirements

Hardware Requirements:

Storage Scanners PCs Database


OCR software Image Processing Software Easy and Guided
19

Software Requirements:

Installation:

DAFv2 Overview

20

DAF v2.0: Can be tailored to any environment Supports both manual and automated operations Tracks the history of the job's life cycle Is easy to install and configure Is flexible in defining digitization workflow phases Allows for defining different users access level Plug-In based System

BA Digitization Workflow Management System (DAF) Overview

21

Overview: System Data Model


Manager

Entities User JobType Phase

Handlers
22

System Modules: Job Life Cycle

Job life cycle


Reject
Administrator accepts the rejection

Reject job for some problems

New Job

Checkin

Assign

File transfer

Start Ordinary job finishing Finish

CheckTo Repository out

Upload

Download

Job assigned to next state Administrator accepts the recommendation Recommend re-do a phase to the job
23

Redirect

System Architecture

24

Overview: System Architecture


Jobs in the System
Job Type A Phase A1 Phase A2 Phase AN

Check-Out To Digital Documents Repository

Phase Manager

Check-In Module

Reporting Module

Job Type B Phase B1 Phase B2 Phase BN

Administration Module

Archiving Module

Job Type C Phase C1 Phase C2

Phase CX
Pre Phase CX

Phase CX

Post Phase CX

Phase CN

Authentication and Authorization Handler

XML Phases Definition Handler

File Handler

Database Handler
LIS Server

File Server Database Stored Procedures

LIS DAF

Off-line Storage

25

System Modules

26

Overview: System Architecture


Jobs in the System
Job Type A Phase A1 Phase A2 Phase AN

Check-Out To Digital Documents Repository

Phase Manager

Check-In Module

Reporting Module

Job Type B Phase B1 Phase B2 Phase BN

Administration Module

Archiving Module

Job Type C Phase C1 Phase C2

Phase CX
Pre Phase CX

Phase CX

Post Phase CX

Phase CN

Authentication and Authorization Handler

XML Phases Definition Handler

File Handler

Database Handler
LIS Server

File Server Database Stored Procedures

LIS DAF

Off-line Storage

27

System Modules: Check-In


Check-in Plug-in Check-out Plug-ins DAR Plug-in Fedora Plug-in Fedora MARC File Plug-in DSpace Plug-in DAR

Check-In

Virtua Plug-in DigiArab Plug-in

Plug-in based for integration Creates the Job in the system Assign the Job to any Phase

DWMS

MODS File Plug-in . . . .

aDORe Plug-in . . . .

DSpace

aDORe

28

Overview: System Architecture


Jobs in the System
Job Type A Phase A1 Phase A2 Phase AN

Check-Out To Digital Documents Repository

Phase Manager

Check-In Module

Reporting Module

Job Type B Phase B1 Phase B2 Phase BN

Administration Module

Archiving Module

Job Type C Phase C1 Phase C2

Phase CX
Pre Phase CX

Phase CX

Post Phase CX

Phase CN

Authentication and Authorization Handler

XML Phases Definition Handler

File Handler

Database Handler
LIS Server

File Server Database Stored Procedures

LIS DAF

Off-line Storage

29

System Modules: Phases Manager

Request a new Job Download and upload the Jobs folders and files Submit the Job back to the system to continue other Phases Reject a Job and recommend another Phase in addition to specifying reasons Redirect a Job from the default Phase Sequence Provide information on the files level to help solving problems

30

31

Overview: System Architecture


Jobs in the System
Job Type A Phase A1 Phase A2 Phase AN

Check-Out To Digital Documents Repository

Phase Manager

Check-In Module

Reporting Module

Job Type B Phase B1 Phase B2 Phase BN

Administration Module

Archiving Module

Job Type C Phase C1 Phase C2

Phase CX
Pre Phase CX

Phase CX

Post Phase CX

Phase CN

Authentication and Authorization Handler

XML Phases Definition Handler

File Handler

Database Handler
LIS Server

File Server Database Stored Procedures

LIS DAF

Off-line Storage

32

System Modules: Administration

Roles Job Types General Settings Phases Users Workstations Collections

33

Overview: System Architecture


Jobs in the System
Job Type A Phase A1 Phase A2 Phase AN

Check-Out To Digital Documents Repository

Phase Manager

Check-In Module

Reporting Module

Job Type B Phase B1 Phase B2 Phase BN

Administration Module

Archiving Module

Job Type C Phase C1 Phase C2

Phase CX
Pre Phase CX

Phase CX

Post Phase CX

Phase CN

Authentication and Authorization Handler

XML Phases Definition Handler

File Handler

Database Handler
LIS Server

File Server Database Stored Procedures

LIS DAF

Off-line Storage

34

System Modules: Reporting

Reporting

Workflow Tracking Pending Items Late Jobs Operators Rates Build Customized Report

35

Overview: System Architecture


Jobs in the System
Job Type A Phase A1 Phase A2 Phase AN

Check-Out To Digital Documents Repository

Phase Manager

Check-In Module

Reporting Module

Job Type B Phase B1 Phase B2 Phase BN

Administration Module

Archiving Module

Job Type C Phase C1 Phase C2

Phase CX
Pre Phase CX

Phase CX

Post Phase CX

Phase CN

Authentication and Authorization Handler

XML Phases Definition Handler

File Handler

Database Handler
LIS Server

File Server Database Stored Procedures

LIS DAF

Off-line Storage

36

System Modules: Archiving

Archiving

On different Medias (CDs, DVDs, Tapes) with different size On online storage Confirm Media Successful reservation
37

Overview: System Architecture


Jobs in the System
Job Type A Phase A1 Phase A2 Phase AN

Check-Out To Digital Documents Repository

Phase Manager

Check-In Module

Reporting Module

Job Type B Phase B1 Phase B2 Phase BN

Administration Module

Archiving Module

Job Type C Phase C1 Phase C2

Phase CX
Pre Phase CX

Phase CX

Post Phase CX

Phase CN

Authentication and Authorization Handler

XML Phases Definition Handler

File Handler

Database Handler
LIS Server

File Server Database Stored Procedures

LIS DAF

Off-line Storage

38

System Modules: Check-Out


Check-in Plug-in Check-out Plug-ins DAR Plug-in Fedora Plug-in Fedora MARC File Plug-in DSpace Plug-in DAR

Check-Out

Virtua Plug-in DigiArab Plug-in

Java Reflection Call section of the XML Phases Definition Ingest the Jobs digital objects into the repository

DWMS

MODS File Plug-in . . . .

aDORe Plug-in . . . .

DSpace

aDORe

39

Quality Assurance

Supported on two different stages

Maintain QA information on the files levels while moving from a Phase to another A QA Phase is defined in the Digitization Phase Sequence as the last Phase before the Archiving
Information of output objects (pages) level

Arabic Books Scanning

Arabic Books Processing

Arabic Books OCRing

Arabic Books Encoding & Publishing

Arabic Books QA

Arabic Books Archiving

40

Achieving Automation Using DWMS

Command Line support

41

Achieving Flexibility Using DWMS

42

Achieving Flexibility Using DWMS


The defined Phase Sequence for a Job Type is a guide rather than a prescription The list of Phases may or may not being the Phase Sequence. The operator can assign the Job to any of all of these Phases Jobs can be Forwarded dynamically to another Phase in the Phase Sequence Changes in the Phase Sequence affect the current and new Jobs in the system, leading to natural process evolution
Arabic Books Processing Arabic Books OCRing Arabic Books Encoding & Publishing Arabic Books QA Arabic Books Archiving
43

Arabic Books Scanning

Adaptation of BA workflow to AMEEL

44

Adaptation of BA workflow to AMEEL


Create a Check-In Plug-in to automate the ingestion of AMEEL Books into the System Create a Publishing Reflection Call to: Create Separate text files for each image Rename them according to the original names Move them to the Deliver FTP location on the FTP server

45

For more details about DAF please refer to http://wiki.bibalex.org/DAFWiki

46

Thank You

47

You might also like