Web Archiving and Archive Management

Prepared By: Sirajuddin bin Ab Aziz Prepared For: Miss Sri Intan binti Shahrul Asaari

O Web archiving is the process of collecting a portion of

the World Wide Web (Internet) and preserve it in an archive for the use of researchers, historians, and the public in the future. O They were collected by web archivist via Web Crawler and in a form of  HTML Web Pages  Style Sheets  Java Scripts  Images  Video

Current Issue
O Copyright Issues O As been stated by Peter Lyman Although the Web

is popularly regarded as a public domain resources, it is copyrighted; thus, archivist have no right to copy the web. O Management Issues O How to manage the Web Content? O Have we obtained this legally? O Policy

O The Web Crawler Limitations:
 Large portion of a Web Site many be hidden

of the Deep Web.  Crawler Trap may cause crawler to download infinite number of pages. O The changes, infinite size of the web O Consume a lot of bandwidth if taken lightly. O Virus attacks.

O Obtain the web material through legal deposit

act whereby a person or a group must submit a copy of their publications to a repository. O Configure the web crawler by limiting the pages that they can crawl. O Provide a better standard on organizing the web content. O Create a backup on all of the Web content.

O As the present era takes a great care on the

archived materials, the same would be implied on the future when the Web content would be placed on same place with the archive for the Web archiving is important to protect their own corporate heritage, regulatory and legal purposes.