October 5, 2009Mr. Michael L. Wash, Chief Information O
f
cerGovernment Printing O
f
RE: XML Conversion TestsDear Mike:I would like to thank you for sharing your early drafts of your XML work on the O
f
cial Journals. I hope our input has been helpful, and I was thrilled to see you launch theFederal Register XML product as open source available for all developers.In this memorandum, I’d like to summarize the work we’ve been doing for the last 6months on a series of conversion tests on the XML source. Most of the heavy lifting onthe XML transformations was done byPoint.B Studio,and I’d also like to acknowledgetheSunlight Foundation,which funded our work on this project.
Overview of the Conversion Process
The Federal Register and Code of Federal Regulations have been authored for sometime in SGML. We used your SGML source along with the S2 utility which is part of theSGML Parser Toolkit package to provide some initial tests of the feasibility of converting this source to the more modern XML standard. In this test, we were able toconvert most, but not all in this semi-automated fashion. You may find these initialtests on the CFR product here:http://bulk.resource.org/gpo.gov/cfr/raw/After you approached us and indicated you were also working on SGML->XMLconversion, we dropped our initial automated approach and moved over to yourformat. As we both discovered, a fully general process would not take into accountsome of the special characteristics of the relatively complex SGML used in the O
f
cial Journals.From the SGML source, you used a program to convert most of the SGML markup intoXML. Our initial testing and validation on this indicated that the conversion work youwere doing made a lot of sense. We then moved on to focus on two core issues:•Conversion of the XML to print formats such as PDF.•Conversion of the XML to more modern on-line formats, particularly HTML.We were very pleased with the results, some of which I have shown you over the lastfew months as we completed the various phases of the project.
PUBLIC.RESOURCE.ORG
~ A Nonprofit Corporation
Open Source America’s Operating System
“It’s Not Just A Good Idea—It’s The Law!”
carl@media.org
1005 GRAVENSTEIN HIGHWAY NORTH, SEBASTOPOL, CALIFORNIA 95472 •PH:(707) 827-7290 •FX: (707) 829-0104
Add a Comment