Access to parliamentary information, open document standards, and improving dialogue with civil society Daniel Schuman, the

Sunlight Foundation February 28, 2012 My name is Daniel Schuman, and I am an attorney with the US-based Sunlight Foundation.1 The Sunlight Foundation is a civil society organization that advocates for greater government transparency. We believe that public information held by the government should be available online, in real time, and in machine-readable formats. I’d like to congratulate the House of Representatives, the Inter-Parliamentary Union, and the United Nations for hosting such an important conversation. Thank you also to the Global Centre for ICT in Parliament, whose hard work had made this conversation possible. The focus of this panel discussion, and my remarks, are on increasing the public’s access to parliamentary information. In particular, I will explore the collaborative relationship between parliaments and civil society organizations in increasing government transparency. I will also explore how open document standards can support this partnership. THE STATE OF LEGISLATIVE TRANSPARENCY AND THE ROLE OF CIVIL SOCIETY Perhaps the best place to start is the current state of legislative transparency. While great work is being done abroad, I am most familiar with the American context, and will start there. The United States Congress created an online legislative information system, known as THOMAS, in 1995.2 THOMAS contains the text and summaries of legislation, it identifies legislative sponsors and amendments authors, it provides an outline of a bill’s legislative path as well as transcripts of floor debates, and it contains other related information. THOMAS has nearly 1 million visits each month, according to the Library of Congress.3 But many more people use THOMAS’s legislative data than the Library’s usage statistics reflect. For example, two civil society legislative information websites together have nearly double the visits per month as THOMAS.4 These sites repackage legislative data in more user-friendly ways, and add additional legislative and contextual information. There are other privatelycreated commonly-used websites and mobile apps that are used by hundreds of thousands of people.
1 2 3 Annual Report of the Librarian of Congress, 2010, p.4, available at 4 According to, THOMAS has 323,000 unique visits per month, compared to 293,000 for and 249,000 for

If our goal is to increase public access to parliamentary information, we must start by acknowledging that civil society organizations and others – and not the government – are already where the majority of Americans turn to for legislative information. These privately-created sites are more flexible, more innovative, and don’t cost the government a penny to develop. This trend is not limited to the United States. But while non-governmental actors are the main conduit through which the public accesses legislative information, that access is dependent upon government making its work product freely available online in machine readable formats. When online publication is compromised, civil society organizations and others step in to fill the void, but they can never entirely replace the government’s role as author and distributor of legislative information. And when governments do not release legislative information to the public at all, or do so poorly, they prompt the creation of secondary markets where government data leaks out to the public – but at a high price. This creates information asymmetries where those with the greatest financial resources have significantly greater access to basic information about the government’s activities. This in turn creates a feedback loop, where those with the resources to gather information about what the government is doing are in a privileged position to influence its activities. In a democratic system, this information asymmetry can not be allowed to flourish. There is a better way. To the largest extent possible, government information should be made available to the public. There must be a presumption of openness. And the category of publiclyavailable legislative information should be as broad as possible. Legislative information clearly includes the text of legislation and transcripts of floor debates. It also includes information about how legislation changes as it moves through the legislative process. It includes proposals for amending legislation, expert analyses performed by parliamentary research services, and reports from executive branch agencies. And it includes schedules of committee hearings and floor debates, live video of proceedings, and vote counts.5 We do not have real openness when this information is made available to the public, but only in obscure ways. To maintain legislative information as paper records in an archive that is rarely open to the public and is difficult to access is to promote information asymmetries. We have all come to the conclusion that online publication is the most effective means to share information with the public. But poor online publication practices can have the effect of thwarting transparency. The Sunlight Foundation, building on the efforts of others, has identified principles to evaluate whether electronically-stored government data is being properly made available for public use.6 These principles fall into three broad categories: data quality, data access, and data transformation and reuse. 7
5 6

It includes more than this, but a longer laundry list would be inappropriate for a speech. We also recently looked at “Benchmarks for Measuring Success of Legislative Data Transparency,” focused entirely on the U.S. Congress. The remarks are available here: 7 These three principles are really a summary of the “Ten principles for Opening up Government Data,” available at


DATA QUALITY To evaluate legislative data quality, it’s important to assess (1) whether it is complete, (2) whether it draws upon primary source data, and (3) whether it is permanently available. A dataset is complete when it reflects the entirety of what is recorded about a particular subject. This allows users to understand the scope of information made available and to examine data items at the greatest level of detail. Generally speaking, all raw information from a dataset should be released to the public, as well as metadata that defines and explains the raw data. Primary source data is the information originally collected by the government. It should be released to allow users to verify that information was collected properly and recorded accurately. Permanent data availability means that once information is released by the government, it is available forever in a findable location. The ongoing availability of data allows for its authenticity to be checked. Users can check their data against the original source to make sure they have the most up-to-date information, and that it has not been corrupted. DATA ACCESSIBILITY Having looked at data quality, let’s now move to data accessibility. Data accessibility can be evaluated on the basis of (1) who can obtain the information, (2) how they must obtain it, (3) when they can access it, and (4) whether they must pay any fees to do so. Any person should be able to access legislative data at any time without having to submit identification or justify what it’s going to be used for. While administrators often set up barriers to access, including registration or membership requirements, these barriers are unnecessary and should be avoided. Similarly, in ideal circumstances, accessing data should require only a trivial amount of effort. The best ways to make information in a database publicly available are to provide online access through APIs and bulk downloads. An API allows a computer to ask for a specific data item, and bulk access provides a copy of the entire database for download. Of course, it’s often useful and appropriate to have a browser-based way to access the information as well, but that’s no substitute for the ability to download all the information at once. Datasets collected by the government should be released to the public as quickly as possible. Priority should be given to data that’s usefulness is time sensitive. There should be no charge to access public data. While there are many reasons given for imposing fees, the reality is that the information has already been paid for by the public, and that even small fees significantly chill access. While it is tempting to impose costs on the public, all too often this is a pretext to suppress public access, especially if the data could be viewed as politically sensitive. The result is an information asymmetry that benefits those with the greatest financial resources and political pull.


DATA USE AND TRANSFORMATION Having now looked at data accessibility, let’s move to data use and transformation. Whether legislative data can be used or transformed for other uses can be examined by looking at (1) whether there are licensing requirements, (2) whether the data is machine readable, and (3) if it recorded in an open standard format. Licensing requirements, including the all-too-common crown copyright, are unnecessary barriers to public use of data. Public information should be labeled as a work of the government that is available without restriction and is a part of the public domain. The additions of dissemination restrictions, attribution requirements, and so on unnecessarily limit the public’s ability to evaluate and make use of legislative information. Machine readability, which is the ability of computers to process data, is incredibly important. In our modern era, nearly all information is created in a digital format, and the small amount of information that is not “born digital” is (or should be) quickly adopted into an electronic format. Data should be stored in widely-used formats that easily lend themselves to machine processing. Handwritten notes cannot be understood by computers, and scanning documents via Optical Character Recognition results in many errors. The widely-used PDF format, while adequate for displaying documents, makes it very difficult for computers to extract the underlying information. When other factors require the use of difficult-to-parse formats like PDFs, the data should also be made available in a machine-friendly format. Let me add, as an aside, that the consistent use of unique entity identification numbers across datasets, such as to identify legislators and bills, also makes it much easier to analyze and sort information. Data should be stored in an open standard format, which refers to who owns the format in which data is stored. The purpose here is to make data available to as many potential users as possible. Ideally, data should be stored in a format that any person can access without having to purchase a software license, and can be processed by commonly-available programs. There’s nothing wrong with publishing information in a proprietary standard as well so long as it is also equally available in an open standard. In summary, when parliaments evaluate how they make information available to the public in terms of data quality, data access, and ease of data transformation and reuse, they take a significant step forward towards empowering civil society and the public at large. The United States House of Representatives, for instance, is taking a significant step towards greater transparency as we speak. The House is publishing more data online as it is created. It also recently created a one-stop portal for bulk access to large amounts of legislative data, and it established consistent ways of identifying different types of legislative datasets. While there is more work to do, there can be no doubt that the House of Representatives is making significant progress. STEPS TO INCREASE PUBLIC ACCESS TO LEGISLATIVE INFORMATION


For any parliament looking to increase public access to legislative information, here are four common-sense suggestions. First, perform a legislative information audit. Determine what types of legislative information you create, what you collect, and how and where it is collected. See whether it’s released to the public, and if so, what formats are used. Second, bring together all the people responsible for creating and collecting legislative data with those outside the government that make use of it. Set up a working group that meets regularly to discuss issues of information collection and dissemination. The first order of business should be to review the legislative information survey. Not all of this work can be done with civil society organizations in the room, but much of it can, and you will benefit from their presence. Third, publish whatever legislative information you have online, subject only to narrowlytailored restrictions for confidential legislative work product and sensitive personal information. Even if the information is not perfectly formatted when released to the public, it’s better to make available online what you have right now than wait for a perfection that may never come. The civil society organizations will be able to steer you towards where you should devote the most effort to releasing better formed data sets, and to point out the flaws in the data that you’ve released. This kind of iterative effort should be welcomed. Finally, make a plan to improve the quality, accessibility, and reusability of the information that you’re releasing. In the short term, how data is released will depend largely on existing internal procedures, but over time, parliamentary processes themselves may be adapted to meet the emerging needs of parliamentarians and the general public. Publishing legislative information online holds forth the promise of rejuvenating public engagement in our political systems. The experiment is already underway, so we need to give it our best effort. Thank you.


Sign up to vote on this title
UsefulNot useful