Macromolecular Structure Database March 3, 1998

NATIONAL SCIENCE FOUNDATION DIRECTORATE FOR BIOLOGICAL SCIENCES MACROMOLECULAR STRUCTURE DATABASE PROPOSAL SOLICITATION LETTERS OF INTENT (OPTIONAL) BY: PROPOSAL RECEIPT DEADLINE: INTRODUCTION The Biological Sciences Directorate (BIO) of the National Science Foundation (NSF), through the Biological Database Activities Program in the Division of Biological Infrastructure, has identified support for the design, development, implementation and use of biological information resources as a priority. The study of the three-dimensional structure of biological macromolecules is an area of biological research where there is a continuing need for a resource that contains known information in a form that permits rapid identification, analysis and retrieval by a large and growing community of researchers. Therefore, the Biological Database Activities Program announces a special re-competition for an award to establish, maintain and distribute a Macromolecular Structure Database (MSD). The MSD will serve as an archive for the three-dimensional Cartesian coordinates and related information, including structure factors, for those biological macromolecules and macromolecular assemblies whose structures have been determined to atomic resolution. Such a database, called the Protein Data Bank (PDB), has existed for over 25 years. The PDB continues to be a unique resource in its role as a primary archive of structural biology information of use to both the public and private sectors. The crucial importance of such a database has been widely recognized in the relevant public and private sector research communities. Therefore, the successful awardee of this special re-competition will be required to incorporate the structural information of the PDB into the MSD. Because information about the structure of biological macromolecules is central to the advancement of biological research, other Federal agencies, including the Office of Biological and Environmental Research (OBER) of the Department of Energy (DOE), the National Institute of General Medical Sciences (NIGMS) and the National Library of Medicine will join the NSF in sponsoring the MSD. This announcement is issued by NSF on behalf of and in collaboration with these other agencies. EXPECTED SCOPE The MSD is expected to serve as an archive for data generated through x-ray scattering and Nuclear Magnetic Resonance (NMR) studies of proteins and nucleic acids. Proposals submitted in response to this announcement must discuss the structure of the proposed database, including the format of data entries, and provide detailed plans for long-term management and distribution of the database. The data should be structured and maintained in a way that permits the development and use of complex queries by knowledgeable users, including commercial software developers. The MSD will be expected to collaborate with other efforts relevant to structural databases, e.g., those APRIL 10, 1998 5 PM (EDT), MAY 27, 1998

of the National Center for Biotechnology Information, and the European Bioinformatics Institute. Plans detailing how such collaborations might work should be provided. However, formal arrangements for the collaborations need not be made prior to an award. The proposals must also provide plans for the incorporation into the MSD of structural information currently found in the Protein Data Bank (PDB) and for the timely assumption of responsibility for data entry, archive maintenance and database distribution, all of which are now provided by PDB. At the current time, PDB includes about 7000 entries and receives about 1500-2000 new entries each year. About 20% of new entries represent results of NMR studies. The annual number of new entries is expected to grow as new and more efficient methods for structure determination are developed. Thus, the scalability of the strategies for data entry and retrieval, for database maintenance and for distribution is expected to be an important issue. A current copy of the PDB, as well as other information about operation of the database may be accessed through the World Wide Web (WWW) at WHO MAY SUBMIT For the purpose of this competition, NSF will accept applications from U.S. colleges and universities, non-academic organizations (both non-profit and for profit), and Federally Funded Research and Development Centers (FFRDCs) as described in the NSF brochure "Grant Proposal Guide" (GPG), NSF 98-2, Chapter 1, Section D. The GPG is available on the NSF web site at the URL ( or as a printed booklet at no cost from: NSF Publication Clearinghouse P.O. Box 218 Jessup, MD 20794-0218 Phone: (301) 947-2722. E-mail: Consortia of eligible individuals or organizations may also apply, but a single individual or organization must accept overall management responsibility. Potential applicants who are unsure if they are eligible to apply, or who plan to include a foreign institution as a member of a consortium or subcontractor, must consult the cognizant NSF program officer listed at the end of this solicitation. PRINCIPAL INVESTIGATOR AND OTHER SENIOR STAFF The Principal Investigator (PI) and other senior staff responsible for the project must have the necessary skills to successfully carry out the tasks outlined below, or the proposal must present convincing plans to hire such staff. The PI should have demonstrated the leadership necessary to meet the challenges of managing a database in a rapidly changing technological and scientific environment. The PI and other members of the senior staff should, in the aggregate, have experience with aspects of structural biology research relevant to the database, have current knowledge about computerized databases and their management, and have a demonstrated ability to interact with the members of the various scientific disciplines and other groups important for the successful operation of the database. Experience with the successful management of a database effort of comparable scope and complexity will be considered an important asset.

AWARD The NSF intends to make a five-year award using a cooperative agreement between the agency and the awardee. The exact amount of the award will depend on the advice of reviewers and on the availability of funds. The cost of the award will be shared among the participating Federal agencies, either through interagency transfers to the NSF or some other means, as appropriate. The overall budget, including contributions by all agencies, is expected to be in the range of $1-2 million per year (including indirect costs), with additional funds for equipment purchase if needed. As a condition of the award, the awardee will be required to place submitted data into the public domain in a timely fashion. The awardee will also be required to secure and maintain, on behalf of the NSF, a service mark for the name Macromolecular Structure Database (MSD) or other name used to designate the database. Furthermore, the awardee must agree that in the event of non-renewal of the award, it will transfer to the NSF or its designee, without condition or additional charge, the right to use the service mark and will also transfer a current version of the database and current versions of all software necessary for entry submission and for database operation or access. The awardee will be expected to establish a formal mechanism for insuring external input from relevant professional societies and interested individuals regarding MSD policies and practices. An appropriate mechanism could, for example, consist of a standing external advisory board with relevant technical and managerial expertise. The function of the mechanism will be to advise senior management of the MSD and the awardee institution(s) on policies such as those regarding format, content and validation of entries and reports, those related to other aspects of use or distribution of the database, etc. Any criteria that establish the minimum amount of data and other information needed for a new entry must be approved by this mechanism prior to implementation. Periodic review and approval of the utility and appropriateness of any such criteria will be expected. Implementation of the mechanism should insure that the views of relevant research communities are represented as part of this advice. In general, the NSF expects that the mechanism will provide an opportunity for input from groups such as the International Union of Crystallography, the U.S. National Committee for Crystallography, and representative professional societies whose members use the database in significant numbers. The mechanism should also provide opportunity for input from non-U.S. users. The appropriateness and adequacy of the advisory mechanism, as implemented, will be subject to approval by the NSF in consultation with other Federal Agencies that support the MSD. LETTER OF INTENT Individuals who intend to submit a proposal are requested to send a letter of intent to the cognizant NSF official listed at the end of this solicitation by April 10, 1998. The letter should include a list of individuals who are expected to participate in the proposed activity, and be signed by both the PI and a representative of the institution expected to submit the proposal. All individuals who are expected to receive funds in the event of an award, or who are otherwise expected to collaborate actively in the project, should be named and their institutional affiliations given if different from that of the

institution expected to submit the proposal. In general, non-U.S individuals and institutions expected to collaborate in data entry or distribution of the database need not be named unless they are expected to receive funds in the event of an award. No letters certifying institutional commitment or willingness to collaborate may be provided at this time. PROPOSAL SUBMISSION The proposal should be prepared following guidelines contained in the GPG, NSF 98-2, the NSF Proposal Forms Kit, NSF 98-3, and the instructions below. The GPG, NSF 98-2, is available on the NSF web site at the URL The NSF Proposal Forms Kit, NSF 98-3 is available on the NSF web site at the URL The proposal must be printed single-spaced on a single side of the page using 2.5 cm margins on all sides. The type size must be clear and readily legible, in standard size which is 10 to 12 points. (No smaller than 10 point font size will be accepted.) If constant spacing is used, then there should be no more than 12 characters per 2.5 cm, whereas proportional spacing should provide no more than an average of 15 characters per 2.5 cm. Pages submitted must be of standard size. Metric A4 (210mm by 297mm) is preferred; however, 8 �" by 11" (216mm by 279mm) may be used. Proposals that do not strictly adhere to the specified page limitations (given below), including those in required or permitted appendices, will be ineligible for consideration and will be returned. Each proposal must contain the following elements in the order indicated: 1. NSF Cover page (NSF Form 1207). Clearly indicate that the proposal is for consideration by the BIO DataBase Activities Program in the appropriate box. 2. Table of Contents (NSF Form 1359). Provide a Table of Contents with page numbers for each section and for major subdivisions of the project description (see below). 3. Summary. On a separate page, provide a brief (200 words or less) description of the project. 4. Prior Experience. If the PI or any co-PI has received federal support for the establishment or operation of a publicly available database within the last five years, provide a brief description of the relevant features of the database together with the name of the agency providing support, the award number and title, and the amount and duration of the award. This section should include a general description of the type of database, number of users, means of distribution, etc. If the database is available electronically, provide the relevant URL. If this proposal includes use of schema, software, or other aspects of the previously supported database, then additional written documentation may be requested by NSF prior to or during review. If awards for more than one project have been received, describe the project most relevant to the current proposal. This section is limited to a maximum of 5 pages, including any references. 5. Project Description. Particular attention must be paid to the following major aspects in preparing a description of the proposed project. Although some relevant technical issues are mentioned below, these details are intended only as guidelines. This section must not exceed 25 pages inclusive of

references, tables, diagrams or other visual material. A. Archive Structure: The proposal should provide a description of (1) the logical or conceptual model for the data and (2) a general outline of the physical implementation schema for the archive. The general features and overall design of both must be justified in the context of efficient data management and researcher support functions. B. Data Acquisition: Proposals should describe the manner in which the data to be placed in an entry will be acquired from the investigator who is the original source of the data, including data exchange formats to be used. The submission procedure must permit use of accepted community standards for data exchange, such as the PDB format and the macromolecular Crystallographic Information File (mmCIF) format (see below). If submitted data will be converted to another format for storage in the archive, the procedure for retrieving the data must be able to provide the data in the original format without loss of information and with minimal intervention by MSD staff. Because it is anticipated that the annual number of submissions will continue to increase in the future, an important technical issue to be considered is the development and use of software for direct submission of data as well as the scalability of the approach. C. Database Content: Proposals should describe precisely the expected content of database entries. Entries are expected to include an atomic model derived by current techniques in structural biology, and possibly other representations of macromolecular structure. They should also include annotation of structural features relevant to biological function and, if submitted, primary experimental data such as structure factors. Minimum criteria for insuring the completeness and consistency of entries at the time they are placed in the archive should be described, as should procedures for assuring that the criteria have been met. It is expected that the utility of the criteria and procedures will be periodically reviewed and approved using the formal external advisory mechanism. D. Database Maintenance: Proposals should address the technical issues involved in the maintenance of a highly automated, direct submission archive, with convenient public access and off-site backup or other provision for protection from software or hardware failure. Provisions for maintenance of internal and external links should be discussed. The focus of the proposal should be the operation of a basic archive. Support for research on the database will not be provided through this award. E. Database Distribution: Proposals should also describe the distribution methods envisioned, for example network access to the complete collection using the WWW or other means, and periodic production of tapes, CD-ROM or other media containing current entries. Report formats must permit full use of accepted community standards for data exchange, such as the PDB format and the macromolecular Crystallographic Information File (mmCIF) recently adopted by the International Union of Crystallography. Information about the PDB format is available at the PDB homepage referenced above. Information about mmCIF is available through the WWW at the following address: If mirror sites are to be used, describe how the central and mirror sites will interact, estimate the time and effort required to operate a typical mirror and provide the criteria to be used in selecting mirror sites.

Any planned charges for copies on tape or other media, or for permission to provide such copies, should be discussed briefly in the proposal. Such charges will be allowed, but are subject to approval by the NSF. Periodic assessment of the utility of the distribution methods will be expected as part of management and oversight of the MSD. F. Direct Access: Describe how users will be able to develop and use direct queries of the database. The interaction with the archive and the means to insure stability and security should be specified. G. Assumption of Responsibility for Database Operation: If appropriate, provide a timetable for the assumption of responsibility for new data entries and distribution of the database, including any efforts necessary for incorporation of entries now found in the PDB into the new database. It is anticipated that the time required for complete assumption of the responsiblity will not exceed one year from the date of the award. H. Quality Control: Describe provisions for insuring the quality of the database and its operation, including procedures for obtaining and responding to user feedback on issues related to quality. I. Required Effort: Estimate the time and effort to be devoted by MSD staff to the process of (1) acquiring data, (2) archiving entries, (3) maintaining the database, and (4) distributing the database. Any planned effort related to outreach or user support should be estimated separately. For items (1) and (2), estimate on both an annual and per entry basis; others should be on an annual basis. In the event that subcontractors or collaborators will be responsible for aspects of data acquisition (e.g., through secondary deposition sites) or development of new software necessary for operation of the database, include an estimate of the time and effort to be required and the expected sources of funding for their efforts. J. Management: A sound management plan will be a crucial aspect of the proposal. The responsibilities of the various senior personnel must be clearly described, as must the time and effort to be committed by each. A mechanism for replacing key personnel who leave the project must also be described. In the event senior personnel will participate in multiple activities related to the database (e.g., outreach, data acquisition, etc.), estimate the anticipated effort with respect to each activity. 6. Budget (NSF Form 1030). Provide a budget for each year of support requested as well as a separate, cumulative budget for all five years. If funds for subcontracts are requested, then a separate Form 1030 must be prepared and signed by each subcontractor to show the distribution of subcontract funds across categories. The NSF does not normally approve subcontracts to non-U.S. individuals or entities. Any request for such a subcontract must fully document and justify the necessity of using the chosen subcontractor. Funds for facility construction or renovation may not be requested. 7. Budget Justification. A brief justification for funds in each budget category should be provided. For major equipment or software materials, a particular model or source and the current or expected price should be specified whenever possible. A brief explanation of the need for each item whose cost exceeds $10,000 should be provided. This section should also include details of institutional cost sharing, if any, and of other sources of support for the project, such as government, industry, or private foundations.

Appropriate documentation of any such commitments should be provided in an appendix (Appendix A). Although cost sharing is not required, any such commitment specified in the proposal will be referenced and included as a condition of an award resulting from this solicitation. 8. Facilities, Equipment & Other Resources (NSF Form 1363). Include a brief description of available facilities, including space and computational equipment available for the project. Where requested equipment or materials duplicate existing items, explain the need for duplication. This section is limited to 2 pages. 9. Biographical Sketches. For each of the key personnel, including senior staff and any other staff whose participation is critical to the success of the project, provide a curriculum vitae or short biographical sketch. Briefly describe relevant experience and list up to 10 publications (to include the individual's 5 most important and up to 5 other, relevant publications). The information may not exceed 2 pages for each individual. Copies of letters indicating agreement to participate should be provided by all senior personnel who do not endorse the cover page as PI or co-PI. Such letters should include a brief description of the individual�s expected role in the project and an estimate of the time and effort to be required. The letters should be provided in an appendix (Appendix B). 10. Collaborations. Plans requiring collaborative effort by an individual not employed at the submitting institution(s) should be supported by a signed letter from the individual. Besides indicating a willingness to collaborate, the letter should provide a brief outline of the goals of the collaboration and estimate the time and effort the individual expects to devote to the collaboration. Biographical sketches should not be provided for such individuals, unless requested by NSF. A collaboration whose primary purpose is advisory (e.g., service on a committee that will provide policy advice) does not require such a letter. Copies of the signed letters should be provided in an appendix (Appendix C). 11. Current Support (NSF Form 1239). Provide a complete list of current and pending support for all key personnel whose biosketches are included in item 9. 12. Appendices. Only the appendices described in sections 7, 9 and 10 (above) are allowed. Other letters of endorsement may not be included. 13. Additional Information. The following items must be provided:

A. One completed copy of NSF Form 1225 (Information about Principal Investigators/Project Directors). B. An alphabetical list of current and past collaborators of all key personnel whose biosketches are included, and of any other staff or collaborators mentioned by name in the proposal. This list should include names of all graduate students and postdoctoral fellows who have trained with these individuals, as well anyone with whom these individuals have co-authored a paper within the last 4 years. Attach these additional items to the copy of the proposal that bears the original signatures, with the Form 1225 on top and the list of collaborators following the proposal. These items are for NSF internal use only and will not be shown to reviewers. Do not provide additional copies of these items with the other copies of the proposal.

The PI is responsible for the completeness and accuracy of the proposal as submitted. Unless requested by the NSF, additional information may not be sent following proposal WHEN AND WHERE TO SUBMIT The proposal must be received at NSF no later than 5:00 p.m. Eastern Time (ET) May 27, 1998. You are encouraged to use NSF FastLane to prepare and submit your Macromolecular Structure Database proposal. To access FastLane, go to the NSF Web Site at the URL, then select "FastLane" or go directly to the FastLane Home page located at For proposals not submitted via FastLane the completed application, including the original proposal and 20 copies should be sent (see the GPG, Chapter I, Section E): Program Announcement Number: NSF 98-66 National Science Foundation PPU Macromolecular Structure Database 4201 Wilson Boulevard, Room P60 Arlington, VA 22230 INSTRUCTIONS FOR SUBMISSION OF MACROMOLECULAR STRUCTURE DATABASE PROPOSALS USING NSF FASTLANE * If you are using FastLane to prepare your Macromolecular Structure Database Proposal it must be received electronically no later than 5:00 p.m. ET May 27, 1998. The signed Cover Sheet and signed Certification Page must be received at NSF no later than 5:00 p.m. ET June 3, 1998. To access FastLane, go to the NSF Web Site at the URL, then select "FastLane" or go directly to the FastLane Home page located at * For instructions to prepare and submit your proposal via Fastlane please see Instructions for Preparing and Submitting a Standard Proposal via FastLane located at Additionally, read the PI Tipsheet for Proposal Preparation, and the Frequently Asked Questions about FastLane Proposal Preparation. The information can be found under Proposal Preparation on the FastLane website at * Project Description. In addition to the requested information detailed in this announcement regarding the Project Description, include requested information about Prior Experience (item 4) as specified in the Proposal Submission section of this announcement, appropriately labeled, at the beginning of the Project Description. Note: Prior Experience (item 4) does not count against the 25 page limit of the Project Description, however, this section is limited to a maximum of 5 pages, including any references. * Appendices and Additional Information. Do not include appendices as part of the FastLane submission. Instead, mail 1 copy of appendices as requested in items 7, 9, and 10. Only the appendices described in sections 7, 9 and 10 (above) are allowed. Also, mail with the package, 1 copy of the alphabetical

list of current and past collaborators, as requested in item 13. This information must be received by 5:00 p.m ET, May 27, 1998. Other letters of endorsement may not be included. Mail to: Dr. Gerald Selzer, Division of Biological Infrastructure, National Science Foundation, 4201 Wilson Boulevard, Room 615, Arlington, VA 22230. Label package "Appendices � Announcement 98-66." It is the proposer�s responsibility to ensure that materials are received by the deadline. * The required signed cover sheet and certification form must be received by no later than 5:00 p.m. ET June 3, 1998. Mail to: Division of Biological Infrastructure, National Science Foundation, 4201 Wilson Boulevard, Room 615, Arlington, VA 22230. It is the proposer�s responsibility to ensure that materials are received by the deadline. EVALUATION OF PROPOSALS The review and selection process will use the merit review criteria as described in GPG, Chapter III, and below: What is the intellectual merit and quality of the proposed activity? The following are suggested questions that the reviewer will consider in assessing how well the proposal meets this criterion. Each reviewer will address only those questions which he/she considers relevant to the proposal and for which he/she is qualified to make judgments. How important is the proposed activity to advancing knowledge and understanding within its own field and across different fields? How well qualified is the proposer (individual or team) to conduct the project? (If appropriate, the reviewer will comment on the quality of prior work.) To what extent does the proposed activity suggest and explore creative and original concepts? How well conceived and organized is the proposed activity? Is there sufficient access to resources? What are the broader impacts of the proposed activity? The following are suggested questions that the reviewer will consider in assessing how well the proposal meets this criterion. Each reviewer will address only those questions which he/she considers relevant to the proposal and for which he/she is qualified to make judgments. How well does the activity advance discovery and understanding while promoting teaching, training, and learning? How well does the proposed activity broaden the participation of underrepresented groups (e.g., gender, ethnicity, geographic, etc.)? To what extent will it enhance the infrastructure for research and education, such as facilities, instrumentation, networks, and partnerships? Will the results be disseminated broadly to enhance scientific and technological understanding? What may be the benefits of the proposed activity to society? The NSF review criteria as outline in the GPG (NSF 98-2) will be interpreted in light of the objective of this solicitation as follows: Intellectual Merit and Quality: This criterion addresses the overall quality of the technical and managerial aspects of the proposal, including plans for distribution of the MSD and for management oversight and long-range planning. The projected cost and the time required for implementation and operation of

the database in a fully functional form will also be considered. This criterion also addresses the capabilities of the proposed personnel, including those of the PI and other senior staff as discussed on page 2 of this solicitation, the technical soundness of the proposed approach, and the adequacy of the resources available or proposed. Broader Impact: An important issue is the likelihood that the proposed work will yield results with generic usefulness and applications both in structural biology and in other fields. Examples of such impact might arise from the utility of the MSD for a variety of uses in academic and industrial settings, availability of automated links with other databases, suitability of the database for research related to common or uncommon properties of macromolecules, and the utility of MSD management strategies and MSD software for operation of other databases. A special emphasis panel will be formed to review the applications and site visits may be used as needed. AWARD ADMINISTRATION The award will be administered in accordance with the terms and conditions of NSF GC-l, "Grant General Conditions," (12/97) and NSF CA-1, "Cooperative Agreements General Conditions" (12/95). This information can be obtained from the NSF OnLine Document System ( Copies of these documents are available at no cost from the NSF Clearinghouse, P.O. Box 218, Jessup, MD 20794-0218, phone (301) 947-2722, or via e-mail at More comprehensive information is contained in the NSF Grant Policy Manual (NSF 95-26), for sale through the Superintendent of Documents, Government Printing Office (GPO), Washington, D.C. 20402. The telephone number at GPO is (202) 783-3238 for subscription information. The NSF Grant Policy Manual can also be accessed online at the above URL. OTHER INFORMATION Inquiries regarding the announcement should be directed to the cognizant NSF official: Gerald Selzer Division of Biological Infrastructure, Room 615 National Science Foundation 4201 Wilson Boulevard Arlington, VA 22230 Tel: (703) 306-1469 FAX: (703) 306-0356 E-mail: The Foundation provides awards for research and education in the sciences and engineering. The awardee is wholly responsible for the conduct of such research and preparation of the results for publication. The Foundation, therefore, does not assume responsibility for the research findings or their interpretation. The Foundation welcomes proposals from all qualified scientists and engineers and strongly encourages women, minorities, and persons with disabilities to compete fully in any of the research and education related programs described

Privacy Act. The information requested on proposal forms is solicited under the authority of the National Science Foundation Act of 1950, as amended. It will be used in connection with the selection of qualified proposals and may be disclosed to qualified reviewers and staff assistants as part of the review process; to applicant institutions/grantees; to provide or obtain data regarding the application review process, award decisions, or the administration of awards; to government contractors, experts, volunteers, and researchers as necessary to complete assigned work; and to other government agencies in order to coordinate programs. See Systems of Records, NSF 50, Principal Investigators/Proposal File and Associated Records, and NSF-51, 60 Federal Register 4449 (January 23, 1995), Reviewer/Proposal File and Associated Records, 59 Federal Register 8031 (February 17, 1994). Public Burden. Submission of the information is voluntary. Failure to provide full and complete information, however, may reduce the possibility of your receiving an award. The public reporting burden for this collection of information is estimated to average 120 hours per response, including the time for reviewing instructions. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Gail A. McHenry, Reports Clearance Officer, Information Dissemination Branch, National Science Foundation, 4201 Wilson Boulevard, Suite 245, Arlington, VA 22230. The National Science Foundation has TDD (Telephonic Device for the Deaf) capability, which enables individuals with hearing impairment to communicate with the Foundation about NSF programs, employment, or general information. To access NSF TDD, dial (703) 306-0090; for FIRS, 1-800-877-8339. The program described in this announcement is in category 47.074 (Biological Sciences) of the Catalog of Federal Domestic Assistance.