You are on page 1of 1

Protein Data Bank (PDB) & File Format

Sumit Kumar Halder (Mob: 8961257875, Email: sumithalder80@gmail.com)

For “Structure and plots of characters of motion picture based on clustering of novel” we need a database
to store all the significant information in a structural manner. For that we are going to use the Protein Data Bank
(PDB) database. PDB is a textual file format describing for three dimensional structural data of
proteins and nucleic acids. Now a days scientific journals and funding agencies uses PDB to store their structural
data as it provides description and annotation of data, data structure including data coordinates and data
connectivity. Many databases use the concept of the PDB file format. In our study we build our own database
stimulated by the PDB to store the records of a novel.

Mainly PDB database contains two files, the sequence file and the PDB file. Sequence file is based on
FASTA format which is a text based format for representing nodes sequences or characters sequences of a novel.
The sequence file contains four types of records:

 Node ID: Characters of a novel represents by the PDB ID or Node ID


 Chain ID: Connectivity or the pairs between two nodes represents by the Chain ID.
 Mode value: Mode value represents the expression of a node positive (1) or negative (0) towards the next
node of the chain.
 Sequence: Order of the appearance of characters of a novel represents by the sequence.

The PDB file is a text based format contains header information and coordinates information of a novel. The PDB
file contains the following records:

 HEADER: The HEADER record contains a preface of a novel such as its subject, scope, aims and release
date.
 TITLE: The TITLE record contains a title that describes the subject of the novel.
 COMPND: The COMPND record describes adjacency matrix among the nodes (characters) and
involvement of the chains among the nodes.
 SOURCE: The SOURCE record specifies the background or the source of each character in the novel.
Sources are described by both the specific name and the alias name.
 KEYWDS: The KEYWDS record contains a set of terms relevant to the novel such as character’s name,
functional classification, activities etc.
 EXPDTA: The EXPDTA record presents experimental technique(s) with optional comment used for the
structure determination.
 AUTHOR: The AUTHOR record contains list of contributors.
 SPRSDE: The SPRSDE record contains a list of Node ID and Chain Id of a novel which are obsolete
during the build of the adjacency matrix. The characters are less important in a novel thus remove from
consideration but store in the SPESDE record.
 JRNL: The JRNL record contains the primary literature citation that describes the experiment. There is at
most one JRNL reference per entry. If there is no primary reference, then there is no JRNL reference.
Other references are given in REMARK 1.
 REMARK: REMARK records present experimental details, annotations, comments, and information not
included in other records. A new level of structure is being used for some REMARK records. This is
expected to facilitate searching and will assist in the conversion to a relational database. The information
fetch from the XML file.
 SEQRES: SEQRES records contain a listing of the consecutive conversations linked in a linear fashion to
form a chain. Serial number of the SEQRES record for the current chain. Starts at 1 and increments by one
each line. Reset to 1 for each chain. The SEQRES records are checked using sequence file and information
provided by the depositor.
 HETTM: HETTM: record contains the expression of the characters positive or negative, Node ID, Chain
ID, X,Y and Z coordinate to build the structure and plot of the novel.
 TER: The TER record indicates the end of a list of Nodes or HETATM records for a chain.

You might also like