You are on page 1of 40

Primary Genome Annotation Report Template

Original gp# in PECAAN: 35

Original coordinates in PECAAN: Start: 28391 Stop: 28269

1. Is the designation of this ORF as a gene well-supported?


● Provide a screenshot or copy/paste the notes from PECAAN reporting whether Glimmer
and/or GeneMark made the prediction.
Provide a screenshot of the GeneMark output(s) documenting the coding potential of the ORF.

● Does the gene length meet expected parameters?


Yes, it is at last 120 bp (barely, by 123)
● Provide a screenshot of this region from web-based Phamerator. Are there any areas of
excessive overlap? There is no areas of excessive overlap, or overlap at all.

No overlap is present.
● DECISION: Is the designation of this ORF as a gene well-supported?
The open reading frame of this gene is 123 base pairs. This fits with the guiding principles, but
is very close to the cut off point of being a real gene (120 base pairs according to the guiding
principles), so it was questionable. Because the open reading frame just barely meets the
criteria of being a real gene, looking at the other evidence will be more helpful in the
determination.

2. Is the start site for this gene the best possible choice?
● Does the currently predicted start site include all of the coding potential as shown in the
GeneMark graphical output?
Yes, the GeneMark output shows slight coding potential for the predicted start site, although it’s
very minimum.
● Did Glimmer and GeneMark agree on the start for this gene?
Glimmer called the start, but GeneMark didn’t call it.

● Is the predicted start codon the longest possible ORF without causing excessive overlap
that violates the Guiding Principles? Record the values shown under “Gap” and “LORF”
from PECAAN.
The predicted start codon doesn’t have the longest open reading frame.
Gap- 193
LORF- blank (Not LORF)
● Does the start site match other starts for similar genes? Record the Recommended Start
listed in Starterator for this ORF.
The recommended start on starterator was 28491, as most genes called this as their annotated
start.
● Provide a screenshot of the RBS score (Final Score) from PECAAN. Does the predicted
start site have an associated RBS/Shine-Dalgarno site with a high score?

● DECISION: are you keeping the start site as originally found in PECAAN, or do you want
to change it to something else? If you want to change it, you will need to provide
additional documentation of all changes and update all PhagesDB Blast, NCBI Blast,
HHPred, and Conserved Domain Database searches in PECAAN (this takes at least 10-
15 minutes).
Given what we saw in Starterator, either choice was supported by the data. However, we chose
to proceed with 28491 rather than 28391 due to evidence supporting our claim.
With the longer open reading frame, we minimized the gap, and were more fit to the guiding
principles than the other open reading frame. Although the RBS score was slightly higher than
the other, it is still a good score. We’ve concluded the longer open reading frame is better fit to
the guiding principles. This start site is also supported by all genes in the Pham having this start
site as shown in starterator.
3. What is the function of this ORF?
● Do any BlastP matches on PECAAN PhagesDB reveal a function?
Yes, BetterKatz revealed the gene to be a toxin.
● Do any BlastP matches on PECAAN NCBI BLAST reveal a function? If so, list.
Yes, BetterKatz revealed the gene to be a toxin/type II toxin- antitoxin system (HicA family
toxin).
● Are any matches detected in the PECAAN Conserved Domain Database? If so, list.
No.
● Does the Phamerator map of this phage and its nearest relatives reveal a function?
Yes, BetterKatz revealed it as a toxin.
● Do any HHPred matches reveal a function? Provide a screenshot.
Yes, HHPred and the allowed functions list on SEAPHAGES revealed evidence that concluded
with our theory of this phage being a HicA Toxin/Anti-toxin.

HHPred

SEAPHAGES

● DECISION: what is the final functional assignment for this ORF?


○ What does this protein do? Provide a one sentence description.
This toxin under normal growth conditions, toxins and antitoxins form stable complexes.
However, stress-induced proteases preferentially eliminate unstable antitoxins, releasing
free toxins to inhibit various cellular functions.
○ What is the rank of this function according to the Annotation Guide?
No rank found.
○ Does the nomenclature used for this function match with the official SEA-
PHAGES function list?
Yes, toxin in toxin/antitoxin system, HicA-like.

Primary Genome Annotation Report Template

Original gp# in PECAAN: 7

Original coordinates in PECAAN: Start: 3471 Stop: 4898

1. Is the designation of this ORF as a gene well-supported?


● Provide a screenshot or copy/paste the notes from PECAAN reporting whether Glimmer
and/or GeneMark made the prediction.

● Provide a screenshot of the GeneMark output(s) documenting the coding potential of the
ORF.

● Does the gene length meet expected parameters?


Yes, the gene is 1428 base pairs long, which is within the guiding principles of being considered
a real gene.
● Provide a screenshot of this region from web-based Phamerator. Are there any areas of
excessive overlap?

No areas of excessive overlap, however it looks like genes 7 and 8 possibly overlap slightly.
● DECISION: Is the designation of this ORF as a gene well-supported?
Yes, the gene is at least 120 base pairs long and fulfills the guiding principles.

2. Is the start site for this gene the best possible choice?
● Does the currently predicted start site include all of the coding potential as shown in the
GeneMark graphical output?
Yes.

● Did Glimmer and GeneMark agree on the start for this gene?
No, Glimmer called a start of 3471, while GeneMark called a start of 3543.
● Is the predicted start codon the longest possible ORF without causing excessive overlap
that violates the Guiding Principles? Record the values shown under “Gap” and “LORF”
from PECAAN.
Yes.
LORF- TRUE
Gap- 62
● Does the start site match other starts for similar genes? Record the Recommended Start
listed in Starterator for this ORF.
The most annotated start site for this gene is 3471.
● Provide a screenshot of the RBS score (Final Score) from PECAAN. Does the predicted
start site have an associated RBS/Shine-Dalgarno site with a high score?
● DECISION: are you keeping the start site as originally found in PECAAN, or do you want
to change it to something else? If you want to change it, you will need to provide
additional documentation of all changes and update all PhagesDB Blast, NCBI Blast,
HHPred, and Conserved Domain Database searches in PECAAN (this takes at least 10-
15 minutes).
We are going to keep the start site that was originally found because it fits best with the guiding
principles. With this start, it has the longest open reading frame, the gap is minimized, and the
RBS score is fairly good. Starterator also supports this start site as it’s the most annotated start
site of all genes in the Pham.

3. What is the function of this ORF?


● Do any BlastP matches on PECAAN PhagesDB reveal a function?
Yes, BetterKatz reveals it as a portal protein.
● Do any BlastP matches on PECAAN NCBI BLAST reveal a function? If so, list.
Yes, BetterKatz reveals it as a portal protein.
● Are any matches detected in the PECAAN Conserved Domain Database? If so, list.
Conserved Domain Database just says it’s a protein of unknown function.
● Does the Phamerator map of this phage and its nearest relatives reveal a function?
Yes, BetterKatz reveals it as a portal protein.
● Do any HHPred matches reveal a function? Provide a screenshot.
Yes, there was a 100 probability for the function of a portal protein.

● DECISION: what is the final functional assignment for this ORF?


Portal protein.
○ What does this protein do? Provide a one sentence description.
A portal protein is the protein responsible for the release of DNA from the capsid.
○ What is the rank of this function according to the Annotation Guide?
According to the Annotation Guide, a portal protein is a rank 1 function.
○ Does the nomenclature used for this function match with the official SEA-
PHAGES function list?
Yes.

Primary Genome Annotation Report Template

Original gp# in PECAAN: 8

Original coordinates in PECAAN: Start: 4898 Stop: 6376

1. Is the designation of this ORF as a gene well-supported?


● Provide a screenshot or copy/paste the notes from PECAAN reporting whether Glimmer
and/or GeneMark made the prediction.

● Provide a screenshot of the GeneMark output(s) documenting the coding potential of the
ORF.

● Does the gene length meet expected parameters?


Yes, the gene is 1479 bp in length, which fulfills the guiding principles of at least 120 bp to be
considered a real gene.
● Provide a screenshot of this region from web-based Phamerator. Are there any areas of
excessive overlap?

No areas of excessive overlap, however it looks like there may be slight overlap.
● DECISION: Is the designation of this ORF as a gene well-supported?
Yes.

2. Is the start site for this gene the best possible choice?
● Does the currently predicted start site include all of the coding potential as shown in the
GeneMark graphical output?

● Did Glimmer and GeneMark agree on the start for this gene?
Yes.

● Is the predicted start codon the longest possible ORF without causing excessive overlap
that violates the Guiding Principles?
○ Yes
● Record the values shown under “Gap” and “LORF” from PECAAN.
○ Gap-(-1)
○ LORF- TRUE

● Does the start site match other starts for similar genes?
■ Yes.
○ Record the Recommended Start listed in Starterator for this ORF.
■ The recommended start for this gene in starterator is 4898.
● Provide a screenshot of the RBS score (Final Score) from PECAAN. Does the predicted
start site have an associated RBS/Shine-Dalgarno site with a high score?

The final score is very good. It fits with the guiding principles (it’s negative and relatively close to
zero).

● DECISION: are you keeping the start site as originally found in PECAAN, or do you want
to change it to something else? If you want to change it, you will need to provide
additional documentation of all changes and update all PhagesDB Blast, NCBI Blast,
HHPred, and Conserved Domain Database searches in PECAAN (this takes at least 10-
15 minutes).
We have decided to keep the start site called by PECAAN. This start site fits well within the
guiding principles by having a good gene length, a good RBS score, and the longest open
reading frame. Although there is a slight overlap, it is very minimal,and a slight gap makes
translation more efficient, so it should have little to no effect on the gene. Starterator also
supports this start site as it’s the most annotated start site of all genes in the Pham.

3. What is the function of this ORF?


● Do any BlastP matches on PECAAN PhagesDB reveal a function?
Yes. BetterKatz calls a capsid maturation protease.
● Do any BlastP matches on PECAAN NCBI BLAST reveal a function? If so, list.
Yes. BetterKatz calls a capsid maturation protease.
● Are any matches detected in the PECAAN Conserved Domain Database? If so, list.
Yes. It calls for a Mu-like prophage I protein.
● Does the Phamerator map of this phage and its nearest relatives reveal a function?
Yes. BetterKatz calls a capsid maturation protease.
● Do any HHPred matches reveal a function? Provide a screenshot.
HHPRED calls for a scaffolding protein and also a Mu-like prophage I protein, which may be a
specific type of a scaffolding protein.
● DECISION: what is the final functional assignment for this ORF?
The function of this gene is determined to be a capsid maturation protease protein.
○ What does this protein do? Provide a one sentence description.
This protein plays an essential role in the process of scaffolding required for the
assembly and maturation of a functional capsid.
○ What is the rank of this function according to the Annotation Guide?
This function is a rank 2.
○ Does the nomenclature used for this function match with the official SEA-
PHAGES function list?
Yes.

Primary Genome Annotation Report Template

Original gp# in PECAAN: 67

Original coordinates in PECAAN: Start: 43301 Stop: 43537

1. Is the designation of this ORF as a gene well-supported?


● Provide a screenshot or copy/paste the notes from PECAAN reporting whether Glimmer
and/or GeneMark made the prediction.
● Provide a screenshot of the GeneMark output(s) documenting the coding potential of the
ORF.

● Does the gene length meet expected parameters?


Yes. The gene is 237 base pairs long, which fulfills the guiding principles of being at least 120
base pairs in length to be considered a true gene.
● Provide a screenshot of this region from web-based Phamerator. Are there any areas of
excessive overlap?

There doesn’t appear to be any areas of excess overlap, but it appears that there might be
slight overlap, but if there is, it’s very minimal.
● DECISION: Is the designation of this ORF as a gene well-supported?
Yes. The length fulfills the guiding principles, and the gap is minimized, and if there is an
overlap it’s very insignificant.

2. Is the start site for this gene the best possible choice?
● Does the currently predicted start site include all of the coding potential as shown in the
GeneMark graphical output?
Yes. There is a large amount of coding potential with the start site of 43301.
● Did Glimmer and GeneMark agree on the start for this gene?
Yes. They both agreed on the start of 43301.
● Is the predicted start codon the longest possible ORF without causing excessive overlap
that violates the Guiding Principles? Record the values shown under “Gap” and “LORF”
from PECAAN.
Gap- 2
LORF- TRUE
The predicted start has the LORF, and there is no overlap, and the gap is minimized, all fitting
with the guiding principles.
● Does the start site match other starts for similar genes? Record the Recommended Start
listed in Starterator for this ORF.
This start site matches other starts for similar genes. The recommended start site called by
starterator is 43301.
● Provide a screenshot of the RBS score (Final Score) from PECAAN. Does the predicted
start site have an associated RBS/Shine-Dalgarno site with a high score?

The RBS score is negative and fairly close to zero which abides by and supports the guiding
principles.
● DECISION: are you keeping the start site as originally found in PECAAN, or do you want
to change it to something else? If you want to change it, you will need to provide
additional documentation of all changes and update all PhagesDB Blast, NCBI Blast,
HHPred, and Conserved Domain Database searches in PECAAN (this takes at least 10-
15 minutes).
We have decided to keep the recommended start site found in PECAAN. The start meets the
length requirements, has a minimized gap, has a good RBS score, and also has the longest
open reading frame. Starterator also supports this start site as it’s the most annotated start site
of all genes in the Pham.
3. What is the function of this ORF?
● Do any BlastP matches on PECAAN PhagesDB reveal a function?
Yes. Kimchi calls an HNH endonuclease.
● Do any BlastP matches on PECAAN NCBI BLAST reveal a function? If so, list.
No. BetterKatz just calls a hypothetical protein.
● Are any matches detected in the PECAAN Conserved Domain Database? If so, list.
No.
● Does the Phamerator map of this phage and its nearest relatives reveal a function?
No.
● Do any HHPred matches reveal a function? Provide a screenshot.
Yes, HHPRED calls a function of a methyltransferase.
● DECISION: what is the final functional assignment for this ORF?
NKF (No known function). This protein has no evidence-supporting the function of this gene.
Although HHPRED called a function of methyltransferase, the probability was too low to fulfill
the guiding principles and cannot be supported.
○ What does this protein do? Provide a one sentence description.
NKF
○ What is the rank of this function according to the Annotation Guide?
NKF
○ Does the nomenclature used for this function match with the official SEA-
PHAGES function list?
NKF

Primary Genome Annotation Report Template

Original gp# in PECAAN: 68

Original coordinates in PECAAN: Start: 43534 Stop: 43770

1. Is the designation of this ORF as a gene well-supported?


● Provide a screenshot or copy/paste the notes from PECAAN reporting whether Glimmer
and/or GeneMark made the prediction.
● Provide a screenshot of the GeneMark output(s) documenting the coding potential of the
ORF.

There is coding potential evident.


● Does the gene length meet expected parameters?
Yes. The length fulfills the guiding principles of being at least 120 base pairs in length. The gene
is 483 base pairs long.
● Provide a screenshot of this region from web-based Phamerator. Are there any areas of
excessive overlap?

There aren’t any areas of excessive overlap, however there does appear to be a slight overlap.
● DECISION: Is the designation of this ORF as a gene well-supported?
Yes. The gene fulfills the length requirements and the gap is minimized. If there is an overlap,
it’s very minor and insignificant.

2. Is the start site for this gene the best possible choice?
● Does the currently predicted start site include all of the coding potential as shown in the
GeneMark graphical output?
The current predicted start does include all of the coding potential as shown in GeneMark.

● Did Glimmer and GeneMark agree on the start for this gene?
No, Glimmer called a start at 43534, where GeneMark called a start at 43555.
● Is the predicted start codon the longest possible ORF without causing excessive overlap
that violates the Guiding Principles? Record the values shown under “Gap” and “LORF”
from PECAAN.
No, the predicted start does not have the LORF.
Gap- -4
LORF- not true
● Does the start site match other starts for similar genes? Record the Recommended Start
listed in Starterator for this ORF.
Yes, the most annotated start site found in Starterator is 43534.
● Provide a screenshot of the RBS score (Final Score) from PECAAN. Does the predicted
start site have an associated RBS/Shine-Dalgarno site with a high score?

The RBS score is very good. It is negative and very close to zero which fulfills the guiding
principles.
● DECISION: are you keeping the start site as originally found in PECAAN, or do you want
to change it to something else? If you want to change it, you will need to provide
additional documentation of all changes and update all PhagesDB Blast, NCBI Blast,
HHPred, and Conserved Domain Database searches in PECAAN (this takes at least 10-
15 minutes).
We have decided to keep the start site of 43534. Although there is a slight overlap, it is very
insignificant, and a slight gap makes translation more efficient, the length is of a true gene, the
RBS score is very good, and this is also the most annotated start site found in Starterator.
43534 is the start site we are going to stick with.
3. What is the function of this ORF?
● Do any BlastP matches on PECAAN PhagesDB reveal a function?
No, I couldn’t find any matches that revealed a function.
● Do any BlastP matches on PECAAN NCBI BLAST reveal a function? If so, list.
No, the matches are just revealing that they are hypothetical proteins.
● Are any matches detected in the PECAAN Conserved Domain Database? If so, list.
No, there were no matches detected that revealed a function.
● Does the Phamerator map of this phage and its nearest relatives reveal a function?
No, BetterKatz does not reveal a function for this gene.
● Do any HHPred matches reveal a function? Provide a screenshot.

No, HHPRED reveals hypothetical proteins but no actual functions for this gene. The
probabilities are not high enough to be considered true according to the guiding principles.
● DECISION: what is the final functional assignment for this ORF?
NKF (No known function)
○ What does this protein do? Provide a one sentence description.
NKF
○ What is the rank of this function according to the Annotation Guide?
NKF
○ Does the nomenclature used for this function match with the official SEA-
PHAGES function list?
NKF
Primary Genome Annotation Report Template

Original gp# in PECAAN: 69

Original coordinates in PECAAN: Start: 43767 Stop: 46580

1. Is the designation of this ORF as a gene well-supported?


● Provide a screenshot or copy/paste the notes from PECAAN reporting whether Glimmer
and/or GeneMark made the prediction.
● Provide a screenshot of the GeneMark output(s) documenting the coding potential of the
ORF.

● Does the gene length meet expected parameters?


Yes, the length of the gene is 2814 base pairs in length which is well over the minimum 120
base pair minimum guiding principle.
● Provide a screenshot of this region from web-based Phamerator. Are there any areas of
excessive overlap?

No areas of excessive overlap are evident, there seems to be a rather large gap however
between gene 69 and gene 70.
● DECISION: Is the designation of this ORF as a gene well-supported?
Yes, the gene is well over the minimum required length to be considered a true gene, and is in
support of the guiding principles.

2. Is the start site for this gene the best possible choice?
● Does the currently predicted start site include all of the coding potential as shown in the
GeneMark graphical output?
Yes, there is a lot of coding potential, as shown by the graphical output.
● Did Glimmer and GeneMark agree on the start for this gene?
Both Glimmer and GeneMark called the gene, however, Glimmer called a start site of 43767,
and GeneMark called a start site of 43779.
● Is the predicted start codon the longest possible ORF without causing excessive overlap
that violates the Guiding Principles? Record the values shown under “Gap” and “LORF”
from PECAAN.
LORF- TRUE
Gap- -4
The predicted start has the longest open reading frame and doesn’t have an excessive overlap;
it’s only an overlap of 4 base pairs, which is biologically more efficient in translation.

● Does the start site match other starts for similar genes? Record the Recommended Start
listed in Starterator for this ORF.
The most annotated start in starterator was start site number 8, which is not even listed under
the start sites for Mulch; so starterator was unhelpful for this gene.
● Provide a screenshot of the RBS score (Final Score) from PECAAN. Does the predicted
start site have an associated RBS/Shine-Dalgarno site with a high score?

The RBS score is relatively good, it is negative and fairly close to zero, so it fulfills the guiding
principles and is supportive of them.
● DECISION: are you keeping the start site as originally found in PECAAN, or do you want
to change it to something else? If you want to change it, you will need to provide
additional documentation of all changes and update all PhagesDB Blast, NCBI Blast,
HHPred, and Conserved Domain Database searches in PECAAN (this takes at least 10-
15 minutes).
We have decided to keep the initial start site found in PECAAN. The length is good, there is an
overlap of 4, which biologically more efficient in translation, this start site has the LORF, and the
RBS score is good. This evidence is sufficient and supportive of the guiding principles, and
believed to be the best start site for this gene.
3. What is the function of this ORF?
● Do any BlastP matches on PECAAN PhagesDB reveal a function?
Yes, BetterKatz calls a function of DNA primase.
● Do any BlastP matches on PECAAN NCBI BLAST reveal a function? If so, list.
Yes, BetterKatz calls a function of DNA primase.
● Are any matches detected in the PECAAN Conserved Domain Database? If so, list.
No, there were no matches detected that revealed a function.

● Does the Phamerator map of this phage and its nearest relatives reveal a function?
● Do any HHPred matches reveal a function? Provide a screenshot.

Yes, there is a 99.2 probability that the function of this gene has a function of DNA primase.
This probability is high and fulfills the guiding principles of providing supportive evidence to
suggest this function.
● DECISION: what is the final functional assignment for this ORF?
DNA Primase.
○ What does this protein do? Provide a one sentence description.
DNA primase is an enzyme that is involved with the replication of DNA, and is a type of
RNA polymerase.
○ What is the rank of this function according to the Annotation Guide?
There was no rank provided by the Annotation Guide.
○ Does the nomenclature used for this function match with the official SEA-
PHAGES function list?
Yes, the SEA-PHAGES function list shows DNA primase as a true function.

Primary Genome Annotation Report Template

Original gp# in PECAAN: 12

Original coordinates in PECAAN: Start: 8234 Stop: 8626

1. Is the designation of this ORF as a gene well-supported?


● Provide a screenshot or copy/paste the notes from PECAAN reporting whether Glimmer
and/or GeneMark made the prediction.
● Provide a screenshot of the GeneMark output(s) documenting the coding potential of the
ORF.

● Does the gene length meet expected parameters?


Yes, the gene is 392 bp in length, which falls within the guiding principles of what qualifies to be
a gene.
● Provide a screenshot of this region from web-based Phamerator. Are there any areas of
excessive overlap?
● DECISION: Is the designation of this ORF as a gene well-supported?
Yes, the gene is within the requirements to be supported by the guiding principles to affirm this
is a real gene.

2. Is the start site for this gene the best possible choice?
● Does the currently predicted start site include all of the coding potential as shown in the
GeneMark graphical output?
Yes, by looking at the graphical output, the start site includes much of the genes coding
potential.
● Did Glimmer and GeneMark agree on the start for this gene?
Yes, Glimmer and Gene Mark both agreed on 8234 as the start site.
● Is the predicted start codon the longest possible ORF without causing excessive overlap
that violates the Guiding Principles? Record the values shown under “Gap” and “LORF”
from PECAAN.
No, the predicted start does not have LORF.
Gap- -1
LORF- not true

● Does the start site match other starts for similar genes? Record the Recommended Start
listed in Starterator for this ORF.
The most annotated start in starterator was start site number 4. The recommended start called
by starterator is 8234.
● Provide a screenshot of the RBS score (Final Score) from PECAAN. Does the predicted
start site have an associated RBS/Shine-Dalgarno site with a high score?

The RBS score for this gene is good, it is - 3 (and close to zero) which supports the guiding
principles.

● DECISION: are you keeping the start site as originally found in PECAAN, or do you want
to change it to something else? If you want to change it, you will need to provide
additional documentation of all changes and update all PhagesDB Blast, NCBI Blast,
HHPred, and Conserved Domain Database searches in PECAAN (this takes at least 10-
15 minutes).
I decided to keep the start site called by PECAAN. This start site fits within the guiding principles
by having a good gene length, a good RBS score, and the longest open reading frame. There is
hardly an overlap (-1) in the gene so this should not affect translation other than perhaps
making translation slightly more efficient. Starterator also supports this start site as it’s the most
annotated start site of all genes in the Pham.
3. What is the function of this ORF?
● Do any BlastP matches on PECAAN PhagesDB reveal a function?
BetterKatz reveals the function to be a head-to-tail connector protein.
● Do any BlastP matches on PECAAN NCBI BLAST reveal a function? If so, list.
Yes, it reaffirmed BetterKatz’ function of head-to-tail connector protein.
● Are any matches detected in the PECAAN Conserved Domain Database? If so, list.
Yes, pham09355 says the “family of proteins are functionally uncharacterized. They are found in a
variety of bacteriophage.”
● Does the Phamerator map of this phage and its nearest relatives reveal a function?
Yes, BetterKatz reveals it to be a head-to-tail connector protein.
● Do any HHPred matches reveal a function? Provide a screenshot.

● DECISION: what is the final functional assignment for this ORF?


Head-to-tail connector complex protein
○ What does this protein do? Provide a one sentence description.
The head-to-tail connector protein connects the genetic information (DNA) in the capsid
to the tail, which is released during ejection.
○ What is the rank of this function according to the Annotation Guide?
Rank 3
○ Does the nomenclature used for this function match with the official SEA-
PHAGES function list?
Yes, the SEA-PHAGES function list shows head-to-tail connector complex protein as a
true function.

Primary Genome Annotation Report Template

Original gp# in PECAAN: 65

Original coordinates in PECAAN: Start: 42205 Stop: 42591


1. Is the designation of this ORF as a gene well-supported?
● Provide a screenshot or copy/paste the notes from PECAAN reporting whether Glimmer
and/or GeneMark made the prediction.

● Provide a screenshot of the GeneMark output(s) documenting the coding potential of the
ORF.

● Does the gene length meet expected parameters?


Yes, the gene is 386 bp in length which meets the guiding principles minimum of 120 bp.
● Provide a screenshot of this region from web-based Phamerator. Are there any areas of
excessive overlap?

There aren’t any areas of excessive overlap according to Pharmerator.


● DECISION: Is the designation of this ORF as a gene well-supported?
Yes, the gene is over the minimum required length to be considered a true gene, and is in
support of the guiding principles.

2. Is the start site for this gene the best possible choice?
● Does the currently predicted start site include all of the coding potential as shown in the
GeneMark graphical output?
Yes, the currently predicted start includes all of the coding potential shown in GeneMark.

● Did Glimmer and GeneMark agree on the start for this gene?
No, Glimmer called the start 42205 while GeneMark called it at 42193.

● Is the predicted start codon the longest possible ORF without causing excessive overlap
that violates the Guiding Principles? Record the values shown under “Gap” and “LORF”
from PECAAN.
No, the predicted start does not have LORF.
Gap: -4
LORF: not true
● Does the start site match other starts for similar genes? Record the Recommended Start
listed in Starterator for this ORF.
The most annotated start in starterator was start site 35. The recommended start called by
starterator is 42205.

● Provide a screenshot of the RBS score (Final Score) from PECAAN. Does the predicted
start site have an associated RBS/Shine-Dalgarno site with a high score?

● DECISION: are you keeping the start site as originally found in PECAAN, or do you want
to change it to something else? If you want to change it, you will need to provide
additional documentation of all changes and update all PhagesDB Blast, NCBI Blast,
HHPred, and Conserved Domain Database searches in PECAAN (this takes at least 10-
15 minutes).
I have decided to keep the start site 42205 called by Glimmer. The length is good, there is an
overlap of 4, which is biologically more efficient in translation, and the RBS score is good. This
evidence is sufficient and supportive of the guiding principles, and it appears to be the best start
site for this gene. Starterator also supports this start site as it’s the most annotated start site of
all genes in the Pham.

3. What is the function of this ORF?


● Do any BlastP matches on PECAAN PhagesDB reveal a function?
BetterKatz reveals the function to be hydrolase.

● Do any BlastP matches on PECAAN NCBI BLAST reveal a function? If so, list.
BetterKatz reveals the function to be hydrolase.

● Are any matches detected in the PECAAN Conserved Domain Database? If so, list.
Yes, there are two matches; one list the function to be unknown while the other lists it as
Nucleoside 2-deoxyribosyltransferase.

● Does the Phamerator map of this phage and its nearest relatives reveal a function?
Yes, BetterKatz reveals the function as hydrolase.
● Do any HHPred matches reveal a function? Provide a screenshot.
HHPred matches list the function, if any as Nucleoside deoxyribosyltransferase which is
uncharacterized.

● DECISION: what is the final functional assignment for this ORF?


hydrolase
○ What does this protein do? Provide a one sentence description.
Hydrolase is an enzyme for hydrolysis.
○ What is the rank of this function according to the Annotation Guide?
There is no rank according to the Annotation Guide.
○ Does the nomenclature used for this function match with the official SEA-
PHAGES function list?
Yes, the SEA-PHAGES function list shows hydrolase as a true function.
Primary Genome Annotation Report Template

Original gp# in PECAAN: 66

Original coordinates in PECAAN: Start: 42588 Stop: 43298

1. Is the designation of this ORF as a gene well-supported?


● Provide a screenshot or copy/paste the notes from PECAAN reporting whether Glimmer
and/or GeneMark made the prediction.

● Provide a screenshot of the GeneMark output(s) documenting the coding potential of the
ORF.

● Does the gene length meet expected parameters?


Yes, the gene length is 711 bp which meet the guiding principles minimum of 120 bp.

● Provide a screenshot of this region from web-based Phamerator. Are there any areas of
excessive overlap?
No, there is no excessive overlap according to Phamerator.
● DECISION: Is the designation of this ORF as a gene well-supported?
Yes, the gene is well over the minimum required length to be considered a true gene, and is in
support of the guiding principles.

2. Is the start site for this gene the best possible choice?
● Does the currently predicted start site include all of the coding potential as shown in the
GeneMark graphical output?
Yes, the start site includes all the coding potential in GeneMark’s graphical output.

● Did Glimmer and GeneMark agree on the start for this gene?
Yes, Glimmer and GeneMark agree on the start site of 42588.

● Is the predicted start codon the longest possible ORF without causing excessive overlap
that violates the Guiding Principles? Record the values shown under “Gap” and “LORF”
from PECAAN.
No, the predicted start does not have LORF.
Gap: -4
LORF: not true

● Does the start site match other starts for similar genes? Record the Recommended Start
listed in Starterator for this ORF.
The most annotated start in starterator was start site 12. The recommended start called by
starterator is 42588.

● Provide a screenshot of the RBS score (Final Score) from PECAAN. Does the predicted
start site have an associated RBS/Shine-Dalgarno site with a high score?
● DECISION: are you keeping the start site as originally found in PECAAN, or do you want
to change it to something else? If you want to change it, you will need to provide
additional documentation of all changes and update all PhagesDB Blast, NCBI Blast,
HHPred, and Conserved Domain Database searches in PECAAN (this takes at least 10-
15 minutes).
I have decided to keep the start site 42588 called by Glimmer and GeneMark. The length is
good, there is an overlap of 4, which is biologically more efficient in translation, and the RBS
score is good. This evidence is sufficient and supportive of the guiding principles, and it appears
to be the best start site for this gene. Starterator also supports this start site as it’s the most
annotated start site of all genes in the Pham.

3. What is the function of this ORF?


● Do any BlastP matches on PECAAN PhagesDB reveal a function?
Yes, BetterKatz reveals it to have a function of DNA methyltransferase.
● Do any BlastP matches on PECAAN NCBI BLAST reveal a function? If so, list.
Yes, BetterKatz reveals it to have a function of DNA methyltransferase.
● Are any matches detected in the PECAAN Conserved Domain Database? If so, list.
Yes, many of the genes listed state function as DNA methyltransferase.
● Does the Phamerator map of this phage and its nearest relatives reveal a function?
Yes, BetterKatz reveals it to have a function of DNA methyltransferase.
● Do any HHPred matches reveal a function? Provide a screenshot.
HHPred affirms the function of DNA methyltransferase.

● DECISION: what is the final functional assignment for this ORF?


Methyltransferase
○ What does this protein do? Provide a one sentence description.
DNA methyltransferase is an enzyme to catalyze methyl groups to DNA.
○ What is the rank of this function according to the Annotation Guide?
No, there is no rank according to the Annotation Guide.
○ Does the nomenclature used for this function match with the official SEA-
PHAGES function list?
Yes, the SEA-PHAGES function list shows Methyltransferase as a true function.

Primary Genome Annotation Report Template

Original gp# in PECAAN: 9

Original coordinates in PECAAN: Start: 6378 Stop: 6821

1. Is the designation of this ORF as a gene well-supported?


● Provide a screenshot or copy/paste the notes from PECAAN reporting whether Glimmer
and/or GeneMark made the prediction.

● Provide a screenshot of the GeneMark output(s) documenting the coding potential of the
ORF.
● Does the gene length meet expected parameters?
○ Yes, it is 443 base pairs in length, which fulfills the guiding principles of being at
least 120 base pairs.

● Provide a screenshot of this region from web-based Phamerator. Are there any areas of
excessive overlap?

○ No overlap is evident.

● DECISION: Is the designation of this ORF as a gene well-supported?


○ Yes it is at least 120 base pairs which fulfills the guiding principles.

2. Is the start site for this gene the best possible choice?
● Does the currently predicted start site include all of the coding potential as shown in the
GeneMark graphical output?
○ Yes, there is a significant amount of coding potential present in the graphical
output.

● Did Glimmer and GeneMark agree on the start for this gene?
○ Yes
● Is the predicted start codon the longest possible ORF without causing excessive overlap
that violates the Guiding Principles? Record the values shown under “Gap” and “LORF”
from PECAAN.
○ GAP- 1
○ LORF- not true

● Does the start site match other starts for similar genes? Record the Recommended Start
listed in Starterator for this ORF.
o The start number called the most often in the published annotations is 5, it was
called in 15 of the 15 non-draft genes in the pham. Start site 5 would be 6378.

● Provide a screenshot of the RBS score (Final Score) from PECAAN. Does the predicted
start site have an associated RBS/Shine-Dalgarno site with a high score?

Yes, the RBS score is negative and close to zero; this RBS score is pretty good.
Because it is negative and close to zero, this fulfills the guiding principles of being a true
gene.
● DECISION: are you keeping the start site as originally found in PECAAN, or do you want
to change it to something else? If you want to change it, you will need to provide
additional documentation of all changes and update all PhagesDB Blast, NCBI Blast,
HHPred, and Conserved Domain Database searches in PECAAN (this takes at least 10-
15 minutes).
o We are going to keep the start site that was originally found because it fits best with
the guiding principles. The length is good, the gap is very minimal, and the RBS score is
negative and close to zero. Starterator also supports this start site as it’s the most
annotated start site of all genes in the Pham.
3. What is the function of this ORF?
● Do any BlastP matches on PECAAN PhagesDB reveal a function?
○ Yes Betterkatz reveals it as a scaffolding protein.

● Do any BlastP matches on PECAAN NCBI BLAST reveal a function? If so, list.
○ Yes BetterKatz reveals it as a scaffolding protein.

● Are any matches detected in the PECAAN Conserved Domain Database? If so, list.
○ No matches detected.

● Does the Phamerator map of this phage and its nearest relatives reveal a function?
○ Yes, it reveals it to be a scaffolding protein in BetterKatz.

● Do any HHPred matches reveal a function? Provide a screenshot.

○ HHPRED calls for an uncharacterized conserved protein


● DECISION: what is the final functional assignment for this ORF?
○ What does this protein do? Provide a one sentence description.
Controls the curvature of assembly of the major capsid protein subunits, such that they form
the correct-sized head.
○ What is the rank of this function according to the Annotation Guide?
■ Rank 2
○ Does the nomenclature used for this function match with the official SEA-
PHAGES function list?
■ Yes

Primary Genome Annotation Report Template

Original gp# in PECAAN: 10


Original coordinates in PECAAN: Start: 6844 Stop: 7776

1. Is the designation of this ORF as a gene well-supported?


● Provide a screenshot or copy/paste the notes from PECAAN reporting whether Glimmer
and/or GeneMark made the prediction.

GeneMark and Glimmer both called the gene.


● Provide a screenshot of the GeneMark output(s) documenting the coding potential of the
ORF.

● Does the gene length meet expected parameters?


○ Yes, the length is 933 which falls into the guiding principles of being at least 120
base pairs in length.
● Provide a screenshot of this region from web-based Phamerator. Are there any areas of
excessive overlap?

There appears to be a slight overlap between genes 10 and 11.

● DECISION: Is the designation of this ORF as a gene well-supported?


○ Yes it is at least 120 base pairs, which is fulfilling the guiding principles of being
at least 120 base pairs in length to be considered a true gene.

2. Is the start site for this gene the best possible choice?
● Does the currently predicted start site include all of the coding potential as shown in the
GeneMark graphical output?
○ Yes, there appears to be a great amount of coding potential as shown by the
graphical output.

● Did Glimmer and GeneMark agree on the start for this gene?
○ Yes

● Is the predicted start codon the longest possible ORF without causing excessive overlap
that violates the Guiding Principles? Record the values shown under “Gap” and “LORF”
from PECAAN.
○ GAP- 19
○ LORF-TRUE

● Does the start site match other starts for similar genes? Record the Recommended Start
listed in Starterator for this ORF.
o The start number called the most often in the published annotations is 7, which is
6844.
● Provide a screenshot of the RBS score (Final Score) from PECAAN. Does the predicted
start site have an associated RBS/Shine-Dalgarno site with a high score?

Yes, the RBS score is negative and close to zero; this RBS score is extremely good.
Because it is negative and close to zero, this fulfills the guiding principles of being a true
gene.
● DECISION: are you keeping the start site as originally found in PECAAN, or do you want
to change it to something else? If you want to change it, you will need to provide
additional documentation of all changes and update all PhagesDB Blast, NCBI Blast,
HHPred, and Conserved Domain Database searches in PECAAN (this takes at least 10-
15 minutes).
o We are going to keep the start site that was originally found because it fits best with
the guiding principles. The length of this gene is good, the RBS score is very good, this
start site provides the LORF, and the gap is minimized. Starterator also supports this
start site as it’s the most annotated start site of all genes in the Pham.

3. What is the function of this ORF?


● Do any BlastP matches on PECAAN PhagesDB reveal a function?
○ Yes Betterkatz reveals it as a major capsid protein

● Do any BlastP matches on PECAAN NCBI BLAST reveal a function? If so, list.
○ Yes BetterKatz reveals it as a major capsid protein

● Are any matches detected in the PECAAN Conserved Domain Database? If so, list.
○ No matches detected

● Does the Phamerator map of this phage and its nearest relatives reveal a function?
○ Yes, it reveals it to be a major capsid protein in BetterKatz

● Do any HHPred matches reveal a function? Provide a screenshot.


HHPRED calls for a major capsid protein.
● DECISION: what is the final functional assignment for this ORF?
○ What does this protein do? Provide a one sentence description.
Major capsid protein: forms the capsid, or head, or the phage.
○ What is the rank of this function according to the Annotation Guide?
■ Rank 1
○ Does the nomenclature used for this function match with the official SEA-
PHAGES function list?
■ Yes, major capsid protein is a function listed on the SEA-PHAGES
function list.

Primary Genome Annotation Report Template

Original gp# in PECAAN: 11

Original coordinates in PECAAN: Start: 7773 Stop: 8234

1. Is the designation of this ORF as a gene well-supported?


● Provide a screenshot or copy/paste the notes from PECAAN reporting whether Glimmer
and/or GeneMark made the prediction.

GeneMark and Glimmer both called the gene.


● Provide a screenshot of the GeneMark output(s) documenting the coding potential of the
ORF.

● Does the gene length meet expected parameters?


Yes, the gene is 462 bp in length, which falls within the guiding principles of what qualifies to be
a gene of being at least 120 base pairs in length.
● Provide a screenshot of this region from web-based Phamerator. Are there any areas of
excessive overlap?

There are no areas evident of excessive overlap.


● DECISION: Is the designation of this ORF as a gene well-supported?
Yes, the gene is within the requirements to be supported by the guiding principles to affirm this
is a real gene. The length is a fulfillment of the guiding principles of being 120 base pairs in
length, thus the ORF of the gene is well-supported.

2. Is the start site for this gene the best possible choice?
● Does the currently predicted start site include all of the coding potential as shown in the
GeneMark graphical output?
Yes, by looking at the graphical output, the start site includes a great amount of coding
potential.
● Did Glimmer and GeneMark agree on the start for this gene?
Yes, Glimmer and Gene Mark both agreed on 7773 as the start site.
● Is the predicted start codon the longest possible ORF without causing excessive overlap
that violates the Guiding Principles? Record the values shown under “Gap” and “LORF”
from PECAAN.
Yes the LORF is TRUE
Gap- -4
LORF- TRUE

● Does the start site match other starts for similar genes? Record the Recommended Start
listed in Starterator for this ORF.
The most annotated start in starterator was start site number 3. The recommended start called
by starterator is 7773.
● Provide a screenshot of the RBS score (Final Score) from PECAAN. Does the predicted
start site have an associated RBS/Shine-Dalgarno site with a high score?

The RBS score for this gene is good, it is - 4(and close to zero) which supports the guiding
principles.

● DECISION: are you keeping the start site as originally found in PECAAN, or do you want
to change it to something else? If you want to change it, you will need to provide
additional documentation of all changes and update all PhagesDB Blast, NCBI Blast,
HHPred, and Conserved Domain Database searches in PECAAN (this takes at least 10-
15 minutes).
We decided to keep the start site called by PECAAN. This start site fits within the guiding
principles by having a good gene length, a good RBS score, and the longest open reading
frame. There is also a 4 base pair overlap, which is biologically more efficient in translation.
Starterator also supports this start site as it’s the most annotated start site of all genes in the
Pham.

3. What is the function of this ORF?


● Do any BlastP matches on PECAAN PhagesDB reveal a function?
BetterKatz reveals the function to be a head-to-tail connector protein.
● Do any BlastP matches on PECAAN NCBI BLAST reveal a function? If so, list.
Yes, it reaffirmed BetterKatz’ function of head-to-tail connector protein.
● Are any matches detected in the PECAAN Conserved Domain Database? If so, list.
No
● Does the Phamerator map of this phage and its nearest relatives reveal a function?
Yes, BetterKatz reveals it to be a head-to-tail connector protein.
● Do any HHPred matches reveal a function? Provide a screenshot
o No, HHPRED reveals Mu-like protein and FluMu protein but no actual functions for this
gene. The probabilities are not high enough to be considered true according to the guiding
principles.

●DECISION: what is the final functional assignment for this ORF?


Head-to-tail connector complex protein
○ What does this protein do? Provide a one sentence description.
The head-to-tail connector protein connects the genetic information (DNA) in the capsid to the
tail, which is released during ejection.
○ What is the rank of this function according to the Annotation Guide?
Rank 3
○Does the nomenclature used for this function match with the official SEA-
PHAGES function list?
Yes, the SEA-PHAGES function list shows head-to-tail connector complex protein as a true
function.

You might also like