You are on page 1of 17

Bio::Blogs #8

The eighth edition of Bio::Blogs was originally posted online on the 2nd of February at:

Editorial musings I would expect. Some of this can be explained


by poor promotion of my part but it is also due
Welcome to the eight edition of the to the small size of the bioinformatics blogging
bioinformatics blog journal Bio::Blogs. The community. In any case I think it is worth
archive from previous months can be found at maintaining Bio::Blogs up and running for some
bioblogs.wordpress.com. When this carnival more time before thinking about stopping this
was started, more than eight months ago, it had
experiment.
the primary objective to serve as sort of display
for some of the best bioinformatics blog posts In this edition a PDF version of all the posts has
on the web and to create incentives for other been created for anyone interested in
people to promote their blogs and join in the downloading, printing and reading some fine
conversation. posts over coffee or tea. Leave comments or
send an email (bioblogs at gmail.com) with your
Looking back at the past Bio::Blogs editions I thoughts/ideas for the continuation of this blog
would like to think that we have manage to journal. I think this printed version also gives a
come with with many interesting posts about more concrete impression of the potential of
bioinformatic conferences, tools and useful blogging for scientific communication.
computational tips like the DNA analysis series
by Sandra Porter (I,II,III,IV,V,VI). Bio::Blogs has News and Views
also been use to promote tools that have been GrrlScientist submitted a report on a recent
published in blogs like the Genetic Science paper describing how moths use their
Programming Applet from Pierre and research antennae to help them stabilize their flight (see
like the correlation between protein-interaction page 3). Apart from some nasty experiments
likelihood and protein age (I and II) that I were they authors removed and glued parts of
worked on. the antenna the work features some interesting
Not everything went as I would hope. The neurophysiology analysis used to characterize
submissions to Bio::Blogs have not picked up as the sensitivity of the antennae.

1
Bio::Blogs #8

From my blog I picked an entry on network to the ecosystems. The problems range from
reconstruction (see page 4). I think the data access to data management and analysis.
increasing amounts of omics data should be The complexity and the different scales of
better explored than it currently is and network organization (i.e. molecules, environment,
reconstruction methods are a very good way of ecosystems, disease) make this a very promising
achieving this. In this paper, Jason Ernst and field for computational biologists.
colleagues used expression data to reconstruct
dynamic transcription regulatory interactions. I From ecological changes we move on to human
will try to continue blogging about the topic in evolution. On page 10, Phil tries to introduce
future posts. the possibility of humanity using technology to
improve itself. Having a strong interest myself
Comentaries in synthetic biology and man-machine
interfaces I would say that we are still far away
PLoS ONE was launched about two months ago from having such control. It is nevertheless
and it has produced so far an amazing stream of useful to discuss the implications of emerging
publications. However the initially proposed technologies to better prepare for the possible
goal of generating discussions online to changes.
promote post-publication reviews as been
lagging. Stew (on page 6) and Alf (on page 7) Reviews and tips
wrote two commentaries (summited by Greg) I start this section with a post from a brand new
regarding the progress of PLoS ONE. They both bioinformatics blog called Bioinformatics Zen.
discuss the current lack of infrastructures to Michael Barton submitted his post on useful
stimulate the online discussions at the PLoS tips to get organized as a dry lab scientist (see
ONE site. Stew goes even further by providing page 12). I agree with most of his suggestions. I
to anyone interested a nice Gresemonkey script have try to slightly fancier methods of
to add blog comments to the PLoS ONE papers. organizing my work using project managing
I hope Chris Surridge and the rest of the PLoS tools but I end up returning to a more
ONE team start deploying soon some of the straightforward folder based approach as well.
tools that they have talked about in their blog.
They need to make ONE feel like home, a place Finally on page 14 Neil Saunders presents a nice
were a community of people can discuss their tutorial on building AJAX pages for
papers. bioinformatics. It is a very well explained
introduction with annotated code. If you were
From Deepak we have a post dedicated (on ever interested in learning the basics of AJAX
page 8) to what he dubs EcoInformatics. The but never invested time in it, here is a good
importance of using computational methods to chance to try it.
analyze ecological changes from the molecules

2
Bio::Blogs #8

Gyroscopes Tell Moths How to Fly Straight


th
This blog post was originally posted online by GrrlScientist on the 9 of February 2007 at:
http://scienceblogs.com/grrlscientist/2007/02/gyroscopes_tell_moth_how_to_fl.php

Researchers have discovered that moth antennae caused the moths to collide with
antennae have gyroscope-like sensors to help walls, to fly backwards or crash to the floor.
them control their flight through the air. However, when the antennae were glued back
Because they fly at night, the source of their in place, the moths regained their
smooth and graceful flight was a mystery maneuverability.
because they could not rely on visual cues.
Closer examination revealed that a structure,
But a research team headed by Sanjay Sane, a called Johnston's organ, found at the base of
biologist at the University of Washington, the moths' antennae, was crucial for flight
Seattle, found a structure at the base of the stability. This organ relies on vibrations from
antennae that senses when the moth's body the antennae, which remain in a fixed position
begins to pitch or roll, it relays this information during flight, to detect the spatial relationship
to the brain, which causes the body to of the moth's body to its antennae, behaving
compensate. like a gyroscope.

"Whenever a creature is moving about, it has to Johnston's organ sends this information to the
have sensory information to tell it what it has moth's brain, which then tells the moth to shift
done," said Sane. "If a person unintentionally its body back to the correct spatial position.
turns around, the inner ear system or eyes will
provide that information and allow for a course Previous studies found that two-winged insects,
correction. Flying creatures need to know that such as house flies or mosquitoes, also use
information too, and when the light is low, and gyroscope-like sensors to control their flight.
the visual cues are hard to see, they have to These are the "halteres" that are attached to
depend more on the mechanosensory system." their hindwings.

In a particularly elegant series of experiments,


the researchers found that removing the Cited story.

3
Bio::Blogs #8

in sillico network reconstruction (using expression data)


This post was originally posted by Pedro Beltrao on th 7th of February 2007 at:
http://pbeltrao.blogspot.com/2007/02/in-sillico-network-reconstruction-using.html

In my last post I commented on a paper that factors regulate these group of genes. There is a
tried to find the best mathematical model for a simple example shown in figure 1, reproduced
cellular pathway. In that paper they used below.
information on known and predicted protein
interactions. This time I want to mention a In this toy example there is a bifurcation event
paper, published in Nature Mol. Systems at 1 h and another at the 2h time point. All of
Biology, that attempts to reconstruct gene the genes are assigned to a gene expression
regulatory networks from gene expression data path. In this case, the red genes are those that
and Chip-chip data. are very likely to show a down regulation in
between the 1st and 2nd hour and stay at the
The authors were interested in determining same level of expression from then on. Once
how/when transcription factors regulate their the genes have been assigned it is possible to
target genes over time. One novelty introduced search for transcription factors that are
in this work was the focus on bifurcation events significantly associate to each gene expression
in gene expression. They tried to look for cases path. For example in this case, TF A is strongly
where a groups of genes clearly bifurcated into associated to the pink trajectory. This means
two groups at a particular time point. that many of the genes in the pink group have a
Combining these patterns of bifurcation with known binding site for TF A in their promoter
experimental binding data for transcription region.
factors they tried to predict what transcription

4
Bio::Blogs #8

To test their approach, the authors studied the that Ino4 binds many more promoters during
amino-acid starvation in S. cerevisiae. In figure 2 amino acid starvation as compared to synthetic
they summarize the reconstructed dynamic complete glucose media. Out of 207 genes
map. The result is the association of TFs to bound by Ino4 (specifically during AA
groups of genes and the changes in expression starvation) 34 were also among the genes
of these genes over time during amino acid assigned to the Ino4 gene path obtained from
starvation. their approach.

One interesting finding from this map was that This results confirmed the usefulness of this
Ino4 activates a group of genes related to lipid computational approach to reconstruct gene
metabolism starting at the 2h time point. Since regulatory networks from gene expression data
Ino4 binding sites had only been profiled by and TF binding site information. The authors
Chip-chip in YPD media and not in a.a. then go on to study the regulation of other
starvation, this is a novel result obtained using conditions.
their method.
For anyone curious enough about the method,
To further test the significance of their this was done using Hidden Markov Models (see
observation they performed Chip-chip assays of here for available primer on HMMs).
Ino4 in amino acid starvation. They confirmed

5
Bio::Blogs #8

PLoS One / Postgenomic mashup


This post was originally posted by Stew on the 12th of February 2007 at:
http://www.ghastlyfop.com/blog/2007/02/plos-one-postgenomic-mashup.html

Chris Surridge has an interesting post over at


the PLoS blog about the comments (or the lack
thereof) on PLoS One papers. He mentions one
paper in particular that has a long discussion
thread associated with it on Gene Expression
but no real comments on the actual PLoS One
site.

As a temporary solution (?) to the problem of


blog comments not being immediately
accessible from the paper, summaries of
notable manuscripts are going to be posted to
the PLoS publishing blog with open comment
threads. Based on the three posts already up I
think this is a terrible idea. The PLoS One blogs are generally a good read at
the moment, don't ruin them.
Partly this is personal preference - I hate blogs I'm not just PLoS bashing here: I like the ideas
that just replicate tables of contents - but more behind PLoS One and we do the same 'if we
importantly I think that it misses the point. blog the abstract then people will comment!'
thing at Nature on some blogs (the ones I don't
People like the GNXP folks have taken the time
read any more). The intention is good, it's just
and trouble to build up a loyal community that
misguided, IMHO.
fosters debate and to create an environment in
which visitors enjoy interacting with the site Anyway, I think that a better solution would be
and with each other. Sticking up an abstract or to embrace the existing science blogosphere
two on your own blog just isn't going to and to explore ways of working with it more
compete with that, doesn't matter how much closely. As a proof of concept, here's a
traffic you get. Greasemonkey script that adds science blog
trackbacks to PLoS One.
Blog properly - engage your audience - or don't
blog at all. It's a personal communication It's doesn't look particularly nice, mainly
medium, that's one of the reasons why people because I didn't have time to style things very
feel more comfortable commenting in a well. Feel free to do with it as you will, though
blogging environment. A link and an abstract on (you could get it working with PLoS Two, for a
a publisher's blog isn't personal, it's an advert. start).

6
Bio::Blogs #8

PLoS One

This post was originally posted by Alf on the 13th of February 2007 at:
http://hublog.hubmed.org/archives/001447.html

PLoS One's been launched in beta for a while There was one article that caught my attention
now, but there had been technical problems because of the wording of the funding section:
that seem to have been fixed now. It's a great "The authors claim they did not receive any
idea: a much lower barrier of entry to article financial funding." - suggesting, perhaps, that
acceptance, publishing articles of any length, the publishers weren't entirely sure about that
with peer review provided by readers claim. This article in particular has quite a few
comments on the article after it's published and style errors (even one in the title), so while peer
publication charges paid for by the authors' review may come afterwards (and hasn't in this
funding agencies. case, yet, but maybe someone out there is
repeating the experiment themselves), there's
Technically, PLoS should be built on solid still a role for publishers in copy-editing articles
foundations, as it uses the NLM Journal for readability. It would be good if authors have
Publishing DTD to store articles as XML and enough control over their published papers that
retrieves all objects (articles, figures, etc) using corrections can be made at a later date, and
DOIs. The underlying TOPAZ software is with archiving of open access articles using
supposed to be released as open source, LOCKSS, updated articles could feasibly be
though there hasn't been anything made distributed to multiple archives.
available yet. Hopefully this project should
cross-pollinate well with OJS, which recently It's a shame that Chris Surridge is already
had its own annotation system added thanks to lamenting the lack of comments on papers,
Geof Glass's work on Marginalia. PLoS only when the infrastructure isn't in place to
producing RSS feeds and still not even getting properly handle discussions at the moment. It's
them right doesn't exactly inspire confidence not surprising that people are more willing to
though. comment on papers within their own
communities where they can see that
As far as published articles go, some people discussion threads are treated as important,
publish long, technical papers similar to those permanent content and displayed
found in existing journals; others publish short, appropriately.
one experiment papers (which will hopefully get
even shorter, if methods and introductions can
be referenced elsewhere).

7
Bio::Blogs #8

Ecoinformatics - Information for our planet


This post was originally posted by Deepak on the 4th of February 2007 at:
http://mndoci.com/blog/2007/02/04/ecoinformatics-information-for-our-planet/

As some of you know, I contribute to the Seattle


edition of Worldchanging. That has resulted in a
lot of research into science for sustainable
development. Since Just Science week starts
tomorrow, I thought it would be good to get the
ball rolling with something that combines
sustainability with science. A couple of weeks
ago I came upon a paper entitled The new
bioinformatics: Integrating ecological data from
the gene to the biosphere by a group of
researchers from UC Santa Barbara and UC population, and the population dynamics of the
Davis. The abstract was interesting enough to gorilla population. I am sure there are many
go get the paper and find out what the authors other factors involved. The kind of data here
really had to say about the subject. remind me of the kinds of challenges facing the
field of biomedical informatics, which seeks to
Bioinformatics has been engine that has made combine classic bioinformatics with healthcare
the entire genomics “revolution” possible. From and clinical information. In fact, the challenges
the various databases hosted by the NCBI and are probably far greater, since the data are not
the EBI, to tools like BLAST, and efforts like the as well understood, and the uncertainties are
HapMap project, bioinformatics is an essential significantly more. Reading the paper, one gets
part of modern biology. The huge amounts of a much better understanding of the challenges
data from genome sequencing efforts provide that the field faces. What makes the entire
the fuel for follow on bioinformatics efforts. subject so fascinating in the end is the fact that
Ecology is a field rich with information. Based ecological information is only really useful if it
on the paper, one could argue that it is a form can be used predictively.
of systems biology. Every ecological system is
finally balance, and being able to understand Right of the bat the heterogeneity and nature of
the factors involved and how perturbing them ecological data would present the informatician
might impact the ecology is not a trivial with significant challenges, and that’s just in
problem. In the review, the authors talk about data management. Ecological information is
the demise of gorillas in Africa. There were a also temporal, often over long time lines, which
number of factors involved, including Ebola and adds further complexity to the available
hunting pressure. To understand the impact of information. However being able to mine
all the factors would require understanding the diverse studies is critical to the success of
epidemiology, genetics and transmission modes ecological studies and hypothesis generation.
of Ebola, the nutritional status and various Multiple studies over different temporal points
sociological factors of the local human not only lead to better results, but re-using the

8
Bio::Blogs #8

data in combination with different studies at this makes the nature of metadata rather
the later date can help ecologists gain better complex and will require ecologists to spend a
insight. For the uninitiated, i.e. yours truly, this considerable amount of time developing data
screams for some data standards at the standards and better still, ontologies to come
minimum and the development of an ecological up with ways to enable the interoperability of
ontology in a perfect world. the datasets so that high quality data analysis
becomes possible.
Currently ecological data is spreadsheet based,
i.e. it is still document centric. A number of The good news for the field is that there are a
ecologists also use packages such as R and SAS, number of existing resources, e.g. The
since most ecological hypotheses are generated Knowledge Network for Biocomplexity which
via statistical modeling. Most people will tell seems to be a fairly modern resource for
you that this is a recipe for data disaster. There ecological data. There are attempts to provided
is a need for quality databases, and a data a unified interface to many ecological data
centric approach. Regardless of integrative sources, but one could argue that their time is
analysis or synthetic analysis, having data in better spent enabling the creation of search
well-designed databases will only help engines and interfaces to any resources of
ecologists in the long run. The authors spend ecological information since information will be
some time talking about metadata. In a field like generated by a variety of sources. The dynamic
ecology, metadata is critical, especially in cases nature of ecological data will be a significant
when studies are re-used at a later point in time challenge for data integration, especially since a
in conjunction with newer studies. The authors lot of continuous modeling and re-modeling.
seem to talk about metadata driven data There needs to be a way to store and version
collections as being separate from vertical different studies, to make sure people are not
databases (data warehouses). That seems to be making incorrect decisions.
too simplistic a view. Combining the two
paradigms is probably a more powerful For someone with only a peripheral knowledge
approach, one that has been discussed here in of ecology, but a good understanding of
the past. There is a role for core databases in bioinformatics, the review by Jones et al is a
the mode of Genbank, which can be combined very useful and interesting read. Ecologists are
with data on the edges. While the data on the trying to understand several critical problems
edges might lack the structure of a facing our society and planet. How they access
comprehensive databases, but by building data, interpret it, and publish their results
semantic intelligence and developing should be a problem with more eyeballs on it.
appropriate standards/ontologies, one can Given the very public interest in sustainable
combine the knowledge in metadata driven development and the environment these days,
datasets with the core knowledge housed in hopefully there will be more informatics-savvy
structured data warehouses. Ecological projects people working in the field to develop high
are very diverse, crossing species, societies, quality databases, data standards and
data types, data volume and data quality. All of ontologies.
Reference: M.B. Jones, Schildhauer, M.P., Reichman,O.J., and Bowers, S. The New Bioinformatics: Integrating Ecological Data
from the Gene to the Biosphere, Annual Review of Ecology, Evolution, and Systematics, 37: 519-544 (2006). Picture: Via bprm2

9
Bio::Blogs #8

The Future of Evolution


This post was originally posted by Phil B. at:
http://bellsouthpwp2.net/b/e/benichou//The_Future_of_Evolution.html

Compared with the progress of modern science, before, by relying on the knowledge and
evolution is too slow and error prone. We are experience of other people. Eventually, people
entering a period of time when scientists will be could become far more intelligent,
able to greatly surpass natural evolution. knowledgeable, and capable than anybody who
has ever lived before. Eventually this could lead
First, genetic engineering will allow doctors to to completely artificial and intelligent life forms
modify existing genes or entirely replace genes that are not limited to evolution at all, since
to remove genetic defects, treat diseases, and they would not have any organic parts. Instead,
augment natural abilities. For instance, plants they could just upgrade themselves.
and animals could be modified to yield better,
healthier, and more food. What if your eyesight We already discussed forced evolution of plants
could be fixed without surgery? What if all your and animals for the sake of humanity, such as
future children would be guaranteed not to get better sources of food. Now consider modifying
your original bad genes of poor vision or a non-intelligence and non-sentient (sentient
genetic disease? What if people choose to have means self aware or self conscious) animals into
their vision enhanced to be far greater than the becoming an intelligent and sentient species
natural human range, such as a bird’s vision? through the methods described above. This
How about people with the reflexes and speed very possible idea is called "uplifting". Humanity
of a cat? Now imagine if a large number of our could artificially uplift lowly species to the ranks
children had perfect memory thanks to genetic of civilized and productive people, for better or
modification. These children would find school worse. Imagine the philosophy, music, and
very easy and would most likely be far more literature that intelligent animals, such as cats
successful than their parents. and dolphins, would invent. I am sure they
would be creative in ways that humans are
Second, cybernetics could artificially replace not… yet.
human body parts. People could replace limbs
with fully functional machines that would These methods of leapfrogging evolution will
seamlessly be directly connected to the human ultimately create hyper-intelligent super beings
brain. The human brain could be enhanced too. that are far more capable than human beings.
Imagine if people could immediately look up Hopefully these future people will care more for
information just by thinking about it. For humanity than we do for less fortunate people.
example, people would instantly know what to Maybe these super beings will be able to solve
do in medical emergencies or how to solve some of the world’s greatest problems, such as
complex problems that they never encountered

10
Bio::Blogs #8

global warming, pollution free energy, diseases, start restricting variation in our children before
etc. they are even born. As a result, our children will
be geniuses but not unique.
The only problem that I see with these methods
of artificial evolution is that the variety or Whatever methods that humanity will use to go
diversity of people would lessen. The more beyond natural evolution, the purpose should
people become alike, then the less likely we always be to better humanity, our children, and
would have another Mozart or Einstein if we ourselves.

11
Bio::Blogs #8

Organising yourself as a dry lab scientist


This post was originally posted by Mike on the 16th of February 2007 at:
http://www.bioinformaticszen.com/2007/02/organising-yourself-as-a-dry-lab-scientist/

Browsing wikiomics, I found this small section ancova_sequence_hydrophobicity.R


on keeping organised as a practising ancova_sequence_hydrophobicity_inter
bioinformatician. In particular these lines action_term.R
contain gems of information. ancova_sequence_hydrophobicity_resi
duals.R
Use text files/plain e-mail whenever
possible All three files contain a script fitting an ancova
Give meaningful names to your files model, but all differ slighty in focusing on
Create separate folders/directories for different parts of the model. Finding the one
each project with meaningful names you need is still simple for you, but perhaps not
so in a few months time when you return to the
I find keeping my work organised one of the results to write a paper.
most frustrating but necessary tasks of being a
bioinformatician. Also this subject seems to Consider this example
recieve little attention in the bioinformatics
community. ancova_sequence_hydrophobicity.R
ancova_sequence_hydrophobicity.csv
Wet scientists are expected to keep laboratory ancova_sequence_hydrophobicity.tiff
books. Where not doing so considered very bad ancova_sequence_hydrophobicity_inter
practice. I am jealous when I see these books action_term.R
filled with pictures of gels and printed tables of ancova_sequence_hydrophobicity_inter
results. I’ve tried using a lab book, but I didn’t action_term.csv
find it applicable for the many different types of ancova_sequence_hydrophobicity_inter
scripts and results I was producing. action_term.tiff
ancova_sequence_hydrophobicity_resi
Here are some tips I find useful for organising duals.R
myself. ancova_sequence_hydrophobicity_resi
duals.csv
I couldn’t agree more with the above tips. Give ancova_sequence_hydrophobicity_resi
directories and files the most verbose names as duals.tiff
possible. This helps when trying to find a
specific file. Being verbose as possible in naming Here, there are files for the results of each
your files is useful because often sets of files are model (csv) and a plot of the results (tiff). This
all related to a similar subject. Take the illustrates how quickly things can expand.
following example.

12
Bio::Blogs #8

Making it more difficult to understand what describing each file, e.g.


each file refers to. 1.ancova_sequence_hydrophobicity/results/mo
del_residuals.csv. This helpful if you are
Here’s one way that this could be organised referencing the file else where and want to
know what the file contains.
1.ancova_sequence_hydrophobicity
o scripts Since the files are related, they each have an
 model.R identically named counterpart in the other
 model_interaction_ter directories. This is useful for determining which
m.R script produced which result.
 model_residuals.R
o results Finally the top level directory has a number.
 model.csv Often projects and experiments are carried out
 model_interaction_ter linearly, one being done after another. Keeping
m.csv the directories numbered can help to trace the
 model_residuals.csv thought process at a later date.
o pictures
 model.tiff There’s an interesting post at LifeHacker about
 model_interaction_ter organising file structure. The comments also
m.tiff have a lot of useful ideas too.
 model_residuals.tiff
There are an infinite number of ways to
Each sub directory names describes its organise. Probably the best way to do this is to
contents, which keeps things verbose. use the system that suits you best. Experiment,
Furthermore the directory path contributes to you’re a scientist.

13
Bio::Blogs #8

My first “AJAX for bioinformatics” page


This post was originally posted by Neil Saunders on the 20th of February 2007 at:
http://nsaunders.wordpress.com/2007/02/20/my-first-ajax-for-bioinformatics-page/

So you’ve heard about this wondrous thing for bioinformatics, but it should give you an
called AJAX. You’re dimly aware that it can idea.
generate interactive, user-friendly and dynamic
websites without the usual cycle of reloading 1. Getting set up
the page after submitting a request to the First, I went to the place where I do my web
server. You know that Google Maps and Flickr testing (e.g. /var/www/testing/) and created 3
are two excellent examples. You’re keen to directories: php for the PHP, js for the javascript
explore the possibilities for your bioinformatics and xml for - you guessed right. In fact no XML
web applications. What you need is a “minimal files were saved in this example but I like to be
example”. Where do you start? organised. The HTML files just go in the /testing
root, right above these 3 directories.
That’s the situation that I was in last weekend
and here’s what I did. 2. The HTML form
There’s nothing special about the form. I named
I’ll start by making it clear that much of what the file form.html and it goes like this:
follows is lifted from the W3Schools AJAX
tutorial, with minimal adaptation to make it 1. <html>
relevant for bioinformaticians. Please go there 2. <head>
3. <script>script
and read their excellent work. src="js/ncbi.js"</script>
4. </head>
When I figured out how AJAX works my 5. <body>
6. <h3>Get protein name from NCBI
response was: “Oh. Is that all it is?” AJAX, you Gene DB ID</h3>
see, is nothing new. In fact if you’re familiar 7. <form>
with web programming and know a little about 8. <b>Select a Gene ID:<b>
9. <select name="geneID"
HTML, server-side scripting, javascript and XML onchange="showName(this.value)">
- well, that’s all it is. It’s just combined in a 10. <option value="none"
clever way to produce a pleasing result. selected="selected">-----</option>
11. <option
value="54123">54123</option>
Here’s what we’re going to do. We’re going to 12. <option
construct a simple form with a drop-down list of value="21354">21354</option>
13. <option
options. The options will be UIDs from the NCBI value="11988">11988</option>
Gene database. When we select an option, our 14. </select>
form will display the protein name associated 15. </form>
16. <p>
with the UID - without the need to reload the 17. <div id="geneName"><b>Gene info
page. It’s the AJAX equivalent of “Hello World” will be listed here.</b></div>

14
Bio::Blogs #8

18. </p> 23. if (!response) {


19. </body> 24.
20. </html> document.getElementById("geneName").
innerHTML="No data returned!"
25. }
Nothing complicated about that. Our select list
26. else {
has 3 values which correspond to NCBI Gene 27.
UIDs. When we choose one (onchange, line 9), document.getElementById("geneName").
innerHTML=response
we fire the javascript code in js/ncbi.js. At the
28. }
bottom of the form is a DIV element with the 29. }
name geneName. Initially it displays “Gene info 30. }
will be listed here”; later on we’ll see the
31. function GetXmlHttpObject()
javascript alter it to something different. 32. {
33. var xmlHttp=null;
OK, how about that javascript? 34. try
35. {
36. // Firefox, Opera 8.0+, Safari
3. The javascript 37. xmlHttp=new XMLHttpRequest();
38. }
39. catch (e)
Once again, nothing to be scared of. The file
40. {
ncbi.js reads like this: 41. // Internet Explorer
42. try
1. var xmlHttp 43. {
44. xmlHttp=new
2. function showName(str) ActiveXObject("Msxml2.XMLHTTP");
3. { 45. }
4. xmlHttp=GetXmlHttpObject() 46. catch (e)
5. if (xmlHttp==null) 47. {
6. { 48. xmlHttp=new
7. alert ("Browser does not ActiveXObject("Microsoft.XMLHTTP");
support HTTP Request") 49. }
8. return 50. }
9. } 51. return xmlHttp;
10. var url="php/ncbi.php" 52. }
11. url=url+"?q="+str
12. url=url+"&sid="+Math.random() I’m not a strong javascript programmer - truth
13.
xmlHttp.onreadystatechange=stateChan
be told, I don’t like the language much, but
ged even I can follow this one. We’ve got 3
14. xmlHttp.open("GET",url,true) functions: showName() on lines 2-16,
15. xmlHttp.send(null)
16. } stateChanged(), lines 17-30 and
GetXmlHttpObject(), lines 31-52. showName()
17. function stateChanged() first calls GetXmlHttpObject(), assigning the
18. {
19. returned value to the variable xmlHttp. All you
document.getElementById("geneName"). need to know about lines 31-52 is that they test
innerHTML = "Fetching XML file..." whether your browser supports AJAX and if so,
20. if (xmlHttp.readyState==4 ||
xmlHttp.readyState=="complete") return a special object, the xmlHttp request
21. { object. This object is what “does” AJAX. As you
22. var response = can see from the code it has a number of
xmlHttp.responseText

15
Bio::Blogs #8

methods that send, listen to and act on HTTP <?php


1. $val = $_GET['q'];
requests. 2. $baseURL =
"http://eutils.ncbi.nlm.nih.gov/entr
In fact, the main reason why many of us are ez/eutils/efetch.fcgi?db=gene&id=";
3. $urlSUFF = "&retmode=xml";
only now hearing about AJAX is - you guessed it
4. $url =
- browser standards. See if you can guess which $baseURL.$val.$urlSUFF;
browser is being difficult from the code.
5. $xmlDoc = new DOMDocument();
6. $xmlDoc->load($url);
Assuming that all is well, we move to lines 10- 7. $titles = $xmlDoc-
16. Here, the javascript is calling a server-side >getElementsByTagName("Prot-
PHP script named php/ncbi.php. It appends a ref_name");
couple of things to the URL query string. The 8. foreach($titles as $node) {
first, “q”, is the value that we get from the 9. echo "<b>" . $node->nodeName
select list in our form. The second is a random . ": </b>".
10. $node-
number (which W3Schools assures us is to >textContent . "<br />";
prevent server caching). The PHP script is going 11. }
?>
to get the value of “q”, use it to make a request
to the NCBI and return some data. The
First, you should be aware that this code is PHP
javascript is going to grab that data and display
5. PHP 5 has some new functions to make
it. This happens in lines 13-15.
handling XML files quite easy, even for people
like me who don’t much care for XML. There’s a
We know when our data comes back thanks to
good introduction to XML in PHP 4 and 5 at this
the function stateChanged(). When the request
Zend page.
is sent, the text of the “geneName” element
(formerly, you recall, “Gene info will be
Off we go. In line 1-4 we grab the value “q”
displayed here”) is altered to “Fetching XML
which, you recall, is sent from ncbi.js and
file…”, line 19. When the request is complete
corresponds to a gene UID from form.html. We
(line 20), we check the variable named response
then construct a URL to the Efetch component
to see what came back. If nothing, we display
of NCBI EUtils, to return our record as XML.
“No data returned!”, line 24. Otherwise, we set
“geneName” to the value of response.
We can read the XML stream straight into the
variable $xmlDoc and parse the XML for the
For me, the javascript is the trickiest part of the
element “Prot-ref_name” (lines 5-7). This
whole thing. If you’re like me, read the code
contains the official protein name for the gene
through a few times and you’ll soon have it. OK
UID. We then loop through the stored XML
- the last part is the server-side PHP script,
object, retrieving the node name (”Protein-
ncbi.php.
ref_name”) and its value ($node->textContent).
Purists will frown at the use of textContent, by
4. The PHP
the way. These values are what the script
The PHP isn’t much more complex than the returns to ncbi.js and are displayed as the value
HTML: of the “geneName” element in form.html.

16
Bio::Blogs #8

To recap then: one element, but hopefully you get the idea.
You can imagine all sorts of uses for this in
We select a gene UID from a drop-down bioinformatics applications: fetching the most
list in a normal HTML form recent data rather than local storage, XML-SQL
Javascript and PHP interact to perform interconversions, real-time BLAST results and so
an EUtils query at the NCBI and return on. As ever, the only limits are your creativity
an XML file and requirements.
The XML is parsed and appropriate
values retrieved You can see it in action for a short time at this
Using asynchronous requests to the location. Feel free to grab the files and/or copy-
server (that’s the first ‘A’ in AJAX), paste from here to try it on your own server. I
javascript updates the page with added a little extra to the javascript at that
progress and displays the result location to display a “spinning disk” progress
All without reloading the page indicator - see if you can figure out where the
addition goes. Finally this is all new and exciting
That’s it. That’s AJAX. It’s a particularly stupid to me so if you spot any shocking errors, do let
example - fetching a huge XML file to parse out me know.

17

You might also like