Bio::Blogs #8

The eighth edition of Bio::Blogs was originally posted online on the 2nd of February at:

Editorial musings Welcome to the eight edition of the bioinformatics blog journal Bio::Blogs. The archive from previous months can be found at When this carnival was started, more than eight months ago, it had the primary objective to serve as sort of display for some of the best bioinformatics blog posts on the web and to create incentives for other people to promote their blogs and join in the conversation. Looking back at the past Bio::Blogs editions I would like to think that we have manage to come with with many interesting posts about bioinformatic conferences, tools and useful computational tips like the DNA analysis series by Sandra Porter (I,II,III,IV,V,VI). Bio::Blogs has also been use to promote tools that have been published in blogs like the Genetic Programming Applet from Pierre and research like the correlation between protein-interaction likelihood and protein age (I and II) that I worked on. Not everything went as I would hope. The submissions to Bio::Blogs have not picked up as

I would expect. Some of this can be explained by poor promotion of my part but it is also due to the small size of the bioinformatics blogging community. In any case I think it is worth maintaining Bio::Blogs up and running for some more time before thinking about stopping this experiment. In this edition a PDF version of all the posts has been created for anyone interested in downloading, printing and reading some fine posts over coffee or tea. Leave comments or send an email (bioblogs at with your thoughts/ideas for the continuation of this blog journal. I think this printed version also gives a more concrete impression of the potential of blogging for scientific communication. News and Views GrrlScientist submitted a report on a recent Science paper describing how moths use their antennae to help them stabilize their flight (see page 3). Apart from some nasty experiments were they authors removed and glued parts of the antenna the work features some interesting neurophysiology analysis used to characterize the sensitivity of the antennae. 1

Bio::Blogs #8
From my blog I picked an entry on network reconstruction (see page 4). I think the increasing amounts of omics data should be better explored than it currently is and network reconstruction methods are a very good way of achieving this. In this paper, Jason Ernst and colleagues used expression data to reconstruct dynamic transcription regulatory interactions. I will try to continue blogging about the topic in future posts. Comentaries PLoS ONE was launched about two months ago and it has produced so far an amazing stream of publications. However the initially proposed goal of generating discussions online to promote post-publication reviews as been lagging. Stew (on page 6) and Alf (on page 7) wrote two commentaries (summited by Greg) regarding the progress of PLoS ONE. They both discuss the current lack of infrastructures to stimulate the online discussions at the PLoS ONE site. Stew goes even further by providing to anyone interested a nice Gresemonkey script to add blog comments to the PLoS ONE papers. I hope Chris Surridge and the rest of the PLoS ONE team start deploying soon some of the tools that they have talked about in their blog. They need to make ONE feel like home, a place were a community of people can discuss their papers. From Deepak we have a post dedicated (on page 8) to what he dubs EcoInformatics. The importance of using computational methods to analyze ecological changes from the molecules to the ecosystems. The problems range from data access to data management and analysis. The complexity and the different scales of organization (i.e. molecules, environment, ecosystems, disease) make this a very promising field for computational biologists. From ecological changes we move on to human evolution. On page 10, Phil tries to introduce the possibility of humanity using technology to improve itself. Having a strong interest myself in synthetic biology and man-machine interfaces I would say that we are still far away from having such control. It is nevertheless useful to discuss the implications of emerging technologies to better prepare for the possible changes. Reviews and tips I start this section with a post from a brand new bioinformatics blog called Bioinformatics Zen. Michael Barton submitted his post on useful tips to get organized as a dry lab scientist (see page 12). I agree with most of his suggestions. I have try to slightly fancier methods of organizing my work using project managing tools but I end up returning to a more straightforward folder based approach as well. Finally on page 14 Neil Saunders presents a nice tutorial on building AJAX pages for bioinformatics. It is a very well explained introduction with annotated code. If you were ever interested in learning the basics of AJAX but never invested time in it, here is a good chance to try it.


Bio::Blogs #8

Gyroscopes Tell Moths How to Fly Straight
This blog post was originally posted online by GrrlScientist on the 9 of February 2007 at:

Researchers have discovered that moth antennae have gyroscope-like sensors to help them control their flight through the air. Because they fly at night, the source of their smooth and graceful flight was a mystery because they could not rely on visual cues. But a research team headed by Sanjay Sane, a biologist at the University of Washington, Seattle, found a structure at the base of the antennae that senses when the moth's body begins to pitch or roll, it relays this information to the brain, which causes the body to compensate. "Whenever a creature is moving about, it has to have sensory information to tell it what it has done," said Sane. "If a person unintentionally turns around, the inner ear system or eyes will provide that information and allow for a course correction. Flying creatures need to know that information too, and when the light is low, and the visual cues are hard to see, they have to depend more on the mechanosensory system." In a particularly elegant series of experiments, the researchers found that removing the

antennae caused the moths to collide with walls, to fly backwards or crash to the floor. However, when the antennae were glued back in place, the moths regained their maneuverability. Closer examination revealed that a structure, called Johnston's organ, found at the base of the moths' antennae, was crucial for flight stability. This organ relies on vibrations from the antennae, which remain in a fixed position during flight, to detect the spatial relationship of the moth's body to its antennae, behaving like a gyroscope. Johnston's organ sends this information to the moth's brain, which then tells the moth to shift its body back to the correct spatial position. Previous studies found that two-winged insects, such as house flies or mosquitoes, also use gyroscope-like sensors to control their flight. These are the "halteres" that are attached to their hindwings.

Cited story.


Bio::Blogs #8

in sillico network reconstruction (using expression data)
This post was originally posted by Pedro Beltrao on th 7th of February 2007 at:

In my last post I commented on a paper that tried to find the best mathematical model for a cellular pathway. In that paper they used information on known and predicted protein interactions. This time I want to mention a paper, published in Nature Mol. Systems Biology, that attempts to reconstruct gene regulatory networks from gene expression data and Chip-chip data. The authors were interested in determining how/when transcription factors regulate their target genes over time. One novelty introduced in this work was the focus on bifurcation events in gene expression. They tried to look for cases where a groups of genes clearly bifurcated into two groups at a particular time point. Combining these patterns of bifurcation with experimental binding data for transcription factors they tried to predict what transcription

factors regulate these group of genes. There is a simple example shown in figure 1, reproduced below. In this toy example there is a bifurcation event at 1 h and another at the 2h time point. All of the genes are assigned to a gene expression path. In this case, the red genes are those that are very likely to show a down regulation in between the 1st and 2nd hour and stay at the same level of expression from then on. Once the genes have been assigned it is possible to search for transcription factors that are significantly associate to each gene expression path. For example in this case, TF A is strongly associated to the pink trajectory. This means that many of the genes in the pink group have a known binding site for TF A in their promoter region.


Bio::Blogs #8
To test their approach, the authors studied the amino-acid starvation in S. cerevisiae. In figure 2 they summarize the reconstructed dynamic map. The result is the association of TFs to groups of genes and the changes in expression of these genes over time during amino acid starvation. One interesting finding from this map was that Ino4 activates a group of genes related to lipid metabolism starting at the 2h time point. Since Ino4 binding sites had only been profiled by Chip-chip in YPD media and not in a.a. starvation, this is a novel result obtained using their method. To further test the significance of their observation they performed Chip-chip assays of Ino4 in amino acid starvation. They confirmed that Ino4 binds many more promoters during amino acid starvation as compared to synthetic complete glucose media. Out of 207 genes bound by Ino4 (specifically during AA starvation) 34 were also among the genes assigned to the Ino4 gene path obtained from their approach. This results confirmed the usefulness of this computational approach to reconstruct gene regulatory networks from gene expression data and TF binding site information. The authors then go on to study the regulation of other conditions. For anyone curious enough about the method, this was done using Hidden Markov Models (see here for available primer on HMMs).


Bio::Blogs #8

PLoS One / Postgenomic mashup
This post was originally posted by Stew on the 12th of February 2007 at:

Chris Surridge has an interesting post over at the PLoS blog about the comments (or the lack thereof) on PLoS One papers. He mentions one paper in particular that has a long discussion thread associated with it on Gene Expression but no real comments on the actual PLoS One site. As a temporary solution (?) to the problem of blog comments not being immediately accessible from the paper, summaries of notable manuscripts are going to be posted to the PLoS publishing blog with open comment threads. Based on the three posts already up I think this is a terrible idea.

The PLoS One blogs are generally a good read at the moment, don't ruin them. I'm not just PLoS bashing here: I like the ideas behind PLoS One and we do the same 'if we blog the abstract then people will comment!' thing at Nature on some blogs (the ones I don't read any more). The intention is good, it's just misguided, IMHO. Anyway, I think that a better solution would be to embrace the existing science blogosphere and to explore ways of working with it more closely. As a proof of concept, here's a Greasemonkey script that adds science blog trackbacks to PLoS One. It's doesn't look particularly nice, mainly because I didn't have time to style things very well. Feel free to do with it as you will, though (you could get it working with PLoS Two, for a start).

Partly this is personal preference - I hate blogs that just replicate tables of contents - but more importantly I think that it misses the point. People like the GNXP folks have taken the time and trouble to build up a loyal community that fosters debate and to create an environment in which visitors enjoy interacting with the site and with each other. Sticking up an abstract or two on your own blog just isn't going to compete with that, doesn't matter how much traffic you get. Blog properly - engage your audience - or don't blog at all. It's a personal communication medium, that's one of the reasons why people feel more comfortable commenting in a blogging environment. A link and an abstract on a publisher's blog isn't personal, it's an advert.


Bio::Blogs #8

PLoS One
This post was originally posted by Alf on the 13th of February 2007 at:

PLoS One's been launched in beta for a while now, but there had been technical problems that seem to have been fixed now. It's a great idea: a much lower barrier of entry to article acceptance, publishing articles of any length, with peer review provided by readers comments on the article after it's published and publication charges paid for by the authors' funding agencies. Technically, PLoS should be built on solid foundations, as it uses the NLM Journal Publishing DTD to store articles as XML and retrieves all objects (articles, figures, etc) using DOIs. The underlying TOPAZ software is supposed to be released as open source, though there hasn't been anything made available yet. Hopefully this project should cross-pollinate well with OJS, which recently had its own annotation system added thanks to Geof Glass's work on Marginalia. PLoS only producing RSS feeds and still not even getting them right doesn't exactly inspire confidence though. As far as published articles go, some people publish long, technical papers similar to those found in existing journals; others publish short, one experiment papers (which will hopefully get even shorter, if methods and introductions can be referenced elsewhere).

There was one article that caught my attention because of the wording of the funding section: "The authors claim they did not receive any financial funding." - suggesting, perhaps, that the publishers weren't entirely sure about that claim. This article in particular has quite a few style errors (even one in the title), so while peer review may come afterwards (and hasn't in this case, yet, but maybe someone out there is repeating the experiment themselves), there's still a role for publishers in copy-editing articles for readability. It would be good if authors have enough control over their published papers that corrections can be made at a later date, and with archiving of open access articles using LOCKSS, updated articles could feasibly be distributed to multiple archives. It's a shame that Chris Surridge is already lamenting the lack of comments on papers, when the infrastructure isn't in place to properly handle discussions at the moment. It's not surprising that people are more willing to comment on papers within their own communities where they can see that discussion threads are treated as important, permanent content and displayed appropriately.


Bio::Blogs #8

Ecoinformatics - Information for our planet
This post was originally posted by Deepak on the 4th of February 2007 at:

As some of you know, I contribute to the Seattle edition of Worldchanging. That has resulted in a lot of research into science for sustainable development. Since Just Science week starts tomorrow, I thought it would be good to get the ball rolling with something that combines sustainability with science. A couple of weeks ago I came upon a paper entitled The new bioinformatics: Integrating ecological data from the gene to the biosphere by a group of researchers from UC Santa Barbara and UC Davis. The abstract was interesting enough to go get the paper and find out what the authors really had to say about the subject. Bioinformatics has been engine that has made the entire genomics “revolution” possible. From the various databases hosted by the NCBI and the EBI, to tools like BLAST, and efforts like the HapMap project, bioinformatics is an essential part of modern biology. The huge amounts of data from genome sequencing efforts provide the fuel for follow on bioinformatics efforts. Ecology is a field rich with information. Based on the paper, one could argue that it is a form of systems biology. Every ecological system is finally balance, and being able to understand the factors involved and how perturbing them might impact the ecology is not a trivial problem. In the review, the authors talk about the demise of gorillas in Africa. There were a number of factors involved, including Ebola and hunting pressure. To understand the impact of all the factors would require understanding the epidemiology, genetics and transmission modes of Ebola, the nutritional status and various sociological factors of the local human

population, and the population dynamics of the gorilla population. I am sure there are many other factors involved. The kind of data here remind me of the kinds of challenges facing the field of biomedical informatics, which seeks to combine classic bioinformatics with healthcare and clinical information. In fact, the challenges are probably far greater, since the data are not as well understood, and the uncertainties are significantly more. Reading the paper, one gets a much better understanding of the challenges that the field faces. What makes the entire subject so fascinating in the end is the fact that ecological information is only really useful if it can be used predictively. Right of the bat the heterogeneity and nature of ecological data would present the informatician with significant challenges, and that’s just in data management. Ecological information is also temporal, often over long time lines, which adds further complexity to the available information. However being able to mine diverse studies is critical to the success of ecological studies and hypothesis generation. Multiple studies over different temporal points not only lead to better results, but re-using the 8

Bio::Blogs #8
data in combination with different studies at the later date can help ecologists gain better insight. For the uninitiated, i.e. yours truly, this screams for some data standards at the minimum and the development of an ecological ontology in a perfect world. Currently ecological data is spreadsheet based, i.e. it is still document centric. A number of ecologists also use packages such as R and SAS, since most ecological hypotheses are generated via statistical modeling. Most people will tell you that this is a recipe for data disaster. There is a need for quality databases, and a data centric approach. Regardless of integrative analysis or synthetic analysis, having data in well-designed databases will only help ecologists in the long run. The authors spend some time talking about metadata. In a field like ecology, metadata is critical, especially in cases when studies are re-used at a later point in time in conjunction with newer studies. The authors seem to talk about metadata driven data collections as being separate from vertical databases (data warehouses). That seems to be too simplistic a view. Combining the two paradigms is probably a more powerful approach, one that has been discussed here in the past. There is a role for core databases in the mode of Genbank, which can be combined with data on the edges. While the data on the edges might lack the structure of a comprehensive databases, but by building semantic intelligence and developing appropriate standards/ontologies, one can combine the knowledge in metadata driven datasets with the core knowledge housed in structured data warehouses. Ecological projects are very diverse, crossing species, societies, data types, data volume and data quality. All of this makes the nature of metadata rather complex and will require ecologists to spend a considerable amount of time developing data standards and better still, ontologies to come up with ways to enable the interoperability of the datasets so that high quality data analysis becomes possible. The good news for the field is that there are a number of existing resources, e.g. The Knowledge Network for Biocomplexity which seems to be a fairly modern resource for ecological data. There are attempts to provided a unified interface to many ecological data sources, but one could argue that their time is better spent enabling the creation of search engines and interfaces to any resources of ecological information since information will be generated by a variety of sources. The dynamic nature of ecological data will be a significant challenge for data integration, especially since a lot of continuous modeling and re-modeling. There needs to be a way to store and version different studies, to make sure people are not making incorrect decisions. For someone with only a peripheral knowledge of ecology, but a good understanding of bioinformatics, the review by Jones et al is a very useful and interesting read. Ecologists are trying to understand several critical problems facing our society and planet. How they access data, interpret it, and publish their results should be a problem with more eyeballs on it. Given the very public interest in sustainable development and the environment these days, hopefully there will be more informatics-savvy people working in the field to develop high quality databases, data standards and ontologies.

Reference: M.B. Jones, Schildhauer, M.P., Reichman,O.J., and Bowers, S. The New Bioinformatics: Integrating Ecological Data from the Gene to the Biosphere, Annual Review of Ecology, Evolution, and Systematics, 37: 519-544 (2006). Picture: Via bprm2


Bio::Blogs #8

The Future of Evolution
This post was originally posted by Phil B. at:

Compared with the progress of modern science, evolution is too slow and error prone. We are entering a period of time when scientists will be able to greatly surpass natural evolution. First, genetic engineering will allow doctors to modify existing genes or entirely replace genes to remove genetic defects, treat diseases, and augment natural abilities. For instance, plants and animals could be modified to yield better, healthier, and more food. What if your eyesight could be fixed without surgery? What if all your future children would be guaranteed not to get your original bad genes of poor vision or a genetic disease? What if people choose to have their vision enhanced to be far greater than the natural human range, such as a bird’s vision? How about people with the reflexes and speed of a cat? Now imagine if a large number of our children had perfect memory thanks to genetic modification. These children would find school very easy and would most likely be far more successful than their parents. Second, cybernetics could artificially replace human body parts. People could replace limbs with fully functional machines that would seamlessly be directly connected to the human brain. The human brain could be enhanced too. Imagine if people could immediately look up information just by thinking about it. For example, people would instantly know what to do in medical emergencies or how to solve complex problems that they never encountered

before, by relying on the knowledge and experience of other people. Eventually, people could become far more intelligent, knowledgeable, and capable than anybody who has ever lived before. Eventually this could lead to completely artificial and intelligent life forms that are not limited to evolution at all, since they would not have any organic parts. Instead, they could just upgrade themselves. We already discussed forced evolution of plants and animals for the sake of humanity, such as better sources of food. Now consider modifying non-intelligence and non-sentient (sentient means self aware or self conscious) animals into becoming an intelligent and sentient species through the methods described above. This very possible idea is called "uplifting". Humanity could artificially uplift lowly species to the ranks of civilized and productive people, for better or worse. Imagine the philosophy, music, and literature that intelligent animals, such as cats and dolphins, would invent. I am sure they would be creative in ways that humans are not… yet. These methods of leapfrogging evolution will ultimately create hyper-intelligent super beings that are far more capable than human beings. Hopefully these future people will care more for humanity than we do for less fortunate people. Maybe these super beings will be able to solve some of the world’s greatest problems, such as


Bio::Blogs #8
global warming, pollution free energy, diseases, etc. The only problem that I see with these methods of artificial evolution is that the variety or diversity of people would lessen. The more people become alike, then the less likely we would have another Mozart or Einstein if we start restricting variation in our children before they are even born. As a result, our children will be geniuses but not unique. Whatever methods that humanity will use to go beyond natural evolution, the purpose should always be to better humanity, our children, and ourselves.


Bio::Blogs #8

Organising yourself as a dry lab scientist
This post was originally posted by Mike on the 16th of February 2007 at:

Browsing wikiomics, I found this small section on keeping organised as a practising bioinformatician. In particular these lines contain gems of information. Use text files/plain e-mail whenever possible Give meaningful names to your files Create separate folders/directories for each project with meaningful names I find keeping my work organised one of the most frustrating but necessary tasks of being a bioinformatician. Also this subject seems to recieve little attention in the bioinformatics community. Wet scientists are expected to keep laboratory books. Where not doing so considered very bad practice. I am jealous when I see these books filled with pictures of gels and printed tables of results. I’ve tried using a lab book, but I didn’t find it applicable for the many different types of scripts and results I was producing. Here are some tips I find useful for organising myself. I couldn’t agree more with the above tips. Give directories and files the most verbose names as possible. This helps when trying to find a specific file. Being verbose as possible in naming your files is useful because often sets of files are all related to a similar subject. Take the following example.

ancova_sequence_hydrophobicity.R ancova_sequence_hydrophobicity_inter action_term.R ancova_sequence_hydrophobicity_resi duals.R All three files contain a script fitting an ancova model, but all differ slighty in focusing on different parts of the model. Finding the one you need is still simple for you, but perhaps not so in a few months time when you return to the results to write a paper. Consider this example ancova_sequence_hydrophobicity.R ancova_sequence_hydrophobicity.csv ancova_sequence_hydrophobicity.tiff ancova_sequence_hydrophobicity_inter action_term.R ancova_sequence_hydrophobicity_inter action_term.csv ancova_sequence_hydrophobicity_inter action_term.tiff ancova_sequence_hydrophobicity_resi duals.R ancova_sequence_hydrophobicity_resi duals.csv ancova_sequence_hydrophobicity_resi duals.tiff Here, there are files for the results of each model (csv) and a plot of the results (tiff). This illustrates how quickly things can expand.


Bio::Blogs #8
Making it more difficult to understand what each file refers to. Here’s one way that this could be organised 1.ancova_sequence_hydrophobicity o scripts  model.R  model_interaction_ter m.R  model_residuals.R o results  model.csv  model_interaction_ter m.csv  model_residuals.csv o pictures  model.tiff  model_interaction_ter m.tiff  model_residuals.tiff Each sub directory names describes its contents, which keeps things verbose. Furthermore the directory path contributes to describing each file, e.g. 1.ancova_sequence_hydrophobicity/results/mo del_residuals.csv. This helpful if you are referencing the file else where and want to know what the file contains. Since the files are related, they each have an identically named counterpart in the other directories. This is useful for determining which script produced which result. Finally the top level directory has a number. Often projects and experiments are carried out linearly, one being done after another. Keeping the directories numbered can help to trace the thought process at a later date. There’s an interesting post at LifeHacker about organising file structure. The comments also have a lot of useful ideas too. There are an infinite number of ways to organise. Probably the best way to do this is to use the system that suits you best. Experiment, you’re a scientist.


Bio::Blogs #8

My first “AJAX for bioinformatics” page
This post was originally posted by Neil Saunders on the 20th of February 2007 at:

So you’ve heard about this wondrous thing called AJAX. You’re dimly aware that it can generate interactive, user-friendly and dynamic websites without the usual cycle of reloading the page after submitting a request to the server. You know that Google Maps and Flickr are two excellent examples. You’re keen to explore the possibilities for your bioinformatics web applications. What you need is a “minimal example”. Where do you start? That’s the situation that I was in last weekend and here’s what I did. I’ll start by making it clear that much of what follows is lifted from the W3Schools AJAX tutorial, with minimal adaptation to make it relevant for bioinformaticians. Please go there and read their excellent work. When I figured out how AJAX works my response was: “Oh. Is that all it is?” AJAX, you see, is nothing new. In fact if you’re familiar with web programming and know a little about HTML, server-side scripting, javascript and XML - well, that’s all it is. It’s just combined in a clever way to produce a pleasing result. Here’s what we’re going to do. We’re going to construct a simple form with a drop-down list of options. The options will be UIDs from the NCBI Gene database. When we select an option, our form will display the protein name associated with the UID - without the need to reload the page. It’s the AJAX equivalent of “Hello World”

for bioinformatics, but it should give you an idea. 1. Getting set up First, I went to the place where I do my web testing (e.g. /var/www/testing/) and created 3 directories: php for the PHP, js for the javascript and xml for - you guessed right. In fact no XML files were saved in this example but I like to be organised. The HTML files just go in the /testing root, right above these 3 directories. 2. The HTML form There’s nothing special about the form. I named the file form.html and it goes like this:
1. <html> 2. <head> 3. <script>script src="js/ncbi.js"</script> 4. </head> 5. <body> 6. <h3>Get protein name from NCBI Gene DB ID</h3> 7. <form> 8. <b>Select a Gene ID:<b> 9. <select name="geneID" onchange="showName(this.value)"> 10. <option value="none" selected="selected">-----</option> 11. <option value="54123">54123</option> 12. <option value="21354">21354</option> 13. <option value="11988">11988</option> 14. </select> 15. </form> 16. <p> 17. <div id="geneName"><b>Gene info will be listed here.</b></div>


Bio::Blogs #8
18. </p> 19. </body> 20. </html> 23. if (!response) { 24. document.getElementById("geneName"). innerHTML="No data returned!" 25. } 26. else { 27. document.getElementById("geneName"). innerHTML=response 28. } 29. } 30. } 31. function GetXmlHttpObject() 32. { 33. var xmlHttp=null; 34. try 35. { 36. // Firefox, Opera 8.0+, Safari 37. xmlHttp=new XMLHttpRequest(); 38. } 39. catch (e) 40. { 41. // Internet Explorer 42. try 43. { 44. xmlHttp=new ActiveXObject("Msxml2.XMLHTTP"); 45. } 46. catch (e) 47. { 48. xmlHttp=new ActiveXObject("Microsoft.XMLHTTP"); 49. } 50. } 51. return xmlHttp; 52. }

Nothing complicated about that. Our select list has 3 values which correspond to NCBI Gene UIDs. When we choose one (onchange, line 9), we fire the javascript code in js/ncbi.js. At the bottom of the form is a DIV element with the name geneName. Initially it displays “Gene info will be listed here”; later on we’ll see the javascript alter it to something different. OK, how about that javascript? 3. The javascript Once again, nothing to be scared of. The file ncbi.js reads like this:
1. var xmlHttp

2. function showName(str) 3. { 4. xmlHttp=GetXmlHttpObject() 5. if (xmlHttp==null) 6. { 7. alert ("Browser does not support HTTP Request") 8. return 9. } 10. var url="php/ncbi.php" 11. url=url+"?q="+str 12. url=url+"&sid="+Math.random() 13. xmlHttp.onreadystatechange=stateChan ged 14."GET",url,true) 15. xmlHttp.send(null) 16. } 17. function stateChanged() 18. { 19. document.getElementById("geneName"). innerHTML = "Fetching XML file..." 20. if (xmlHttp.readyState==4 || xmlHttp.readyState=="complete") 21. { 22. var response = xmlHttp.responseText

I’m not a strong javascript programmer - truth be told, I don’t like the language much, but even I can follow this one. We’ve got 3 functions: showName() on lines 2-16, stateChanged(), lines 17-30 and GetXmlHttpObject(), lines 31-52. showName() first calls GetXmlHttpObject(), assigning the returned value to the variable xmlHttp. All you need to know about lines 31-52 is that they test whether your browser supports AJAX and if so, return a special object, the xmlHttp request object. This object is what “does” AJAX. As you can see from the code it has a number of 15

Bio::Blogs #8
methods that send, listen to and act on HTTP requests. In fact, the main reason why many of us are only now hearing about AJAX is - you guessed it - browser standards. See if you can guess which browser is being difficult from the code. Assuming that all is well, we move to lines 1016. Here, the javascript is calling a server-side PHP script named php/ncbi.php. It appends a couple of things to the URL query string. The first, “q”, is the value that we get from the select list in our form. The second is a random number (which W3Schools assures us is to prevent server caching). The PHP script is going to get the value of “q”, use it to make a request to the NCBI and return some data. The javascript is going to grab that data and display it. This happens in lines 13-15. We know when our data comes back thanks to the function stateChanged(). When the request is sent, the text of the “geneName” element (formerly, you recall, “Gene info will be displayed here”) is altered to “Fetching XML file…”, line 19. When the request is complete (line 20), we check the variable named response to see what came back. If nothing, we display “No data returned!”, line 24. Otherwise, we set “geneName” to the value of response. For me, the javascript is the trickiest part of the whole thing. If you’re like me, read the code through a few times and you’ll soon have it. OK - the last part is the server-side PHP script, ncbi.php. 4. The PHP The PHP isn’t much more complex than the HTML:
<?php 1. $val = $_GET['q']; 2. $baseURL = " ez/eutils/efetch.fcgi?db=gene&id="; 3. $urlSUFF = "&retmode=xml"; 4. $url = $baseURL.$val.$urlSUFF; 5. $xmlDoc = new DOMDocument(); 6. $xmlDoc->load($url); 7. $titles = $xmlDoc>getElementsByTagName("Protref_name"); 8. foreach($titles as $node) { 9. echo "<b>" . $node->nodeName . ": </b>". 10. $node>textContent . "<br />"; 11. } ?>

First, you should be aware that this code is PHP 5. PHP 5 has some new functions to make handling XML files quite easy, even for people like me who don’t much care for XML. There’s a good introduction to XML in PHP 4 and 5 at this Zend page. Off we go. In line 1-4 we grab the value “q” which, you recall, is sent from ncbi.js and corresponds to a gene UID from form.html. We then construct a URL to the Efetch component of NCBI EUtils, to return our record as XML. We can read the XML stream straight into the variable $xmlDoc and parse the XML for the element “Prot-ref_name” (lines 5-7). This contains the official protein name for the gene UID. We then loop through the stored XML object, retrieving the node name (”Proteinref_name”) and its value ($node->textContent). Purists will frown at the use of textContent, by the way. These values are what the script returns to ncbi.js and are displayed as the value of the “geneName” element in form.html. 16

Bio::Blogs #8
To recap then: We select a gene UID from a drop-down list in a normal HTML form Javascript and PHP interact to perform an EUtils query at the NCBI and return an XML file The XML is parsed and appropriate values retrieved Using asynchronous requests to the server (that’s the first ‘A’ in AJAX), javascript updates the page with progress and displays the result All without reloading the page That’s it. That’s AJAX. It’s a particularly stupid example - fetching a huge XML file to parse out one element, but hopefully you get the idea. You can imagine all sorts of uses for this in bioinformatics applications: fetching the most recent data rather than local storage, XML-SQL interconversions, real-time BLAST results and so on. As ever, the only limits are your creativity and requirements. You can see it in action for a short time at this location. Feel free to grab the files and/or copypaste from here to try it on your own server. I added a little extra to the javascript at that location to display a “spinning disk” progress indicator - see if you can figure out where the addition goes. Finally this is all new and exciting to me so if you spot any shocking errors, do let me know.