Bioinformatics: Feel the fear and do it anyway
Bioinformatics: Feel the fear and do it anyway
The Mozambique tilapia (Oreochromis mossambicus) a fresh water species that is able to grow and breed in sea water.Photo: Greg Hume
By Cornelia Eisenach
Have you ever run a BLAST search, performed a sequence alignment, trawled gene and protein sequence databases or used tools to predict the structure of your favourite protein? Chances are you have done at least one of these things and are therefore part of the bioinformatics revolution that is changing biological research with pace.
Although bioinformatics now informs almost every aspect of biology, a recent SEB member survey suggests that we still have some way to go to really get to grips with the opportunities this new field offers. “We identified a need amongst SEB members for training in bioinformatics”, explains Terri Attwood, Professor of Bioinformatics at the University of Manchester. She is also one of the drivers behind GOBLET (www.mygoblet.org; see page 20), the organisation that co-developed the survey. “Nearly 70% of all respondents wanted training in data analysis and interpretation”, she says “and at the same time three quarters of respondents classified themselves as selftaught or taught by colleagues when it came to using bioinformatics.” Many SEB members, animal, plant and cell experimental biologists alike, use bioinformatics in their research – I spoke to some of them about their experiences:
HOW FRESH WATER FISH CAN GROW IN SALTY SEA WATER
Dr Avner Cnaani from the Agricultural Research Organisation in Israel uses bioinformatics to understand physiological adaptations of fish to sea water. Avner is analysing the organspecific transcriptome of tilapia to understand how this freshwater fish is able to survive in salty water. “Fish blood has a salt content of around 1%”, explains Avner, “way below that of sea water, which means that marine fish need to overcome dehydration in the high-saline conditions of the oceans”. Avner’s team is trying to answer the question why one species, the Nile tilapia, suffers stress in sea water, whereas another species, the Mozambique tilapia, can grow and breed in sea water nearly as well as in fresh water. “These two species are very closely related, they can even interbreed”, says Avner. “Yet they show great differences in how they cope with salinity”. He is using this two-species platform to apply bioinformatics techniques to identify new genes and pathways that are important for the acclimatisation of the fish to sea water.
Tilapia is the second largest group of fish grown in aquacultures worldwide, farmed mostly in tropical and subtropical regions. With fresh water being a precious resource, many farmers are trying to grow this fish in brackish water, which cannot be used for drinking or irrigation. So understanding the mechanisms of salinity adaptation is of great interest from an agricultural and water resources point of view. “Previously we had to rely on information gained from mammals or model species to investigate such physiological adaptations in fish”, says Avner, “but now we are able to discover new key genes using bioinformatics techniques.”
COLLABORATION IS KEY
Although he recognises the power of these techniques for his research, Avner is not a bioinformatics expert himself. His work relies on collaboration and the recruitment of students and co-workers with programming and computing skills. At his institute the growing demand for bioinformatics support has been recognised and more experts have been hired. “In order to communicate with the experts you have to lose the fear of using LINUX or programming, seek training and educate yourself”, says Avner.
Dr Lu Ma, a postdoctoral researcher at Queen Mary University London (QMUL), experienced the benefits of collaboration while studying barley genome sequences in his previous research project. “At the time it was difficult to find unique sequences for fluorescence in situ hybridisation in barley”, says Lu. However, after deciding to collaborate with bioinformaticians, he saw how in silico analysis of huge data sets overcame this problem. “I got amazing results from that collaboration”, says Lu, “and I realised that combining bioinformatics with traditional research would be a trend for the future.”
Tali Nitzan, the lab manager at the Avner lab and graduate student Pazit Rozenberg are dissecting tilapia for analysis. Photo:Dr Avner Cnaani
Genome Size Does Matter
After obtaining a Marie-Curie fellowship, Lu started his current project at QMUL working on Fritillaria. You might come across this spring-flowering plant on your way to work. The checkpattern flowers gave one Fritillaria species the common name Snake’s Head. Compared to many other modern plants, such as Arabidopsis, Fritillaria has a huge genome size. With roughly 50 giga base pairs its genome is 15 times bigger than that of human beings. The genome size is determined by the number of base pairs in one cell of a species and, although, in general, larger genomes allow for more complex organisms, it also means that more time and energy is required for DNA replication. So what is the benefit of such large genomes? Lu is aiming to answer this question using next-generation sequencing combined with bioinformatics analysis to obtain an overview of the Fritillaria genome. “We know that the number of genes is quite similar between Fritillaria and Arabidopsis” says Lu, “so where in the Fritillaria genome the expansion occurred and why is one of the mysteries we are trying to solve”.
THERE IS NO SUCH THING AS AN EASY START
“At the beginning of my research project I knew little about bioinformatics”, says Lu. “I sometimes had to start over with my analyses which made me lose valuable time.” Like many SEB members, Lu began by asking colleagues and on-campus specialists or by simply using Google. He quickly acquired programming skills and now uses command-line driven tools for his analysis. Although web-based tools such as GALAXY(1) exist that allow bioinformatics analysis without the use of commands, Lu says that being able to use command lines makes progress much faster. Looking back at his experience, though, he recommends that anyone entering this field seeks systematic training so as to have a general overview from the start.
LEARN TO PROGRAMME
“The best advice I could give to anyone trying to get into this field of research is to learn programming and get familiar using LINUX”, says Dr Thomas Wicker, group leader and bioinformatician at the Institute of Plant Biology at the University of Zürich. His office looks to me like the prototypical IT office – stuffed with computers; cables, hard disks and components scattered around, Thomas sits in front of two joined screens. “I started programming at a young age and always had an affinity for computers, but I studied biology”. His computer-savviness came in handy at a time when more and more sequences were being produced and there was an increasing need to analyse them. Now, Thomas is involved in comparative genomics, where his group analyses sequences of crops such as wheat, rye and barley to study genome evolution and expansion.
Presented with the outcome of the SEB survey, Thomas is not surprised. “I have seen this before. People start off with little or no knowledge and gain most of their training by asking around.” Thomas says that the problem lies with the fact that, in most cases, researchers and students are not exposed to what lies underneath the tools and databases they use. “They often do not know where the information they use comes from and how it is generated”, he says.
KNOW YOUR TOOLS
Vicky Schneider, Head of Scientific Training, Education and Learning at The Genome Analysis Centre (TGAC) in Norwich, agrees. Like Lu, Avner and Thomas, Vicky started out in a traditional field of biology before getting into bioinformatics. Throughout her work as postdoctoral researcher and Assistant Professor she realised that there was a need for students to delve below the surface of the tools and databases they were using. “For example, when it comes to using the protein knowledgebase UniProt, students often don’t realise that it comprises entries from two traditional databases, SwissProt and TrEMBL, which differ substantially.” Vicky explains that Swiss-Prot holds high quality protein sequences that are annotated manually and that are reviewed. TrEMBL, however, holds sequences that are not reviewed and that are associated with computationally generated annotation and large-scale functional characterisation.
PhD Student Margarita Shatalina and postdoctoral researcher Francis Parlange of Thomas Wicker’s group at a field research site in Switzerland. Photo: Dr Thomas Wicker
According to Vicky, biologists also need to become more proactive. “Bioinformatics is not a one-way street”, she says. “A lot of biologists use databases to inform their research, but then never go back to annotate and feed back into these databases. We need a shift in mentality. Scientists need to realise that once they have published, they ought to get active, deposit their information in appropriate databases and share their data.”
A MATTER OF JARGON
Vicky runs and organises workshops to address the training needs of biologists and publishes on the subject(2) . She is passionate about how to teach and train, working to ensure that training is effective. “When I hold workshops I first do a reality check to figure out the experiences of the audience and then I aim to demystify the subject.” In her mind, biologists often have a barrier that prevents them getting to grips with bioinformatics and basic computing concepts. This barrier often lies in simple semantics, in the use of jargon. She says: “It is often a problem of communication; biologists and computer scientists have to learn to speak the same langue and be patient with one another.”
Teresa (Terri) Attwood and Victoria (Vicky) Schneider
WHEN THINGS GO WRONG
As much as researchers have managed to overcome their difficulties in using bioinformatics resources, many still use them without formal training, and without understanding and knowing the theories underlying online tools and databases. “In the worst cases”, says Terri Attwood, “this can lead to misinterpretation of results from bioinformatics analyses. The problem is that such flawed results, if published, end up in the public domain and can re-enter the databases, which may mislead future research efforts”. Thomas Wicker agrees: “Not only may data be misinterpreted; you may also miss your next big discovery. If you don’t know how to analyse your data efficiently, you might miss some crucial information”.
STANDING ON THE SHOULDERS OF GIANTS
As genome analyst Lu Ma says, “Nowadays we are flooded with sequence data. If you look at the top-ranking journals in the plant science field, you see that almost everybody is doing sequencing”. A remark made in 1967 by one of bioinformatics’ pioneers Margaret Dayhoff now seems at odds with our reality of over 50 million sequence entries held in UniProtKB/TrEMBL(3) : “There is a tremendous amount of information regarding the evolutionary history and biochemical function implicit in each sequence and the number of known sequences is growing explosively.” At the time, her Atlas of Protein Sequence and Structure contained 65 sequences(4) . Now, standing on the shoulder of giants as we click our way through online tools and databases there is a need to make sense of all these data. Says Lu: “New techniques such as next generation sequencing have changed the world. The bottleneck is now no longer to obtain sequences but to analyse them and give meaning to them.”
References:
1) galaxyproject.org/, accessed 09/02/2014
2) Schneider MV, Watson J, Attwood T, et al (2011) Bioinformatics training: a review of challenges, actions and support requirements. Brief Bioinformatics 11, 544-51.
3) www.ebi.ac.uk/uniprot/TrEMBLstats, accessed 09/02/2014
4) Attwood, T. K., Gisel, A., Eriksson, N.-E., & Bongcam-Rudloff, E. (2011). Concepts, Historical Milestones and the Central Place of Bioinformatics in Modern Biology: A European Perspective. Bioinformatics – Trends and Methodologies, Mahmood A. Mahdavi (Ed.), InTech.