Bioinformatics:Phylogeny: What Phylogenetic Trees can do for you?
As you know the purpose of phylogeny is to construct a history of life to better understand it, the main ourpose of phylogeny is to groupe organisms according to their similarities.
Genes mutates over time and changes, we mean by that EVOLUTION, that's why there are a lot of species (Diversity) on earth, and that's when Phylogenetics become an indispensable science to Bioinformatics especially Phylogeny.
Phylogenetics is a science that is part of phylogeny and that relies on the comparison of many species genes to find out which species are more related to others and to construct a tree of these species.
To better understand Phylogenetic trees you can read this post HERE. To learn how to use Bioinformatics tools of constructing phylogenetic trees (As PHYLIP) you can read this post HERE.
Phylogenetic Trees can do:
1- Determining the most relative organism to yours your studying.
2- Determining the function of a gene by looking at its relatives (Orthologous genes).
3- Determining genes family.
4- Finding out about the origin of the gene you're studying.
Bioinformatics: Main Applications Of Multiple sequence Alignment
You can read an introductory post to Multiple Sequence AlignmentHERE, to understand what is a Multiple Sequence Alignment.
Multiple Sequence Alignment is almost the most useful tool in Bioinlformatics, it helps almost in every application of Bioinformatics (predicting protein structure, predicting protein function, phylogenetic analysis...etc).
The main applications of Multiple Sequence Alignment are:
1- Structure Prediction: a Multiple Sequence Alignment can give you the almost perfect protein or RNA secondary structure, some times it helps even with the 3D structure.
2- Protein Family: a Multiple Sequence Alignment can help you to decide that your protein is a member of a known protein family or not.
3- Pattern Identification: By looking at conserved regions or sites, you can identify which region is responsible for a functional site.
4- Domain Identification: By looking at file provided by a Multiple Sequence Alignment, you can extract profiles to use them against databases.
5- DNA Regulatory Elements: You can use Multiple Sequence Alignments to locate DNA regulatory elements such as binding sites...etc.
6- Phylogenetic Analysis: By carefully picking related sequences you can reconstruct a tree using sequences that u have used in the Multiple Sequence Alignment (You can use the PHYLIP package and you can find a post about it here).
As Multiple Sequence Alignments are playing a major role in Bioinformatics, you can use it almost anywhere but as every thing on this earth, nothing is perfect or 100% accurate, so u have to choose your sequences very carefully to prevent meaningless results.
You can access the EBI ClustalW program from HERE, to do a Multiple Sequence Alignment.
Multiple sequence alignment is an alignment of more than one (Protein or Nucleic Acid "DNA & RNA") sequence.
What's ClustalW:
ClustalW is a large and complex program for multiple sequence alignments.
Why use ClustalW:
As we said before ClustalW is for multiple sequence alignments which are very important in bioinformatics field and especially studying sequences, by doing a multiple sequence alignment for protein sequences for example we can extract these very useful informations:
1- Conserved sequence regions.
2- Knowing which are active sites and which are not.
3- Predicting protein function.
4- Helping in predicting protein structure.
5- Identify protein family or new members.
6- Calculating trees to know proteins relationship (Distances)...etc.
You can find ClustalW at EBI and you can access it from HERE.
Bioinformatics Tutorials & Lessons: Using TMHMM method to locate Transmembrane helices in Protein sequences
TMHMM is an abreviation of (Transmembrane Hidden Markov Model) which is a statistical model, you can read about this model in this Wikipedia article HERE.
TMHMM is a method for Predicting Transmembrane Helices in a Protein sequence, you can access the TMHMM server from HERE.
This Video is about how to use the TMHMM server to predict Transmembrane helices in a Protein sequence.
PHYLIP or the PHYlogeny Inference Package is a package that contains a lot of programs for infering Phylogenies or by simple words constructing Phylogenetic or Evolutionary Trees.
The Package contains a lot of useful programs and above all of that its free and you can get it from its website from HERE
The Programs contained in the PHYLIP Package can estimate Phylogenies from Protein sequences or Nucleic Acid sequences with different methods (parsimony, maximum likelihood...etc)
It was and still very helpful for Bioinformaticiens and Phylogeny scientists and students as it can provide a complete environment for Phylogeny .
Bioinformatics Tutorials & Lessons: using BLAST to search for similarities
BLAST (Basic Local Alignment Search Tool) is an algorithm or program that can identify similar (Nucleic Acid or Amino Acid) sequences to a query sequence.
Lets say that you have sequenced recently a gene from the mouse genome and you have nothing about this gene except its sequence, here comes the role of BLAST, it searches databases for similar sequences to yours, by this you will find informations about similar sequences to yours like (Gene or protein Family, Organism, related sequences, function...etc), this will help you to identify your sequence.
You can read this wikipedia article to know more about BLAST from HERE
You can read the BLAST help page from HERE or the Documentation from HERE.
This is a video tutorial that demonstrates how to use BLAST to search for similar protein sequences to my sequence. (I used BLAST of SwissProt database)
The Open Reading Frame or (ORF) is a sequence of DNA located between the start-code sequence (initiation codon) and the stop-code sequence (termination codon).
The ORF finder softwares or algorithms are used to locate a gene in a given sequence by locating the initiation codon and the termination codon.
The initiation and termination codon can occur by chance so they could falsify our results, but in general the sequence found between them is not long enough, so to make sure its an ORF, we have to make sure that the sequence between the initiation and termination codon is long so it can represent a GENE.
The DNA sequence can be read in SIX different reading frames, 3 for each strand (because every codon have 3 bases).
In eucaryotic DNA we may find overlapping sequences withing a gene, these overlapping sequences are called INTRONS and they do not code for proteines.
We can see that the first reading frame contains an initiation codon (AUG), the 2nd doesn't contain anything, the 3rd contains a stop codon (UAA).
So if we are about to choose a correct reading frame we would choose the first one.
There are many softwares dedicated for ORF detection, GeneMark is one of the best, it is a family of gene prediction programs developed at Georgia Institute of Technology, Atlanta, Georgia, USA. You can access it from HERE.
Bio Linux 5.0 is a project released in January 2009 for students and researchers in the field of Bioinformatics, it's a linux envirement (ubuntu) + more than 500 Bioinformatics programs with full documentation to each program.
This means that we can say that Bio Linux 5.0 is an easy to use Bioinformatics Workstation, powerful and easy to configure.
Bio Linux 5.0 is developed and maintained by the NERC Environmental Bioinformatics Centre, it contains a complete analysis and development environment easy to use by Bioinformaticiens.
Bio Linux 5.0 can run in a live DVD, that means that you can run it without installing it (without affecting your system), it can also run in a memory stick, You can install it in dual boot with Windows or in a virtual machine if you want to run it with Windows in the same time.
Above all of this Bio Linux 5.0 is FREE and you can download Bio Linux 5.0 from HERE
To access the NERC Environmental Bioinformatics Centre Homepage click HERE
If you already have an ubuntu system installed on your machine you can download Bio Linux 5.0 Packages from HERE and install them on your ubuntu, but i don't recommend that, because it takes more time and effort with less packages (Bio Perl, Bio Python...etc) not included in package repository, so the easy way is to download the full Bio Linux 5.0 and install it directly.
The PDB or Protein DataBank is a Database that contains three-dimentional structures of large biological molecules such as: Proteins and Nucleic Acids.
The data provided by this Database is experimental (X-ray crystallography or NMR spectroscopy), biologists and biochemists submit structures from all over the world.
The PDB Database is playing a major role in Bioinformatics especially Structural Biology.
The Protein Information Resource (PIR) is a major player in Bioinformatics field (Proteomics). It is a joint effort between Georgetown University Medical Centre and the National Biomedical Research Foundation in Washington, D.C.
It was established in 1984 and resulted from the work of Dr. Margaret Dayhoff. Her Atlas of Protein Sequence and Structure, published from 1965–1978, was the first comprehensive collection of protein sequences.
In 1974, Dr.Dayhoff devised the concept of the protein family and superfamily, defined by sequence similarity, as a means of organising and classifying proteins.
In recent years, this concept has been exploited by the PIR Protein Sequence Database (PIR-PSD) to enable them to computer-annotate their entries with functional and structural data. This has facilitated an increase in the number of sequences in the database.
There are many other Databases provided by PIR:
1- PIR-PSD: it has been the most comprehensive and expertly-curated protein sequence database in the public domain for over 20 years. In 2002, PIR joined EBI (European Bioinformatics Institute) and SIB (Swiss Institute of Bioinformatics) to form the UniProt consortium. PIR-PSD sequences and annotations have been integrated into UniProt Knowledgebase.
2- IProClass: integrated resource of family relationships and structural and functional features of proteins. The iProClass database provides value-added information reports for UniProtKB and unique NCBI Entrez protein sequences in UniParc, with links to over 90 biological databases, including databases for protein families, functions and pathways, interactions, structures and structural classifications, genes and genomes, ontologies, literature, and taxonomy.
3- The comprehensive PIR-NREF database of protein sequences: from PIR-PSD, Swiss-Prot, TrEMBL, RefSeq, GenPept, and PDB. PIR-NREF has been discontinued inasmuch as the UniProt databases now include all of its functionalities (Final Release 1.83, 16-Jan-2006). This consolidation provides one centralized comprehensive database and minimizes duplication of work between UniProt and PIR.
4- PIRSF: The PIRSF concept is being used as a guiding principle to provide comprehensive and non-overlapping clustering of UniProtKB sequences into a hierarchical order to reflect their evolutionary relationships. The PIRSF classification system is based on whole proteins rather than on the component domains; therefore, it allows annotation of generic biochemical and specific biological functions, as well as classification of proteins without well-defined domains.
Swiss-Prot is a protein knowledgebase established in 1986 and maintained collaboratively by the Swiss Institute of Bioformatics (SIB) and the European Bioinformatics Institute (EBI).
The SwissProt Database provides a high level of annotation (a detailed file for each entry) that is mantained by expert biologists in the field and a high level of interaction with other Databases with a low level of redundancy.
The documentation is very easy for every one even with beginners in the field.
The TrEMBL protein sequence database was created in 1996 as a complement to Swiss-Prot in response to the need to make new sequences available as quickly as possible.
TrEMBL (Translation of EMBL nucleotide sequence database) initially consisted of computer annotated entries derived from the translation of all coding sequences (CDS) in the DDBJ/EMBL-Bank/GenBank nucleotide sequence database, except for those already included in Swiss-Prot. It now additionally contains protein sequences that are extracted from the literature or submitted to Swiss-Prot.
Now the SwissProt & TrEMBL Databases are playing a major role in Bioinformatics field (Proteomics to be accurate).
For more informations about Protein Sequence Databases you can read this post HERE
To learn how to use SwissProt to search for a specific Protein (Detailed lesson with a video) you can see HERE
For more informations about ExPASy Proteomics Server you can read this post HERE
There are a number of Protein sequence Databases, but it's very important to distinguish between universal databases covering proteins from all species and specialised data collections storing information about specific families or groups of proteins or about proteins of a specific organism.
Universal Databases: 1- The first database that came to mind for this category is the great Swiss-Prot, which is a protein knowledgebase established in 1986 and maintained collaboratively by the Swiss Institute of Bioformatics (SIB) and the European Bioinformatics Institute (EBI).
2- The second Database is the Protein Information Resource (PIR), PIR is a joint effort between Georgetown University Medical Centre and the National Biomedical Research Foundation in Washington, D.C. It was established in 1984 and resulted from the work of Dr. Margaret Dayhoff.
2- InterPro: Which contains Protein signatures, Domains, Sites...etc This Database combines a number of Databases such (PROSITE, PRINTS, Pfam, SMART, TIGRFAMS, PIR SuperFamily (PIRSF) and ProDom) and others
The main general applications of DNA Microarrays are:
1- Determining the expression patterns of Proteines by looking at mRNAs. 2- For Genotyping, detection of different variations in gene sequences (Single Nucleotide Polymorphisms -SNP- for example).
To achieve this we have to do a parallel hybridization analysis, where hybridization is the way to detect whether a particular sequence is present in a DNA sample or not.
In order to do a parallel hybridization analysis, we use a large number of DNA Oligomers that are fixed to known locations on a rigid support.
One DNA Chip or Array may contain 100.000 probe oligomers.
Applications of DNA microarrays include:
1- Investigating cellular states and processes: Patterns of expression that change with cellular state can give clues to the mechanisms of processes such as sporulation, or the change from aerobic to anaerobic metabolism.
2- Diagnosis of disease: Testing for the presence of mutations can confirm the diagnosis of a suspected genetic disease, including detection of a late-onset condition such as Huntington disease, to determine whether prospective parents are carriers of a gene that could threaten their children.
3- Genetic warning signs: Some diseases are not determined entirely and irrevocably by genotype, but the probability of their development is correlated with genes or their expression patterns. A person aware of an enhanced risk of developing a condition can in some cases improve his or her prospects by adjustments in lifestyle.
4- Drug selection: Detection of genetic factors that govern responses to drugs, that in some patients render treatment ineffective and in others even cause serious adverse reactions.
5- Classification of disease: Different types of leukaemia can be identified by different patterns of gene expression. Knowing the exact type of the disease is important in selecting optimal treatment.
6- Target selection for drug design: Proteins showing enhanced transcription in particular disease states might be candidates for attempts at pharmacological intervention (provided that it can be demonstrated, by other evidence, that enhanced transcription contributes to or is essential to the maintenance of the disease state).
7- Pathogen resistance: Comparisons of genotypes or expression patterns, between bacterial strains susceptible and resistant to an antibiotic, point to the proteins involved in the mechanism of resistance.
Nucleotide Sequence Databases are Databases that contains informations about Nucleotide Sequences including: 1- Accession number. 2- Definition (name). 3- Organism. 4- Authors that submitted this sequence. 5- Chromosome location. 6- Description and a lot more...
There are 3 Main Nucleotide Sequence Databases that are synchronized or updated daily and publicly available.
Genetical, mophological, Biochemical evidences are now showing that all organisms on earth are genetically related, so every scientist is searching for what's called "The Tree Of Life" that represents the Phylogeny of organisms.
What is Phylogeny?
Phylogeny is the history of organismal lineages as they change through time. It implies that different species arise from previous forms via descent, and that all organisms, from the smallest microbe to the largest plants and vertebrates, are connected by the passage of genes along the branches of the phylogenetic tree that links all of Life.
Phylogenetic tree:
The Phylogenetic tree or Evolutionary tree is a tree showing the evolutionary relationship between various species that are thought to have a common Ancestor.
Each node in the tree represents the most recent common ancestor of the descendants, the edge lengths in some trees correspond to estimated time. Each node is called a taxonomic unit. Internal nodes are generally called hypothetical taxonomic units (HTUs) as they cannot be directly observed.
In Bioinformatics, Softwares align sequences of species that are thought to have a common ancestor ( multiple sequence alignment) , and calculate the distance between organisms (by using the number of mutations...etc), in the end it displays a graphical view of the tree with nodes and their corresponding edge lengths.
Introduction to Bioinformatics by Arthur M. Lesk: 475 Pages
This book is a great book for beginners in this field "Bioinformatics", if you read it you'll have a complete image about Bioinformatics field.
Book's content:
The book contains 7 chapters including:
1- Introduction: in the introduction the writer provides an initial (Biological and Computer science's) informations to understand Bioinformatics including (Bioinformatics information, World Wide Web, computer science, biological nomenclature, programming, Proteomics, Genomics....etc).
2- Genome organisation and evolution: Genomes, Proteomes, Differences between Eukarya and Prokaryotes Genomes and Proteomes...etc
3- Scientific publications and archives: media, content and access: DataBases, softwares, programming languages...etc
4- Archives and information retrieval: Different types of Databases including Protein sequences Databases, Nucleic acid sequences Databases, analysis softwares...etc
ProtParam is a very useful software that can computes various physico-chemical properties of proteines, all you have to do is enter the Protein sequence in raw format or write its accession number or ID on (Swiss-Prot/TrEMBL).
What ProtParam can do for you? 1- Number of amino acids. 2- Molecular weight. 3- Theoretical pI. 4- Amino acid composition (%). 5- Atomic composition. 6- Extinction coefficients. 7- Estimated half-life. 8- Instability index. 9- Aliphatic index.....etc.
The swineflu A (H1N1) virus is an RNA virus that codes 8 genes, its Genome is composed of avian flu, human flu Type A, human flu Type B, Asian swine flu, and European swine flu, this combination is supposed to be rare and have only a chance of less than 0.1 to be a natural event.
The two anti-viral drugs Tamiflu and Relenza are availible on the market and can lessen the symptoms of swine flu.
But the Swine Flu virus has made some sort of resistance to Tamiflu and the % of resistance is growing now.
Now all submitted influenza sequences are availible at GenBank and are availible for Blast searching at NCBI here , with a set of tools that you can use to analyse the sequences.
So we hope that the cure will be found before the next mutation of the virus.
And that is the general format, the first line have the ">" at the beginning followed by a difinition of your protein or DNA sequencs. The second line is where begins your protein or DNA sequence.
Notes: * The first line can contain informations like: 1- Database name like sp which means SwissProt. 2- Database accession number like (Q3LGA9) . 3- Protein or DNA sequence name. 4- Organism for example Homo sapiens...........etc * The sequences use one capital letter codes, then the software begins to scan the second line after the first that contains the ">" sign until the end of the sequence (it there is only one sequence inthe file).
The FASTA format is the default sequences format because its easy to parse, thats why most of analysis Softwares uses FASTA format like BLAST, CLUSTALW. Some programs uses the RAW format which is FASTA format without the first line (definition line).
Expasy Proteomics Server is a huge database which contains a variety of databases and a lot of tools and softwares used in molecular biology for analysing proteines.
The Expasy database contains a lot of ressources including:
1- Databases (SWISS-PROT, Prosite, ViralZone...etc ). 2- Tools and softwares to analyse proteines (Similarity searches, Post-translational modifications, Predicting proteines structures).
The databases included in the Expasy Proteomics Server are:
1- SWISS-PROT knowledgebase: a curated protein sequence database that provides high quality annotations (such as the description of the function of a protein, its domain structure, post-translational modifications and variants), a minimal level of redundancy and a high level of integration with other databases. 2- TrEMBL: contains computer-annotated entries for all sequences not yet integrated in SWISS-PROT. SWISS-PROT and TrEMBL are maintained collaboratively by the SIB and the European Bioinformatics Institute (EBI). 3- PROSITE: a database of protein domains and families. PROSITE contains biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. 4- SWISS-MODEL Repository: a database of automatically generated structural protein models
And a lot of other Databases.
The Softwares includes in it are:
They have a huge amount of softwares, we will talk about some of them:
1- Softwares of Protein identification and characterization (Aldente, PepMAPPER...etc). 2- Prediction or characterization tools (ProtParam, PeptideMass...etc). 3- DNA to Protein softwares (Translation...etc). 4- Similarity searches (Blast). 5- Pattern and profile searches (InterPro Scan, PROSITE Scan). 6- Post-translational modification prediction (LipoP, Predotar). 7- Primary, Secondary, Tertiary structure analysis and Prediction. 8- Molecular modeling and visualization tools.
U know if i go on i'll not complete this list forever!!!so i'll stop here.
To access the EXPASY Proteomics Server click HERE Or search in google for the term EXPASY and click the first entry.
Bioinformatics Tutorials & Lessons:Tutorials & Lessons: use SWISSPROT to search for a specific protein:
Let say that you have a specific protein and you want to do some research about it including: 1- Informations about organisms that have this protein. 2- The function of this protein. 3- The protein sequence. 4- Complete references about this protein...............etc And a lot of athor features and informations.
That protein is for example "Myosin", you can choose any protein you're interested in. The first step to do is to enter to the SWISSPROT website at expasy: 1- Enter the site directly from here or go to google and write SWISSPROT, the first website at expasy is the SWISSPROT website, you'll see something like this:
2- Enter your protein name in the search box shown by the red arrow in the picture above. * In this tutorial i'm searching for the protein "myosin" for example. 3- Click the GO button to start the search. 4- You will see the result page like bellow:
The information provided by this result is to huge and not accurate like related proteines, so we need to set a couple of things to get the results we need. 5- To do that you need to click on fields shown bellow:
You will see this:
From 1 shown by the arrow you can choose AND, NOT, OR AND: means that you will add something like organism name for example. NOT: means that you can eliminate searches that contain the word you'll write, like eliminate an organism. OR: will searches for example for myosin OR actin if you want to.
From 2 we can choose our field like protein name, organism, gene name...etc From 3 the term section, we can add the word we need to add it to search. 6- We will set AND and Protein name from the field dropdown menu. 7- In the term section we write Myosin to search only for Protein wich names are Myosin and exclude related proteines. 8- We click Add & Search, and we will see this:
We remark that the number of hits had dropped down and also this hits shows only proteines with protein name containing the word Myosin. Let's say for example that you want only the protein in a specific organism like Homo Sapiens, then we will repeat the steps from 5 to 8 by clicking "field", choosing AND, Organism from the field dropdown menu and write Homo Sapiens in the Term field. 9- we click Add & Search and we'll get this:
As you can see that the numbet of hits has dropped from more than 200 in the first search to 11 here. Because the protein Myosin have several chains, we will choose for example Myosin-Va. 10- By clicking the accession number shown bellow you will be taken to the information of this entry.
The informations about this protein are classed by category: * Names and origins. * Protein attributes. * General anotations (protein function, subunits structure...etc). * Refferences.
Our interest for now is the Sequences section, where we'll find the protein sequence. We can see the sequence bellow:
To see the protein sequence without numbers within lines click on 1 shown by the red arrow To do a blast search for similarities with other proteines choose Blast and click go like shown 2 by the red arrow.
Use SWISSPROT to search for a specific protein (Video Lesson on Youtube)
Microarrays are micro-chips used in molecular biology and medecine to achieve a lot of useful tests including gene expression.
To inderstand this technology, we should put a thing in our minds, wich is: 1- Not all the genome codes for proteines. 2- Not all genes always turned on.
We use the term Gene Expression to describe the transcription of the information containes within the DNA into mRNA, which is after translated to proteines.
Scientists have to study these genes to identify which of theme are expressed and which are not.
Gene expression is a highly complex and tightly regulated process that allows a cell to respond dynamically both to environmental stimuli and to its own changing needs.
This mechanism acts as both an "on/off" switch to control which genes are expressed in a cell as well as a "volume control" that increases or decreases the level of expression of particular genes as necessary.
So thats what DNA microarrays are used for.
To inderstand such a process, there is nothing better then animations...!
Here are some animation that i found very useful to fully understand Microarrays:
Note: The first animation is pretty simple and good for beginners:
Bioinformatics lies in first place on DATA (Genomics & Proteomics...etc), so without data, Bioinformatics have nothing to analyse.
Before we can use analysis softwares we should have DNA or Protein sequences, so the first thing we have to do is sequencing.
The term DNA Sequencing refers to the methods applied to identify the order of DNA nucleotides or bases (Adenine, Guanine, Cytosine, Thymine).
Now with the advancements of technology, DNA Sequencing is indispensable for the most of biological researches because its the only way to provide almost complete and accurate data.
DNA Sequencing methods: There are many ways or methods of DNA Sequencing but i like to introduce the sanger method explained by the beautiful and easy animation HERE
I picked the Sanger or (dideoxy) method, because its the more commonly used and the easier to apply.
The word Proteome has came from the combination of the "protein" & "genome".
Proteomics is the study of proteines especially their structure and function and its the second step after genomics.
We all know that proteines are the molecules of life "as they say" because they are the acting molecules in every living organism, so by the study of Genomics and genes, we don't have every thing because:
1- Not all the genome codes for proteines (non coding regions "introns & exons"). 2- The proteines will have post-translational modifications after they were translated (phosphorylation...etc).
The study of proteomics is more complicated than genomics because we're studying a variable thing that differs from cell to cell and from time to time.
By studying proteines we will discover: 1- Active sites that interacts with other molecules 2- Functions of these proteines. 3- Their location (transmembrane, outside or inside the cell)...etc.
Proteomics has solved a lot of problems and mysteries of many scinces like the case of the Alzheimer's disease in medecine, heart desease...etc
The main source of proteine sequences and information is the huge swissprot database, thanks to them the proteine analysis now is a peace of cake, you ca, find the data base here SWISSPROT
Genomics is the study of the entire genome of species (the sum of all genes of an organism) and their interaction with eachother, in contrast the study of a single gene is the role of molecular biology and genetics. the study of genomes includes the DNA, RNA, Proteines levels.
recently there have been extensive sequencing projects of species genomes like the HGP (Human Genome Project) and a lot of other species (animals, insects, bacteria, viruses...etc).
You can find human genome sequences and many other species in the UCSC Genome Browser you'll find it a little complicated first but you'll get familiar with it very fast.
the UCSC Genome Browser contains now more than 45 complete genomes.
With the huge amount of sequences provided by sequencing projects, there is no way one can analyse it without the use of Bioinformatics tools, well thats good for us because if we have more than 3 billion pb that our brains will explode by reaching the 30 base!!!
Bioinformatics as its name means is the use of computer science or informatics materials (hardwares & softwares) to analyse biological data, this data includes (genetic data, molecular biology, microbiology, virology,) and many many topics of biology.
As we all know that genetic material is the one responsible for the design of all living organisms, then when we master the genetic code, we can get rid off all malfunctions provided by it (diseases).
Bioinformatics links to most biology branches (genetics, molecular biology, microbiology, epidemology, phylogeny, zoology,...etc )
The advances provides by technology made bioinformatics easy as eating a cake, because the human brain can't handle all of this huge amount of biological data, otherwise home computers now can do 4 billions operations per one second.
The main role of bioinformatics today is to do what humans brain can't do including:
1- Analyse and compare immense genetic data (code). 2- Finding similarities with other species genetic code. 3- Searching databases (genbank or swissprot) for a query sequence. 4- Establishing phylogenetic relashionships between species. 5- Finding 3D protein structures to understand better active sites....etc.
and the list will go on and on
Nowadays bioinformatics is making huge advances and providing accurate answers to medecine and other sciences like in the case of the TAMIFLU (medicament for the seasonal flu and the swine flu), that was born by bioinformatics on the computers screen.