Monday, November 30, 2009

Bioinformatics:SWISSPROT & TrEMBL

ADVERTISEMENTS

Bioinformatics:SWISSPROT & TrEMBL Database

Swiss-Prot is a protein knowledgebase established in 1986 and maintained collaboratively by the Swiss Institute of Bioformatics (SIB) and the European Bioinformatics Institute (EBI).

The SwissProt Database provides a high level of annotation (a detailed file for each entry) that is mantained by expert biologists in the field and a high level of interaction with other Databases with a low level of redundancy.

The documentation is very easy for every one even with beginners in the field.

The TrEMBL protein sequence database was created in 1996 as a complement to Swiss-Prot in response to the need to make new sequences available as quickly as possible.

TrEMBL (Translation of EMBL nucleotide sequence database) initially consisted of computer annotated entries derived from the translation of all coding sequences (CDS) in the DDBJ/EMBL-Bank/GenBank nucleotide sequence database, except for those already included in Swiss-Prot. It now additionally contains protein sequences that are extracted from the literature or submitted to Swiss-Prot.

Now the SwissProt & TrEMBL Databases are playing a major role in Bioinformatics field (Proteomics to be accurate).

For more informations about Protein Sequence Databases you can read this post HERE

To learn how to use SwissProt to search for a specific Protein (Detailed lesson with a video) you can see HERE

For more informations about ExPASy Proteomics Server you can read this post HERE

Any questions you are welcome.

Friday, November 27, 2009

Bioinformatics:Proteomics:Protein Databases

ADVERTISEMENTS

Bioinformatics:Proteomics:Protein Databases

There are a number of Protein sequence Databases, but it's very important to distinguish between universal databases covering proteins from all species and specialised data collections storing information about specific families or groups of proteins or about proteins of a specific organism.

Universal Databases:
1- The first database that came to mind for this category is the great Swiss-Prot, which is a protein knowledgebase established in 1986 and maintained collaboratively by the Swiss Institute of Bioformatics (SIB) and the European Bioinformatics Institute (EBI).

You can access the Database from HERE

2- The second Database is the Protein Information Resource (PIR), PIR is a joint effort between Georgetown University Medical Centre and the National Biomedical Research Foundation in Washington, D.C. It was established in 1984 and resulted from the work of Dr. Margaret Dayhoff.

You can Access the Database from HERE

Specialized Databases:
1- The Protein Data Bank (PDB): This Database contains three-dimensional structural data

You can access the Database from HERE

2- InterPro: Which contains Protein signatures, Domains, Sites...etc
This Database combines a number of Databases such (PROSITE, PRINTS, Pfam, SMART, TIGRFAMS, PIR SuperFamily (PIRSF) and ProDom) and others

You can access the Database from HERE

Note: In this post i mentioned the important and well known Databases, but actually there are many others

Any questions you're welcome.

Wednesday, November 25, 2009

Bioinformatics: DNA Microarrays Applications

ADVERTISEMENTS

The main general applications of DNA Microarrays are:

1- Determining the expression patterns of Proteines by looking at mRNAs.
2- For Genotyping, detection of different variations in gene sequences (Single Nucleotide Polymorphisms -SNP- for example).

An introduction to Microarrays can be found here in this post.

To achieve this we have to do a parallel hybridization analysis, where hybridization is the way to detect whether a particular sequence is present in a DNA sample or not.

In order to do a parallel hybridization analysis, we use a large number of DNA Oligomers that are fixed to known locations on a rigid support.

One DNA Chip or Array may contain 100.000 probe oligomers.

Applications of DNA microarrays include:

1- Investigating cellular states and processes: Patterns of expression that change with cellular
state can give clues to the mechanisms of processes such as sporulation, or the change
from aerobic to anaerobic metabolism.

2- Diagnosis of disease: Testing for the presence of mutations can confirm the diagnosis of a
suspected genetic disease, including detection of a late-onset condition such as
Huntington disease, to determine whether prospective parents are carriers of a gene that
could threaten their children.

3- Genetic warning signs: Some diseases are not determined entirely and irrevocably by
genotype, but the probability of their development is correlated with genes or their
expression patterns. A person aware of an enhanced risk of developing a condition can in
some cases improve his or her prospects by adjustments in lifestyle.

4- Drug selection: Detection of genetic factors that govern responses to drugs, that in some
patients render treatment ineffective and in others even cause serious adverse reactions.

5- Classification of disease: Different types of leukaemia can be identified by different patterns
of gene expression. Knowing the exact type of the disease is important in selecting optimal
treatment.

6- Target selection for drug design: Proteins showing enhanced transcription in particular
disease states might be candidates for attempts at pharmacological intervention (provided
that it can be demonstrated, by other evidence, that enhanced transcription contributes to
or is essential to the maintenance of the disease state).

7- Pathogen resistance: Comparisons of genotypes or expression patterns, between bacterial
strains susceptible and resistant to an antibiotic, point to the proteins involved in the
mechanism of resistance.

Any questions comment.

Sunday, November 22, 2009

Bioinformatics: Nucleotide Sequence Databases

ADVERTISEMENTS


Bioinformatics: Nucleotide Sequence Databases


Nucleotide Sequence Databases are Databases that contains informations about Nucleotide Sequences including:
1- Accession number.
2- Definition (name).
3- Organism.
4- Authors that submitted this sequence.
5- Chromosome location.
6- Description and a lot more...

There are 3 Main Nucleotide Sequence Databases that are synchronized or updated daily and publicly available.

1- GenBank (National Center for Biotechnology Information).

2- EMBL (European Bioinformatics Institute).

3- DDBJ (DNA Databank Of Japan).

Any comments, you're welcome.

Friday, November 20, 2009

Bioinformatics: Phylogenetic Trees

ADVERTISEMENTS

Bioinformatics: Phylogenetic Trees

Genetical, mophological, Biochemical evidences are now showing that all organisms on earth are genetically related, so every scientist is searching for what's called "The Tree Of Life" that represents the Phylogeny of organisms.

What is Phylogeny?

Phylogeny is the history of organismal lineages as they change through time. It implies that different species arise from previous forms via descent, and that all organisms, from the smallest microbe to the largest plants and vertebrates, are connected by the passage of genes along the branches of the phylogenetic tree that links all of Life.



Phylogenetic tree:

The Phylogenetic tree or Evolutionary tree is a tree showing the evolutionary relationship between various species that are thought to have a common Ancestor.

Each node in the tree represents the most recent common ancestor of the descendants, the edge lengths in some trees correspond to estimated time. Each node is called a taxonomic unit. Internal nodes are generally called hypothetical taxonomic units (HTUs) as they cannot be directly observed.

In Bioinformatics, Softwares align sequences of species that are thought to have a common ancestor ( multiple sequence alignment) , and calculate the distance between organisms (by using the number of mutations...etc), in the end it displays a graphical view of the tree with nodes and their corresponding edge lengths.

Any comments you are welcome.

Tuesday, November 17, 2009

Books:Introduction to Bioinformatics

ADVERTISEMENTS

Introduction to Bioinformatics by Arthur M. Lesk: 475 Pages



This book is a great book for beginners in this field "Bioinformatics", if you read it you'll have a complete image about Bioinformatics field.

Book's content:

The book contains 7 chapters including:

1- Introduction: in the introduction the writer provides an initial (Biological and Computer science's) informations to understand Bioinformatics including (Bioinformatics information, World Wide Web, computer science, biological nomenclature, programming, Proteomics, Genomics....etc).

2- Genome organisation and evolution: Genomes, Proteomes, Differences between Eukarya and Prokaryotes Genomes and Proteomes...etc

3- Scientific publications and archives: media, content and access: DataBases, softwares, programming languages...etc

4- Archives and information retrieval: Different types of Databases including Protein sequences Databases, Nucleic acid sequences Databases, analysis softwares...etc

5- Alignments and phylogenetic trees: sequences alignment methods, constructing Phylogenetic trees, softwares...etc

6- Structural Bioinformatics and drug discovery: Protein folding methods, Prediction of protein function, Drug discovery and development...etc.

7- Proteomics and system biology: DNA Microarrays, Protein interaction networks, metabolic pathways, exercices...etc


Sunday, November 15, 2009

Bioinformatics Tutorials & Lessons:Expasy:What's ProtParam?

ADVERTISEMENTS

Bioinformatics Tutorials & Lessons:Expasy:What's ProtParam?

ProtParam is a very useful software that can computes various physico-chemical properties of proteines, all you have to do is enter the Protein sequence in raw format or write its accession number or ID on (Swiss-Prot/TrEMBL).

What ProtParam can do for you?
1- Number of amino acids.
2- Molecular weight.
3- Theoretical pI.
4- Amino acid composition (%).
5- Atomic composition.
6- Extinction coefficients.
7- Estimated half-life.
8- Instability index.
9- Aliphatic index.....etc.

To access the ProtParam tool click HERE.
To access its documentation click HERE.

Any questions, you're welcome.

Friday, November 13, 2009

Bioinformatics: Swine Flu Genome:

ADVERTISEMENTS

Bioinformatics: Swine Flu Genome:

The swine flu A (H1N1) virus is an RNA virus that codes 8 genes, its Genome is composed of avian flu, human flu Type A, human flu Type B, Asian swine flu, and European swine flu, this combination is supposed to be rare and have only a chance of less than 0.1 to be a natural event.

The two anti-viral drugs Tamiflu and Relenza are availible on the market and can lessen the symptoms of swine flu.

But the Swine Flu virus has made some sort of resistance to Tamiflu and the % of resistance is growing now.

Now all submitted influenza sequences are availible at GenBank and are availible for Blast searching at NCBI here , with a set of tools that you can use to analyse the sequences.

So we hope that the cure will be found before the next mutation of the virus.

Thursday, November 12, 2009

Bioinformatics:FASTA format:

ADVERTISEMENTS

Bioinformatics:General Bioinformatics:FASTA format:

FASTA is the name of the most popular and used Alignement and Databases scanning program, created by W.R. Pearson and D.J. Lipman in 1988.

The sequences used by FASTA have to be like this:

>My_Sequence_Name
MKLWLVIVISVFSFVIMGTGVVQKAEAELSEEGRQVAKPNAEAQLSGEEYKKANVVQKVEAEL

And that is the general format, the first line have the ">" at the beginning followed by a difinition of your protein or DNA sequencs.
The second line is where begins your protein or DNA sequence.

Notes:
* The first line can contain informations like:
1- Database name like sp which means SwissProt.
2- Database accession number like (Q3LGA9) .
3- Protein or DNA sequence name.
4- Organism for example Homo sapiens...........etc
* The sequences use one capital letter codes, then the software begins to scan the second line after the first that contains the ">" sign until the end of the sequence (it there is only one sequence inthe file).

The FASTA format is the default sequences format because its easy to parse, thats why most of analysis Softwares uses FASTA format like BLAST, CLUSTALW.
Some programs uses the RAW format which is FASTA format without the first line (definition line).

Any questions? you're welcome.

Tuesday, November 10, 2009

ExPASy Proteomics Server

ADVERTISEMENTS

Bioinformatics:Proteomics:ExPASy Proteomics Server:

Introduction:

Expasy Proteomics Server is a huge database which contains a variety of databases and a lot of tools and softwares used in molecular biology for analysing proteines.

The Expasy database contains a lot of ressources including:

1- Databases (SWISS-PROT, Prosite, ViralZone...etc ).
2- Tools and softwares to analyse proteines (Similarity searches, Post-translational modifications, Predicting proteines structures).

The databases included in the Expasy Proteomics Server are:

1- SWISS-PROT knowledgebase: a curated protein sequence database that provides high quality annotations (such as the description of the function of a protein, its domain structure, post-translational modifications and variants), a minimal level of redundancy and a high level of integration with other databases.
2- TrEMBL: contains computer-annotated entries for all sequences not yet integrated in SWISS-PROT. SWISS-PROT and TrEMBL are maintained collaboratively by the SIB and the European Bioinformatics Institute (EBI).
3- PROSITE: a database of protein domains and families. PROSITE contains biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs.
4- SWISS-MODEL Repository: a database of automatically generated structural protein models

And a lot of other Databases.

The Softwares includes in it are:

They have a huge amount of softwares, we will talk about some of them:

1- Softwares of Protein identification and characterization (Aldente, PepMAPPER...etc).
2- Prediction or characterization tools (ProtParam, PeptideMass...etc).
3- DNA to Protein softwares (Translation...etc).
4- Similarity searches (Blast).
5- Pattern and profile searches (InterPro Scan, PROSITE Scan).
6- Post-translational modification prediction (LipoP, Predotar).
7- Primary, Secondary, Tertiary structure analysis and Prediction.
8- Molecular modeling and visualization tools.

U know if i go on i'll not complete this list forever!!!so i'll stop here.

To access the EXPASY Proteomics Server click HERE
Or search in google for the term EXPASY and click the first entry.

Any questions, u're welcome

Sunday, November 8, 2009

Bioinformatics Tutorials & Lessons: Use SWISSPROT to search for a specific protein:

ADVERTISEMENTS

Bioinformatics Tutorials & Lessons:Tutorials & Lessons: use SWISSPROT to search for a specific protein:

Let say that you have a specific protein and you want to do some research about it including:
1- Informations about organisms that have this protein.
2- The function of this protein.
3- The protein sequence.
4- Complete references about this protein...............etc
And a lot of athor features and informations.

That protein is for example "Myosin", you can choose any protein you're interested in.
The first step to do is to enter to the SWISSPROT website at expasy:
1- Enter the site directly from here or go to google and write SWISSPROT, the first website at expasy is the SWISSPROT website, you'll see something like this:



2- Enter your protein name in the search box shown by the red arrow in the picture above.
* In this tutorial i'm searching for the protein "myosin" for example.
3- Click the GO button to start the search.
4- You will see the result page like bellow:


The information provided by this result is to huge and not accurate like related proteines, so we need to set a couple of things to get the results we need.
5- To do that you need to click on fields shown bellow:


You will see this:


From 1 shown by the arrow you can choose AND, NOT, OR
AND: means that you will add something like organism name for example.
NOT: means that you can eliminate searches that contain the word you'll write, like eliminate an organism.
OR: will searches for example for myosin OR actin if you want to.

From 2 we can choose our field like protein name, organism, gene name...etc
From 3 the term section, we can add the word we need to add it to search.
6- We will set AND and Protein name from the field dropdown menu.
7- In the term section we write Myosin to search only for Protein wich names are Myosin and exclude related proteines.
8- We click Add & Search, and we will see this:


We remark that the number of hits had dropped down and also this hits shows only proteines with protein name containing the word Myosin.
Let's say for example that you want only the protein in a specific organism like Homo Sapiens, then we will repeat the steps from 5 to 8 by clicking "field", choosing AND, Organism from the field dropdown menu and write Homo Sapiens in the Term field.
9- we click Add & Search and we'll get this:



As you can see that the numbet of hits has dropped from more than 200 in the first search to 11 here.
Because the protein Myosin have several chains, we will choose for example Myosin-Va.
10- By clicking the accession number shown bellow you will be taken to the information of this entry.


The informations about this protein are classed by category:
* Names and origins.
* Protein attributes.
* General anotations (protein function, subunits structure...etc).
* Refferences.

Our interest for now is the Sequences section, where we'll find the protein sequence.
We can see the sequence bellow:



To see the protein sequence without numbers within lines click on 1 shown by the red arrow
To do a blast search for similarities with other proteines choose Blast and click go like shown 2 by the red arrow.


Use SWISSPROT to search for a specific protein
(Video Lesson on Youtube)





Any comment or question, you're welcome.













Friday, November 6, 2009

Bioinformatics:Genomics:Microarrays

ADVERTISEMENTS

Bioinformatics:Genomics:Microarrays:


Microarrays are micro-chips used in molecular biology and medecine to achieve a lot of useful tests including gene expression.

To inderstand this technology, we should put a thing in our minds, wich is:
1- Not all the genome codes for proteines.
2- Not all genes always turned on.

We use the term Gene Expression to describe the transcription of the information containes within the DNA into mRNA, which is after translated to proteines.

Scientists have to study these genes to identify which of theme are expressed and which are not.

Gene expression is a highly complex and tightly regulated process that allows a cell to respond dynamically both to environmental stimuli and to its own changing needs.

This mechanism acts as both an "on/off" switch to control which genes are expressed in a cell as well as a "volume control" that increases or decreases the level of expression of particular genes as necessary.

So thats what DNA microarrays are used for.

To inderstand such a process, there is nothing better then animations...!

Here are some animation that i found very useful to fully understand Microarrays:

Note: The first animation is pretty simple and good for beginners:

Animation1

Animation2

Animation3

Any question, you're welcome.

Thursday, November 5, 2009

Bioinformatics:Genomics:DNA Sequencing

ADVERTISEMENTS

Bioinformatics:Genomics:DNA Sequencing:

Bioinformatics lies in first place on DATA (Genomics & Proteomics...etc), so without data, Bioinformatics have nothing to analyse.

Before we can use analysis softwares we should have DNA or Protein sequences, so the first thing we have to do is sequencing.

The term DNA Sequencing refers to the methods applied to identify the order of DNA nucleotides or bases (Adenine, Guanine, Cytosine, Thymine).

Now with the advancements of technology, DNA Sequencing is indispensable for the most of biological researches because its the only way to provide almost complete and accurate data.

DNA Sequencing methods:
There are many ways or methods of DNA Sequencing but i like to introduce the sanger method explained by the beautiful and easy animation HERE


I picked the Sanger or (dideoxy) method, because its the more commonly used and the easier to apply.

Any questions comment.

Tuesday, November 3, 2009

Bioinformatics: Proteomics

ADVERTISEMENTS

Bioinformatics: Proteomics

The word Proteome has came from the combination of the "protein" & "genome".

Proteomics is the study of proteines especially their structure and function and its the second step after genomics.

We all know that proteines are the molecules of life "as they say" because they are the acting molecules in every living organism, so by the study of Genomics and genes, we don't have every thing because:

1- Not all the genome codes for proteines (non coding regions "introns & exons").
2- The proteines will have post-translational modifications after they were translated (phosphorylation...etc).

The study of proteomics is more complicated than genomics because we're studying a variable thing that differs from cell to cell and from time to time.

By studying proteines we will discover:
1- Active sites that interacts with other molecules
2- Functions of these proteines.
3- Their location (transmembrane, outside or inside the cell)...etc.

Proteomics has solved a lot of problems and mysteries of many scinces like the case of the Alzheimer's disease in medecine, heart desease...etc

The main source of proteine sequences and information is the huge swissprot database, thanks to them the proteine analysis now is a peace of cake, you ca, find the data base here SWISSPROT

Bioinformatics, Genomics

ADVERTISEMENTS

Bioinformatics, Genomics:

Genomics is the study of the entire genome of species (the sum of all genes of an organism) and their interaction with eachother, in contrast the study of a single gene is the role of molecular biology and genetics.
the study of genomes includes the DNA, RNA, Proteines levels.

recently there have been extensive sequencing projects of species genomes like the HGP (Human Genome Project) and a lot of other species (animals, insects, bacteria, viruses...etc).

You can find human genome sequences and many other species in the
UCSC Genome Browser
you'll find it a little complicated first but you'll get familiar with it very fast.

the UCSC Genome Browser contains now more than 45 complete genomes.

With the huge amount of sequences provided by sequencing projects, there is no way one can analyse it without the use of Bioinformatics tools, well thats good for us because if we have more than 3 billion pb that our brains will explode by reaching the 30 base!!!

Any questions or comments, you're welcome.

Monday, November 2, 2009

Introduction to bioinformatics

ADVERTISEMENTS




Bioinformatics as its name means is the use of computer science or informatics materials (hardwares & softwares) to analyse biological data, this data includes (genetic data, molecular biology, microbiology, virology,) and many many topics of biology.

As we all know that genetic material is the one responsible for the design of all living organisms, then when we master the genetic code, we can get rid off all malfunctions provided by it (diseases).

Bioinformatics links to most biology branches (genetics, molecular biology, microbiology, epidemology, phylogeny, zoology,...etc )



The advances provides by technology made bioinformatics easy as eating a cake, because the human brain can't handle all of this huge amount of biological data, otherwise home computers now can do 4 billions operations per one second.

The main role of bioinformatics today is to do what humans brain can't do including:

1- Analyse and compare immense genetic data (code).
2- Finding similarities with other species genetic code.
3- Searching databases (genbank or swissprot) for a query sequence.
4- Establishing phylogenetic relashionships between species.
5- Finding 3D protein structures to understand better active sites....etc.

and the list will go on and on

Nowadays bioinformatics is making huge advances and providing accurate answers to medecine and other sciences like in the case of the TAMIFLU (medicament for the seasonal flu and the swine flu), that was born by bioinformatics on the computers screen.

If you have comments or questions, post.