Swiss-Prot is a protein knowledgebase established in 1986 and maintained collaboratively by the Swiss Institute of Bioformatics (SIB) and the European Bioinformatics Institute (EBI).
The SwissProt Database provides a high level of annotation (a detailed file for each entry) that is mantained by expert biologists in the field and a high level of interaction with other Databases with a low level of redundancy.
The documentation is very easy for every one even with beginners in the field.
The TrEMBL protein sequence database was created in 1996 as a complement to Swiss-Prot in response to the need to make new sequences available as quickly as possible.
TrEMBL (Translation of EMBL nucleotide sequence database) initially consisted of computer annotated entries derived from the translation of all coding sequences (CDS) in the DDBJ/EMBL-Bank/GenBank nucleotide sequence database, except for those already included in Swiss-Prot. It now additionally contains protein sequences that are extracted from the literature or submitted to Swiss-Prot.
Now the SwissProt & TrEMBL Databases are playing a major role in Bioinformatics field (Proteomics to be accurate).
For more informations about Protein Sequence Databases you can read this post HERE
To learn how to use SwissProt to search for a specific Protein (Detailed lesson with a video) you can see HERE
For more informations about ExPASy Proteomics Server you can read this post HERE
There are a number of Protein sequence Databases, but it's very important to distinguish between universal databases covering proteins from all species and specialised data collections storing information about specific families or groups of proteins or about proteins of a specific organism.
Universal Databases: 1- The first database that came to mind for this category is the great Swiss-Prot, which is a protein knowledgebase established in 1986 and maintained collaboratively by the Swiss Institute of Bioformatics (SIB) and the European Bioinformatics Institute (EBI).
2- The second Database is the Protein Information Resource (PIR), PIR is a joint effort between Georgetown University Medical Centre and the National Biomedical Research Foundation in Washington, D.C. It was established in 1984 and resulted from the work of Dr. Margaret Dayhoff.
2- InterPro: Which contains Protein signatures, Domains, Sites...etc This Database combines a number of Databases such (PROSITE, PRINTS, Pfam, SMART, TIGRFAMS, PIR SuperFamily (PIRSF) and ProDom) and others
The main general applications of DNA Microarrays are:
1- Determining the expression patterns of Proteines by looking at mRNAs. 2- For Genotyping, detection of different variations in gene sequences (Single Nucleotide Polymorphisms -SNP- for example).
To achieve this we have to do a parallel hybridization analysis, where hybridization is the way to detect whether a particular sequence is present in a DNA sample or not.
In order to do a parallel hybridization analysis, we use a large number of DNA Oligomers that are fixed to known locations on a rigid support.
One DNA Chip or Array may contain 100.000 probe oligomers.
Applications of DNA microarrays include:
1- Investigating cellular states and processes: Patterns of expression that change with cellular state can give clues to the mechanisms of processes such as sporulation, or the change from aerobic to anaerobic metabolism.
2- Diagnosis of disease: Testing for the presence of mutations can confirm the diagnosis of a suspected genetic disease, including detection of a late-onset condition such as Huntington disease, to determine whether prospective parents are carriers of a gene that could threaten their children.
3- Genetic warning signs: Some diseases are not determined entirely and irrevocably by genotype, but the probability of their development is correlated with genes or their expression patterns. A person aware of an enhanced risk of developing a condition can in some cases improve his or her prospects by adjustments in lifestyle.
4- Drug selection: Detection of genetic factors that govern responses to drugs, that in some patients render treatment ineffective and in others even cause serious adverse reactions.
5- Classification of disease: Different types of leukaemia can be identified by different patterns of gene expression. Knowing the exact type of the disease is important in selecting optimal treatment.
6- Target selection for drug design: Proteins showing enhanced transcription in particular disease states might be candidates for attempts at pharmacological intervention (provided that it can be demonstrated, by other evidence, that enhanced transcription contributes to or is essential to the maintenance of the disease state).
7- Pathogen resistance: Comparisons of genotypes or expression patterns, between bacterial strains susceptible and resistant to an antibiotic, point to the proteins involved in the mechanism of resistance.
Nucleotide Sequence Databases are Databases that contains informations about Nucleotide Sequences including: 1- Accession number. 2- Definition (name). 3- Organism. 4- Authors that submitted this sequence. 5- Chromosome location. 6- Description and a lot more...
There are 3 Main Nucleotide Sequence Databases that are synchronized or updated daily and publicly available.
Genetical, mophological, Biochemical evidences are now showing that all organisms on earth are genetically related, so every scientist is searching for what's called "The Tree Of Life" that represents the Phylogeny of organisms.
What is Phylogeny?
Phylogeny is the history of organismal lineages as they change through time. It implies that different species arise from previous forms via descent, and that all organisms, from the smallest microbe to the largest plants and vertebrates, are connected by the passage of genes along the branches of the phylogenetic tree that links all of Life.
Phylogenetic tree:
The Phylogenetic tree or Evolutionary tree is a tree showing the evolutionary relationship between various species that are thought to have a common Ancestor.
Each node in the tree represents the most recent common ancestor of the descendants, the edge lengths in some trees correspond to estimated time. Each node is called a taxonomic unit. Internal nodes are generally called hypothetical taxonomic units (HTUs) as they cannot be directly observed.
In Bioinformatics, Softwares align sequences of species that are thought to have a common ancestor ( multiple sequence alignment) , and calculate the distance between organisms (by using the number of mutations...etc), in the end it displays a graphical view of the tree with nodes and their corresponding edge lengths.
Introduction to Bioinformatics by Arthur M. Lesk: 475 Pages
This book is a great book for beginners in this field "Bioinformatics", if you read it you'll have a complete image about Bioinformatics field.
Book's content:
The book contains 7 chapters including:
1- Introduction: in the introduction the writer provides an initial (Biological and Computer science's) informations to understand Bioinformatics including (Bioinformatics information, World Wide Web, computer science, biological nomenclature, programming, Proteomics, Genomics....etc).
2- Genome organisation and evolution: Genomes, Proteomes, Differences between Eukarya and Prokaryotes Genomes and Proteomes...etc
3- Scientific publications and archives: media, content and access: DataBases, softwares, programming languages...etc
4- Archives and information retrieval: Different types of Databases including Protein sequences Databases, Nucleic acid sequences Databases, analysis softwares...etc
ProtParam is a very useful software that can computes various physico-chemical properties of proteines, all you have to do is enter the Protein sequence in raw format or write its accession number or ID on (Swiss-Prot/TrEMBL).
What ProtParam can do for you? 1- Number of amino acids. 2- Molecular weight. 3- Theoretical pI. 4- Amino acid composition (%). 5- Atomic composition. 6- Extinction coefficients. 7- Estimated half-life. 8- Instability index. 9- Aliphatic index.....etc.
The swineflu A (H1N1) virus is an RNA virus that codes 8 genes, its Genome is composed of avian flu, human flu Type A, human flu Type B, Asian swine flu, and European swine flu, this combination is supposed to be rare and have only a chance of less than 0.1 to be a natural event.
The two anti-viral drugs Tamiflu and Relenza are availible on the market and can lessen the symptoms of swine flu.
But the Swine Flu virus has made some sort of resistance to Tamiflu and the % of resistance is growing now.
Now all submitted influenza sequences are availible at GenBank and are availible for Blast searching at NCBI here , with a set of tools that you can use to analyse the sequences.
So we hope that the cure will be found before the next mutation of the virus.
And that is the general format, the first line have the ">" at the beginning followed by a difinition of your protein or DNA sequencs. The second line is where begins your protein or DNA sequence.
Notes: * The first line can contain informations like: 1- Database name like sp which means SwissProt. 2- Database accession number like (Q3LGA9) . 3- Protein or DNA sequence name. 4- Organism for example Homo sapiens...........etc * The sequences use one capital letter codes, then the software begins to scan the second line after the first that contains the ">" sign until the end of the sequence (it there is only one sequence inthe file).
The FASTA format is the default sequences format because its easy to parse, thats why most of analysis Softwares uses FASTA format like BLAST, CLUSTALW. Some programs uses the RAW format which is FASTA format without the first line (definition line).
Expasy Proteomics Server is a huge database which contains a variety of databases and a lot of tools and softwares used in molecular biology for analysing proteines.
The Expasy database contains a lot of ressources including:
1- Databases (SWISS-PROT, Prosite, ViralZone...etc ). 2- Tools and softwares to analyse proteines (Similarity searches, Post-translational modifications, Predicting proteines structures).
The databases included in the Expasy Proteomics Server are:
1- SWISS-PROT knowledgebase: a curated protein sequence database that provides high quality annotations (such as the description of the function of a protein, its domain structure, post-translational modifications and variants), a minimal level of redundancy and a high level of integration with other databases. 2- TrEMBL: contains computer-annotated entries for all sequences not yet integrated in SWISS-PROT. SWISS-PROT and TrEMBL are maintained collaboratively by the SIB and the European Bioinformatics Institute (EBI). 3- PROSITE: a database of protein domains and families. PROSITE contains biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. 4- SWISS-MODEL Repository: a database of automatically generated structural protein models
And a lot of other Databases.
The Softwares includes in it are:
They have a huge amount of softwares, we will talk about some of them:
1- Softwares of Protein identification and characterization (Aldente, PepMAPPER...etc). 2- Prediction or characterization tools (ProtParam, PeptideMass...etc). 3- DNA to Protein softwares (Translation...etc). 4- Similarity searches (Blast). 5- Pattern and profile searches (InterPro Scan, PROSITE Scan). 6- Post-translational modification prediction (LipoP, Predotar). 7- Primary, Secondary, Tertiary structure analysis and Prediction. 8- Molecular modeling and visualization tools.
U know if i go on i'll not complete this list forever!!!so i'll stop here.
To access the EXPASY Proteomics Server click HERE Or search in google for the term EXPASY and click the first entry.
Bioinformatics Tutorials & Lessons:Tutorials & Lessons: use SWISSPROT to search for a specific protein:
Let say that you have a specific protein and you want to do some research about it including: 1- Informations about organisms that have this protein. 2- The function of this protein. 3- The protein sequence. 4- Complete references about this protein...............etc And a lot of athor features and informations.
That protein is for example "Myosin", you can choose any protein you're interested in. The first step to do is to enter to the SWISSPROT website at expasy: 1- Enter the site directly from here or go to google and write SWISSPROT, the first website at expasy is the SWISSPROT website, you'll see something like this:
2- Enter your protein name in the search box shown by the red arrow in the picture above. * In this tutorial i'm searching for the protein "myosin" for example. 3- Click the GO button to start the search. 4- You will see the result page like bellow:
The information provided by this result is to huge and not accurate like related proteines, so we need to set a couple of things to get the results we need. 5- To do that you need to click on fields shown bellow:
You will see this:
From 1 shown by the arrow you can choose AND, NOT, OR AND: means that you will add something like organism name for example. NOT: means that you can eliminate searches that contain the word you'll write, like eliminate an organism. OR: will searches for example for myosin OR actin if you want to.
From 2 we can choose our field like protein name, organism, gene name...etc From 3 the term section, we can add the word we need to add it to search. 6- We will set AND and Protein name from the field dropdown menu. 7- In the term section we write Myosin to search only for Protein wich names are Myosin and exclude related proteines. 8- We click Add & Search, and we will see this:
We remark that the number of hits had dropped down and also this hits shows only proteines with protein name containing the word Myosin. Let's say for example that you want only the protein in a specific organism like Homo Sapiens, then we will repeat the steps from 5 to 8 by clicking "field", choosing AND, Organism from the field dropdown menu and write Homo Sapiens in the Term field. 9- we click Add & Search and we'll get this:
As you can see that the numbet of hits has dropped from more than 200 in the first search to 11 here. Because the protein Myosin have several chains, we will choose for example Myosin-Va. 10- By clicking the accession number shown bellow you will be taken to the information of this entry.
The informations about this protein are classed by category: * Names and origins. * Protein attributes. * General anotations (protein function, subunits structure...etc). * Refferences.
Our interest for now is the Sequences section, where we'll find the protein sequence. We can see the sequence bellow:
To see the protein sequence without numbers within lines click on 1 shown by the red arrow To do a blast search for similarities with other proteines choose Blast and click go like shown 2 by the red arrow.
Use SWISSPROT to search for a specific protein (Video Lesson on Youtube)
Microarrays are micro-chips used in molecular biology and medecine to achieve a lot of useful tests including gene expression.
To inderstand this technology, we should put a thing in our minds, wich is: 1- Not all the genome codes for proteines. 2- Not all genes always turned on.
We use the term Gene Expression to describe the transcription of the information containes within the DNA into mRNA, which is after translated to proteines.
Scientists have to study these genes to identify which of theme are expressed and which are not.
Gene expression is a highly complex and tightly regulated process that allows a cell to respond dynamically both to environmental stimuli and to its own changing needs.
This mechanism acts as both an "on/off" switch to control which genes are expressed in a cell as well as a "volume control" that increases or decreases the level of expression of particular genes as necessary.
So thats what DNA microarrays are used for.
To inderstand such a process, there is nothing better then animations...!
Here are some animation that i found very useful to fully understand Microarrays:
Note: The first animation is pretty simple and good for beginners:
Bioinformatics lies in first place on DATA (Genomics & Proteomics...etc), so without data, Bioinformatics have nothing to analyse.
Before we can use analysis softwares we should have DNA or Protein sequences, so the first thing we have to do is sequencing.
The term DNA Sequencing refers to the methods applied to identify the order of DNA nucleotides or bases (Adenine, Guanine, Cytosine, Thymine).
Now with the advancements of technology, DNA Sequencing is indispensable for the most of biological researches because its the only way to provide almost complete and accurate data.
DNA Sequencing methods: There are many ways or methods of DNA Sequencing but i like to introduce the sanger method explained by the beautiful and easy animation HERE
I picked the Sanger or (dideoxy) method, because its the more commonly used and the easier to apply.
The word Proteome has came from the combination of the "protein" & "genome".
Proteomics is the study of proteines especially their structure and function and its the second step after genomics.
We all know that proteines are the molecules of life "as they say" because they are the acting molecules in every living organism, so by the study of Genomics and genes, we don't have every thing because:
1- Not all the genome codes for proteines (non coding regions "introns & exons"). 2- The proteines will have post-translational modifications after they were translated (phosphorylation...etc).
The study of proteomics is more complicated than genomics because we're studying a variable thing that differs from cell to cell and from time to time.
By studying proteines we will discover: 1- Active sites that interacts with other molecules 2- Functions of these proteines. 3- Their location (transmembrane, outside or inside the cell)...etc.
Proteomics has solved a lot of problems and mysteries of many scinces like the case of the Alzheimer's disease in medecine, heart desease...etc
The main source of proteine sequences and information is the huge swissprot database, thanks to them the proteine analysis now is a peace of cake, you ca, find the data base here SWISSPROT
Genomics is the study of the entire genome of species (the sum of all genes of an organism) and their interaction with eachother, in contrast the study of a single gene is the role of molecular biology and genetics. the study of genomes includes the DNA, RNA, Proteines levels.
recently there have been extensive sequencing projects of species genomes like the HGP (Human Genome Project) and a lot of other species (animals, insects, bacteria, viruses...etc).
You can find human genome sequences and many other species in the UCSC Genome Browser you'll find it a little complicated first but you'll get familiar with it very fast.
the UCSC Genome Browser contains now more than 45 complete genomes.
With the huge amount of sequences provided by sequencing projects, there is no way one can analyse it without the use of Bioinformatics tools, well thats good for us because if we have more than 3 billion pb that our brains will explode by reaching the 30 base!!!
Bioinformatics as its name means is the use of computer science or informatics materials (hardwares & softwares) to analyse biological data, this data includes (genetic data, molecular biology, microbiology, virology,) and many many topics of biology.
As we all know that genetic material is the one responsible for the design of all living organisms, then when we master the genetic code, we can get rid off all malfunctions provided by it (diseases).
Bioinformatics links to most biology branches (genetics, molecular biology, microbiology, epidemology, phylogeny, zoology,...etc )
The advances provides by technology made bioinformatics easy as eating a cake, because the human brain can't handle all of this huge amount of biological data, otherwise home computers now can do 4 billions operations per one second.
The main role of bioinformatics today is to do what humans brain can't do including:
1- Analyse and compare immense genetic data (code). 2- Finding similarities with other species genetic code. 3- Searching databases (genbank or swissprot) for a query sequence. 4- Establishing phylogenetic relashionships between species. 5- Finding 3D protein structures to understand better active sites....etc.
and the list will go on and on
Nowadays bioinformatics is making huge advances and providing accurate answers to medecine and other sciences like in the case of the TAMIFLU (medicament for the seasonal flu and the swine flu), that was born by bioinformatics on the computers screen.