Showing posts with label Bioinformatics Softwares. Show all posts
Showing posts with label Bioinformatics Softwares. Show all posts

Sunday, March 7, 2010

Bioinformatics: Nucleotide sequence databases names for use with BLAST

ADVERTISEMENTS

Bioinformatics: Nucleotide sequence databases names for use with BLAST

In most cases people like to use BLAST that is hosted on servers like NCBI, but sometimes you would like to use a command line BLAST already installed on your computer on a Windows or Linux operating system.

In order to do that you have to know the databases names that you can use in the command line BLAST.

Here are some of the nucleotide databases names that you can use with BLAST:

1- nr : Nonredundant GenBank, a database that provides comprehensive collections of both amino acid and nucleotide sequence data, with redundancy reduced by merging sequences that are completely identical.

2- est :  expressed sequence tags.

3- sts : sequence tagged sites.

4- htgs : high-throughput genomic sequences.

5- ecoli : Complete genomic sequence of E. coli.

6- yeast : Complete genomic sequence of S. cerevisiae.

7- drosoph : Complete genomic sequence of D. melanogaster.

8- mito : Complete genomic sequences of vertebrate mitochondria.

9- vector : Collection of popular cloning vectors.

These are some of the most used nucleotide databases names in a BLAST search.

This is an example of a BLAST command:

blastall -i blast.in -d nr -o blast.out

blastall: program name.

-i : input.

blast.in & blast.out : input and output file containing the sequence.

-d : database.

nr : Nonredundant.

You can read (using BLAST to search for similarities) post to learn how to run a BLAST search against a database.

You can read (Different Blast Programs) to learn about different BLAST programs.

Wednesday, February 24, 2010

Bioinfrmatics:video tutorial:using Genomescan to parse genomes (find exons)

ADVERTISEMENTS

Bioinfrmatics:Video Tutorial:using Genomescan to parse genomes (find exons)

In this video tutorial we are going to see how to use Genomescan to parse large DNA sequences and find coding regions or Exons.

As you know higher organisms genes like vertebrates are more complex then others, because they contain coding regions called Exons and between these Exons we find non coding regions called Introns.

To predict these genes which contain several Exons, you have to use a very sophisticated algorithms, that can locate Exons and Introns and by that locating genes.

You can read this post (Open Reading Frame (ORF)) to understand what are ORFs.

You  can read this post (Using ORF Finder to locate open reading frames) for a basic software that can find ORFs.

You can read this post (Sophisticated ORF prediction with GenMark) for a more sophisticated ORF prediction software.

Sunday, February 21, 2010

Bioinformatics: What Is PSI-BLAST?

ADVERTISEMENTS

Bioinformatics: What Is PSI-BLAST?

PSI-BLAST (Position-Specific Iterative BLAST) is a software designed for proteins, and it's a BLAST search that uses a PSSM (position-specific scoring matrix).

What is PSSM?

PSSM (position-specific scoring matrix) is a matrix used for biological data, and its main role in PSI-BLAST search is to increase the sensitivity of results.

PSI-BLAST search uses PSSM as a query instead of individual sequence, it's like a matrix constructed from a multiple sequence alignment and then each position of the alignment will have its own position specific score.

How PSI-BLAST works?

It begins with a normal BLAST search (the more match, the more score), but in this case a regular BLAST search will probably miss more distant and may be interesting homologies, so next PSI-BLAST will construct a PSSM (position-specific scoring matrix) and repeat the search until no new matches are found, this will result in finding new distant sequences that you are may be interested in.

You can read this post (Different Blast Programs) to understand all types of BLAST programs including PSI-BLAST and what each one do.

You can access PSI-BLAST from EBI website HERE.

Friday, February 19, 2010

Bioinformatics: Perl and BioPerl

ADVERTISEMENTS

Bioinformatics: Perl and BioPerl

As you all know, Bioinformaticians are 2 types:

1- That use ready softwares to analyse biological data.

2- That design new softwares for them or for other Bioinformaticians.

As we discussed on earlier post about The best programming language for bioinformatics HERE, we said that Perl (Practical Extraction an Report Language) is the most powerful because:

1- It is installed or included in almost every Linux distribution.

2- The scripts written by Perl doesn't require compilation (They are portable from one system to another).

3- It supports regular expressions (a very powerful controle and manipulation of strings).

4- And what makes it very unique programming language comparing to others, its support to Hashes or Table Hashes (association of values with keys).

5- It contains an unlimited number of ready modules on internet that anybody can use.

6- It is available also for Windows.

You can read this post about the best book to begin programming with Perl for bioinformatics called Beginning Perl for Bioinformatics.

What is BioPerl?

BioPerl is a project developed by Open Bioinformatics Foundation and is a collection of modules that you can use to easily contruct Perl scripts to automate tasks for bioinformatics.

With BioPerl you don't have to do anything from scratch, so you use ready modules that suites your needs (what do you want more than that???).

In my opinion i see that Perl is the best programming language for bioinformatics, if you have a different point of view, you can suggest it in comments.

Tuesday, February 16, 2010

Bioinformatics: Sophisticated ORF prediction with GenMark

ADVERTISEMENTS

Bioinformatics: Sophisticated ORF prediction with GenMark

Orf prediction programs are a key to locate ORFs (Open Reading Frames), and if we locate ORFs we have an approximative idea of the location of your gene that is coding for a protein.

To read about ORFs or Open Reading Frames click HERE.

In the how to work with ORF finder program to predict ORFs video tutorial, i've showed you how to use ORF Finder program developed by NCBI to locate ORFs, but i've said that this software is very basic, so we can use it only with simple genomes (Viral, Bacterial...etc), bacause these kind of programs can identify only about 80 percent of Protein Coding regions that you may be interested in.

You can see ORF Finder video tutorial HERE.

In this video tutorial i'm going to show you a more sophisticated approach that can predict ORF of (Bacteria, Viruses, Eucaryotes...etc), this software is a familly of different programs that use a very sophisticated method.

Sunday, February 14, 2010

Bioinformatics: Linux Vs Windows (What's Better For Bioinformatics)?

ADVERTISEMENTS

Bioinformatics: Linux Vs Windows (What's Better For Bioinformatics)?

People have 2 big choices when it comes to use operating systems especially Bioinformaticians, Linux and Windows, but there is a huge difference between these 2 operating systems.

Windows:

Windows is known for its simplicity (Anyone with a basic knowledge can work with windows), so it's user friendly, great interface, great media support, but it is less adapted to Bioinformaticians needs and:

- Its not free.
- Its source is not open to buplic.
- Most of its softwares are not free.
- You can't automate instructions...etc


I'm not saying that Windows isn't good for you, because i work with it most time, but if you are a Bioinformatician and you want to program new softwares or automate some instructions, than Linux is definitely for you, if you want to use ready softwares to analyse your data you can use Windows.

UNIX (Linux):

Linux is a very powerful operating system especially for programmers because it gives you full controle over your machine:

- It has a lot of programming tools (languages and interfaces).
- Other free softwares as (Webservers, Database management system, visualisation softwares, text editing...etc).
- Statistic analysis (like R).
- Unix is more stable and runs fast.
- Vast ducumentation for softwares (How to use stuff!!!).

So if you are a bioinformatician that is more likely attracted to biology (you use bioinformatics softwares only for analysing your biological data) then you can use Windows, but if you are a cyber geek!!! that wants to develop new softwares for bioinformaticians then you can use Linux, i personally prefer Linux but in the end it's up to you to decide.


If you want to use BioLinux 5.0 you can read a post abou it HERE.

If you want to know how to have BioLinux 5.0 working on you computer you can read this post HERE.

If you have any question, put it in comment.

Wednesday, February 10, 2010

Bioinformatics: Using ORF Finder to locate open reading frames

ADVERTISEMENTS

Bioinformatics: Using ORF Finder to locate open reading frames

In this video tutorial, i'm going to show you haw to use the ORF Finder software to find or locate open reading frames (possible protein coding genes).

ORF Finder is a software located at the NCBI Website and it is designed to locate open reading frames in a given DNA sequence in all the six reading frames.

To know more about Open Reading Frames,you can read this post HERE.

Note: This software (ORF Finder) is a basic software, so you can use it in the case of non complex genes (Microbial genomes).

There is a more sophisticated softwares that can handle the complexity of higher organisms genomes like GenMark.

Friday, January 22, 2010

The best Bioinformatics programming language

ADVERTISEMENTS

The best Bioinformatics programming language

As you now, bioinformatics is the use of computer hardware and software to analyze or interpret biological data, most of bioinformaticiens use ready programmed softwares, and most of these softwares can give you what you exactly want.

But lets say that you want to extract some specific data from database files for example, what will you do than.

Bioinformatics softwares are made or programmed by specialists in the programming field using programming languages (c, c++, perl, phython, java...etc), i'm not saying that you have to learn them all, but PERL (Practical Extraction and Report Language), is the most powerful and ideal in Bioinformatics.

Why exactly PERL:

You may say that we have a lot of programming languages choices, why PERL, well we have already seen bioinformatics programs written in other languages such as (c, java, phython, FORTRAN...etc), but PERL is the best in the field because it can highly detects data patterns especially what we call STRINGs of text, so PERL is the best programming language for bioinformatics.

We mean by STRINGs characters of DNA/RNA or protein sequences (ATGATCCAGT for example).

I found this OREILLY book 'Beginning PERL For Bioinformatics' very helpful, and i advise that you read it to understand better how to design your own programs that are suited to your needs instead of using others programs.

Any question, comment.

Bioinformatics: Different Blast Programs

ADVERTISEMENTS

Bioinformatics: Different Blast Programs

BLAST or (Basic Local Alignment Search Tool) is a set of programs that search for similar sequences to your query sequence, so you can find hundreds of similar sequences to yours in about 20 seconds.

Blast have a set of programs, each with a specific role:

BLASTN: Nucleotide query sequence against nucleotide sequence database.

BLASTP: Amino acid query sequence against a protein sequence database. you can find it HERE.

BLASTX: Nucleotide query sequence translated in all six reading frames against a protein sequence database.

TBLASTX: Six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

TBLASTN: Protein query sequence against a nucleotide sequence database translated in all six reading frames. you can find it HERE.

Or you can find them all at ch.EMBnet.org

You can find also other programs such as:

1- PSI-BLAST: Position Specific Iterative BLAST detects weak homologs by building a profile from a multiple alignment of the highest scoring hits in an initial BLAST search.
Available at NCBI .

2- PHI-BLAST: Pattern-Hit Initiated BLAST combines matching of regular expressions with local alignments surrounding the match.
Available at NCBI .

To learn how to use Blast to search for similarities, you can see this Video Tutorial HERE.

Any questions, you are welcome.

Sunday, January 17, 2010

Bioinformatics: How to install BioLinux 5.0

ADVERTISEMENTS

Bioinformatics: Tutorials and Lessons: How to install BioLinux 5.0

As i described before in this BioLinux 5.0 post, Biolinux 5.0 is a linux (ubuntu) environment that have +500 Bioinformatics softwares installed in it, and its free for all Bioinformatics students or researchers.

There are 4 ways to have BioLinux 5.0 working on your computer:

1- Install it directly in your computer with an empty hard drive.

2- Install it in dual boot with another operating system (Windows for example).

3- Install it on a virtual machine like VMware Workstation.

4- Download the virtual appliance directly, and play it with a virtual machine player.

- The first method doesn't work for all people because most of them have Windows already installed on their computers.

- The second method is good but people are afraid to damage their first operating system.

- The third method is great, but it takes some time to install it and configure it on the virtual machine.

Most of the newbies in the bioinformatics field think that installing Linux is a little bit complicated then installing Windows, so in this post i recommend you to use the 4th method, which is the easiest and the simplest even for someone who never installed a linux operating system.

The first thing to do is downloading the BioLinux 5.0 appliance from HERE.

The second thing is to extract the archive into your hard drive.

The third thing is to download the free VMWare Player from HERE.

The last thing is to open the file that have the extension " *.vmx " with VMWare Player.

Here is an overview of the operating system BioLinux 5.0 appliance :


Appliance Type:

Community

Description:

Guest OS config:
Distro: Bio-Linux 5.0 (Ubuntu 8.04.1 - Hardy Heron)
Kernel: 2.6.24-23-generic
Desktop WM: GNOME 2.22.3
Filesystem: ext3
Releasedate: January 12 2009

Virtual Machine config:
Virtual Disk: 40GB
Used Space: 6 GB
Networking: NAT
VMwaretools: 7.8.4-126130 installed
Resolution: Dynamic (default=1152x864)

Following tested and works:

- USB Mouse, USB Pendrive, USB Printer
- Sound (vmware ensonic driver)
- Video/Video (Firefox on CNN.com and Youtube.com)
- Internet (network: eth/dhcp)
- Cut n' Paste Drag n' Drop between Host/Guest installed and works perfect.

root ID: sudo
Password: bagside

Download: http (US server)
Compression: 7z

Features & Benefits

Standard install.

Pricing

Free

If you have any questions please comment.


Saturday, January 16, 2010

Bioinformatics: Tutorials & Lessons: Predict Protein Secondary Structure using SABLE Program

ADVERTISEMENTS

Bioinformatics: Tutorials & Lessons: Predict Protein Secondary Structure using SABLE Program

Protein structure is playing a major role in Bioinformatics especially structural Bioinformatics, so predicting protein structure can give us a lot of indispensable informations.

Proteins folds in 3 ways, that's why they have:

1- Primary structure: You can read this post about it HERE.

2- Secondary structure.

3- Tertiary structure or 3D: You can read this post about it HERE.

In this video tutorial i'm going to show you the best program to predict protein secondary structure, which is SABLE program.

Sunday, January 10, 2010

Bioinformatics: Tutorials & Lessons: Using ClustalW to do a multiple sequence alignment

ADVERTISEMENTS

Bioinformatics: Tutorials & Lessons: Using ClustalW to do a multiple sequence alignment

In this video tutorial i'll be showing how to use ClustalW program to do a multiple sequence alignment.

You can read about multiple sequence alignment and ClustalW program in this post HERE.

If you want more informations about main multiple sequence alignment applications, you can read this post HERE.

Monday, December 28, 2009

Bioinformatics: Main Applications Of Multiple sequence Alignment

ADVERTISEMENTS

Bioinformatics: Main Applications Of Multiple sequence Alignment

You can read an introductory post to Multiple Sequence Alignment HERE, to understand what is a Multiple Sequence Alignment.

Multiple Sequence Alignment is almost the most useful tool in Bioinlformatics, it helps almost in every application of Bioinformatics (predicting protein structure, predicting protein function, phylogenetic analysis...etc).

The main applications of Multiple Sequence Alignment are:

1- Structure Prediction: a Multiple Sequence Alignment can give you the almost perfect protein or RNA secondary structure, some times it helps even with the 3D structure.

2- Protein Family: a Multiple Sequence Alignment can help you to decide that your protein is a member of a known protein family or not.

3- Pattern Identification: By looking at conserved regions or sites, you can identify which region is responsible for a functional site.

4- Domain Identification: By looking at file provided by a Multiple Sequence Alignment, you can extract profiles to use them against databases.

5- DNA Regulatory Elements: You can use Multiple Sequence Alignments to locate DNA regulatory elements such as binding sites...etc.

6- Phylogenetic Analysis: By carefully picking related sequences you can reconstruct a tree using sequences that u have used in the Multiple Sequence Alignment (You can use the PHYLIP package and you can find a post about it here).

As Multiple Sequence Alignments are playing a major role in Bioinformatics, you can use it almost anywhere but as every thing on this earth, nothing is perfect or 100% accurate, so u have to choose your sequences very carefully to prevent meaningless results.

You can access the EBI ClustalW program from HERE, to do a Multiple Sequence Alignment.

Any comments you're welcome.

Friday, December 25, 2009

Bioinformatics:Multiple sequence alignment: ClustalW

ADVERTISEMENTS

Bioinformatics:Multiple sequence alignment: ClustalW

What is multiple sequence alignment:

Multiple sequence alignment is an alignment of more than one (Protein or Nucleic Acid "DNA & RNA") sequence.

What's ClustalW:

ClustalW is a large and complex program for multiple sequence alignments.

Why use ClustalW:

As we said before ClustalW is for multiple sequence alignments which are very important in bioinformatics field and especially studying sequences, by doing a multiple sequence alignment for protein sequences for example we can extract these very useful informations:

1- Conserved sequence regions.

2- Knowing which are active sites and which are not.

3- Predicting protein function.

4- Helping in predicting protein structure.

5- Identify protein family or new members.

6- Calculating trees to know proteins relationship (Distances)...etc.

You can find ClustalW at EBI and you can access it from HERE.

Monday, December 21, 2009

Bioinformatics Tutorials & Lessons: Using TMHMM method to locate Transmembrane helices in Protein sequences

ADVERTISEMENTS

Bioinformatics Tutorials & Lessons: Using TMHMM method to locate Transmembrane helices in Protein sequences

TMHMM is an abreviation of (Transmembrane Hidden Markov Model) which is a statistical model, you can read about this model in this Wikipedia article HERE.

TMHMM is a method for Predicting Transmembrane Helices in a Protein sequence, you can access the TMHMM server from HERE.

This Video is about how to use the TMHMM server to predict Transmembrane helices in a Protein sequence.

Wednesday, December 16, 2009

Bioinformatics Tutorials & Lessons: using BLAST to search for similarities

ADVERTISEMENTS

Bioinformatics Tutorials & Lessons: using BLAST to search for similarities

BLAST (Basic Local Alignment Search Tool) is an algorithm or program that can identify similar (Nucleic Acid or Amino Acid) sequences to a query sequence.

Lets say that you have sequenced recently a gene from the mouse genome and you have nothing about this gene except its sequence, here comes the role of BLAST, it searches databases for similar sequences to yours, by this you will find informations about similar sequences to yours like (Gene or protein Family, Organism, related sequences, function...etc), this will help you to identify your sequence.

You can read this wikipedia article to know more about BLAST from HERE


You can read the BLAST help page from HERE or the Documentation from HERE.


This is a video tutorial that demonstrates how to use BLAST to search for similar protein sequences to my sequence.
(I used BLAST of SwissProt database)



P9BTHZ2GQVAE

Friday, December 11, 2009

Bioinformatics:Bio Linux 5.0

ADVERTISEMENTS

Bioinformatics:Bio Linux 5.0

Bio Linux 5.0 is a project released in January 2009 for students and researchers in the field of Bioinformatics, it's a linux envirement (ubuntu) + more than 500 Bioinformatics programs with full documentation to each program.

This means that we can say that Bio Linux 5.0 is an easy to use Bioinformatics Workstation, powerful and easy to configure.

Bio Linux 5.0 is developed and maintained by the NERC Environmental Bioinformatics Centre, it contains a complete analysis and development environment easy to use by Bioinformaticiens.

Bio Linux 5.0 can run in a live DVD, that means that you can run it without installing it (without affecting your system), it can also run in a memory stick, You can install it in dual boot with Windows or in a virtual machine if you want to run it with Windows in the same time.

Above all of this Bio Linux 5.0 is FREE and you can download Bio Linux 5.0 from HERE

To access the NERC Environmental Bioinformatics Centre Homepage click HERE

If you already have an ubuntu system installed on your machine you can download Bio Linux 5.0 Packages from HERE and install them on your ubuntu, but i don't recommend that, because it takes more time and effort with less packages (Bio Perl, Bio Python...etc) not included in package repository, so the easy way is to download the full Bio Linux 5.0 and install it directly.

Any questions, please comment.