Sunday, January 31, 2010

5 Things any Bioinformatician should know

ADVERTISEMENTS

5 Things any Bioinformatician should know

1- How to work with a computer: And i mean by that, how to work with at least one operating system like Windows for example, most of bioinformatics students and researchers like Linux because its open source and all of its softwares are free, but i tell you that Windows is not bad at all for Bioinformatics, because most softwares designed for linux are availible for Windows too.

2- How to use internet browsers: This is indispensable because the internet is what made Bioinformatics move so fast, so if you want to be a bioinformatician, you have to know how to work with internet browsers like (internet explorer, netscape, chrome, firefox), i personally prefer Firefox, i find it very easy and powerful.

3- How to install a new software: you should have this easy knowledge, because installing a Windows based software is a peace of cake comparing to Linux based one.

4- A little knowledge of Molecular Biology: You can't be a Bioinformatician without having a litlle knowledge in Biology especially Molecular Biology and genetics, it will be like you want to play guitar and you don't know what is a guitar...!

5- How to surf the internet: This is very important as most of bioinformatics operations are made online, so you have to know how to open a website, surf it, download from it...etc

The most important knowledge that you should have about the How to surf the internet, is how to use Search Engines, because they will provide you with anything you will need.

These are the basic skills that any Bioinformatics student should have.

For more suggestions about this, please comment.

Friday, January 29, 2010

List of the most popular and useful Databases in Bioinformatics

ADVERTISEMENTS

List of the most popular and useful Databases in Bioinformatics

As Biological data is growing every day, maintaining this huge amount of data has became hard, so i'll give you what i call the best organized and maintained bioinformatics databases.

Genbank on NCBI : this database is the most powerful in bioinformatics because its designed for every thing : proteins genes genomes, structures, ………etc.
To visit NCBI click HERE.

Swissprot: if your query is a protein sequence i advise you to use SwissProt that is located on the expasy proteomics server, in addition you'll find dozens of useful programs that you can use to analyze your sequence.
To visit swissprot or the expasy proteomics server click HERE.

Integrated Microbial Genomes: this database is for complete genomes, i like it because its very organized and anyone can get used to it in a few minutes
To visit the Integrated Microbial Genomes click HERE.

TIGR: The Institute for Genomic Research founded by Craig Venter is a project for complete bacterial genomes, if you are a microbiologist, then this database is exactly for you, in addition to the database, bioinformaticiens working in the TIGR project had developped a set of very useful tools to analyses the database genomes such as : GLIMMER, MUMer...etc.
To visit TIGR project click HERE.

Enssembl: for me its the best database for complete genomes because it containes a lot of graphic tools for interpreting and analyzing data, that means that you don't get boared while exploring it,all is visual!!!.
To visit Enssembl click HERE.

There are more databases and project on the internet, but i found these databases very helpful in my reasearch.

If you have more useful databases or projects you can post it in the comment section.

Wednesday, January 27, 2010

Bioinformatics: Transcriptomics

ADVERTISEMENTS

Bioinformatics: Transcriptomics

In human DNA, less than 5% of the genome is transcribed, the rest of the genome is playing the role of watching and controlling and regulating the 5%, that's why the cellular processes are very precise.

So now after the extencive sequencing projects of different genomes, the new challenge is to try to identify expression patterns of genes we have sequenced, thats when Transcriptomics will become very useful.



So what is Transcriptomics?


Transcriptomics is the study of the complete set of RNA transcripts produced by the genome (Transcriptome) at a given time.

Transcriptomics also called gene expression profiling or genome-wide expression profiling sometimes provide solutions to understand genes and pathways involved in biological processes, so simply it examines the expression level of mRNAs.



So what can transcriptomics do for us?

As mentioned before Transcriptomics will give us answers as which gene is activated, and when its activated, by what its activated...etc

In Transcriptomics identifying similarities in expression pattern give us clues that the genes are functionally related and they have the same genetic control mechanism.


The most common technology used to study expression levels is DNA Microarray.

To understand what Microarrays are used for or Microarrays main applications, please read THIS POST.

Any questions, be free to comment.

Friday, January 22, 2010

The best Bioinformatics programming language

ADVERTISEMENTS

The best Bioinformatics programming language

As you now, bioinformatics is the use of computer hardware and software to analyze or interpret biological data, most of bioinformaticiens use ready programmed softwares, and most of these softwares can give you what you exactly want.

But lets say that you want to extract some specific data from database files for example, what will you do than.

Bioinformatics softwares are made or programmed by specialists in the programming field using programming languages (c, c++, perl, phython, java...etc), i'm not saying that you have to learn them all, but PERL (Practical Extraction and Report Language), is the most powerful and ideal in Bioinformatics.

Why exactly PERL:

You may say that we have a lot of programming languages choices, why PERL, well we have already seen bioinformatics programs written in other languages such as (c, java, phython, FORTRAN...etc), but PERL is the best in the field because it can highly detects data patterns especially what we call STRINGs of text, so PERL is the best programming language for bioinformatics.

We mean by STRINGs characters of DNA/RNA or protein sequences (ATGATCCAGT for example).

I found this OREILLY book 'Beginning PERL For Bioinformatics' very helpful, and i advise that you read it to understand better how to design your own programs that are suited to your needs instead of using others programs.

Any question, comment.

Books: Beginning Perl for Bioinformatics

ADVERTISEMENTS

Beginning Perl for Bioinformatics







By: James Tisdall

Publisher: O'Reilly Media, Inc.

I found this book very helpful to understand the basics of using PERL to design programs that you need, to extract or manipulate data.

If you read this book you'll be able to use your own designed programs to parse database files and extract only what you need and even analyze DNA/RNA or protein data.


Table of Contents

Copyright

Preface

What Is Bioinformatics?

About This Book

Who This Book Is For

Why Should I Learn to Program?

Structure of This Book

Conventions Used in This Book

Comments and Questions

Acknowledgments

1. Biology and Computer Science

Section 1.1. The Organization of DNA

Section 1.2. The Organization of Proteins

Section 1.3. In Silico

Section 1.4. Limits to Computation

Chapter 2. Getting Started with Perl

Section 2.1. A Low and Long Learning Curve

Section 2.2. Perl's Benefits

Section 2.3. Installing Perl on Your Computer

Section 2.4. How to Run Perl Programs

Section 2.5. Text Editors

Section 2.6. Finding Help

Chapter 3. The Art of Programming

Section 3.1. Individual Approaches to Programming

Section 3.2. Edit—Run—Revise (and Save)

Section 3.3. An Environment of Programs

Section 3.4. Programming Strategies

Section 3.5. The Programming Process

Chapter 4. Sequences and Strings

Section 4.1. Representing Sequence Data

Section 4.2. A Program to Store a DNA Sequence

Section 4.3. Concatenating DNA Fragments

Section 4.4. Transcription: DNA to RNA

Section 4.5. Using the Perl Documentation

Section 4.6. Calculating the Reverse Complement in Perl

Section 4.7. Proteins, Files, and Arrays

Section 4.8. Reading Proteins in Files

Section 4.9. Arrays

Section 4.10. Scalar and List Context

Section 4.11. Exercises

Chapter 5. Motifs and Loops

Section 5.1. Flow Control

Section 5.2. Code Layout

Section 5.3. Finding Motifs

Section 5.4. Counting Nucleotides

Section 5.5. Exploding Strings into Arrays

Section 5.6. Operating on Strings

Section 5.7. Writing to Files

Section 5.8. Exercises

Chapter 6. Subroutines and Bugs

Section 6.1. Subroutines

Section 6.2. Scoping and Subroutines

Section 6.3. Command-Line Arguments and Arrays

Section 6.4. Passing Data to Subroutines

Section 6.5. Modules and Libraries of Subroutines

Section 6.6. Fixing Bugs in Your Code

Section 6.7. Exercises

Chapter 7. Mutations and Randomization

Section 7.1. Random Number Generators

Section 7.2. A Program Using Randomization

Section 7.3. A Program to Simulate DNA Mutation

Section 7.4. Generating Random DNA

Section 7.5. Analyzing DNA

Section 7.6. Exercises

Chapter 8. The Genetic Code

Section 8.1. Hashes

Section 8.2. Data Structures and Algorithms for Biology

Section 8.3. The Genetic Code

Section 8.4. Translating DNA into Proteins

Section 8.5. Reading DNA from Files in FASTA Format

Section 8.6. Reading Frames

Section 8.7. Exercises

Chapter 9. Restriction Maps and Regular Expressions

Section 9.1. Regular Expressions

Section 9.2. Restriction Maps and Restriction Enzymes

Section 9.3. Perl Operations

Section 9.4. Exercises

Chapter 10. GenBank

Section 10.1. GenBank Files

Section 10.2. GenBank Libraries

Section 10.3. Separating Sequence and Annotation

Section 10.4. Parsing Annotations

Section 10.5. Indexing GenBank with DBM

Section 10.6. Exercises

Chapter 11. Protein Data Bank

Section 11.1. Overview of PDB

Section 11.2. Files and Folders

Section 11.3. PDB Files

Section 11.4. Parsing PDB Files

Section 11.5. Controlling Other Programs

Section 11.6. Exercises

Chapter 12. BLAST

Section 12.1. Obtaining BLAST

Section 12.2. String Matching and Homology

Section 12.3. BLAST Output Files

Section 12.4. Parsing BLAST Output

Section 12.5. Presenting Data

Section 12.6. Bioperl

Section 12.7. Exercises

Chapter 13. Further Topics

Section 13.1. The Art of Program Design

Section 13.2. Web Programming

Section 13.3. Algorithms and Sequence Alignment

Section 13.4. Object-Oriented Programming

Section 13.5. Perl Modules

Section 13.6. Complex Data Structures

Section 13.7. Relational Databases

Section 13.8. Microarrays and XML

Section 13.9. Graphics Programming

Section 13.10. Modeling Networks

Section 13.11. DNA Computers

Appendix A. Resources

Section A.1. Perl

Section A.2. Computer Science

Section A.3. Linux

Section A.4. Bioinformatics

Section A.5. Molecular Biology

Appendix B. Perl Summary

Section B.1. Command Interpretation

Section B.2. Comments

Section B.3. Scalar Values and Scalar Variables

Section B.4. Assignment

Section B.5. Statements and Blocks

Section B.6. Arrays

Section B.7. Hashes

Section B.8. Operators

Section B.9. Operator Precedence

Section B.10. Basic Operators

Section B.11. Conditionals and Logical Operators

Section B.12. Binding Operators

Section B.13. Loops

Section B.14. Input/Output

Section B.15. Regular Expressions

Section B.16. Scalar and List Context

Section B.17. Subroutines and Modules

Section B.18. Built-in Functions

Index


Bioinformatics: Different Blast Programs

ADVERTISEMENTS

Bioinformatics: Different Blast Programs

BLAST or (Basic Local Alignment Search Tool) is a set of programs that search for similar sequences to your query sequence, so you can find hundreds of similar sequences to yours in about 20 seconds.

Blast have a set of programs, each with a specific role:

BLASTN: Nucleotide query sequence against nucleotide sequence database.

BLASTP: Amino acid query sequence against a protein sequence database. you can find it HERE.

BLASTX: Nucleotide query sequence translated in all six reading frames against a protein sequence database.

TBLASTX: Six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

TBLASTN: Protein query sequence against a nucleotide sequence database translated in all six reading frames. you can find it HERE.

Or you can find them all at ch.EMBnet.org

You can find also other programs such as:

1- PSI-BLAST: Position Specific Iterative BLAST detects weak homologs by building a profile from a multiple alignment of the highest scoring hits in an initial BLAST search.
Available at NCBI .

2- PHI-BLAST: Pattern-Hit Initiated BLAST combines matching of regular expressions with local alignments surrounding the match.
Available at NCBI .

To learn how to use Blast to search for similarities, you can see this Video Tutorial HERE.

Any questions, you are welcome.

Sunday, January 17, 2010

Bioinformatics: How to install BioLinux 5.0

ADVERTISEMENTS

Bioinformatics: Tutorials and Lessons: How to install BioLinux 5.0

As i described before in this BioLinux 5.0 post, Biolinux 5.0 is a linux (ubuntu) environment that have +500 Bioinformatics softwares installed in it, and its free for all Bioinformatics students or researchers.

There are 4 ways to have BioLinux 5.0 working on your computer:

1- Install it directly in your computer with an empty hard drive.

2- Install it in dual boot with another operating system (Windows for example).

3- Install it on a virtual machine like VMware Workstation.

4- Download the virtual appliance directly, and play it with a virtual machine player.

- The first method doesn't work for all people because most of them have Windows already installed on their computers.

- The second method is good but people are afraid to damage their first operating system.

- The third method is great, but it takes some time to install it and configure it on the virtual machine.

Most of the newbies in the bioinformatics field think that installing Linux is a little bit complicated then installing Windows, so in this post i recommend you to use the 4th method, which is the easiest and the simplest even for someone who never installed a linux operating system.

The first thing to do is downloading the BioLinux 5.0 appliance from HERE.

The second thing is to extract the archive into your hard drive.

The third thing is to download the free VMWare Player from HERE.

The last thing is to open the file that have the extension " *.vmx " with VMWare Player.

Here is an overview of the operating system BioLinux 5.0 appliance :


Appliance Type:

Community

Description:

Guest OS config:
Distro: Bio-Linux 5.0 (Ubuntu 8.04.1 - Hardy Heron)
Kernel: 2.6.24-23-generic
Desktop WM: GNOME 2.22.3
Filesystem: ext3
Releasedate: January 12 2009

Virtual Machine config:
Virtual Disk: 40GB
Used Space: 6 GB
Networking: NAT
VMwaretools: 7.8.4-126130 installed
Resolution: Dynamic (default=1152x864)

Following tested and works:

- USB Mouse, USB Pendrive, USB Printer
- Sound (vmware ensonic driver)
- Video/Video (Firefox on CNN.com and Youtube.com)
- Internet (network: eth/dhcp)
- Cut n' Paste Drag n' Drop between Host/Guest installed and works perfect.

root ID: sudo
Password: bagside

Download: http (US server)
Compression: 7z

Features & Benefits

Standard install.

Pricing

Free

If you have any questions please comment.


Saturday, January 16, 2010

Bioinformatics: Tutorials & Lessons: Predict Protein Secondary Structure using SABLE Program

ADVERTISEMENTS

Bioinformatics: Tutorials & Lessons: Predict Protein Secondary Structure using SABLE Program

Protein structure is playing a major role in Bioinformatics especially structural Bioinformatics, so predicting protein structure can give us a lot of indispensable informations.

Proteins folds in 3 ways, that's why they have:

1- Primary structure: You can read this post about it HERE.

2- Secondary structure.

3- Tertiary structure or 3D: You can read this post about it HERE.

In this video tutorial i'm going to show you the best program to predict protein secondary structure, which is SABLE program.

Wednesday, January 13, 2010

Bioinformatics: Proteomics: Protein Primary structure

ADVERTISEMENTS

Bioinformatics: Proteomics: Protein Primary structure

As you know in structural bioinformatics, analysing protein structure begins by analysing its primary structure then secondary structure, then tertiary structure.

Primary structure doesn't give us informations about protein interaction with each other as secondary and tertiary structure do, but it gives you informations about segments in your protein that display a special composition, so with these informations we can retrieve protein properties like:

1- Hydrophobic regions: generally found anchored into the membrane.

2- Hydrophylic regions: we find them outside, so they form the protein surface.

3- coiled-coil regions: that indicate the protein-protein interaction potential.

Any comments you're welcome.

Sunday, January 10, 2010

Bioinformatics: Tutorials & Lessons: Using ClustalW to do a multiple sequence alignment

ADVERTISEMENTS

Bioinformatics: Tutorials & Lessons: Using ClustalW to do a multiple sequence alignment

In this video tutorial i'll be showing how to use ClustalW program to do a multiple sequence alignment.

You can read about multiple sequence alignment and ClustalW program in this post HERE.

If you want more informations about main multiple sequence alignment applications, you can read this post HERE.

Thursday, January 7, 2010

Bioinformatics: Genomics: Different Types of RNAs

ADVERTISEMENTS

Bioinformatics: Genomics: Different Types of RNAs

RNAs are macromolecules which plays a major and necessary role in biology, they play a role of intermediary between DNA and Proteines .

RNAs can fold to secondary and even tertiary structures.

The main purpose to study RNAs in bioinformatics is to try to predict their structures, to know better about their interactions and their stability.

You can read about RNA structures in this post HERE.

RNAs have 2 main types:

1- Coding RNAs: Corresponding to mRNA (Messenger RNA) that plays a role of a transmitter, which transmits information from RNA and deliver it to Protein.

2- Non coding RNAs: Like rRNA (Ribosomal RNA), tRNA (Transfer RNA), snRNA...etc



mRNA : messenger RNA.
rRNA : ribosomal RNA.
tRNA : transfer RNA.
snRNA : (small nuclear) .
snoRNA : (small nucleolar ) .
scRNA : small cytoplasmic RNA.
tmRNA : transfer-messenger RNA.
siARN : small interfering RNA.

Any comments you're welcome.

Tuesday, January 5, 2010

Bioinformatics: Genomics: RNA secondary structure

ADVERTISEMENTS

Bioinformatics: Genomics: RNA secondary structure

As proteins can have a complex structures, RNAs too, because a major advance in biology in the 1970s had shown that RNAs can have a complex 2D and even 3D structures.

The good thing to hear is that RNAs obey folding patterns or laws that are much simpler then the complex protein folding laws.

In order for an RNA molecule to work, it has to be protected from solvents, to do that, RNA bases pair themselves with other bases, this pairing forms RNA secondary structure.

When the two RNA stretches (we're talking about one RNA molecule) are perfectly compatible, or complementary to each other, they form what's called STEM.

Note: STEMs don't have to be 100% compatible, so we can find also unpaired residues.

When the stretches aren't compatible they form what's called a LOOP.

Tertiary interactions may also occur in an RNA molecule, but its very difficult to predict there tertiary interactions.

Any comments you're welcome.

Saturday, January 2, 2010

Bioinformatics: Proteomics: Protein 3D structure

ADVERTISEMENTS

Bioinformatics: Proteomics: Protein 3D structure

As we all know the succession of amino acids in a protein sequence is what defines the protein structure, so the 3D structure of a protein sequence is a result of its amino acids succession, because for example, the Hydrophobic amino acids have no desire to interact with water, so they won't be on the surface, on the other hand the Hydrophylic amino acids or residues will appear on the surface to interact with water for example.

The protein 3D structure is not defined only by the previous properties but also the electric charge of amino acids, their interaction with their neighbors...etc

The man rule in the Structural Bioinformatics field is "similar sequences = similar shapes or 3D structures & similar shapes or 3D structures = similar sequences".

So the relationship will be like this:

Sequence ---> Structure ---> Function

The sequence identifies the structure which identifies the function.

The field that studies all of this is called Structural Bioinformatics.

We can identify the protein 3D structure by using 2 distinct methods:

1- The experimental: In the lab by doing an X-ray crystallography for example.
2- The theoretical: By predicting the structure from the sequence by using specialized bioinformatics tools.

Predictin protein 2D structure is now easy, but 3D structures still make an obstacle to Bioinformaticiens because of its complexity.

To read about protein databases you can read this article HERE.
To learn more about 3D structural databases you can read about PDB database HERE.

Any comments you're welcome.