Tuesday, February 15, 2011

The Magic of Microsoft Excel Macros

All through my Ph.D. one powerful tool that helped me a lot in organizing the information is Microsoft Office Excel. It has so many functionalities that I always felt evoked working on new issues using excel. It is not only useful for data analysis and chart preparation, but also has applications in Bioinformatics as mentioned in my previous posts. In this post I want to share with you a macro that I created in excel which on a single click gives the following details about a sequence.
1. Length of the sequence
2. RNA Sequence
3. Number of residues of Adenine, Guanine, Thymine and Cytosine
4. GC percentage of the sequence and
5. Six reading frame of the sequence.
All you need to do is just enable macros in your excel sheet import this macro or create a new one with this code. Then enter the DNA sequence in the excel sheet in which the macro has been created and then go to the cell below that sequence and press Ctrl+Shift+W.

How to enable macros in Excel:
Office 2007: Go to All Options (the office symbol on the extreme top left), then got to Excel Options. In the Popular window that opens check the third check box Show Developer tab in the Ribbon and click OK. This will make the Developer tab to show up in the main ribbon. Now go to the Developer tab and click Macro security. The Macro security is set to Disable all macros without notification as default, change it to select Enable all macros. This will enable all macros to work normally.

How to import a macro or create a new macro:
Importing a macro: To import a macro go to Developer tab and then to Visual Basic, then go to File and then to Import file, choose the directory in which the macro is present (.bas file). Then close the Visual Basic window.
Creating a new macro: To create a new macro go to Visual Basic, then to Insert then to Module, copy paste all the code into the resulting window and save it, then close the Visual Basic.
Either of the above two steps is essential to make this work.

Now paste any DNA sequence in a cell of this excel workbook. Go to the cell below that and click Ctrl+Shift+W and get all the above mentioned details for the sequence.

Click here to go to text file that has the code for the macro.

So try out this and let me know if you have any suggestions, criticisms and ideas, bye :)

Tuesday, January 4, 2011

A superior score for ranking and mapping scientific journals - Eigenfactor

Ranking and mapping scientific journals From eigenfactor.org

Saturday, September 4, 2010

How to bring data from a word doc to an excel file

This is one way which will be of good help when there is enormous amount of data in word that you want to bring into excel.

Because I have tried to explain the process in detail, so that even a novice will be able to do it, the process might appear a very long one, but actually they are just a few simple steps.
There are a total of three file formats we are going to work with in order to import data from a word document to a excel sheet. The formats and the sequence in which we are going to do it is:
.doc --- .txt --- .xls

When excel imports some thing from another file like .txt or .csv, it sorts data in each line into a column. It looks for any default delimiters like comma (,) and splits the data into columns taking comma as delimiter. So, delimiters can be used to split data in one line into continuous columns. Any letter, numbers, symbols or combination of letters or formats can be used as a delimiter.

For the data given below:

Kumar,Rakesh,,Ranjith,Krishna,Surya

If we use comma <,> as delimiter, then the entire text will be imported into excel file as shown below


As can be seen, the third cell is blank. This is so because we had two commas after Rakesh. There was nothing between two delimiters, so a blank cell is created. This format of file that has values separated with commas is called Comma Separated Values (CSV). This is one of the most commonly used file format to transfer our contacts from e-mail services like Gmail or from our mobile.

So the arrangement of data is what is important for proper splitting into cells when imported to excel. When we use comma as a delimiter we should make sure that we have them only at places where we need the splits. If there are any commas somewhere else they should be either removed or replaced with some symbol (lets say #) which is not there else where in the document. This is to make sure that the splitting is done correctly and once we import the data into excel and save it as an excel file, we can replace the new symbol (#) with the comma.

And why do we need to convert a .doc file into a .txt file? This is because excel permits import from a text file but not a doc file. You can save a word document as a text file by going to File --- Save As, then select Plain Text and save.

Once the data is saved as a text file, open MS Excel and go to File --- Open, select Text files in the Files of type drop down list, and then browse and select the text document from which you want to import the data into excel.

After this a Text Import Wizard will open, it will have Delimited as the default option, let it be so and click Next. Then we get the option to choose a delimiter – select the used delimiter, if it’s not there in the given options, then select other and give the delimiter in the box provided. In case you don’t want any blank cells in between (as in the above example) then select Treat consecutive delimiters as one. In the Text qualifier drop down list, select none. The splitting can be seen in the Data preview area. Then click Next and then Finish (or directly click Finish) to complete the importing. Now the data will appear as required. Save the file in xls format by going to File --- Save As, select Microsoft Office Excel Workbook in the Save as type drop down list, browse to select the location to save, give it a name and save the file.

So, with a few simple steps we got our final excel file from the word document.

Example

Let us take the following three line data from word document

Rajesh Kumar 35 Visakhapatnam
Kumar Ganeshan 42 Chennai
Muthu Swamy 28 Bangalore

Suppose we want this data to be sorted in an excel sheet like this:


If we use space ( ) as a delimiter, then when we finally import this is what we will get:



So, we can’t use space as delimiter, what we can do is modify the data in doc file to the following format:

Rajesh Kumar$35$Visakhapatnam
Kumar Ganeshan$42$Chennai
Muthu Swamy$28$Bangalore

So, in this the dollar symbol ($) is separating the values that we want to be split in to columns, so if we use the $ symbol as delimiter then we get the required output:


So everything is about how you save data in the word document, while planning to collect data then itself, save it such that you can have delimiters in between, so that importing into excel will be easy, or else we need to include delimiters manually.

Excel is a very potential tool to even handle the enormous amounts of sequence data that we handle in our work. Some potential uses that I have explored so far are:

Convert a given sequence into amino acid sequences using a standard genetic code
Get all the six reading frames from a given sequence
Calculate the codon frequency of a given sequence
Construct a dot plot from two given sequences
Conversion of DNA sequences into RNA

and many more . . .

I will subsequently add the tutorials on all of these things . . . :-)

If any help is needed in this regard, you can comment below. If need help in any other aspect of excel, mail me at sathirkreddy@gmail.com with your requirements, I will try to attend and provide a solution.

Friday, August 13, 2010

CSIR-UGC NET Exam Pattern Changed

Hi all,

The CSIR-UGC NET exam pattern has been changed. Please follow the link below to see the changes that have been made:

CSIR-UGC exam pattern changed from June, 2011

Byeee.... :-)

Thursday, July 29, 2010

Biology and evolution internet links

Hi all,

Here are two links where you can read about biology and evolution, click on these to go to that page:

Botany, Evolution

Enjoy the biology experience, byeeee :-)