This is one way which will be of good help when there is enormous amount of data in word that you want to bring into excel.
Because I have tried to explain the process in detail, so that even a novice will be able to do it, the process might appear a very long one, but actually they are just a few simple steps.
There are a total of three file formats we are going to work with in order to import data from a word document to a excel sheet. The formats and the sequence in which we are going to do it is:
.doc --- .txt --- .xls
When excel imports some thing from another file like
.txt or
.csv, it sorts data in each line into a column. It looks for any default delimiters like comma (,) and splits the data into columns taking comma as delimiter. So, delimiters can be used to split data in one line into continuous columns. Any letter, numbers, symbols or combination of letters or formats can be used as a delimiter.
For the data given below:
Kumar,Rakesh,,Ranjith,Krishna,Surya
If we use comma <,> as delimiter, then the entire text will be imported into excel file as shown below
As can be seen, the third cell is blank. This is so because we had two commas after Rakesh. There was nothing between two delimiters, so a blank cell is created. This format of file that has values separated with commas is called
Comma Separated Values (CSV). This is one of the most commonly used file format to transfer our contacts from e-mail services like Gmail or from our mobile.
So the arrangement of data is what is important for proper splitting into cells when imported to excel. When we use comma as a delimiter we should make sure that we have them only at places where we need the splits. If there are any commas somewhere else they should be either removed or replaced with some symbol (lets say #) which is not there else where in the document. This is to make sure that the splitting is done correctly and once we import the data into excel and save it as an excel file, we can replace the new symbol (#) with the comma.
And why do we need to convert a .doc file into a .txt file? This is because excel permits import from a text file but not a doc file. You can save a word document as a text file by going to
File --- Save As, then select
Plain Text and save.
Once the data is saved as a text file, open MS Excel and go to
File --- Open, select
Text files in the
Files of type drop down list, and then browse and select the text document from which you want to import the data into excel.
After this a
Text Import Wizard will open, it will have
Delimited as the default option, let it be so and click
Next. Then we get the option to choose a delimiter – select the used delimiter, if it’s not there in the given options, then select other and give the delimiter in the box provided. In case you don’t want any blank cells in between (as in the above example) then select
Treat consecutive delimiters as one. In the
Text qualifier drop down list, select none. The splitting can be seen in the
Data preview area. Then click
Next and then
Finish (or directly click Finish) to complete the importing. Now the data will appear as required. Save the file in xls format by going to
File --- Save As, select
Microsoft Office Excel Workbook in the
Save as type drop down list, browse to select the location to save, give it a name and save the file.
So, with a few simple steps we got our final excel file from the word document.
Example
Let us take the following three line data from word document
Rajesh Kumar 35 Visakhapatnam
Kumar Ganeshan 42 Chennai
Muthu Swamy 28 Bangalore
Suppose we want this data to be sorted in an excel sheet like this:
If we use space ( ) as a delimiter, then when we finally import this is what we will get:
So, we can’t use space as delimiter, what we can do is modify the data in doc file to the following format:
Rajesh Kumar$35$Visakhapatnam
Kumar Ganeshan$42$Chennai
Muthu Swamy$28$Bangalore
So, in this the dollar symbol ($) is separating the values that we want to be split in to columns, so if we use the $ symbol as delimiter then we get the required output:
So everything is about how you save data in the word document, while planning to collect data then itself, save it such that you can have delimiters in between, so that importing into excel will be easy, or else we need to include delimiters manually.
Excel is a very potential tool to even handle the enormous amounts of sequence data that we handle in our work. Some potential uses that I have explored so far are:
Convert a given sequence into amino acid sequences using a standard genetic code
Get all the six reading frames from a given sequence
Calculate the codon frequency of a given sequence
Construct a dot plot from two given sequences
Conversion of DNA sequences into RNA
and many more . . .
I will subsequently add the tutorials on all of these things . . . :-)
If any help is needed in this regard, you can comment below. If need help in any other aspect of excel, mail me at sathirkreddy@gmail.com with your requirements, I will try to attend and provide a solution.