March 25, 2008

Migrating from GeneWorks to MacVector

As you know the utility of GeneWorks has been eroded by lack of updates and its incompatibility with Apple OS10. In response we have begun to migrate to the use of an alternative, far less wieldy software, called MacVector.

We have purchased a single license to run the software and installed it on two machines, one in my office and the other in the large screen computer. Be sure to QUIT GeneWorks when it is not in use.

MacVector can open GeneWorks files and display the nucleotide sequence. However, on any but the simplests GeneWorks files, the annotations are hopelessly corrupt. Therefore, to preserve our database, the GeneWorks files are being converted to .EMBL format. The .EMBL format files can be read with a simple text editor, and MacVector can read most of the annotations (but not all). later I will describe some of the issues surrounding MacVectorÕs errant ways, but first we should discuss the backing-up process.

Creating EMBL files of GeneWorks files:

In GeneWorks ÒopenÓ all the nucleotide files in a given folder. Select them all and perform a ÒSave AsÓ command specifying EMBL format. Direct all the saved EMBL format files to a new folder (called EMBL) that will be inside the folder from where the GeneWorks files came from. Using the finder, label that EMBL folder and the parent folder as ÒIn ProgressÓ, this will be an indication that the folder has been backed-up. which means, in effect that every GeneWorks file in the folder now has a corresponding EMBL file. Move up the hierarchy of the folder, at the end you will have something that looks like this:

The yellow color (which is how OS10 reads the in progress label generated in OS9) means that the folder in question has been backed up

 

Reading files in MacVector:

Open the EMBL file from within MacVector (donÕt double-click the EMBL files, as that will open them in a text editor). Check the features table to see if they are read properly or not.

If features are corrupt, donÕt despair! The most common corruption is an inappropriate splitting of the feature, as exemplified here by the T7 primer (the T3 primer is OK):

Identify all the features that are corrupt; this is best done by printing out the feature table. Then close the EMBL file in MacVector.

From the finder, duplicate the EMBL file (to be sure you donÕt lose the annotations, if you make a mistake)

Open the original EMBL file using a text editor like TextEdit. Scroll down the file to find the source of MacVectors problem. This is usually an annotation that looks like this:

FT   primer          2174..2194                                               

FT                   /note="T3"                                               

FT   primer          complement (2339..2320)                                                

FT                   /note="T7"

The T3 primer is OK, but MacVector was confused by the fact that the features coordinates for the T7 primer are presented in reverse. Fix the problem by inverting the order of the numbers:

FT   primer          complement (2320.. 2339)                                               

FT                   /note="T7"

Save the file as a simple text file (this is critical as MacVector reads EMBL files from simple text and chokes on the richer WORD documents).

Open the revised EMBL file in MacVector and check that the annotations are correct. Now you can add, delete, rename etc. Once you are done, give the file a different name and save it as an EMBL file. Do NOT save files as MacVector files, as these are readable only by MacVector!.