Bio::Affymetrix Perl modules
Introduction
Got an Affymetrix machine? Or do you know folks who do? Then chances are, you will be familiar with the .CHP, .CEL and .CDF files associated with them. Unfortunately sometimes these are binary files, and so getting data out of them is not easy. Wouldn't it be great if you could use a handy set of Perl modules to parse the information out of them? Strangely enough ...
What does it do?
With these modules you can...
- Parse CHP files from MAS 5 and GCOS 1.2 software, and obtain expression values and summary statistics. The modules handle the two file formats transparently, so you can write application that parse either without trouble
- Parse CDF files from MAS 5 and obtain all information about design of Affymetrix chips
If you have a lot of CHP files lying around that you need to get data from, these are the modules for you.
Design philosophy
General usage is as follows- first you make an object, then you call
one of the parse_ methods to fill it with data. The
objects are entirely parsed into memory. This makes manipulating the
data very easy, at the expense of using lots of memory. It is possible
to write a module that parses through the data in one step. Hopefully
these modules will give some clues if you want to write such a system.
Where can I get them?
These modules are available from CPAN . Also included is some perldoc documentation explaining how to use the modules, and some example programs.
Missing Features
Features that we want to include, but have not so far:
- Parsing GCOS v1.2 CDF files
- Adding the ability to write files as well as read them. We have a prototype CDF file writer available
- Any handling of CEL files
Features that are arguably missing, but we do not plan to implement:
- Non-expression arrays (SNP chips, etc.)
What other ways are there of doing the same thing?
There are various options available. You can pay for one of the Affymetrix developer kits. This provides Microsoft COM access to Affymetrix files, and to the GCOS database. Affymetrix also has a free (LGPL) parser for some files written in C++. Bioconductor can read various Affymetrix file formats. The Bioperl Microarray modules can read some Affymetrix file formats however it cannot read the latest formats.