Installing a local copy of Ensembl
Introduction
Arabidopsis Ensembl is a version of Ensembl built especially for Arabidopsis. It intends to be close to the ordinary Ensembl in capability and data. In the long term, we aim to have all applicable Ensembl features available in our version. So, just like ordinary Ensembl, you need a way to install it locally!
This document is not a full set of instructions in itself. Instead, what we will do is use the standard Ensembl install procedure document, and say what you need to do differently to get an Arabidopsis Ensembl, rather than one of the Ensembl databases developed at Hinxton.
Unfortunately, Arabidopsis Ensembl is not very sociable at the moment. The current version (at the time of writing) is based on version 23 of the Hinxton Ensembl. It should be possible in principle to get a server to run multiple species (e.g. Human, Rat and Arabidopsis), but we haven't tried this. Unfortunately our Ensembl has many code tweaks, some which would break the Human/Mouse/Rat/etc. databases. We are working to reduce these (apart from anything else, it makes our life easier when we upgrade). We think the best bet if you want to run Arabidopsis and anything else is to run two copies of Apache and Ensembl on different ports. Sorry.
Getting started
Start by downloading the "Ensembl xx.x Website Installation Instructions" from the Ensembl server (follow the link from Documentation from the Ensembl homepage). The rest of this document uses that document as a base. We will refer to this document as "the Ensembl document" throughout. You should start by reviewing "What you need" in the Ensembl document. Thanks to Arabidopsis' small genome size, you will only need around 5Gb disk space maximum, rather than the 96Gb asked for in the document. We run Redhat / Fedora Linux at NASC, and we will scatter some hints for this OS throughout this document.
Don't worry about the rather impressive machine demanded in the opening of the document- if you don't plan to host the website for other users, but just want to play with it yourself, any regular PC should be fine.
Things to install
General Software
If you are running any kind of Linux, you should have access to Perl and MySQL already. We recommend you use the versions that come with your operating system if you can- installing MySQL and Perl is no fun at all. MySQL version 3 or 4 is fine.
You require a version 1 Apache! Install Apache from source exactly as written in the Ensembl document if you have not got it. Modern RedHat and Fedora Linuxes come with Apache 2, and other distributions may do so too. If you have one of these distributions, you might find it helpful to remove all traces of Apache 2 if installing mod_perl does not work.
You need to install Perl modules as in the Ensembl document. General advice:
- It is best if you can use the versions of these modules that came with your distribution - especially DBI and DBD::Mysql, rather than installing from CPAN.
- Some distributions require a newer libgd C library from www.boutell.com, before installing the GD perl module.
perl -CPAN -e shell;; is your friend if you want to install lots of perl modules in a hurry- doperldoc CPANto find out more.- You do not need Dotter for Arabidopsis Ensembl. You do not need CVS to install Arabidopsis Ensembl (apart from BioPerl).
Ensembl build/install
The principles behind installing Arabidopsis Ensembl are the same as those listed in the "Ensembl Build/Install" section of the Ensembl document. The actual technique however is somewhat different.
Databases
All data is available from ftp://ftp.arabidopsis.info/pub/ensembl/databases/ . These are in the same format as the Ensembl packages. You will need all of the files labelled with the version you are intending to use- e.g. all the files with 23_1 in them. Choose the latest version, unless you have a good reason not to.
There is no mart view for Arabidopsis Ensembl yet. There is no multi-species data for Arabidopsis Ensembl yet.
Software
Unlike Ensembl, we supply our software as a big tarball. Get the software from ftp://ftp.arabidopsis.info/pub/ensembl/code/ . Once downloaded, you can untar it wherever you like. The Ensembl document refers to your server root. Your server root will be where you untarred our tarball. We use /usr/local/ensembl, as in the Ensembl document.
Configuration
Configuration is exactly as in the Ensembl document. The only difference is that you can't add or remove species- you are stuck with the ones we have given you.
Running
Starting and stopping is just as described in the Ensembl document. We supply a start and stop script for you to copy.