download

Get the packages here

required perl modules

  • BSML
  • Bio::Seq
  • CDB_File
  • Class::Struct
  • Config::IniFiles
  • DB_File
  • Data::Dumper
  • Date::Manip
  • ExtUtils::MakeMaker (>=6.31)
  • File::Basename
  • File::Copy
  • File::Find
  • File::Mirror
  • File::Path
  • File::Spec
  • Getopt::Long
  • HTML::Template
  • IO::File
  • IO::Tee
  • IPC::Open3
  • LWP::Simple
  • Log::Cabin
  • Log::Log4perl
  • Mail::Mailer
  • Math::Combinatorics
  • PerlIO::gzip
  • Storable
  • URI::Escape
  • XML::Parser
  • XML::RSS
  • XML::Twig
  • XML::Writer

documentation - installation guide

This document is designed to help you get Ergatis up and running. For those of you who like to jump right in and do it yourself you can skip down the the quick install guide. For a more detailed explanation of the requirements, architecture, best practices, etc. read on.

requirements

Getting the requirements up and running will take far more time than actually setting up Ergatis itself. This is a software package designed to allow users to easily build and manage pipelines on a compute grid, after all, so actually having a compute grid available (though not technically required) will be one of the first challenges.

Workflow Engine
Workflow Engine processes the XML representing the work to be done that is generated by Ergatis.
Sun Grid Engine
If you plan to use Ergatis to run jobs on a compute grid a Workflow Engine-supported grid management software installation must be in place. The most popular choice is the freely-available Sun Grid Engine.
Web server
Because Ergatis is a web-based interface, so you'll need a web server, such as Apache.
Perl 5.8.0 or higher and selected modules
Your installed Perl must be at least version 5.8.0. Also, the modules listed in the column on the right are required. (The Ergatis installer will check for these and complain if you don't have any.) Note that the first module (BSML) isn't currently available on CPAN and is an independant SourceForge project.

The Workflow Engine contains a test suite that allows you to ensure that you can successfully submit and track jobs on your grid. With these, a web server and perl modules in place you are ready to install Ergatis.

downloading Ergatis packages

Ergatis is freely available under the open-source Artistic License and can be loaded from from the file release page on SourceForge. You'll notice several different packages available on that page and each are described below. Depending on your anticipated usage you may only need one of them.

ergatis
All installations will require this package. It contains the web interface and all the software and components for building analysis pipelines. The other components are optional.
coati
Optional. This is our three-tiered architecture built on top of DBI that enables transparent database connectivity for Perl programs across different schema and vendor types. You only need to install it if you plan on using the Ergatis components utilize relational databases.
ontologies and chado
Optional. Much of Ergatis in general and everything involving database loading specifically makes extensive use of controlled vocabularies including the Gene Ontology and Sequence Ontology. This package provides the collection of ontologies used throughout the software.
prism (shared)
Optional. Prism inherits from and extends Coati's features with the emphasis on interactivity with the chado schema. The Prism API provides methods for efficient data loading and retrieval to and from a chado database. This package contains the shared components of prism that are required regardless of schema employeed.
prism (chado schema)
Optional. Extends prism to include utilities to load/retrieve data from a chado schema.
prism (TIGR euk schema)
Optional. Extends prism to include utilities to load/retrieve data from a TIGR-specific eukaryotic 'legacy' schema.
prism (TIGR prok schema)
Optional. Extends prism to include utilities to load/retrieve data from a TIGR-specific prokaryotic 'legacy' schema.

quick install

If you just want the basic descriptions of the steps involved this is the section for you. If you want a detailed guide skip to the next section, 'detailed install'

  1. download ergatis
  2. extract and run the Makefile.PL (specifying some INSTALL_BASE)
  3. make
  4. make install
  5. rename the htdocs directory to 'ergatis' (or whatever you like) and place it in your webserver's htdocs/html directory
  6. modify your apache configuration to allow CGI script execution within the ergatis directory
  7. read and customize your new /ergatis/cgi/ergatis.ini config file
  8. read and edit your INSTALL_BASE/software.config file
  9. run the check_installation.pl script packaged with the installer (reads your ergatis.ini)
  10. point your browser to http://yourserver/ergatis
  11. use the 'admin' tab to set up your first project

The sections below provide more information and background.

detailed install

planning your layout

A bit of planning can save you a lot of grief when you layout your directory structure. Most paths in Ergatis are configurable and most institutions usually find a directory convention that works for them. Here I'll describe the core directories you'll need to create with examples at the end of each description. As you read through the ergatis.ini file you'll find descriptions for all needed files, directories and parameters.

ergatis directory
This is a directory where you'll install the perl scripts and component files that make up Ergatis, excluding the interface code (which lives instead under your web server.) When you compile Ergatis using the Makefile.PL this is the path you'll pass via the INSTALL_BASE parameter to direct its installation location. Some people create more than one of these for different software versions (discussed later.) Example: /opt/ergatis
project area
These are completely user-configurable and many institutions will have more than one. If I'm annotating two different organisms, for example, and want to keep their pipelines and data separate from each other I'll create one project area for each organism. The interface easily supports the display of dozens of these project areas via customization of the ergatis.ini file. Examples: /usr/local/projects/e_coli, /usr/local/projects/a_aegypti
global ID repository
Ergatis uses a file-based system for generating IDs, such as new pipeline or feature IDs. Each project area will have it's own ID repository (created automatically) but the interface does need the path to an area it can use for global IDs - those that can be used across projects. Pipeline IDs are an example of global IDs because they are unique across all projects. This directory is specified in the ergatis.ini file. Example: /opt/global_id_repository
temp directory
This is a directory with open permissions where Ergatis can create temporary files and folders. It is used both by the interface and running pipelines components. Example: /tmp/ergatis

performing the install

Using these example directories, we'll now go through an example install. If you've downloaded the current version from the SourceForge site you should have a tarball named like ergatis-v2rNbN.tar.gz where the Ns are some version numbers. From here on in the guide you'll need to replace this with the name of the file you downloaded. You should be able to extract it like this:

tar -xvzf ergatis-v2rNbN.tar.gz

This will create a directory called ergatis-v2rNbN. Change into that directory:

cd ergatis-v2rNbN

Here you should see several files and directories, among them the Makefile.PL. Using the directory examples above, we'll perform the install with these three commands:

/usr/local/bin/perl Makefile.PL INSTALL_BASE=/opt/ergatis make make install

Next you'll need to copy the interface files into your web server's directory structure. This involves one simple copy command. In this example I'm using apache and my html directory is under /var/www/ . From the same directory as we've executed the previous commands we'll now do:

cp htdocs /var/www/html/ergatis

This takes the htdocs directories within the package and moves them into place, renaming both 'ergatis' at the same time. You can name this directory anything or, optionally, place the files directory under your server root. Because the CGI scripts are also stored under the htdocs, you'll need to modify your apache configuration to allow execution of these scripts and to prevent some files within the system from being displayed.

<Directory "/var/www/html/ergatis"> Options +ExecCGI AllowOverride Limit DirectoryIndex index.html index.cgi AddHandler cgi-script .cgi <FilesMatch "\.ini"> Deny from all </FilesMatch> </Directory>

interface configuration

With the interface code now in place you'll want to edit the ergatis.ini file located at:

/var/www/html/ergatis/cgi/ergatis.ini

It is pretty heavily documented so you'll have information about every variable you need to set. Once finished you can point your browser to

http://yourserver/ergatis

software path configuration

Many components in Ergatis execute external binaries such as blastall, glimmer3, etc. that can be installed anywhere on your filesystem. You'll need to tell Ergatis where each of these are by editing the software.config file under your ergatis directory:

/opt/ergatis/software.config

If you don't have access to some of these binaries or don't want them available you can edit the disabled component list in your ergatis.ini file. This will prevent that component from being available when users build a pipeline.

check_installation.pl

Before we proceed you should run the installation verification script, passing the ergatis.ini file you just configured. It performs a host of checks to ensure that the software is set up and configured correctly.

./check_installation.pl --ergatis_ini=/var/www/html/ergatis/cgi/ergatis.ini

setting up a project area

Now that you have the software and interface set up you're ready to create a project area. This will be where your pipelines and data are stored for any particular project. In our example we decided to creating a project area for the annotation of the E. coli, so we start by creating that directory:

mkdir /usr/local/projects/e_coli

You can then use the interface to create all the subdirectories and files necessary for a project area. Start by clicking the 'admin' tab followed by the 'create project' link. Next enter the project directory path in the text box. If there are any problems populating the area with the necessary files the interface will give a warning and allow you to try again until it's successful. Once the area is made you'll be prompted to edit the project's config file. Here you'll be able to specify which temp area and which ergatis directory you want to use (some sites have several.) Most of the options on this form should be filled out automatically for you based on the values in the ergatis.ini file. The only one you'll need to be sure to change is the first - the project abbreviation.

Once finished this will create the project are and all needed files and subdirectories. Currently, it will not add your new project to the global project list visible on the left side of the home page. Those are controlled by the entries at the bottom of the ergatis.ini file. You can add an entry for it there, else to go to your new project, enter the path (like /usr/local/projects/ecoli) into the text box at the bottom of the page labeled 'project area'.

getting help

The primary support mechanism is the mailing lists hosted by sourceforge. The link is found below and this should be the first place you write if you have problems. There are several developers watching that list to assist when necessary. If you find a bug or want to request a feature, please use the tracker link.