Companion

This repository has been archived, currently maintained version is at https://github.com/iii-companion/companion

Generate Convert Improve

Install / Use

/learn @sanger-pathogens/Companion

About this skill

Quality Score

0/100

README

Companion

A portable, scalable eukaryotic genome annotation pipeline implemented in Nextflow.

This software is a comprehensive computational pipeline for the annotation of eukaryotic genomes (like protozoan parasites). It performs the following tasks:

Fast generation of pseudomolecules from scaffolds by ordering and orientating against a reference
Accurate transfer of highly conserved gene models from the reference
De novo gene finding as a complement to the gene transfer
Non-coding RNA detection (tRNA, rRNA, sn(o)RNA, ...)
Pseudogene detection
Functional annotation (GO, products, ...)
- ...by transferring reference annotations to the target genome
- ...by inferring GO terms and products from Pfam pHMM matches
Consistent gene ID assignment
Preparation of validated GFF3, GAF and EMBL output files for jump-starting manual curation and quick turnaround time to submission

It supports parallelized execution on a single machine as well as on large cluster platforms (LSF, SGE, ...).

Quick start
Further technical information
License
Feedback/Issues
Citation

Quick start

This should get you up & running on an Ubuntu system, but please read the full documentation before before doing any work "for real".

1. Install dependencies

Execute these commands as root, e.g. using sudo

apt-get install default-jre
curl -fsSL get.nextflow.io | bash && \
   mv nextflow /usr/local/bin
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - && \
   add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" && \
   apt-get update && \
   apt-cache policy docker-ce && \
   apt-get install --yes docker-ce && \
   systemctl enable docker

To enable you to use docker with your normal user account (i.e. without being root or needing to use sudo), run the following command, with your username in place of <username>.

usermod -aG docker <username>

Log out and log back in again for this to take effect.

Checks

java -version should say you have Java 1.8 or greater
nextflow info will print system information if nextflow has been installed successfully
systemctl status docker will tell you if docker is active (running)
docker info will print information if docker has been installed successfully, and you have permission to use it

2. Install Companion

Execute these commands in the directory you want to keep your Companion work in. Do this as a normal user, i.e. not as root or using sudo. Use a name that is meaningful to you in place of <my-companion-project>

curl -L -o companion-master.zip https://github.com/sanger-pathogens/companion/archive/master.zip && \
   unzip companion-master.zip && \
   mv companion <my-companion-project>
docker pull sangerpathogens/companion

3. Run Companion test job

Companion is distributed with configuration and data (including a few pregenerated reference annotations) for a small test run. Run the following command (using the name you chose for your project directory in place of my-companion-project).

nextflow run my-companion-project -profile docker

This will create a directory my-companion-project/example-output with the results of the run.

4. Configure Companion for your annotation run

The file params_default.config configures the pipeline, and will need to be edited for your annotation run. You will probably need to change at least the following parameters:

inseq Your input FASTA file (${baseDir}/example-data/L_donovani.1.fasta in the example parameter file included wirth the distribution)

ref_dir The directory containing your reference genomes (${baseDir}/example-data/references in the example file)

ref_species The "short name" for your reference species (LmjF.1 in the example file)

dist_dir The directory that will contain the newly created output files (${baseDir}/example-data-output in the example file)

GENOME_PREFIX Text pattern matching your genome prefix (LDON in the example file)

CHR_PATTERN Pattern matching your chromosome names (LDON_(%w+) in the example file, where %w+ matches one or more letters or numbers)

ABACAS_BIN_CHR Abacas bin chromosome (LDON_0 in the example file)

EMBL_AUTHORS etc.; please provide suitable EMBL metadata (dummy values in the example file)

TAXON_ID Please provide suitable value for the GAF output (4711 in the example file)

5. Prepare reference annotations

The reference annotations used in the pipeline need to be pre-processed before they can be used. To add a reference organism, you will need:

a descriptive name of the organism
a short abbreviation for the organism
the genome sequence in a single FASTA file
a structural gene annotation in GFF3 format (see below for details)
functional GO annotation in GAF 1.0 format, on the gene level
a pattern matching chromosome headers, describing how to extract chromosome numbers from them
an AUGUSTUS model, trained on reference genes

Insert these file names, etc., where <placeholders> appear in the steps below:

Create a new data directory (i.e. the equivalent of the example-data directory included in the distribution)
Edit nextflow.config (and any config files that are referenced) and change parameters such as inseq and ref_dir to your new data directory.
Copy the new reference genome (FASTA) into <new_data_dir>/genomes
Copy GFF3 and GAF files into <new_data_dir>/genomes
Copy Augustus model files into data/augustus/species/<species_name>/
Create new directory <new_data_dir>/references/<short_name>/
Add new section to <new_data_dir>/references/references-in.json, using the short name (same as the directory name in the previous step); in this section add the names/paths of the files copied (above), a descriptive name, and a pattern for matching chromosomes in the FASTA files (in this example, <short_name>_<n>, where n in any integer).

"<short_name>" : {   "gff"                : "../genomes/<gff3_filename>.gff3",
                     "genome"             : "../genomes/<ref_genome_name>.fasta",
                     "gaf"                : "../genomes/<ref_annot_filename>.gaf",
                     "name"               : "<Descriptive Name of Reference Genome>",
                     "augustus_model"     : "../../data/augustus/species/<species_name>/",
                     "chromosome_pattern" : "<short_name>_(%d+)"
                  }

Finally, change directory to <new_data_dir>/references (you must execute the following command in this directory) and run ../../bin/update_references.lua. This writes the file <new_data_dir>/references/references.json.

6. Run it!

The following command (using the name you chose for your project directory in place of my-companion-project) will start your annotation run:

nextflow run my-companion-project -profile docker

Further technical information

Dependencies

Companion has the following dependencies:

Java 8 or later
Nextflow
Docker (if using the Docker image to satisfy dependencies)

Java

To check if you have Java installed, and the version, use the command java -version. Note that this will give you a version number of 1.8 for Java 8, 1.9 for Java 9, etc.

To install Java 8 on an Ubuntu or Debian system, run:

apt-get install openjdk-8-jre

On Fedora, Centos or Red Hat (etc.) systems:

yum install java-1.8.0-openjdk

Nextflow

To install Nextflow, run:

curl -fsSL get.nextflow.io | bash

This will create an executable called 'nextflow', which should be moved to a suitable directory, for example:

mv nextflow /usr/local/bin/

Use the command which nextflow to check that it is found in your path.

Docker

Docker is required if you intended to use the Docker image, as recommended below, to satisfy the dependencies.

To install Docker, see the installation guide for Ubuntu, Centos, Debian or Fedora.

Users running Companion with Docker will need to be added to the docker group (unix users can belong to one or more groups, which determine whether they can peform certain acti

Related Skills

node-connect

349.7k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.7k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.7k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

sanger-pathogens

View profile

View on GitHub

GitHub Stars21

CategoryDevelopment

Updated3y ago

Forks18

sanger-pathogens/companion

Languages

Lua

Security Score

80/100

Audited on Jan 28, 2023

No findings

Companion

Install / Use

README

Companion

Contents

Quick start

1. Install dependencies

Checks

2. Install Companion

3. Run Companion test job

4. Configure Companion for your annotation run

5. Prepare reference annotations

6. Run it!

Further technical information

Dependencies

Java

Nextflow

Docker

Related Skills