Gaptools
dbGaP data validation tool repo
Install / Use
/learn @ncbi/GaptoolsREADME
GaPTools
dbGaP data validation tool. GaPTools is distributed as a docker image on Docker Hub. See GaPTools.md for more information about the tool.
Pre-requisites:
Docker Installation:
You must have Docker installed and working to be able to run GaPTools. Docker is available on many different operating systems, including most modern Linux distributions, like CentOS, Debian, Ubuntu, etc. Follow the link below for more information about how to install Docker on your particular operating system.
To ensure that you can run GaPTools under your user account, run the below command and check for a response similar to the one below (your version and build numbers might be different than the ones below). The minimum supported Docker version to run GaPTools is 17.04.0.
docker -v
Docker version 19.03.6, build 369ce74a3c
Docker Compose Installation:
GaPTools uses docker-compose to run multiple containers under a single service. Follow the link below for more details on how to install docker-compose.
Access to data files
The docker host running GaPTools requires access to the data files that need to be validated. The files can either be on a local file system, a network file share (NFS) or in a storage bucket on the cloud. If the files are on a network file share (NFS) or in a storage bucket on the cloud, they need to be mounted as file system on the docker host. Below are some tools that are commonly used to mount cloud storage buckets as file systems on linux servers
Unused port 8080 on your docker host
GaPTools requires port 8080 to be available on the host system running docker. Run the below command to check if port 8080 is available on the docker host. If the below command does not produce any output, then port 8080 is available on the docker host.
netstat -an | grep "8080"
Setup
Once all pre-requisites are met, follow the instructions below to setup GaPTools. The setup can be validated using a sample study that is included as part of GaPTools installation. The input files for the sample study are inside the input_files/1000_Genomes_Study/ directory of the cloned GaPTools GitHub repository.
For the sample study, we will have GaPTools generate all output files inside the output_files/1000_Genomes_Study/ directory.
git clone https://github.com/ncbi/gaptools
cd gaptools
mkdir -p output_files/1000_Genomes_Study
# Change file permissions to allow GaPTools to write output files on docker host
chmod -R o+w output_files
Execution
Once GaPTools is setup, to execute it on the included sample study, run the below script from inside the same directory where the GaPTools GitHub repository is cloned.
./dbgap-docker.bash -i ./input_files/1000_Genomes_Study/ -o ./output_files/1000_Genomes_Study -m ./input_files/1000_Genomes_Study/metadata.json up
GaPTools uses Apache Airflow behind the scenes as the workflow orchestrator to perform all the validation tasks. To view the validation results of the dbGaP validation tool, browse to the following URL:
http://<your_docker_host_ip>:8080
At the end of the workflow, the output files will be created under the specified output directory.
Usage
To use GaPTools for your study, modify the above command and pass as input parameters:
-i -- path to the input files for your study
-o -- path where output files should be generated
-m -- path to the manifest file for your study
Stop Docker Containers
Once your study is processed, run the below command to stop the GaPTools service.
./dbgap-docker.bash down
Contact
If you have any questions or to report any issues, please contact us at: dbgap-help@ncbi.nlm.nih.gov
Related Skills
node-connect
352.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
