Canopy
Python package canopy loads the SomaLogic, Inc. proprietary data file called an *.adat. The package provides auxiliary functions for extracting relevant information from the ADAT object once in the Python environment. Basic familiarity with the Python environment is assumed, as is the ability to install contributed packages from the Python Package Installer (pip)
Install / Use
/learn @SomaLogic/CanopyREADME
The Python SomaData Package from Somalogic, Inc.
Overview
This document accompanies the Python package somadata, which loads the SomaLogic, Inc. structured text data file called an *.adat. The somadata.Adat object is an extension of the pandas.DataFrame class. The package provides auxiliary functions for extracting relevant information from the ADAT object once in the Python environment. Basic familiarity with the Python environment is assumed, as is the ability to install contributed packages from the Python Package Installer (pip).
<a name="toptoc"></a>
Table of Contents:
- Installation
- Basic Use
- Reading ADAT text files
- Wrangling Data
- Adding Metadata
- Slicing Data
- SomaScan Version Lifting
- Writing an ADAT text file
- Example Data Analysis
<a name="installation"></a>
Installation
The easiest way to install SomaData is to install directly from PyPI
PIP:
pip install SomaData
Alternatively one can install from the GitHub repository.
GitHub:
pip install git+https://github.com/SomaLogic/Canopy.git#egg=somadata
Alternatively, if you wish to develop or change the source code, you may clone the repository and install manually via:
git clone https://github.com/SomaLogic/Canopy.git
pip install -e ./somadata
Dependencies
Python >=3.9 is required to install somadata. The following package dependencies are installed on a pip install:
pandas >= 1.1.2numpy >= 1.19.1
<a name="basics"></a>
Basics
Upon installation, load somadata as normal:
import somadata
For a traversable index of the library:
help(somadata)
# help(somadata.adat) ... etc
Help on package somadata:
NAME
somadata
PACKAGE CONTENTS
adat
annotations
base (package)
data (package)
errors
io (package)
tools (package)
FILE
/Users/tjohnson/code/repos/SomaData/somadata/__init__.py
Internal Objects
The somadata package comes with one internal object available to users to run canned examples (or analyses). It can be accessed by performing the import:
from somadata.data.example_data import example_data
Main Features (I/O)
- Loading data (Import)
- Import a text file in the
*.adatformat into aPythonsession as anadatobject.
- Import a text file in the
- Wrangling data (Manipulation)
- Subset, reorder, and list various fields of an
adatobject.
- Subset, reorder, and list various fields of an
- Exporting data (Output)
- Write out an
adatobject as a*.adattext file.
- Write out an
<a name="reading"></a>
Loading an ADAT
Loading the sample file from within the somadata library via its path
adat = somadata.read_adat('./somadata/data/example_data.adat')
type(adat)
somadata.adat.Adat
adat.shape
(192, 5284)
adat.columns
MultiIndex([( '10000-28', '3', 'SL019233', ...),
( '10001-7', '3', 'SL002564', ...),
( '10003-15', '3', 'SL019245', ...),
( '10006-25', '3', 'SL019228', ...),
( '10008-43', '3', 'SL019234', ...),
( '10011-65', '3', 'SL019246', ...),
( '10012-5', '3', 'SL014669', ...),
( '10013-34', '3', 'SL025418', ...),
( '10014-31', '3', 'SL007803', ...),
('10015-119', '3', 'SL014924', ...),
...
( '9981-18', '3', 'SL018293', ...),
( '9983-97', '3', 'SL019202', ...),
( '9984-12', '3', 'SL019205', ...),
( '9986-14', '3', 'SL005356', ...),
( '9989-12', '3', 'SL019194', ...),
( '9993-11', '3', 'SL019212', ...),
( '9994-217', '3', 'SL019217', ...),
( '9995-6', '3', 'SL013164', ...),
( '9997-12', '3', 'SL019215', ...),
( '9999-1', '3', 'SL019231', ...)],
names=['SeqId', 'SeqIdVersion', 'SomaId', 'TargetFullName', 'Target', 'UniProt', 'EntrezGeneID', 'EntrezGeneSymbol', 'Organism', 'Units', 'Type', 'Dilution', 'PlateScale_Reference', 'CalReference', 'Cal_Example_Adat_Set001', 'ColCheck', 'CalQcRatio_Example_Adat_Set001_170255', 'QcReference_170255', 'Cal_Example_Adat_Set002', 'CalQcRatio_Example_Adat_Set002_170255'], length=5284)
from IPython.display import HTML
#Display the first five rows and columns of the adat
HTML(adat.iloc[:5,:5].to_html()) # Need to use HTML & to_html() here to display nicely for this README
# Output is left-right scrollable in both this readme and Jupyter notebooks
<table border="1" class="dataframe">
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th>SeqId</th>
<th>10000-28</th>
<th>10001-7</th>
<th>10003-15</th>
<th>10006-25</th>
<th>10008-43</th>
</tr>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th>SeqIdVersion</th>
<th>3</th>
<th>3</th>
<th>3</th>
<th>3</th>
<th>3</th>
</tr>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th>SomaId</th>
<th>SL019233</th>
<th>SL002564</th>
<th>SL019245</th>
<th>SL019228</th>
<th>SL019234</th>
</tr>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th>TargetFullName</th>
<th>Beta-crystallin B2</th>
<th>RAF proto-oncogene serine/threonine-protein kinase</th>
<th>Zinc finger protein 41</th>
<th>ETS domain-containing protein Elk-1</th>
<th>Guanylyl cyclase-activating protein 1</th>
</tr>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th>Target</th>
<th>CRBB2</th>
<th>c-Raf</th>
<th>ZNF41</th>
<th>ELK1</th>
<th>GUC1A</th>
</tr>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th>UniProt</th>
<th>P43320</th>
<th>P04049</th>
<th>P51814</th>
<th>P19419</th>
<th>P43080</th>
</tr>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th>