Operators
Accelerate workloads with standard or custom operators
Atgenomix Operators are SeqsLab platform plugins and open-source Cromwell extensions that streamline heterogeneous data integration and transformation, and scale complex workflow tasks, using distributed computing clusters and GPU accelerators.
Operators enable you to load each task’s individual input files (text, binary, csv, parquet, table, etc.) and store the task’s individual output files via standard or custom access methods that are agnostic to storage infrastructure. Localization and delocalization optimization can be applied on the fly to each individual file or dataset to meet various task I/O requirements and to turn single-node processing into multi-node and multi-core computing.
Optimizable data localization and delocalization
Configurable data processing pipelines
Atgenomix empowers everyone to build auto-scaling automation with the integrated flow of Operators in the entire lifecycle of workflow tasks. From input localization, command execution, to output delocalization, the Atgenomix operator pipeline makes it straightforward to design data processing pipelines, chain them together in tasks, parallelize their execution, and boost workflow efficiency.
Working with data the common way
Atgenomix allows you to load and work with biomedical data by using DataFrames, the most common data structure used in modern data processing and analytics. A DataFrame, much like a spreadsheet, organizes data into a two-dimensional table of rows and named columns; and can span hundreds or thousands of CPU/GPU cores. DataFrames make workflows possible and intuitive, and improves processing speed and scalability using distributed computing environments.
For Users
Select the operator pipeline that works best for your task input and output files from a list of pre-canned pipelines. Furthermore, following a simple process pattern, you can define your own operator pipeline by chaining one or more operators together.