Software Unit Operations

Software-based unit operations for biofoundry automation

title: Biofoundry Unit-operations
description: Collection of standardized Unit-operations
version: 0.2
language: English
suggestions:
  - https://github.com/sblabkribb/biofoundry_workflows/issues
date: 2025-02-22
categories:
  - unit-operations
  - biofoundry
  - lab automation
  - software
status: draft

1 Unit-operations

1.1 Software

  • US010: DNA Oligomer Pool Design
    • Software: Dsembler, DNAWorks
    • Description: A software package that designs DNA oligomers which are pooled in a single tube. It optimizes the selection and combination of oligomers for efficient DNA assembly and synthesis.
  • US020: Primer Design
    • Software: SnapGene, Primer3, OligoAnalyzer
    • Description: Designing primers regarding melting temperature and structure. The primers are used for mutant generation, PCR primers, and other molecular biology applications, ensuring specificity and efficiency in amplification.
  • US030: Vector Design
    • Software: VectorNTI, SnapGene, Geneious
    • Description: Designing vector maps regarding inserts and a plasmid backbone. This might include primer design and DNA assembly processes, facilitating the construction of functional genetic vectors for cloning and expression.
  • US040: Sequence Optimization
    • Software: GeneOptimizer, JCat
    • Description: Modifying codon usage of a DNA sequence to maximize protein expression in a specific host. This software ensures optimal translation efficiency and protein yield by adapting sequences to host-specific codon preferences.
  • US050: Synthesis Screening
    • Software: UltraSEQ, Common Mechanism, FAST-NA
    • Description: Tools for screening potentially dangerous DNA sequences. These tools help ensure biosafety by identifying sequences that may pose risks in synthetic biology applications.
  • US060: Structure-based Sequence Generation
    • Software: ProteinMPNN
    • Description: Generating sequences based on protein structures using AI models. This software aids in designing novel proteins with desired structural and functional properties.
  • US070: Protein Structure Prediction
    • Software: Alphafold, Rosettafold, I-TASSER
    • Description: Predicting protein structures using AI models. These tools provide insights into protein folding and stability, supporting protein engineering and drug discovery efforts.
  • US080: Protein Structure Generation
    • Software: RFdiffusion
    • Description: Generating protein structures using AI models. This software facilitates the design of new proteins and enzymes with specific catalytic or binding functions.
  • US090: Retrosynthetic Pathway Design
    • Software: RetroPath2.0, ECREACT, BioNavi-NP
    • Description: To predict biosynthetic pathways using tools for checking reaction feasibilities and for novel pathway discovery. These tools support metabolic engineering by identifying efficient routes for chemical synthesis.
  • US100: Enzyme Identification
    • Software: DeepEC, Selenzyme, SoluProt
    • Description: To search for enzymes from databases or to predict enzyme properties such as reactivities for selecting proper enzymes in pathways. This software aids in enzyme discovery and characterization for biocatalysis.
  • US110: Sequence Alignment
    • Software: BLAST, MUSCLE
    • Description: Exploring and comparing sequence similarity using alignment algorithms. These tools are essential for identifying homologous sequences and understanding evolutionary relationships.
  • US120: Sequence Trimming and Filtering
    • Software: Trimmomatic, Cutadapt, Porechop, Filtlong
    • Description: Preprocessing for removing low-quality long/short-read sequences. This step is crucial for ensuring data quality in sequencing projects.
  • US130: Sequence Mapping and Alignment
    • Software: BWA, Bowtie2, Minimap2, GraphMap
    • Description: Mapping long/short-read sequences to reference sequences. These tools are used for genome assembly, variant calling, and transcriptomics.
  • US140: Sequence Assembly
    • Software: Velvet, SOAP, Quast, Canu, Flye
    • Description: Assembling long/short-read sequences for complete gene, pathway, and chromosome. This software supports the reconstruction of genomes and metagenomes.
  • US145: Metagenomic Assembly
    • Software: MetaSPAdes, MEGAHIT
    • Description: Assembling metagenomic data to reconstruct genomes from complex microbial communities. This software supports environmental and clinical metagenomics studies.
  • US150: Sequence Quality Control
    • Software: FastQC, MultiQC, NanoPlot, pycoQC
    • Description: Performing quality control (QC) on long/short-read fastq and fast5 files. QC is essential for identifying and correcting errors in sequencing data.
  • US160: Demultiplexing
    • Software: bcl-convert, Guppy
    • Description: Separating NGS reads based on native or user-defined barcodes. This process is critical for handling multiplexed sequencing data.
  • US170: Variant Calling
    • Software: GATK, bcftools, Sniffles, Longshot
    • Description: Detecting variants based on read mapping. These tools are used for identifying SNPs, indels, and structural variants in genomic data.
  • US180: RNA-Seq Analysis
    • Software: DESeq2/EdgeR (R), Galaxy, HISAT2
    • Description: Processing and analyzing transcriptomic data to quantify gene expression levels, identify splice variants, and detect differential gene expression. This software supports functional genomics studies.
  • US185: Gene Set Enrichment Analysis
    • Software: GSEA, DAVID
    • Description: Analyzing gene expression data to identify enriched biological pathways. This software supports functional genomics and systems biology research.
  • US190: Proteomics Data Analysis
    • Software: MaxQuant, Perseus, Proteome Discoverer
    • Description: Processing and interpreting data from mass spectrometry to identify and quantify proteins, understand modifications, and assess protein interactions. These tools are essential for proteomics research.
  • US200: Phylogenetic Analysis
    • Software: MEGA, PhyML
    • Description: Determining the evolutionary relationships among species or sequences by constructing phylogenetic trees based on sequence similarities and differences. This software supports evolutionary biology studies.
  • US210: Metabolic Flux Analysis
    • Software: COBRA Toolbox, FBA, CellNetAnalyzer
    • Description: Modeling and analyzing metabolic pathways, providing insights into cellular metabolism and pathway optimization. These tools are used for metabolic engineering and systems biology.
  • US220: Deep Learning Data Preparation
    • Software: pytorch::DataLoader, Huggingface::datasets
    • Description: Preparing and batching datasets for AI model training and evaluation. This software supports machine learning workflows in bioinformatics.
  • US230: Sequence Embedding
    • Software: ProtT5, ProtBERT, ESM
    • Description: Biological sequence embedding procedure. This process is used for transforming sequences into numerical representations for machine learning applications.
  • US240: Deep Learning Model Training
    • Software: CNN, LSTM, Transformer, Bayesian opt.
    • Description: Model training procedure using training data. This software supports the development of AI models for various bioinformatics tasks.
  • US250: Model Evaluation
    • Software: scikit-learn, TensorBoard
    • Description: Utilizing model evaluation metrics (accuracy, precision, recall, F1 score, etc.). This software is used for assessing the performance of machine learning models.
  • US260: Hyperparameter Tuning
    • Software: Optuna, HyperOpt
    • Description: Efficiently exploring the search space using Bayesian optimization techniques. This software supports the optimization of machine learning models.
  • US270: Model Deployment
    • Software: TorchScript, FastAPI
    • Description: Deploying trained models as services. This software supports the integration of AI models into production environments.
  • US280: Monitoring and Reporting
    • Software: Prometheus, Grafana
    • Description: Monitoring and visualizing performance and resource usage of AI models. This software supports the maintenance and optimization of deployed models.
  • US290: Phenotype Data Preprocessing
    • Software: R, Python
    • Description: Preprocessing measured and collected phenotype data. It involves cleaning, organizing, and transforming raw phenotype datasets for downstream analysis.
  • US300: XCMS Analysis
    • Software: XCMS (R)
    • Description: Analyzing and visualizing chromatographically separated and single-spectra mass spectral data. This software supports metabolomics research.
  • US310: Flow Cytometry Analysis
    • Software: flowcore, flowworkspace (R), flowJo
    • Description: Analyzing and visualizing flow cytometry data. This software supports immunology and cell biology research.
  • US320: DNA Assembly Simulation
    • Software: pyDNA (python)
    • Description: Simulating DNA assembly such as Golden Gate and Gibson for increasing assembly success rate. This software supports synthetic biology and genetic engineering.
  • US325: Gene Editing Simulation
    • Software: CRISPResso, CHOPCHOP
    • Description: Simulating gene editing outcomes using CRISPR technology. This software helps predict off-target effects and optimize guide RNA design for precise genome editing.
  • US330: Well Plate Mapping
    • Software: Well plate mapping software
    • Description: Software for mapping well plate source plates to destination plates. This tool supports high-throughput screening and assay development.
  • US340: Computation
    • Software: Computer
    • Description: A general process of data collection, preprocessing, and analysis steps. This software supports a wide range of bioinformatics and computational biology applications.

1.2 Software Categories

Software Unit Operation Categories
Category Count Examples
Design Tools 8 DNA design, protein structure prediction
Sequence Analysis 10 Assembly, alignment, variant calling
Data Processing 6 Quality control, preprocessing
AI/ML 8 Deep learning, model training
Workflow Management 3 Plate mapping, computation