Arion Drug Discovery Platform - User Manual

Version 1.1.0 | Updated April 2026


Table of Contents

  1. Getting Started
  2. Dashboard
  3. BBB Predictor
  4. BBB Optimizer
  5. Chemical Space Explorer
  6. Project Management
  7. Compound Browser
  8. API Reference
  9. Model Benchmarks
  10. Troubleshooting

1. Getting Started

Accessing the Platform

The platform runs as a web application accessible at: - Local: http://localhost:5051 - Remote (via Tailscale Funnel): The URL provided by your administrator

Open any modern web browser (Chrome, Firefox, Edge) and navigate to the URL.

The top navigation bar provides access to all major features: - Dashboard - Overview of all projects - BBB Predictor - Single/batch BBB prediction - BBB Optimizer - AI-powered molecule optimization - Chemical Space - UMAP visualization - Benchmarks - Model performance comparisons - New Project - Create a new drug discovery project


2. Dashboard

The dashboard shows all active drug discovery projects with key metrics: - Project cards display target name, indication, number of compounds, and pipeline progress - Aggregate statistics show total compounds, leads identified, and active campaigns - Click any project card to view detailed project information


3. BBB Predictor

The BBB (Blood-Brain Barrier) Predictor is the platform's primary tool. It predicts whether a molecule can cross the blood-brain barrier to reach the central nervous system.

Single Molecule Prediction

  1. Enter a SMILES string in the input field (e.g., O=C1CN=C(c2ccccc2)c2cc(Cl)ccc2N1C for Diazepam)
  2. Click Predict BBB
  3. The results page shows:

Verdict

  • BBB+ (green): Molecule likely crosses the BBB (ensemble probability >= 65%)
  • Borderline (yellow): Uncertain prediction (40-65%)
  • BBB- (red): Molecule unlikely to cross the BBB (<40%)

Model Consensus

Four independent ML models vote on the prediction: - D-MPNN: ChemProp directed message-passing neural network - RF (Morgan): Random Forest on 2048-bit Morgan fingerprints - XGBoost: Gradient boosting on 15 molecular descriptors - Attentive GNN: Graph neural network with stereochemistry awareness (CANDID-CNS architecture)

The ensemble probability is the average of all available models. High confidence means all models agree and the probability spread is small.

Mechanistic Breakdown

  • Passive Permeability (0-100%): Rule-based estimate of how easily the molecule diffuses through the BBB
  • P-gp Efflux Risk (LOW/MODERATE/HIGH): Likelihood the molecule is pumped back out by P-glycoprotein
  • CNS MPO (0-6): Multi-parameter optimization score (Wager et al. 2016). Score >= 4 is favorable.

CNS Property Profile

A radar chart and table showing molecular properties relevant to CNS penetration:

Property Ideal Range Significance
MW 350-450 Da Smaller molecules cross more easily
TPSA (2D) < 90 A^2 Lower polar surface area = better permeation
3D PSA < 80 A^2 Conformer-based PSA (more accurate than 2D)
HBD <= 2 Fewer H-bond donors reduce P-gp efflux
HBA <= 8 Fewer acceptors improve passive diffusion
SlogP 1.0-3.0 Moderate lipophilicity
Fsp3 >= 0.2 Some sp3 character improves solubility
RotBonds <= 8 Less flexibility = more rigid, better permeation
QED >= 0.5 Drug-likeness score

Feature Attribution (SHAP)

When available, two explainability visualizations show WHY the model made its prediction: - XGBoost Descriptors: Bar chart showing which molecular properties most influenced the prediction. Green bars push toward BBB+, red bars push toward BBB-. - Atom BBB Attribution: Molecule image with atoms colored by their contribution. Green atoms help BBB+, red atoms hurt it.

Structural Alerts

Red/yellow flags for properties that strongly predict BBB impermeability: - TPSA > 120 (critical) - MW > 500 (critical) - HBD > 3 (critical) - Charged groups (critical) - SlogP < 0 or > 5 (warning)

Additional Scores

  • BBBscore (Gupta 2019): Validated 0-5 composite score
  • Clark Classification: High/Moderate/Low penetration class
  • P-gp Substrate Probability: ML prediction of P-gp efflux risk
  • BCRP Efflux Probability: ML prediction of BCRP transporter efflux
  • Kp,uu (Brain Exposure): Predicted unbound brain-to-blood ratio

Nearest CNS Drug

Shows the most structurally similar approved CNS drug from a curated library of 20 reference compounds.

Applicability Domain

Shows how similar the query molecule is to the training data. "In domain" (similarity >= 0.3) means the prediction is more reliable.

Batch Prediction

  1. Click the Upload & Predict section below the single-molecule input
  2. Upload a CSV file with a SMILES column (one molecule per row)
  3. Up to 500 molecules can be processed per batch
  4. Results show a sortable table with BBB probability, verdict, and key properties
  5. Download results as CSV using the download button

Quick Test Compounds

Click any of the example buttons (Diazepam, Acetaminophen, Ibuprofen, Testosterone) to quickly test the predictor.


4. BBB Optimizer

The optimizer uses REINVENT 4 (deep reinforcement learning) to generate novel molecules optimized for BBB penetration starting from a seed compound.

How to Use

  1. Navigate to /optimize/bbb
  2. Enter a seed SMILES (the starting molecule to optimize)
  3. Click Launch Optimization
  4. The system launches a REINVENT job that generates variants
  5. Monitor progress on the status page (auto-refreshes)
  6. Results show top-scoring generated molecules with BBB predictions

Important Notes

  • Optimization runs take 5-30 minutes depending on complexity
  • Only one optimization can run at a time
  • Generated molecules are novel and may not be synthesizable
  • Use the BBB predictor to validate individual results

5. Chemical Space Explorer

Navigate to /chemspace/bbb to view the chemical space visualization.

  • UMAP plot: 2D projection of all training compounds colored by BBB class
  • Interactive: Hover over points to see SMILES and BBB probability
  • Query molecules are overlaid as larger markers for comparison

6. Project Management

Creating a New Project

  1. Navigate to /new-project
  2. Fill in the target specification:
  3. Target gene/protein name
  4. ChEMBL target ID
  5. Indication
  6. PDB code for docking
  7. Off-target selectivity requirements
  8. Click Create Project
  9. The pipeline automatically:
  10. Fetches known ligands from ChEMBL
  11. Trains QSAR models
  12. Generates seed compounds
  13. Launches initial REINVENT campaign

Viewing Project Progress

  • Each project card on the dashboard shows pipeline status
  • Click a project to see campaign history, validation results, and lead compounds

7. Compound Browser

For each project, the compound browser at /project/<name>/compounds provides:

  • Sortable table of all generated and validated compounds
  • Filter by: traffic light status (green/yellow/red), BBB verdict, property ranges
  • Sort by: any column including docking score, BBB probability, CNS MPO
  • Export: Download filtered results as CSV or SDF
  • Compound detail: Click any row to see full profile including:
  • 2D/3D structure views
  • All BBB scores and properties
  • Docking pose visualization (if available)
  • Retrosynthesis analysis (if computed)

Comparison Mode

Select 2-4 compounds and click "Compare" to see a radar chart overlay of their properties.

Scaffold Analysis

The scaffolds view groups compounds by Murcko scaffold for SAR analysis.


8. API Reference

The platform provides a REST API for programmatic access.

Health Check

GET /api/v1/health

Returns model versions and status.

BBB Prediction

POST /api/v1/predict
Content-Type: application/json

{
  "smiles": ["CCO", "c1ccccc1"]
}

Returns full BBB prediction results for up to 100 SMILES.

BBB Optimization

POST /api/v1/optimize
Content-Type: application/json

{
  "smiles": "c1ccccc1",
  "n_iterations": 100
}

Launches an optimization job. Returns a job ID.

Job Status

GET /api/v1/status/<job_id>

Returns optimization progress and results.

Full API Documentation

Visit /api/docs in your browser for interactive API documentation.


9. Model Benchmarks

Navigate to /benchmarks to see how the platform's models compare against published methods:

Method AUC-ROC Source
Arion 4-Model Ensemble 0.934 Validated on Spielvogel 2025 dataset
Arion Attentive GNN 0.876 Scaffold-balanced split on B3DB
CANDID-CNS (Collins 2025) 0.95 Published (unfiltered B3DB)
Spielvogel RF (2025) 0.88 Published (154 radiotracers)
BBBscore (Gupta 2019) 0.83 Validated on same data
CNS MPO (published) 0.53 Spielvogel evaluation

The platform's ensemble consistently outperforms individual published methods by combining multiple complementary model architectures.


10. Troubleshooting

Platform won't start

  • Ensure the conda environment is activated: conda activate reinvent_env
  • Check port 5051 is not in use: netstat -aon | findstr 5051
  • Start with: cd D:\Arion\Platform && python run_web.py

Predictions are slow

  • First prediction loads all models (~10-15 seconds)
  • Subsequent predictions are fast (~2-3 seconds)
  • 3D PSA adds ~1-2 seconds (conformer generation)
  • Batch predictions skip 3D PSA for speed

"Out of domain" warning

The query molecule is structurally dissimilar to the training data (Tanimoto < 0.3). The prediction may be less reliable. Consider: - Checking if the molecule is valid/reasonable - Using additional scoring methods (docking, experimental)

SHAP plots not showing

  • SHAP is only computed for single-molecule predictions (not batch)
  • If the SHAP section is missing, models may not have loaded correctly

Optimization fails to start

  • Ensure REINVENT 4 is installed and accessible
  • Check that model files exist at expected paths
  • Only one optimization can run at a time

Remote access issues

  • Ensure Tailscale is running on both machines
  • Funnel must be active: tailscale funnel --bg 5051
  • Check status: tailscale funnel status