Arion Drug Discovery Platform - User Manual

Version 1.1.0 | Updated April 2026

Getting Started
Dashboard
BBB Predictor
BBB Optimizer
Chemical Space Explorer
Project Management
Compound Browser
API Reference
Model Benchmarks
Troubleshooting

1. Getting Started

Accessing the Platform

The platform runs as a web application accessible at: - Local: http://localhost:5051 - Remote (via Tailscale Funnel): The URL provided by your administrator

Open any modern web browser (Chrome, Firefox, Edge) and navigate to the URL.

The top navigation bar provides access to all major features: - Dashboard - Overview of all projects - BBB Predictor - Single/batch BBB prediction - BBB Optimizer - AI-powered molecule optimization - Chemical Space - UMAP visualization - Benchmarks - Model performance comparisons - New Project - Create a new drug discovery project

2. Dashboard

The dashboard shows all active drug discovery projects with key metrics: - Project cards display target name, indication, number of compounds, and pipeline progress - Aggregate statistics show total compounds, leads identified, and active campaigns - Click any project card to view detailed project information

3. BBB Predictor

The BBB (Blood-Brain Barrier) Predictor is the platform's primary tool. It predicts whether a molecule can cross the blood-brain barrier to reach the central nervous system.

Single Molecule Prediction

Enter a SMILES string in the input field (e.g., O=C1CN=C(c2ccccc2)c2cc(Cl)ccc2N1C for Diazepam)
Click Predict BBB
The results page shows:

Verdict

BBB+ (green): Molecule likely crosses the BBB (ensemble probability >= 65%)
Borderline (yellow): Uncertain prediction (40-65%)
BBB- (red): Molecule unlikely to cross the BBB (<40%)

Model Consensus

Four independent ML models vote on the prediction: - D-MPNN: ChemProp directed message-passing neural network - RF (Morgan): Random Forest on 2048-bit Morgan fingerprints - XGBoost: Gradient boosting on 15 molecular descriptors - Attentive GNN: Graph neural network with stereochemistry awareness (CANDID-CNS architecture)

The ensemble probability is the average of all available models. High confidence means all models agree and the probability spread is small.

Mechanistic Breakdown

Passive Permeability (0-100%): Rule-based estimate of how easily the molecule diffuses through the BBB
P-gp Efflux Risk (LOW/MODERATE/HIGH): Likelihood the molecule is pumped back out by P-glycoprotein
CNS MPO (0-6): Multi-parameter optimization score (Wager et al. 2016). Score >= 4 is favorable.

CNS Property Profile

A radar chart and table showing molecular properties relevant to CNS penetration:

Property	Ideal Range	Significance
MW	350-450 Da	Smaller molecules cross more easily
TPSA (2D)	< 90 A^2	Lower polar surface area = better permeation
3D PSA	< 80 A^2	Conformer-based PSA (more accurate than 2D)
HBD	<= 2	Fewer H-bond donors reduce P-gp efflux
HBA	<= 8	Fewer acceptors improve passive diffusion
SlogP	1.0-3.0	Moderate lipophilicity
Fsp3	>= 0.2	Some sp3 character improves solubility
RotBonds	<= 8	Less flexibility = more rigid, better permeation
QED	>= 0.5	Drug-likeness score

Feature Attribution (SHAP)

When available, two explainability visualizations show WHY the model made its prediction: - XGBoost Descriptors: Bar chart showing which molecular properties most influenced the prediction. Green bars push toward BBB+, red bars push toward BBB-. - Atom BBB Attribution: Molecule image with atoms colored by their contribution. Green atoms help BBB+, red atoms hurt it.

Structural Alerts

Red/yellow flags for properties that strongly predict BBB impermeability: - TPSA > 120 (critical) - MW > 500 (critical) - HBD > 3 (critical) - Charged groups (critical) - SlogP < 0 or > 5 (warning)

Additional Scores

BBBscore (Gupta 2019): Validated 0-5 composite score
Clark Classification: High/Moderate/Low penetration class
P-gp Substrate Probability: ML prediction of P-gp efflux risk
BCRP Efflux Probability: ML prediction of BCRP transporter efflux
Kp,uu (Brain Exposure): Predicted unbound brain-to-blood ratio

Nearest CNS Drug

Shows the most structurally similar approved CNS drug from a curated library of 20 reference compounds.

Applicability Domain

Shows how similar the query molecule is to the training data. "In domain" (similarity >= 0.3) means the prediction is more reliable.

Batch Prediction

Click the Upload & Predict section below the single-molecule input
Upload a CSV file with a SMILES column (one molecule per row)
Up to 500 molecules can be processed per batch
Results show a sortable table with BBB probability, verdict, and key properties
Download results as CSV using the download button

Quick Test Compounds

Click any of the example buttons (Diazepam, Acetaminophen, Ibuprofen, Testosterone) to quickly test the predictor.

4. BBB Optimizer

The optimizer uses REINVENT 4 (deep reinforcement learning) to generate novel molecules optimized for BBB penetration starting from a seed compound.

How to Use

Navigate to /optimize/bbb
Enter a seed SMILES (the starting molecule to optimize)
Click Launch Optimization
The system launches a REINVENT job that generates variants
Monitor progress on the status page (auto-refreshes)
Results show top-scoring generated molecules with BBB predictions

Important Notes

Optimization runs take 5-30 minutes depending on complexity
Only one optimization can run at a time
Generated molecules are novel and may not be synthesizable
Use the BBB predictor to validate individual results

5. Chemical Space Explorer

Navigate to /chemspace/bbb to view the chemical space visualization.

UMAP plot: 2D projection of all training compounds colored by BBB class
Interactive: Hover over points to see SMILES and BBB probability
Query molecules are overlaid as larger markers for comparison

6. Project Management

Creating a New Project

Navigate to /new-project
Fill in the target specification:
Target gene/protein name
ChEMBL target ID
Indication
PDB code for docking
Off-target selectivity requirements
Click Create Project
The pipeline automatically:
Fetches known ligands from ChEMBL
Trains QSAR models
Generates seed compounds
Launches initial REINVENT campaign

Viewing Project Progress

Each project card on the dashboard shows pipeline status
Click a project to see campaign history, validation results, and lead compounds

7. Compound Browser

For each project, the compound browser at /project/<name>/compounds provides:

Sortable table of all generated and validated compounds
Filter by: traffic light status (green/yellow/red), BBB verdict, property ranges
Sort by: any column including docking score, BBB probability, CNS MPO
Export: Download filtered results as CSV or SDF
Compound detail: Click any row to see full profile including:
2D/3D structure views
All BBB scores and properties
Docking pose visualization (if available)
Retrosynthesis analysis (if computed)

Comparison Mode

Select 2-4 compounds and click "Compare" to see a radar chart overlay of their properties.

Scaffold Analysis

The scaffolds view groups compounds by Murcko scaffold for SAR analysis.

8. API Reference

The platform provides a REST API for programmatic access.

Health Check

GET /api/v1/health

Returns model versions and status.

BBB Prediction

POST /api/v1/predict
Content-Type: application/json

{
  "smiles": ["CCO", "c1ccccc1"]
}

Returns full BBB prediction results for up to 100 SMILES.

BBB Optimization

POST /api/v1/optimize
Content-Type: application/json

{
  "smiles": "c1ccccc1",
  "n_iterations": 100
}

Launches an optimization job. Returns a job ID.

Job Status

GET /api/v1/status/<job_id>

Returns optimization progress and results.

Full API Documentation

Visit /api/docs in your browser for interactive API documentation.

9. Model Benchmarks

Navigate to /benchmarks to see how the platform's models compare against published methods:

Method	AUC-ROC	Source
Arion 4-Model Ensemble	0.934	Validated on Spielvogel 2025 dataset
Arion Attentive GNN	0.876	Scaffold-balanced split on B3DB
CANDID-CNS (Collins 2025)	0.95	Published (unfiltered B3DB)
Spielvogel RF (2025)	0.88	Published (154 radiotracers)
BBBscore (Gupta 2019)	0.83	Validated on same data
CNS MPO (published)	0.53	Spielvogel evaluation

The platform's ensemble consistently outperforms individual published methods by combining multiple complementary model architectures.

10. Troubleshooting

Platform won't start

Ensure the conda environment is activated: conda activate reinvent_env
Check port 5051 is not in use: netstat -aon | findstr 5051
Start with: cd D:\Arion\Platform && python run_web.py

Predictions are slow

First prediction loads all models (~10-15 seconds)
Subsequent predictions are fast (~2-3 seconds)
3D PSA adds ~1-2 seconds (conformer generation)
Batch predictions skip 3D PSA for speed

"Out of domain" warning

The query molecule is structurally dissimilar to the training data (Tanimoto < 0.3). The prediction may be less reliable. Consider: - Checking if the molecule is valid/reasonable - Using additional scoring methods (docking, experimental)

SHAP plots not showing

SHAP is only computed for single-molecule predictions (not batch)
If the SHAP section is missing, models may not have loaded correctly

Optimization fails to start

Ensure REINVENT 4 is installed and accessible
Check that model files exist at expected paths
Only one optimization can run at a time

Remote access issues

Ensure Tailscale is running on both machines
Funnel must be active: tailscale funnel --bg 5051
Check status: tailscale funnel status