--- title: "Self-Hosted Models" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Self-Hosted Models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE # Set to FALSE since these calls require a running container ) ``` Partners who run Synthesize Bio models inside their own environment (for example, on a GPU host in their own cloud account) can use the same `rsynthbio` client against a self-hosted model container instead of the hosted API at `app.synthesize.bio`. > **Self-hosted deployment is a model deployment option available within a > Synthesize Bio partnership.** To learn more or request access, contact > [partnerships@synthesize.bio](mailto:partnerships@synthesize.bio). ## How it differs from the hosted path The hosted path is asynchronous: it starts a query, polls for completion, and downloads results. A self-hosted container instead returns predictions **synchronously** as an [Apache Arrow](https://arrow.apache.org/) IPC stream. `rsynthbio` decodes that stream into exactly the same data frames you get from the hosted path (`expression`, `metadata`, and `latents`), so downstream code does not change. Key differences: - **No polling and no download URL** — a single request returns the data. - **Requires the optional `arrow` package** — install it with `install.packages("arrow")`. - **No API key required** — a key is only sent when `SYNTHESIZE_API_KEY` is set (use this if your container runs with authentication enabled). ## Installation ```{r install, eval=FALSE} install.packages("rsynthbio") install.packages("arrow") # required for self-hosted mode ``` ## Enabling self-hosted mode Set `self_hosted = TRUE` on a call, or enable it for the whole session with the `SYNTHESIZE_SELF_HOSTED` environment variable (truthy values: `1`, `true`, `yes`, `on`). ```{r enable, eval=FALSE} library(rsynthbio) Sys.setenv(SYNTHESIZE_SELF_HOSTED = "1") ``` ## Pointing each model at its container Self-hosted deployments typically run one container per model. Set a per-model base URL once and you never have to pass `api_base_url` on individual calls. The variable name is `SYNTHESIZE_API_BASE_URL__`, where `` is the upper-cased model id with non-alphanumeric characters replaced by underscores (for example, `gem-1-bulk` becomes `SYNTHESIZE_API_BASE_URL__GEM_1_BULK`). ```{r per-model, eval=FALSE} Sys.setenv( SYNTHESIZE_API_BASE_URL__GEM_1_BULK = "https://gem-1-bulk.internal.example", SYNTHESIZE_API_BASE_URL__GEM_1_SC = "https://gem-1-sc.internal.example" ) query <- get_example_query("gem-1-bulk", self_hosted = TRUE)$example_query result <- predict_query(query, model_id = "gem-1-bulk", self_hosted = TRUE) expression <- result$expression metadata <- result$metadata ``` Variant slugs backed by the same container (for example, `gem-1-bulk_reference-conditioning` and `gem-1-bulk_predict-metadata`) resolve to the same per-model variable as their base model. ### Resolution precedence When `api_base_url` is `NULL`, the base URL is resolved in this order: 1. An explicit `api_base_url` argument passed to the call. 2. The per-model variable `SYNTHESIZE_API_BASE_URL__`. 3. The global `SYNTHESIZE_API_BASE_URL`. 4. The production default (`https://app.synthesize.bio`). You can always override the environment by passing `api_base_url` directly: ```{r explicit-url, eval=FALSE} result <- predict_query( query, model_id = "gem-1-bulk", api_base_url = "https://gem-1-bulk.internal.example", self_hosted = TRUE ) ``` ## Authentication (optional) Self-hosted containers may run without authentication. If yours requires a key, set `SYNTHESIZE_API_KEY` and the client attaches it as a bearer token: ```{r auth, eval=FALSE} Sys.setenv(SYNTHESIZE_API_KEY = "your-container-api-key") ``` ## Raw responses Pass `raw_response = TRUE` to receive the parsed Arrow `Table` and its schema metadata (including `model_version`, `request_type`, and `gene_order`) instead of the transformed data frames: ```{r raw, eval=FALSE} raw <- predict_query( query, model_id = "gem-1-bulk", self_hosted = TRUE, raw_response = TRUE ) raw$table # an arrow::Table raw$model_version ``` ## Session info ```{r session-info} sessionInfo() ```