TrainedModel
Definition
A TrainedModel is the trained artifact plus the training-run record produced by training a Model on a Dataset. It
realizes a Model: on create it reads the referenced Model and copies its library / framework / type, then (for
federated models) deploys a training job to Kubernetes/titan and tracks it to completion. The clean distinction is
Model = config/architecture; TrainedModel = the run plus the resulting weights (trained_model.py:48, where it
reads Model().read(uid=source['model'])). The TrainedModel is the object that carries live training status, versioned
weights in git, and the federated job handle.
Defined axonis-core/axonis/ml_userspace/trained_model.py:26 (class TrainedModel(UDS), alias
Schema.TRAINED_MODEL = "trainedmodel"). Persistence: the Elasticsearch userspace index (metadata/status), plus
Gitea (weights/versions) and Redis (live status pub/sub).
Lifecycle
Status codes Schema.MODEL_STATUS (schema.py:139): 0 RETRIEVING DATASET → 1 TRAINING → 2 ANALYZING PERFORMANCE →
3 STORING MODEL → 4 TRAINING COMPLETE, plus 5 ERROR, 6 QUEUED, 7 STOPPED, 8 OPTIMIZE (mirrored as the
TrainingStatus enum). On create, initialize() deploys the job and sets QUEUED (trained_model.py:133); titan's
training entrypoint runs it and calls update_train_status(uid, 4, …) on success or (uid, 5, …) on failure
(training.py:110). update(state='start'|'stop') toggles the k8s pod via initialize / finalize; finalize
stamps training_endtime and tears down the pod; delete removes the k8s deployment. Each status update republishes to
the Redis channel trainedmodels:{uid} and recomputes model_versions from the git branches.
Journey through the code
REST create target=trainedmodel → rest userspace → generic UDS create in the REST process (trainedmodel is also not in
the rest USERSPACE map) → federation propagates → on the federate, TrainedModel.create runs: it resolves the Model
metadata, writes the ES record, then if send_to_federation(): initialize() deploys via
axonis.core.deploy.training.Training(...).deploy(entrypoint='titan.modeling.training') → the titan pod
titan/titan/modeling/training.py:49 entrypoint_training, which downloads the model and checkpoint and branches on
framework:
- federated →
federated_model_runner(coded) /federated_model_runner_v2.nocode_start(nocode) builds a FATE PipeLine (Reader → DataTransform → [Intersection ecdh, if vertical] → operations → DataSplit → model component → Evaluation),compile()s andfit()s it, submitting to FATE-Flow viaflow_sdk.client.FlowClient(federated_model_runner.py:367). Thefederated_job_idis stored on the object; stop callsFlowClient.job.stop. - simple/advanced →
simple_model_runner/advanced_model_runner(pytorch/tf/xgboost).
Artifacts and versions are stored in Gitea via axonis.core.storage.Storage (per-uid repo, branch = version);
checkpoints are base64-stored on the object. The consume path is titan's userspace/predictor.py:48, which reads the
TrainedModel, its source Model, and the Dataset for inference.
Data shape
Seeded on create: model (FK → Model uid), parameters (with prediction_threshold defaulted to 0.5), datasets
(e.g. {training: <dataset_uid>}), version (default main), modeldir / exportdir ('saved_model'),
checkpoint_exists = 0, graph_version, serving = []. Copied from the Model: library, framework, type.
Runtime: status, status_message, training_starttime / training_endtime / training_duration, model_versions,
federated_job_id, transform (e.g. PCA), checkpoint blobs, nocode. Storage: ES userspace index (metadata/status)
+ Gitea (weights/versions) + Redis (status pub/sub on trainedmodels:{uid}).
Invariants
parametersalways exists (indexed unconditionally,trained_model.py:32).- A non-nocode, non-pretrained TrainedModel references a resolvable Model (
trained_model.py:47). library = 'federated'is rejected on an EdgeNode (trained_model.py:135).training_endtimeis set once, never overwritten (trained_model.py:166); every status update republishes totrainedmodels:{uid}and recomputesmodel_versions.
Related products
product.model— the TrainedModel realizes a Model (source['model']), copying its library/framework/type.product.dataset— trained on a Dataset (datasets.training); a Dataset create/update also spawns a system quality-estimator training run.product.fusion— the federated training path submits to FATE-Flow; fusion scoring can reference titan-trained ranker/LLM models by registry key.
Open questions
- Source-of-truth location — the real
TrainedModelclass ships in the published axonis-core wheel, not a tracked local branch. - REST vs federate split —
trainedmodelis absent from the restUSERSPACEmap, so the rich create/train logic executes only after federation propagation lands on a federate. MODEL_STATUSvsASSET_STATUS— a parallelSchema.ASSETtrained-artifact concept exists in the sameuserspaceindex; its relationship to TrainedModel is unconfirmed.- FATE result store-back — submission and status updates are confirmed, but where the trained federated artifact is pulled back from FATE and committed to Gitea was not fully traced.
Realized by: component.titan.runtime