Testing — Real-Service Integration Standard

How every Axonis repo tests: against real services, never mocks. A test that can't reach its dependency skips with a clear reason — it never silently substitutes a double. This is the platform-wide standard; axonis-core is the reference implementation (see #reference).

Ground rules

#REQ.no-mocks — No mocks. No unittest.mock, MagicMock, AsyncMock, @patch, patch.dict('sys.modules', …), or hand-rolled in-memory fakes. A test exercises the real service (Redis, Elasticsearch, Keycloak, MCP, the LLM) or it skips.
#REQ.skip-clean — Skip, never fake. When a service is unreachable, the test skips with an explicit reason (require_*() gate). A skip is honest; a silent double is not.
#REQ.fix-prod-not-test — Fix production, not the test. Mocks hide real bugs. When converting a test surfaces a production defect, fix it in production code; do not paper over it in the test.
Enter from the earliest real wire. Integration tests take serialized input shaped like real client traffic (JSON/CSV/YAML), not a hand-built domain object fed to the consumer (see the workspace CLAUDE.md "Testing" rule). The create-test skill encodes this.

LLM transport is Claude Code, not the Anthropic SDK

Tests that need an LLM use a Claude Code CLI adapter (ClaudeCliProvider) that shells out to claude -p --output-format json and conforms to the axonis.llm.client.Client.complete() interface. The CLI binary is on PATH and auth-bound to the developer's Claude session — no ANTHROPIC_API_KEY. Gate with require_claude_cli(). Expect ~5–15s per call (Sonnet via Claude Code's own auth).

Real-service test infrastructure

The pattern (reference impl in axonis-core/tests/):

#REQ.autosource-env — conftest auto-sources the dev env. conftest.py sources developers-environment/env/development.axonis.ai.env + tokens.env so tests use the same configuration surface as the running platform (platform.service-configuration). It provides per-test-namespaced redis_client and es_client / disposable_index fixtures.
require_*() gates (tests/_integration.py): require_redis(), require_es(), require_keycloak(), require_authenticated_user() (validates the real AUTHORIZATION token via Keycloak introspect, cached per-process), require_claude_cli(). Each skips cleanly when its dependency is unreachable.
Disposable indices. disposable_index_for(monkeypatch, alias) redirects a Schema alias to a fresh axonis-test-* index with production mapping JSON and auto-teardown — real ES, no dev keyspace pollution (e.g. ratelimit tests use Redis db 15).

Token refresh

If require_authenticated_user() skips with Keycloak introspect failed: Inactive Authentication Session, refresh the test tokens with utils/auth/refresh_test_tokens.py.

CI-gated destructive tests

Tests that mutate a real cluster are gated behind opt-in env so they skip locally by default and run in CI where the runner sets the flag:

Gate	Opt-in	Skips when
`require_k8s()`	`AXONIS_RUN_K8S_TESTS=true` (+ valid `KUBE_CONFIG`)	unset, kubeconfig missing, or cluster API unreachable
`require_airflow()`	gated on TCP reach + valid `AIRFLOW_HOST`	the Airflow port isn't listening

Local access to such services is via utils/forwarding/port_forward.py (e.g. an airflow-webserver forward to localhost:9443).

Accepted residue

A few tests may legitimately keep a unittest.mock import, documented per-file: - Auth tests that @patch an Authenticator.validate collaborator where real conversion would require bypassing Keycloak's roles/markings or running multiple KC users. - from unittest.mock import ANY used purely as a wildcard matcher (not a mock) for binary/opaque fields. The "done" check greps for mock usage and filters the documented residue out.

Acceptance check

# no mock usage outside the documented residue
grep -rEn "from unittest\.mock|MagicMock|@patch|AsyncMock|\bMock\(" tests/ \
  | grep -v __pycache__ | grep -vE "<documented-residue-files>"
# (no output)

# suite runs: every test passes or skips with a clear reason
python -m pytest tests/ -q

Reference implementation — axonis-core

axonis-core is the first repo fully converted to this standard. Concrete artifacts: tests/conftest.py, tests/_integration.py, tests/_claude_cli.py.

Converting its suite off mocks surfaced four production bugs that were fixed in production code (do not revert):

File	Bug	Fix
`axonis/redis/client.py`	`Client.delete()` passed a list to `hdel` (expects positional str) → `DataError`; hidden by a patched `hdel`.	`super().hdel(self.namespace, key)`
`axonis/memory/store.py`	`_get_redis()` ignored `REDIS_TLS`/`REDIS_VERIFY`/`REDIS_USERNAME` → silently failed against the TLS dev Redis.	added `ssl_kwargs` gated on `REDIS_TLS` + username lookup
`axonis/middleware/ratelimit.py`	same TLS/auth gap in `_get_redis()`.	same fix
`axonis/memory/service.py`	same TLS/auth gap in `Service._get_redis()`.	same fix

These bugs were invisible while the tests mocked Redis — the strongest argument for the no-mocks rule. New repos adopt the axonis-core/tests/ infrastructure as the template.

Depends on: platform.service-configuration