Skip to content
Guides DOCS

Workspace Setup and Configuration

The workspace config is the single file that tells Gather Step what to index, how deeply to index it, and how to handle file-level scoping rules. Getting this right is the prerequisite for accurate cross-repo code graph results.

Gather Step requires a workspace root directory that satisfies two conditions:

  1. It contains a gather-step.config.yaml file at the top level.
  2. Every repo listed in that config is reachable from the config root through a relative path that stays inside that root.

Gather Step writes all generated state into .gather-step/ inside that same root. Source repositories are never modified by indexing.

The minimal working config lists at least one repo:

repos:
- name: backend_standard
path: repos/backend_standard

A typical multi-repo starting point:

repos:
- name: backend_standard
path: repos/backend_standard
- name: frontend_standard
path: repos/frontend_standard
- name: shared_contracts
path: repos/shared_contracts
indexing:
workspace_concurrency: 4

name is the stable logical identifier used in CLI output, MCP tool responses, and --repo scoping flags. path is relative to the directory containing gather-step.config.yaml.

The table below covers the fields most commonly tuned in practice. For the complete schema with types and validation rules, see Configuration reference.

FieldTypeDefault / Required
reposarrayrequired
repos[].namestringrequired — no path separators
repos[].pathstringrequired — relative, inside config root
repos[].depthstringfull
allow_listed_reposarray of stringsnone — index all
githubobjectoptional
jiraobjectoptional
indexing.excludearray of glob stringsnone
indexing.language_excludesarray of stringsnone
indexing.include_languagesarray of stringsall supported
indexing.include_dotfilesboolfalse
indexing.min_file_sizeinteger (bytes)none
indexing.max_file_sizeinteger (bytes)none
indexing.workspace_concurrencyintegersystem default

The github and jira sections are part of the config schema. The primary workflows, including local indexing, CLI queries, and MCP, do not require them.

When starting from scratch, run init from the workspace root:

Terminal window
cd /path/to/workspace
gather-step init

In an interactive terminal, pressing Enter accepts the default onboarding path: keep the selected repos, index now, generate AI context files, register local MCP settings, and leave watch mode off.

init walks the workspace directory, discovers directories that contain a .git folder, and opens a checkbox-style repo picker before writing gather-step.config.yaml. It skips directories it should not traverse:

  • .git
  • .gather-step
  • node_modules
  • dist
  • target

The picker is fully keyboard-driven:

↑/↓ move Space toggle Enter confirm a all n none q cancel
─────────────────────────────────────────────────────────────────────────
[✓] backend_api
[✓] frontend_app
[ ] docs_site ← unchecked: stays out of the config
[✓] shared_contracts
[✓] worker_service
[ ] internal_admin
...
4 of 6 selected

Toggle a repo to include or exclude it; press a to select every discovered repo, n to deselect everything, and Enter to commit. The selected set is written straight to gather-step.config.yaml:

repos:
- name: backend_api
path: backend_api
- name: frontend_app
path: frontend_app
- name: shared_contracts
path: shared_contracts
- name: worker_service
path: worker_service
indexing:
workspace_concurrency: 1

Each row in the picker maps one-to-one with a repos[] entry: the checkbox state controls whether the repo is in the config, the directory name becomes the logical name, and the path under the workspace root becomes path. Adjust depth for large repos you want to scan shallowly, add indexing scoping rules per repo, or rename name to match a canonical service identifier.

If a config already exists, init uses it as the starting point. Existing repos are preselected, removed repos stay unchecked, and repo-specific settings such as name and depth are preserved for selected repos. Use --force only when you intentionally want a fresh generated draft from repository discovery.

Re-running gather-step init and unchecking a repo is the supported way to drop it from the workspace. The next gather-step index notices the config change, unregisters the repo, and purges its graph, search, and metadata state — so the workspace stays consistent without any manual cleanup.

For scripts or CI, pass flags explicitly instead of relying on prompts:

Terminal window
gather-step --workspace /path/to/workspace init \
--index \
--generate-ai-files \
--setup-mcp local \
--no-watch

Use --no-index, --no-generate-ai-files, or --no-watch to make a scripted setup return immediately after writing the config.

--generate-ai-files writes .agent-context/gather-step/ (and the on-demand skills under .claude/skills/ / .agents/skills/ plus the .claude/rules/gather-step-index.md pointer) only after an index exists because the reference data is graph-backed. When you intentionally skip indexing, Gather Step still writes CLAUDE.gather.md and AGENTS.gather.md, then prints a warning with the follow-up command:

Terminal window
gather-step --workspace /path/to/workspace index
gather-step --workspace /path/to/workspace generate claude-md --target rules

After gather-step index followed by gather-step generate claude-md completes, the workspace looks like this:

/path/to/workspace/
gather-step.config.yaml
.gather-step/
registry.json — workspace-level repo metadata and index state
storage/
graph.redb — graph nodes and edges (redb store)
search/ — Tantivy full-text and symbol search index
metadata.sqlite — file hashes, dependencies, payload contracts,
context pack records, watcher state
.agent-context/
gather-step/
architecture.md — repo map, cross-repo deps, shared symbols, hotspots
events.md — topics/queues/streams, producers, consumers, orphans
routes.md — HTTP routes, handlers, callers
repo-<NAME>.md — per-repo focus (only when generated with --repo)
.claude/
rules/
gather-step-index.md — tiny pointer telling Claude Code to invoke the skill
skills/
gather-step-context/
SKILL.md — on-demand skill body (Claude Code reads on trigger)
.agents/
skills/
gather-step-context/
SKILL.md — on-demand skill body (Codex reads on trigger)
CLAUDE.gather.md — registry-only summary, imported via the managed
block in CLAUDE.md
AGENTS.gather.md — registry-only summary, imported via the managed
block in AGENTS.md

Key properties of this layout:

  • Source repositories are never modified. Indexing writes only to .gather-step/; AI context generation writes only to .agent-context/, .claude/, .agents/, and the two *.gather.md sidecars at the root.
  • All graph-backed CLI and MCP commands read from .gather-step/. If it is empty or absent, commands like status, doctor, trace, and serve have nothing to work from. Run index first.
  • The graph store, search index, and metadata database are updated together. The storage coordinator maintains consistency across all three; partial writes are rolled back on failure.
  • .agent-context/gather-step/ is reference data, not standing instructions. It is loaded on demand by the installed skill — never pulled into every Claude Code or Codex session — so the ~48 KB architecture file never burns context window space until the question actually calls for it.
  • Skill and pointer files are skip-if-exists. Re-running gather-step generate claude-md always overwrites the data files in .agent-context/gather-step/ but leaves user edits to skill prose intact.

Each repo entry accepts an optional depth field that controls how deeply Gather Step parses the code structure inside that repo:

ValueWhat it means
level1Shallow — file-level and top-level symbol extraction only
level2Module structure and direct call sites
level3Cross-file resolution and framework-aware extraction
fullComplete extraction including payload inference and semantic linking

The default is full. Use a shallower depth for very large repos where you only need coarse-grained signal, or to reduce indexing time during initial exploration.

repos:
- name: backend_standard
path: repos/backend_standard
depth: full
- name: large_monorepo
path: repos/large_monorepo
depth: level2

Most CLI commands accept a --repo <name> flag to scope results to one configured repo. This is useful when:

  • the workspace contains many repos and you want focused output
  • a target symbol name is ambiguous across repos
  • you are tuning config for one repo without re-running the full workspace
Terminal window
gather-step --workspace /path/to/workspace status --repo backend_standard
gather-step --workspace /path/to/workspace pack createOrder --mode planning --repo backend_standard

The --repo flag does not affect what is indexed — it only filters CLI output for that command invocation.

Absolute paths in path fields — All repos[].path values must be relative to the config root. An absolute path like /home/user/projects/myrepo is rejected at config load time.

Paths that escape the config root — A path like ../sibling_repo that resolves outside the config root directory is also rejected. Every repo must be physically inside or below the workspace root.

Names with path separators — Repo name values must not contain /, \, or . path components. Use flat identifiers like backend_standard, not services/backend.

Unknown YAML keys — The config parser uses strict validation. Unknown field names are rejected with a descriptive error. If you add a field that is not in the schema, indexing will not start until it is removed.

Running graph commands before indexing — Commands like trace, pack, search, doctor, and serve all read from the indexed state in .gather-step/. If the index does not exist yet, run gather-step index first.

Compacts generated storage in place. Use this when the graph and metadata stores have grown after large reindexes or heavy watch-mode churn, but you do not want to delete and rebuild the index.

Terminal window
gather-step --workspace /path/to/workspace compact

This is the safe maintenance command for “compress the generated index”: it keeps the registry and indexed graph available while reclaiming storage pages where possible.

Removes all generated state under .gather-step/. Use this when you want to discard the current index without immediately rebuilding it — for example, to free disk space, or before handing off a workspace directory.

Terminal window
gather-step --workspace /path/to/workspace clean --yes

The --yes flag is required to skip the interactive confirmation prompt. --json output also requires --yes so that automation cannot hang on an interactive prompt.

Deletes the current index state and then rebuilds it in one step. Equivalent to clean --yes followed immediately by index. Use this when:

  • a significant amount of code has changed and incremental indexing produced stale results
  • you want a clean baseline before a benchmark or review session
  • you changed the gather-step.config.yaml in a way that affects which repos are tracked
Terminal window
gather-step --workspace /path/to/workspace reindex

For smaller code changes during normal development, prefer gather-step watch (live incremental updates) or gather-step index (manual incremental re-run) over a full reindex.

Recreate gather-step.config.yaml from scratch on top of an existing workspace. Use this when the config drifted (manual edits, merge artifacts) or when you want to rerun repo discovery and overwrite the persisted selection.

Terminal window
gather-step --workspace /path/to/workspace init --force

--force only rewrites the config — it does not touch generated state. Pair it with gather-step reindex when you also want to rebuild the graph against the regenerated config:

Terminal window
gather-step --workspace /path/to/workspace init --force
gather-step --workspace /path/to/workspace reindex

Repos that disappear from the regenerated config are automatically unregistered, and their graph, search, and metadata state is purged on the next index run — so the combined sequence above is a complete “start over” from any workspace state.