Workspace Setup and Configuration
The workspace config is the single file that tells Gather Step what to index, how deeply to index it, and how to handle file-level scoping rules. Getting this right is the prerequisite for accurate cross-repo code graph results.
Workspace Contract
Section titled “Workspace Contract”Gather Step requires a workspace root directory that satisfies two conditions:
- It contains a
gather-step.config.yamlfile at the top level. - Every repo listed in that config is reachable from the config root through a relative path that stays inside that root.
Gather Step writes all generated state into .gather-step/ inside that same
root. Source repositories are never modified by indexing.
Minimal Config
Section titled “Minimal Config”The minimal working config lists at least one repo:
repos: - name: backend_standard path: repos/backend_standardA typical multi-repo starting point:
repos: - name: backend_standard path: repos/backend_standard - name: frontend_standard path: repos/frontend_standard - name: shared_contracts path: repos/shared_contractsindexing: workspace_concurrency: 4name is the stable logical identifier used in CLI output, MCP tool responses,
and --repo scoping flags. path is relative to the directory containing
gather-step.config.yaml.
Full Config Reference (at a Glance)
Section titled “Full Config Reference (at a Glance)”The table below covers the fields most commonly tuned in practice. For the complete schema with types and validation rules, see Configuration reference.
| Field | Type | Default / Required |
|---|---|---|
repos | array | required |
repos[].name | string | required — no path separators |
repos[].path | string | required — relative, inside config root |
repos[].depth | string | full |
allow_listed_repos | array of strings | none — index all |
github | object | optional |
jira | object | optional |
indexing.exclude | array of glob strings | none |
indexing.language_excludes | array of strings | none |
indexing.include_languages | array of strings | all supported |
indexing.include_dotfiles | bool | false |
indexing.min_file_size | integer (bytes) | none |
indexing.max_file_size | integer (bytes) | none |
indexing.workspace_concurrency | integer | system default |
The github and jira sections are part of the config schema. The primary
workflows, including local indexing, CLI queries, and MCP, do not require them.
Bootstrapping with init
Section titled “Bootstrapping with init”When starting from scratch, run init from the workspace root:
cd /path/to/workspacegather-step initIn an interactive terminal, pressing Enter accepts the default onboarding path: keep the selected repos, index now, generate AI context files, register local MCP settings, and leave watch mode off.
Repo picker
Section titled “Repo picker”init walks the workspace directory, discovers directories that contain a
.git folder, and opens a checkbox-style repo picker before writing
gather-step.config.yaml. It skips directories it should not traverse:
.git.gather-stepnode_modulesdisttarget
The picker is fully keyboard-driven:
↑/↓ move Space toggle Enter confirm a all n none q cancel ───────────────────────────────────────────────────────────────────────── [✓] backend_api [✓] frontend_app [ ] docs_site ← unchecked: stays out of the config [✓] shared_contracts [✓] worker_service [ ] internal_admin ... 4 of 6 selectedToggle a repo to include or exclude it; press a to select every discovered
repo, n to deselect everything, and Enter to commit. The selected set is
written straight to gather-step.config.yaml:
repos: - name: backend_api path: backend_api - name: frontend_app path: frontend_app - name: shared_contracts path: shared_contracts - name: worker_service path: worker_serviceindexing: workspace_concurrency: 1Each row in the picker maps one-to-one with a repos[] entry: the
checkbox state controls whether the repo is in the config, the directory
name becomes the logical name, and the path under the workspace root
becomes path. Adjust depth for large repos you want to scan
shallowly, add indexing scoping rules per repo, or rename name to
match a canonical service identifier.
If a config already exists, init uses it as the starting point. Existing repos
are preselected, removed repos stay unchecked, and repo-specific settings such
as name and depth are preserved for selected repos. Use --force only when
you intentionally want a fresh generated draft from repository discovery.
Removing a repo later
Section titled “Removing a repo later”Re-running gather-step init and unchecking a repo is the supported way
to drop it from the workspace. The next gather-step index notices the
config change, unregisters the repo, and purges its graph, search, and
metadata state — so the workspace stays consistent without any manual
cleanup.
For scripts or CI, pass flags explicitly instead of relying on prompts:
gather-step --workspace /path/to/workspace init \ --index \ --generate-ai-files \ --setup-mcp local \ --no-watchUse --no-index, --no-generate-ai-files, or --no-watch to make a scripted
setup return immediately after writing the config.
--generate-ai-files writes .agent-context/gather-step/ (and the on-demand
skills under .claude/skills/ / .agents/skills/ plus the
.claude/rules/gather-step-index.md pointer) only after an index exists because
the reference data is graph-backed. When you intentionally skip indexing, Gather
Step still writes CLAUDE.gather.md and AGENTS.gather.md, then prints a
warning with the follow-up command:
gather-step --workspace /path/to/workspace indexgather-step --workspace /path/to/workspace generate claude-md --target rulesGenerated State
Section titled “Generated State”After gather-step index followed by gather-step generate claude-md completes, the workspace looks like this:
/path/to/workspace/ gather-step.config.yaml .gather-step/ registry.json — workspace-level repo metadata and index state storage/ graph.redb — graph nodes and edges (redb store) search/ — Tantivy full-text and symbol search index metadata.sqlite — file hashes, dependencies, payload contracts, context pack records, watcher state .agent-context/ gather-step/ architecture.md — repo map, cross-repo deps, shared symbols, hotspots events.md — topics/queues/streams, producers, consumers, orphans routes.md — HTTP routes, handlers, callers repo-<NAME>.md — per-repo focus (only when generated with --repo) .claude/ rules/ gather-step-index.md — tiny pointer telling Claude Code to invoke the skill skills/ gather-step-context/ SKILL.md — on-demand skill body (Claude Code reads on trigger) .agents/ skills/ gather-step-context/ SKILL.md — on-demand skill body (Codex reads on trigger) CLAUDE.gather.md — registry-only summary, imported via the managed block in CLAUDE.md AGENTS.gather.md — registry-only summary, imported via the managed block in AGENTS.mdKey properties of this layout:
- Source repositories are never modified. Indexing writes only to
.gather-step/; AI context generation writes only to.agent-context/,.claude/,.agents/, and the two*.gather.mdsidecars at the root. - All graph-backed CLI and MCP commands read from
.gather-step/. If it is empty or absent, commands likestatus,doctor,trace, andservehave nothing to work from. Runindexfirst. - The graph store, search index, and metadata database are updated together. The storage coordinator maintains consistency across all three; partial writes are rolled back on failure.
.agent-context/gather-step/is reference data, not standing instructions. It is loaded on demand by the installed skill — never pulled into every Claude Code or Codex session — so the ~48 KB architecture file never burns context window space until the question actually calls for it.- Skill and pointer files are skip-if-exists. Re-running
gather-step generate claude-mdalways overwrites the data files in.agent-context/gather-step/but leaves user edits to skill prose intact.
Depth and Scoping
Section titled “Depth and Scoping”Depth levels
Section titled “Depth levels”Each repo entry accepts an optional depth field that controls how deeply
Gather Step parses the code structure inside that repo:
| Value | What it means |
|---|---|
level1 | Shallow — file-level and top-level symbol extraction only |
level2 | Module structure and direct call sites |
level3 | Cross-file resolution and framework-aware extraction |
full | Complete extraction including payload inference and semantic linking |
The default is full. Use a shallower depth for very large repos where you
only need coarse-grained signal, or to reduce indexing time during initial
exploration.
repos: - name: backend_standard path: repos/backend_standard depth: full - name: large_monorepo path: repos/large_monorepo depth: level2Per-command scoping with --repo
Section titled “Per-command scoping with --repo”Most CLI commands accept a --repo <name> flag to scope results to one
configured repo. This is useful when:
- the workspace contains many repos and you want focused output
- a target symbol name is ambiguous across repos
- you are tuning config for one repo without re-running the full workspace
gather-step --workspace /path/to/workspace status --repo backend_standardgather-step --workspace /path/to/workspace pack createOrder --mode planning --repo backend_standardThe --repo flag does not affect what is indexed — it only filters CLI output
for that command invocation.
Common Pitfalls
Section titled “Common Pitfalls”Absolute paths in path fields — All repos[].path values must be relative
to the config root. An absolute path like /home/user/projects/myrepo is
rejected at config load time.
Paths that escape the config root — A path like ../sibling_repo that
resolves outside the config root directory is also rejected. Every repo must be
physically inside or below the workspace root.
Names with path separators — Repo name values must not contain /, \,
or . path components. Use flat identifiers like backend_standard, not
services/backend.
Unknown YAML keys — The config parser uses strict validation. Unknown field names are rejected with a descriptive error. If you add a field that is not in the schema, indexing will not start until it is removed.
Running graph commands before indexing — Commands like trace, pack,
search, doctor, and serve all read from the indexed state in
.gather-step/. If the index does not exist yet, run gather-step index
first.
Clean / Compact / Reindex Cycle
Section titled “Clean / Compact / Reindex Cycle”compact
Section titled “compact”Compacts generated storage in place. Use this when the graph and metadata stores have grown after large reindexes or heavy watch-mode churn, but you do not want to delete and rebuild the index.
gather-step --workspace /path/to/workspace compactThis is the safe maintenance command for “compress the generated index”: it keeps the registry and indexed graph available while reclaiming storage pages where possible.
Removes all generated state under .gather-step/. Use this when you want to
discard the current index without immediately rebuilding it — for example, to
free disk space, or before handing off a workspace directory.
gather-step --workspace /path/to/workspace clean --yesThe --yes flag is required to skip the interactive confirmation prompt.
--json output also requires --yes so that automation cannot hang on an
interactive prompt.
reindex
Section titled “reindex”Deletes the current index state and then rebuilds it in one step. Equivalent
to clean --yes followed immediately by index. Use this when:
- a significant amount of code has changed and incremental indexing produced stale results
- you want a clean baseline before a benchmark or review session
- you changed the
gather-step.config.yamlin a way that affects which repos are tracked
gather-step --workspace /path/to/workspace reindexFor smaller code changes during normal development, prefer
gather-step watch (live incremental updates) or gather-step index (manual
incremental re-run) over a full reindex.
init --force
Section titled “init --force”Recreate gather-step.config.yaml from scratch on top of an existing
workspace. Use this when the config drifted (manual edits, merge artifacts)
or when you want to rerun repo discovery and overwrite the persisted
selection.
gather-step --workspace /path/to/workspace init --force--force only rewrites the config — it does not touch generated state.
Pair it with gather-step reindex when you also want to rebuild the
graph against the regenerated config:
gather-step --workspace /path/to/workspace init --forcegather-step --workspace /path/to/workspace reindexRepos that disappear from the regenerated config are automatically unregistered, and their graph, search, and metadata state is purged on the next index run — so the combined sequence above is a complete “start over” from any workspace state.
Next Steps
Section titled “Next Steps”- Getting started — run the full quickstart if you have not indexed yet.
- Operator workflows — use the graph once the index is ready.
- Configuration reference — complete field documentation with types and validation rules.