The Enrichment Pipeline
What happens after upload beyond raw indexing, and how enrichment improves the user-facing source surfaces.
Ingestion gets a document into the system. Enrichment makes that document easier to understand, rank, and browse.
This distinction matters. A source can be technically present in MARCUS before it is richly interpretable to a human user. Enrichment is what helps bridge that gap.
Ingestion Versus Enrichment
It helps to separate two ideas:
- Ingestion makes the source searchable.
- Enrichment makes the source understandable and easier to manage.
Without ingestion, the document cannot participate in source review. Without enrichment, the document may still appear in answers, but users have less help interpreting what it is and how much weight it should carry.
Upload To Ready-State Flow
At a high level, the source pipeline looks like this:
- Register the uploaded asset as a source.
- Extract text and metadata from the file.
- Break the content into passages.
- Store search-ready passage records.
- Verify indexed passage integrity.
- Run enrichment passes such as summary, authority scoring, and concept extraction when configured.
The exact timing can vary depending on environment and queue configuration, but this is the practical flow users experience.
Stage 1: Source Registration
As soon as a file is accepted, MARCUS creates a source record and stores basic information about the upload.
At this stage, the document may show up in the project list even though it is not yet ready for source review. This is normal and often confuses first-time users.
Stage 2: Text And Metadata Extraction
MARCUS then tries to pull usable text and metadata from the file.
This is where document quality becomes very important:
- a clean text PDF often works well
- a scanned image or malformed export may extract poorly
- missing metadata does not always break source review, but it does reduce interpretability
If later briefing content looks strange, this extraction stage is often where the problem began.
Stage 3: Passage Preparation
The extracted text is broken into smaller passages.
This is necessary because most questions are answered from part of a document, not from the entire file at once. Passage preparation lets MARCUS find the relevant section rather than simply pointing to a whole PDF and hoping the user can find the right paragraph.
Good passage preparation improves:
- source-review precision
- citation usefulness
- answer specificity
Stage 4: Indexing
Each passage is transformed into a search-ready record and stored. Once this stage succeeds, the source can usually participate in search.
This is the practical meaning of a source becoming indexed or ready.
At this point:
- the source may already be available in conversations
- some enrichment fields may still be missing
That difference explains why a source can answer questions before every field in its briefing panel is visible.
Stage 5: Enrichment Passes
After indexing, MARCUS can run additional analysis to create source-level support material.
Enrichment can populate:
- summary text
- key points
- tags and extracted concepts
- document-type inference
- authority explanation
These are the fields that make the briefing and Library surfaces usable rather than just searchable.
Why Enrichment Matters So Much For Humans
Two projects can both be searchable, but the one with richer enrichment is easier to audit and maintain because users can inspect source quality faster.
Enrichment helps answer questions like:
- Did this document upload correctly?
- Is this the kind of source I expected?
- Does the summary match the document?
- Does the authority level make sense?
- Is this source likely to help answer the questions I care about?
Without enrichment, users can still search, but they have fewer shortcuts for evaluating library quality.
Why Enrichment May Lag Behind Searchability
Indexing and enrichment are not always completed at exactly the same moment. A source may become available for answers before every enrichment field is visible in the briefing.
This is normal because:
- source review depends on passage availability
- enrichment depends on additional analysis passes
- those later passes may take extra time or run asynchronously
So "I can ask about it" and "I can fully inspect it" may happen in that order rather than simultaneously.
Common User Misunderstandings
| Misunderstanding | Better interpretation |
|---|---|
| "The source is in the list, so it must already be fully usable." | Visible in the list and fully indexed are not always the same thing. |
| "The briefing is incomplete, so the source is broken." | The source may still support answers while some enrichment fields are pending. |
| "The summary looks odd, so the model is bad." | Odd briefings often begin with poor file quality or extraction problems. |
| "If a conversation can use the source, I do not need the briefing." | Briefings still help you judge whether the source is trustworthy and well-classified. |
Operational Implications
If you want a healthy library, enrichment should be part of your review habit:
- upload the source
- wait for indexing
- open the briefing
- check whether the source looks right
- ask a narrow test question
This is a much more reliable workflow than uploading many files and assuming that if no visible error appears, everything must be fine.
What Enrichment Cannot Do
Enrichment improves interpretability, but it does not automatically fix:
- bad project boundaries
- missing key sources
- contradictory versions
- poor local governance of the library
It is a quality amplifier, not a substitute for source curation.
One Useful Mental Model
Think of ingestion as "getting the document into the room" and enrichment as "putting a readable label on it, summarizing it, and telling you how much weight it likely deserves." Both matter, but they solve different problems.