The Enrichment Pipeline

What happens after upload beyond raw indexing, and how enrichment improves the user-facing source surfaces.

Ingestion gets a document into the system. Enrichment makes that document easier to understand, rank, and browse.

This distinction matters. A source can be technically present in MARCUS before it is richly interpretable to a human user. Enrichment is what helps bridge that gap.

Ingestion Versus Enrichment

It helps to separate two ideas:

Ingestion makes the source searchable.
Enrichment makes the source understandable and easier to manage.

Without ingestion, the document cannot participate in source review. Without enrichment, the document may still appear in answers, but users have less help interpreting what it is and how much weight it should carry.

Upload To Ready-State Flow

At a high level, the source pipeline looks like this:

Register the uploaded asset as a source.
Extract text and metadata from the file.
Break the content into passages.
Store search-ready passage records.
Verify indexed passage integrity.
Run enrichment passes such as summary, authority scoring, and concept extraction when configured.

The exact timing can vary depending on environment and queue configuration, but this is the practical flow users experience.

Stage 1: Source Registration

As soon as a file is accepted, MARCUS creates a source record and stores basic information about the upload.

At this stage, the document may show up in the project list even though it is not yet ready for source review. This is normal and often confuses first-time users.

Stage 2: Text And Metadata Extraction

MARCUS then tries to pull usable text and metadata from the file.

This is where document quality becomes very important:

a clean text PDF often works well
a scanned image or malformed export may extract poorly
missing metadata does not always break source review, but it does reduce interpretability

If later briefing content looks strange, this extraction stage is often where the problem began.

Stage 3: Passage Preparation

The extracted text is broken into smaller passages.

This is necessary because most questions are answered from part of a document, not from the entire file at once. Passage preparation lets MARCUS find the relevant section rather than simply pointing to a whole PDF and hoping the user can find the right paragraph.

Good passage preparation improves:

source-review precision
citation usefulness
answer specificity

Stage 4: Indexing

Each passage is transformed into a search-ready record and stored. Once this stage succeeds, the source can usually participate in search.

This is the practical meaning of a source becoming indexed or ready.

At this point:

the source may already be available in conversations
some enrichment fields may still be missing

That difference explains why a source can answer questions before every field in its briefing panel is visible.

Stage 5: Enrichment Passes

After indexing, MARCUS can run additional analysis to create source-level support material.

Enrichment can populate:

summary text
key points
tags and extracted concepts
document-type inference
authority explanation

These are the fields that make the briefing and Library surfaces usable rather than just searchable.

Why Enrichment Matters So Much For Humans

Two projects can both be searchable, but the one with richer enrichment is easier to audit and maintain because users can inspect source quality faster.

Enrichment helps answer questions like:

Did this document upload correctly?
Is this the kind of source I expected?
Does the summary match the document?
Does the authority level make sense?
Is this source likely to help answer the questions I care about?

Without enrichment, users can still search, but they have fewer shortcuts for evaluating library quality.

Why Enrichment May Lag Behind Searchability

Indexing and enrichment are not always completed at exactly the same moment. A source may become available for answers before every enrichment field is visible in the briefing.

This is normal because:

source review depends on passage availability
enrichment depends on additional analysis passes
those later passes may take extra time or run asynchronously

So "I can ask about it" and "I can fully inspect it" may happen in that order rather than simultaneously.

Common User Misunderstandings

Misunderstanding	Better interpretation
"The source is in the list, so it must already be fully usable."	Visible in the list and fully indexed are not always the same thing.
"The briefing is incomplete, so the source is broken."	The source may still support answers while some enrichment fields are pending.
"The summary looks odd, so the model is bad."	Odd briefings often begin with poor file quality or extraction problems.
"If a conversation can use the source, I do not need the briefing."	Briefings still help you judge whether the source is trustworthy and well-classified.

Operational Implications

If you want a healthy library, enrichment should be part of your review habit:

upload the source
wait for indexing
open the briefing
check whether the source looks right
ask a narrow test question

This is a much more reliable workflow than uploading many files and assuming that if no visible error appears, everything must be fine.

What Enrichment Cannot Do

Enrichment improves interpretability, but it does not automatically fix:

bad project boundaries
missing key sources
contradictory versions
poor local governance of the library

It is a quality amplifier, not a substitute for source curation.

One Useful Mental Model

Think of ingestion as "getting the document into the room" and enrichment as "putting a readable label on it, summarizing it, and telling you how much weight it likely deserves." Both matter, but they solve different problems.

Ingestion gets a document into the system. Enrichment makes that document easier to understand, rank, and browse.

This distinction matters. A source can be technically present in MARCUS before it is richly interpretable to a human user. Enrichment is what helps bridge that gap.

Ingestion Versus Enrichment

It helps to separate two ideas:

Ingestion makes the source searchable.
Enrichment makes the source understandable and easier to manage.

Upload To Ready-State Flow

At a high level, the source pipeline looks like this:

Register the uploaded asset as a source.
Extract text and metadata from the file.
Break the content into passages.
Store search-ready passage records.
Verify indexed passage integrity.
Run enrichment passes such as summary, authority scoring, and concept extraction when configured.

The exact timing can vary depending on environment and queue configuration, but this is the practical flow users experience.

Stage 1: Source Registration

As soon as a file is accepted, MARCUS creates a source record and stores basic information about the upload.

At this stage, the document may show up in the project list even though it is not yet ready for source review. This is normal and often confuses first-time users.

Stage 2: Text And Metadata Extraction

MARCUS then tries to pull usable text and metadata from the file.

This is where document quality becomes very important:

a clean text PDF often works well
a scanned image or malformed export may extract poorly
missing metadata does not always break source review, but it does reduce interpretability

If later briefing content looks strange, this extraction stage is often where the problem began.

Stage 3: Passage Preparation

The extracted text is broken into smaller passages.

Good passage preparation improves:

source-review precision
citation usefulness
answer specificity

Stage 4: Indexing

Each passage is transformed into a search-ready record and stored. Once this stage succeeds, the source can usually participate in search.

This is the practical meaning of a source becoming indexed or ready.

At this point:

the source may already be available in conversations
some enrichment fields may still be missing

That difference explains why a source can answer questions before every field in its briefing panel is visible.

Stage 5: Enrichment Passes

After indexing, MARCUS can run additional analysis to create source-level support material.

Enrichment can populate:

summary text
key points
tags and extracted concepts
document-type inference
authority explanation

These are the fields that make the briefing and Library surfaces usable rather than just searchable.

Why Enrichment Matters So Much For Humans

Two projects can both be searchable, but the one with richer enrichment is easier to audit and maintain because users can inspect source quality faster.

Enrichment helps answer questions like:

Did this document upload correctly?
Is this the kind of source I expected?
Does the summary match the document?
Does the authority level make sense?
Is this source likely to help answer the questions I care about?

Without enrichment, users can still search, but they have fewer shortcuts for evaluating library quality.

Why Enrichment May Lag Behind Searchability

Indexing and enrichment are not always completed at exactly the same moment. A source may become available for answers before every enrichment field is visible in the briefing.

This is normal because:

source review depends on passage availability
enrichment depends on additional analysis passes
those later passes may take extra time or run asynchronously

So "I can ask about it" and "I can fully inspect it" may happen in that order rather than simultaneously.

Common User Misunderstandings

Misunderstanding	Better interpretation
"The source is in the list, so it must already be fully usable."	Visible in the list and fully indexed are not always the same thing.
"The briefing is incomplete, so the source is broken."	The source may still support answers while some enrichment fields are pending.
"The summary looks odd, so the model is bad."	Odd briefings often begin with poor file quality or extraction problems.
"If a conversation can use the source, I do not need the briefing."	Briefings still help you judge whether the source is trustworthy and well-classified.

Operational Implications

If you want a healthy library, enrichment should be part of your review habit:

upload the source
wait for indexing
open the briefing
check whether the source looks right
ask a narrow test question

This is a much more reliable workflow than uploading many files and assuming that if no visible error appears, everything must be fine.

What Enrichment Cannot Do

Enrichment improves interpretability, but it does not automatically fix:

bad project boundaries
missing key sources
contradictory versions
poor local governance of the library

It is a quality amplifier, not a substitute for source curation.

The Enrichment Pipeline

Ingestion Versus Enrichment

Upload To Ready-State Flow

Stage 1: Source Registration

Stage 2: Text And Metadata Extraction

Stage 3: Passage Preparation

Stage 4: Indexing

Stage 5: Enrichment Passes

Why Enrichment Matters So Much For Humans

Why Enrichment May Lag Behind Searchability

Common User Misunderstandings

Operational Implications

What Enrichment Cannot Do

One Useful Mental Model

On this page

The Enrichment Pipeline

Ingestion Versus Enrichment

Upload To Ready-State Flow

Stage 1: Source Registration

Stage 2: Text And Metadata Extraction

Stage 3: Passage Preparation

Stage 4: Indexing

Stage 5: Enrichment Passes

Why Enrichment Matters So Much For Humans

Why Enrichment May Lag Behind Searchability

Common User Misunderstandings

Operational Implications

What Enrichment Cannot Do

One Useful Mental Model

On this page