Skip to main content

ALDashboard.docx_wrangling

apply_jinja2_highlights

def apply_jinja2_highlights(
document: Union[docx.document.Document,
str]) -> docx.document.Document

Split Jinja tags into separate runs and shade the tag body text.

The inner Jinja code is placed in its own run between the opening and closing delimiters so python-docx-template can still consume the rendered text safely. Nested if and %p if control blocks receive depth-based colors.

defragment_docx_runs

def defragment_docx_runs(
document: Union[docx.document.Document, str],
paragraph_numbers: Optional[Sequence[int]] = None
) -> Tuple[docx.document.Document, dict]

Merge text-only runs within target paragraphs.

Arguments

  • document - A loaded python-docx document or a path to a DOCX file.
  • paragraph_numbers - Optional paragraph indexes to limit defragmentation.

Returns

Tuple[docx.document.Document, dict]: The updated document and summary stats.

apply_docx_label_renames

def apply_docx_label_renames(document: docx.document.Document,
renames: Sequence[Dict[str, Any]]) -> int

Apply label find/replace operations across all paragraphs in the DOCX.

validate_docx_template_syntax

def validate_docx_template_syntax(
document: Union[docx.document.Document, str],
*,
suggestions: Sequence[Any] = (),
renames: Sequence[Dict[str, Any]] = (),
defragment_runs: bool = False) -> Dict[str, Any]

Validate the Jinja syntax of a DOCX after simulated edits are applied.

validate_docx_label_suggestions

def validate_docx_label_suggestions(
document: Union[docx.document.Document,
str], suggestions: Sequence[Any]) -> Dict[str, Any]

Run deterministic checks over model suggestions and the simulated output.

Arguments

  • document - A loaded python-docx document or a path to a DOCX file.
  • suggestions - Raw suggestion items returned by labeling helpers or models.

Returns

Dict[str, Any]: Per-suggestion validation results and aggregate counts.

review_flagged_docx_label_suggestions

def review_flagged_docx_label_suggestions(
document: Union[docx.document.Document, str],
suggestions: Sequence[Any],
deterministic_validation: Dict[str, Any],
*,
openai_client: Optional[Any] = None,
openai_api: Optional[str] = None,
openai_base_url: Optional[str] = None,
model: str = "gpt-5-mini",
max_output_tokens: Optional[int] = None) -> Dict[str, Any]

Ask an LLM to review only deterministic-validator flagged suggestions.

Arguments

  • document - A loaded python-docx document or a path to a DOCX file.
  • suggestions - Raw suggestion items returned by a model or API.
  • deterministic_validation - Validator output describing flagged suggestions.
  • openai_client - Optional initialized OpenAI client.
  • openai_api - Optional API key override.
  • openai_base_url - Optional OpenAI-compatible base URL override.
  • model - Model name to use for the review step.
  • max_output_tokens - Optional token limit for the review call.

Returns

Dict[str, Any]: Review status, any error text, and normalized review items.

review_docx_label_candidate_groups

def review_docx_label_candidate_groups(
candidate_groups: Sequence[Dict[str, Any]],
*,
openai_client: Optional[Any] = None,
openai_api: Optional[str] = None,
openai_base_url: Optional[str] = None,
model: str = "gpt-5-mini",
max_output_tokens: Optional[int] = None,
prompt_profile: str = DEFAULT_DOCX_PROMPT_PROFILE) -> Dict[str, Any]

Ask an LLM judge to choose the best candidate per ambiguous position.

Arguments

  • candidate_groups - Candidate groups requiring judge review.
  • openai_client - Optional initialized OpenAI client.
  • openai_api - Optional API key override.
  • openai_base_url - Optional OpenAI-compatible base URL override.
  • model - Model name to use for judge review.
  • max_output_tokens - Optional token limit for the judge call.
  • prompt_profile - Prompt profile name used for litigation-specific heuristics.

Returns

Dict[str, Any]: Judge review status and normalized review items.

aggregate_docx_label_suggestion_runs

def aggregate_docx_label_suggestion_runs(
document: Union[docx.document.Document, str],
suggestion_runs: Sequence[Dict[str, Any]],
*,
judge_model: Optional[str] = None,
openai_client: Optional[Any] = None,
openai_api: Optional[str] = None,
openai_base_url: Optional[str] = None,
judge_max_output_tokens: Optional[int] = None,
prompt_profile: str = DEFAULT_DOCX_PROMPT_PROFILE) -> Dict[str, Any]

Combine repeated suggestion runs into one ranked set with alternates.

Arguments

  • document - A loaded python-docx document or a path to a DOCX file.
  • suggestion_runs - Multiple labeling result sets to combine.
  • judge_model - Optional model override for ambiguous-group judging.
  • openai_client - Optional initialized OpenAI client for judge review.
  • openai_api - Optional API key override.
  • openai_base_url - Optional OpenAI-compatible base URL override.
  • judge_max_output_tokens - Optional token limit for judge calls.
  • prompt_profile - Prompt profile name used for ranking heuristics.

Returns

Dict[str, Any]: Aggregated suggestions, vote data, and judge metadata.

get_voted_docx_label_suggestions

def get_voted_docx_label_suggestions(
docx_path: str,
custom_people_names: Optional[List[Tuple[str, str]]] = None,
preferred_variable_names: Optional[Sequence[str]] = None,
openai_client: Optional[Any] = None,
openai_api: Optional[str] = None,
openai_base_url: Optional[str] = None,
model: str = "gpt-5-mini",
generator_models: Optional[Sequence[str]] = None,
judge_model: Optional[str] = None,
prompt_profile: str = DEFAULT_DOCX_PROMPT_PROFILE,
prompt_library_path: Optional[str] = None,
optional_context: Optional[str] = None,
custom_prompt: Optional[str] = None,
additional_instructions: Optional[str] = None,
max_output_tokens: Optional[int] = None,
judge_max_output_tokens: Optional[int] = None,
defragment_runs: bool = False) -> Dict[str, Any]

Run repeated generations and aggregate them into one ranked suggestion set.

Arguments

  • docx_path - Path to the DOCX file to label.
  • custom_people_names - Optional list of custom people-variable descriptions.
  • preferred_variable_names - Optional preferred variable names to bias prompts.
  • openai_client - Optional initialized OpenAI client.
  • openai_api - Optional API key override.
  • openai_base_url - Optional OpenAI-compatible base URL override.
  • model - Default generator model to use.
  • generator_models - Optional explicit sequence of generator models.
  • judge_model - Optional model override for ambiguous-group judging.
  • prompt_profile - Prompt profile name used for prompt/ranking heuristics.
  • prompt_library_path - Optional prompt library override path.
  • optional_context - Optional extra source context for the prompt.
  • custom_prompt - Optional full prompt override.
  • additional_instructions - Optional prompt suffix for extra guidance.
  • max_output_tokens - Optional token limit for generation calls.
  • judge_max_output_tokens - Optional token limit for judge calls.
  • defragment_runs - Whether to merge safe split runs before labeling.

Returns

Dict[str, Any]: Aggregated suggestions and generation metadata.

add_paragraph_after

def add_paragraph_after(paragraph: Any, text: str) -> None

Insert a new paragraph after an existing paragraph.

Arguments

  • paragraph - Existing paragraph that should receive a successor.
  • text - Text content for the inserted paragraph.

add_paragraph_before

def add_paragraph_before(paragraph: Any, text: str) -> None

Insert a new paragraph before an existing paragraph.

Arguments

  • paragraph - Existing paragraph that should receive a predecessor.
  • text - Text content for the inserted paragraph.

get_docx_run_text

def get_docx_run_text(document: Union[docx.document.Document, str],
paragraph_number: int, run_number: int) -> str

Get run text by unified paragraph index across body, tables, headers, and footers.

Arguments

  • document - A loaded python-docx document or a path to a DOCX file.
  • paragraph_number - Unified paragraph index in the flattened traversal.
  • run_number - Run index within the selected paragraph.

Returns

  • str - The run text, or an empty string when the coordinates are invalid.

get_docx_run_items

def get_docx_run_items(document: Union[docx.document.Document, str],
defragment_runs: bool = False) -> List[List[Any]]

Return [paragraph_index, run_index, run_text] across all document parts.

Arguments

  • document - A loaded python-docx document or a path to a DOCX file.
  • defragment_runs - Whether to merge safe split runs before traversal.

Returns

  • List[List[Any]] - Flattened run coordinates and text for the document.

update_docx

def update_docx(
document: Union[docx.document.Document, str],
modified_runs: List[Tuple[int, int, str, int]],
defragment_runs: bool = False,
apply_jinja_highlights: bool = False) -> docx.document.Document

Update the document with modified runs.

Arguments

  • document - The python-docx document object, or the path to the DOCX file.
  • modified_runs - Tuples of paragraph number, run number, modified text, and paragraph insertion indicator.
  • defragment_runs - Whether to merge safe split runs before applying edits.
  • apply_jinja_highlights - Whether to split and shade all Jinja tags after edits are applied.

Returns

  • docx.document.Document - The modified document.

get_labeled_docx_runs

def get_labeled_docx_runs(
docx_path: str,
custom_people_names: Optional[List[Tuple[str, str]]] = None,
preferred_variable_names: Optional[Sequence[str]] = None,
openai_client: Optional[Any] = None,
openai_api: Optional[str] = None,
openai_base_url: Optional[str] = None,
model: str = "gpt-5-mini",
prompt_profile: str = DEFAULT_DOCX_PROMPT_PROFILE,
prompt_library_path: Optional[str] = None,
optional_context: Optional[str] = None,
custom_prompt: Optional[str] = None,
additional_instructions: Optional[str] = None,
max_output_tokens: Optional[int] = None,
defragment_runs: bool = False) -> List[Tuple[int, int, str, int]]

Scan the DOCX and return a list of modified text with Jinja2 variable names inserted.

Arguments

  • docx_path - Path to the DOCX file.
  • custom_people_names - Optional list of custom (name, description) pairs.
  • preferred_variable_names - Optional preferred variable names to bias prompts.
  • openai_client - Optional preconfigured OpenAI client.
  • openai_api - Optional API key override.
  • openai_base_url - Optional OpenAI-compatible base URL override.
  • model - OpenAI model to use.
  • prompt_profile - Prompt profile name used to select prompt text and rules.
  • prompt_library_path - Optional prompt library override path.
  • optional_context - Optional extra source context included in the prompt.
  • custom_prompt - Optional full prompt override.
  • additional_instructions - Optional extra instructions appended to the prompt.
  • max_output_tokens - Optional token limit passed to the chat completion helper.
  • defragment_runs - Whether to merge safe split runs before labeling.

Returns

List[Tuple[int, int, str, int]]: Suggested DOCX run replacements.

modify_docx_with_openai_guesses

def modify_docx_with_openai_guesses(docx_path: str) -> docx.document.Document

Uses OpenAI to guess the variable names for a document and then modifies the document with the guesses.

Arguments

  • docx_path str - Path to the DOCX file to modify.

Returns

  • docx.Document - The modified document, ready to be saved to the same or a new path