How many concurrent submissions should the semaphore allow?

Size the semaphore to the funder's documented per-key request rate, not your CPU count. Submission is I/O-bound, so 8 to 16 concurrent calls saturate most portals without tripping HTTP 429. The semaphore width, not the worker count, is the real governor of load.

How do I keep large PDF attachments from exhausting memory during a deadline burst?

Never buffer binaries into the envelope. Carry attachments as URI references and stream them in fixed 8192-byte chunks, checking process RSS before each read and collecting garbage when usage crosses about 75 percent. That keeps steady-state queue memory flat regardless of file size, well inside a 512 MB container.

What happens when a submission fails its compliance check?

The compliance router assigns a CRITICAL severity and returns a REJECT action before any external API call, so a non-compliant package never reaches the funder. The decision is logged with an immutable timestamp for 2 CFR 200.334 retention and routed to Error Categorization and Logging rather than patched in place.

Building Async Batch Processors for Grant Submissions

Build an asyncio batch processor for nonprofit grant submissions: bounded queues, semaphore concurrency, a memory guard, strict Pydantic validation, compliance routing, and 2 CFR §200.302 audit logging.

This guide is part of the Async Batch Processing Pipelines section within the broader Data Ingestion & Grant Parsing Workflows framework, and it solves one narrow problem: how do you push hundreds of already-normalized grant packages to external funder portals concurrently, without exhausting memory, breaching a portal’s rate quota, or losing the audit trail a federal grant requires?

A synchronous upload loop falls over the moment a funder deadline lands: it serializes I/O-bound submissions, buffers attachment binaries into RAM, and offers no place to record why a submission paused or failed. The async batch processor replaces that loop with a bounded queue, a semaphore-gated worker pool, a strict validation gate, and a structured audit log — so every payload traverses a traceable path and a saturated portal degrades into deterministic backpressure rather than silent data loss.

When to Use This Approach

Build a dedicated async batch processor when all three of the following hold:

Submission volume exceeds manual capacity in a bounded window. Most of the year is idle; then a deadline produces 200–500 submissions in an hour. Size the pipeline for that burst, not the annual average.
Payloads are already normalized on the way in. This stage assumes a clean envelope. Canonical field translation belongs to Field Mapping & Normalization; spreadsheet budget reconciliation belongs to Excel Budget Template Sync; document binaries are extracted by PDF Grant Application Parsing. If those upstream stages have not run, do not build here — fix the input first.
The output drives a regulated artifact. Because each submission ultimately reconciles against 2 CFR §200.302 financial-management records and IRS Form 990 Schedule I grant reporting, a dropped or duplicated submission is a compliance event — not a transient glitch you can silently retry away.

Endpoint discovery, credential rotation, and quota negotiation are explicitly out of scope; those belong to API Polling & Rate Limiting. The batch layer assumes a reachable endpoint and owns only orchestration, concurrency, validation, and audit emission.

Step-by-Step Implementation

The reference implementation targets Python 3.11+ and uses asyncio for orchestration, pydantic v2 for the schema gate, aiohttp for transport, tenacity for backoff, and psutil for the memory guard. Install pinned versions first:

bash

pip install "pydantic==2.7.1" "aiohttp==3.9.5" "tenacity==8.3.0" "psutil==5.9.8"

Step 1: Model the work envelope and the bounded queue

Each submission is wrapped in an immutable envelope carrying a correlation ID and a UTC ingest timestamp, then enqueued behind an explicit maxsize. The bound is what converts an overload into backpressure instead of an out-of-memory crash: when producers outpace workers, put() blocks rather than letting the queue grow without limit.

python

import asyncio
import logging
import uuid
from datetime import datetime, timezone
from typing import Any, Dict, List

from pydantic import BaseModel, ConfigDict, Field

logger = logging.getLogger("grant.async_batch")


class GrantSubmissionPayload(BaseModel):
    model_config = ConfigDict(strict=True, extra="forbid")

    grant_id: str = Field(..., min_length=1)
    funder_portal: str
    submission_deadline: datetime
    metadata: Dict[str, Any]
    attachments: List[str] = Field(default_factory=list)  # URIs, never binaries
    correlation_id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    ingested_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))


class AsyncBatchController:
    def __init__(self, max_concurrent: int = 12, queue_depth: int = 500) -> None:
        self.semaphore = asyncio.Semaphore(max_concurrent)
        self.queue: asyncio.Queue[GrantSubmissionPayload] = asyncio.Queue(maxsize=queue_depth)
        self.backpressure_at = queue_depth * 0.85

    async def enqueue(self, payload: GrantSubmissionPayload) -> str:
        if self.queue.qsize() >= self.backpressure_at:
            logger.warning(
                "QUEUE_BACKPRESSURE",
                extra={"correlation_id": payload.correlation_id, "depth": self.queue.qsize()},
            )
        await self.queue.put(payload)  # blocks at maxsize — never silently drops
        logger.info("ENQUEUED", extra={"correlation_id": payload.correlation_id})
        return payload.correlation_id

max_concurrent is the real governor — set it to the funder’s documented per-key request rate (typically 8–16 for I/O-bound submission), not your CPU count. queue_depth should approximate a single deadline cohort so the 85% threshold gives the event loop time to drain before producers block. extra="forbid" and strict=True guarantee an unmapped alias that leaked from upstream surfaces as a hard error rather than slipping through.

Step 2: Gate concurrency and guard memory

Each worker runs inside the semaphore so concurrent funder calls never exceed the quota. Attachments are streamed by URI reference in fixed-size chunks; under memory pressure the guard collects garbage and yields rather than buffering a 40 MB PDF into the envelope.

python

import gc
from pathlib import Path
from typing import AsyncGenerator

import aiofiles
import psutil


class MemoryGuard:
    def __init__(self, rss_threshold_pct: float = 0.75) -> None:
        self.rss_threshold = rss_threshold_pct
        self.process = psutil.Process()

    def under_pressure(self) -> bool:
        rss = self.process.memory_info().rss
        return (rss / psutil.virtual_memory().total) > self.rss_threshold

    async def stream_attachment(self, path: Path) -> AsyncGenerator[bytes, None]:
        if self.under_pressure():
            logger.warning("MEMORY_PRESSURE", extra={"action": "gc_and_yield", "file": str(path)})
            gc.collect()
            await asyncio.sleep(0)  # yield to the event loop
        async with aiofiles.open(path, mode="rb") as fh:
            while chunk := await fh.read(8192):
                yield chunk

The 8192-byte read keeps per-attachment memory flat regardless of file size, and the RSS check at 0.75 caps steady-state usage well inside a 512 MB container. Never call blocking I/O inside a worker coroutine — one synchronous open() would serialize the entire batch.

Step 3: Validate the schema and route on compliance

Before any payload reaches a portal it passes a strict validation gate, then a deterministic router that maps funder-specific field rules to a REJECT/FORWARD decision. Critical flags reject the submission before an external call is made, which is what keeps a non-compliant package from ever touching the funder.

python

from enum import Enum
from pydantic import ValidationError


class ComplianceSeverity(str, Enum):
    CRITICAL = "CRITICAL"
    INFO = "INFO"


class ComplianceRouter:
    def __init__(self, funder_rules: Dict[str, Dict[str, Any]]) -> None:
        self.funder_rules = funder_rules

    def route(self, payload: GrantSubmissionPayload) -> Dict[str, Any]:
        rules = self.funder_rules.get(payload.funder_portal, {})
        flags: List[str] = []
        for field_name, constraint in rules.items():
            value = payload.metadata.get(field_name)
            if value is None:
                flags.append(f"MISSING_REQUIRED_FIELD:{field_name}")
            elif len(str(value)) > constraint.get("max_length", 1000):
                flags.append(f"TRUNCATION_RISK:{field_name}")

        severity = ComplianceSeverity.CRITICAL if flags else ComplianceSeverity.INFO
        decision = {
            "correlation_id": payload.correlation_id,
            "severity": severity.value,
            "flags": flags,
            "action": "REJECT" if flags else "FORWARD",
            "timestamp": datetime.now(timezone.utc).isoformat(),
        }
        logger.info("COMPLIANCE_DECISION", extra=decision)
        return decision


async def process_one(payload: GrantSubmissionPayload, ctrl: AsyncBatchController,
                      router: ComplianceRouter) -> Dict[str, Any]:
    async with ctrl.semaphore:
        try:
            GrantSubmissionPayload.model_validate(payload.model_dump())
        except ValidationError as exc:
            logger.error("VALIDATION_FAILURE",
                         extra={"correlation_id": payload.correlation_id, "errors": exc.errors()})
            return {"status": "INVALID", "correlation_id": payload.correlation_id}

        decision = router.route(payload)
        if decision["action"] == "REJECT":
            return {"status": "REJECTED", "flags": decision["flags"],
                    "correlation_id": payload.correlation_id}
        return {"status": "READY", "correlation_id": payload.correlation_id}

The router logs every decision with an immutable timestamp, which is the record 2 CFR §200.334 expects when an auditor reconstructs why a submission was held. Rejections route to Error Categorization & Logging rather than being patched in place.

Step 4: Submit with capped backoff and categorize failures

Validated, forwarded payloads reach the funder through a tenacity-governed client. Backoff is capped so a single slow portal cannot stall the cohort, and exhausted retries hand off to a typed error category rather than vanishing.

python

import aiohttp
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential


class ErrorCategory(str, Enum):
    SCHEMA_VIOLATION = "SCHEMA_VIOLATION"
    API_RATE_LIMIT = "API_RATE_LIMIT"
    NETWORK_TIMEOUT = "NETWORK_TIMEOUT"
    COMPLIANCE_BLOCK = "COMPLIANCE_BLOCK"


class FunderAPIClient:
    def __init__(self, base_url: str, api_key: str) -> None:
        self.base_url = base_url
        self.headers = {"Authorization": f"Bearer {api_key}",
                        "Content-Type": "application/json"}

    @retry(
        stop=stop_after_attempt(5),
        wait=wait_exponential(multiplier=1, min=2, max=30),
        retry=retry_if_exception_type((aiohttp.ClientResponseError, asyncio.TimeoutError)),
        reraise=True,
    )
    async def submit(self, payload: GrantSubmissionPayload) -> Dict[str, Any]:
        async with aiohttp.ClientSession() as session:
            async with session.post(
                f"{self.base_url}/v2/submissions",
                headers=self.headers,
                json=payload.model_dump(mode="json"),
                timeout=aiohttp.ClientTimeout(total=45),
            ) as response:
                response.raise_for_status()
                result = await response.json()
                logger.info("SUBMISSION_SUCCESS",
                            extra={"correlation_id": payload.correlation_id,
                                   "funder_status": result.get("status")})
                return result


def categorize(exc: Exception) -> ErrorCategory:
    """Map a terminal exception to a typed category — never swallow it."""
    if isinstance(exc, asyncio.TimeoutError):
        return ErrorCategory.NETWORK_TIMEOUT
    if isinstance(exc, aiohttp.ClientResponseError) and exc.status == 429:
        return ErrorCategory.API_RATE_LIMIT
    if isinstance(exc, ValidationError):
        return ErrorCategory.SCHEMA_VIOLATION
    return ErrorCategory.COMPLIANCE_BLOCK

wait_exponential(min=2, max=30) over five attempts bounds worst-case retry latency to roughly 90 seconds before terminal routing; reraise=True ensures the original ClientResponseError propagates after the final attempt instead of a RetryError masking it. A terminal API_RATE_LIMIT should pause the tenant, not loop forever — pair this client with the cross-domain Pipeline Fallback & Retry Logic policy so an exhausted cohort lands in a dead-letter queue.

Verification

Confirm the processor behaves deterministically with four checks:

A full queue applies backpressure, never a drop. Enqueue queue_depth + 1 payloads against a paused worker and assert the final put() blocks (and a QUEUE_BACKPRESSURE line is emitted) rather than raising or discarding the payload.
An invalid payload is rejected before any HTTP call. Feed process_one a payload whose metadata violates a funder rule and assert it returns {"status": "REJECTED"}, emits a COMPLIANCE_DECISION, and that FunderAPIClient.submit is never awaited.
Retries exhaust into a terminal raise. Drive five consecutive 429 responses and assert submit raises aiohttp.ClientResponseError (because reraise=True) and that categorize maps it to ErrorCategory.API_RATE_LIMIT.
Memory stays flat under load. Stream a 50 MB attachment and assert process RSS does not grow by more than a few MB — proof the 8192-byte chunking, not buffering, is in effect.

Every rejection and exhausted retry must leave a structured log line; a compliant pipeline never fails a submission silently. Ship those logs to a write-once tier so the trail satisfies the three-year retention period under 2 CFR §200.334.

python

result = await process_one(invalid_payload, controller, router)
assert result["status"] == "REJECTED"
assert categorize(aiohttp.ClientResponseError(None, None, status=429)) is ErrorCategory.API_RATE_LIMIT

Common Errors & Fixes

Error	Cause	Fix
`MemoryError` during a deadline burst	Attachment binaries buffered into the envelope or queue unbounded	Stream by URI with `MemoryGuard.stream_attachment`; cap `queue_depth` so `put()` applies backpressure.
Sustained `QUEUE_BACKPRESSURE`	Semaphore too narrow or the portal is throttling	Raise `max_concurrent` toward the funder’s quota; confirm endpoint latency via API Polling & Rate Limiting.
`ValidationError` / strict-mode reject	Unresolved alias or coerced type leaked from upstream	Fix the mapping in Field Mapping & Normalization; the batch layer must not patch field names.
`RetryError` masks the real failure	`reraise` left at its default `False`	Set `reraise=True` so the original `ClientResponseError`/`TimeoutError` surfaces for categorization.
Whole batch stalls on one slow funder	Blocking I/O inside a worker, or no backoff cap	Keep all I/O async; cap backoff with `wait_exponential(max=30)` so one portal cannot starve the cohort.
`401`/`403` buried in retries	Expired credential treated as transient	Do not retry auth failures — refresh the key; credential rotation is an API Polling & Rate Limiting concern.

Parent section: Async Batch Processing Pipelines
Upstream of a clean submission: Handling Rate Limits in Grant Portal APIs
Where rejections and exhausted retries go: Error Categorization & Logging
When a portal fails hard: Building a Fallback Routing System for Grant APIs

When to Use This Approach #

Step-by-Step Implementation #

Step 1: Model the work envelope and the bounded queue #

Step 2: Gate concurrency and guard memory #

Step 3: Validate the schema and route on compliance #

Step 4: Submit with capped backoff and categorize failures #

Verification #

Common Errors & Fixes #

Related #

When to Use This Approach

Step-by-Step Implementation

Step 1: Model the work envelope and the bounded queue

Step 2: Gate concurrency and guard memory

Step 3: Validate the schema and route on compliance

Step 4: Submit with capped backoff and categorize failures

Verification

Common Errors & Fixes

Related