SOURCE REGISTRY
by Tobin Albanese
Source Registry is a backend governance layer for managing public-source collections across my Global Intelligence Platform, embedded in my SIGNALIS project. It controls which sources are active, how often they can be collected, when they should cool down, how failures are tracked, and how source reliability is measured. Instead of allowing collectors to run independently or repeatedly hit external APIs, the registry acts as a disciplined control plane for source management, collection intervals, API protection, and feed reliability.
Source Registry is the backend control layer that governs how external public sources are prepared for collection before any automation runs. Rather than allowing each collector to independently decide when to hit a feed, endpoint, or API, the registry creates a centralized decision point for source access. It manages source state across RSS feeds, GDELT, OFAC and sanctions data, news APIs, public records, and other external datasets that require controlled collection behavior.
The purpose of the module is to bring order to the collection pipeline. Each source is treated as a managed backend record with its own status, timing rules, failure history, cooldown behavior, and routing requirements. From my perspective, this matters because public-source automation can become unstable when collection logic is spread across separate scripts. Source Registry creates the structure needed to decide what is allowed to run, what should wait, and what should be temporarily blocked before the collector ever touches the source.
The module was needed because uncontrolled collection creates problems that become harder to fix as a platform scales. If every collector has its own hardcoded interval, retry logic, and source list, the system slowly becomes a scattered set of disconnected automation scripts. That may work during early prototyping, but it becomes risky once the platform depends on many source types.
Without a registry, collectors can make repeated API calls, collect duplicate records, retry failed endpoints too often, or continue pulling from sources that should be paused. There is also no centralized active or inactive control, which means disabling one source may require changing collector code directly. That creates operational friction. It also creates risk.
Source Registry was created to prevent Global Intel Hub from becoming a loose group of scrapers. The goal is to move collection control into a governed backend workflow where source intervals, source status, health state, and API protection rules are managed in one place.
The backend design treats each source as a managed record rather than a hardcoded target inside a collector. Each record stores the metadata the collection system needs to determine whether that source should run, how it should run, and how the platform should treat it over time.
A source record can include fields such as source_id, source_name, source_type, source_url, collector_type, active, collection_interval_minutes, last_collected_at, next_eligible_at, failure_count, cooldown_until, reliability_score, priority, region, and tags. These fields give the system enough context to handle different source types with different rules.
This matters because not every public source should be treated the same way. An RSS feed, a sanctions endpoint, a GDELT query, and a news API may all have different timing needs, rate-limit concerns, failure patterns, and value to the platform. Registry metadata allows Global Intel Hub to manage sources individually instead of applying one generic collection rule across the entire system.
The strongest part of the Source Registry design is collection eligibility. A route such as /collect/next should not randomly hit APIs or feeds just because a collector is available. Instead, it should ask the registry which source is currently eligible to run. That keeps collection disciplined and prevents automation from acting without source-level context.
A source should only be eligible when several conditions are met. active must be true. The collection interval must be due. cooldown_until must be empty or expired. failure_count must remain below the failure threshold. The collector_type must be supported by the platform. The source also cannot be blocked, inactive, delayed, or temporarily disabled by the registry. These checks create a gate before collection begins.
Once a source is selected, the collector router can send it to the correct collection path. An RSS source goes to the RSS collector. A GDELT source goes to the GDELT collector. An OFAC or sanctions source routes into the sanctions collection logic. A news API source routes into the API-based collector. This keeps routing clean while still allowing the registry to control source eligibility.
Cooldown behavior is also important. Failed requests increase failure_count, and repeated failures can trigger a temporary cooldown. That prevents broken endpoints from being retried aggressively. Successful runs can reset or reduce the failure state, update last_collected_at, and recalculate next_eligible_at based on the source’s configured interval. In practice, this protects APIs, reduces duplicate collection, and makes automation safer.
Reliability tracking gives the larger platform a way to understand source quality over time. At first, the reliability score does not need to be overly complex. It can begin as a practical source-health indicator based on whether a source is fetching successfully, staying fresh, producing too many duplicates, or failing too often.
Over time, reliability can become part of prioritization logic. Sources with stronger fetch rates, lower error frequency, better freshness, and higher usefulness to watchlist scoring can be prioritized over weaker sources. Less reliable sources do not need to be removed immediately, but they can be collected less often, placed under cooldown faster, or reviewed manually.
This creates a feedback loop between collection behavior and source quality. The platform is not just collecting data. It is learning which sources are stable enough to support ongoing workflows.
Inside Source Registry supports the larger collection and intelligence workflow by giving source management a formal backend structure. It connects directly to RSS collection, GDELT collection, OFAC and sanctions monitoring, public-source APIs, DuckDB storage, Supabase metadata, dashboard panels, watchlist scoring, analyst notes, and report generation.
The technical significance is that Source Registry turns public-source collection from loose automation into a governed backend workflow. Instead of allowing every collector to decide its own behavior, the registry centralizes source control, collection timing, failure handling, cooldown logic, routing, and reliability state.
That makes the platform easier to scale. It also makes it safer to automate. As Global Intel Hub grows, the value of Source Registry is not just that it organizes sources. It gives the system collection discipline, API protection, source visibility, and a backend governance layer that can support more sources without losing control over how those sources are collected.