Where CloudBroker Gets Its Numbers: Ingestion Across Multiple Clouds #

How it works: from multiple provider APIs to one database.

CloudBroker can't recommend what it doesn't know. Its database is filled by ingestion: connectors that call each provider's pricing or catalog API, pull instance types and hourly rates, and upsert into PostgreSQL. Each provider is different; the connectors do the dirty work. Result: one model — providers, regions, instance types, prices — comparable in EUR.

The hierarchy

Four entities form the core:

  • Provider — slug (e.g. aws, hetzner, scaleway) and type (hyperscaler, EU, regional).
  • Region — data center location with country code and EU flag (used for region constraints like "EU only").
  • InstanceType — name, vCPU, RAM, architecture, family. Unique per provider.
  • Price — hourly price for an instance type in a region. Original currency and EUR-normalized. Append-only; the engine uses the latest price per (instance_type, region).

Cost rates — EgressRate, StorageRate, PublicIpRate, OsLicenseRate (keyed by provider_slug) for TCO. When you provide data source and estimated egress, total cost includes egress, storage, public IP, OS license, and cross-AZ.

FxRate — lookup for USD→EUR (and others) so recommendations are apples-to-apples. A stub rate is seeded if missing.

Provider APIs → Connectors → Ingestion Service → PostgreSQL.
erDiagram
    Provider ||--o{ Region : "has"
    Provider ||--o{ InstanceType : "has"
    InstanceType ||--o{ Price : "has"
    Region ||--o{ Price : "scoped to"
    FxRate }o--|| Price : "EUR lookup"
    Provider ||--o{ EgressRate : "TCO"
    Provider ||--o{ StorageRate : "TCO"
    Provider ||--o{ PublicIpRate : "TCO"
    Provider ||--o{ OsLicenseRate : "TCO"

Ingestion in practice

You run ingestion via the CLI. Each provider has a dedicated connector; the Makefile wraps them:

make ingest-hetzner             # Hetzner Cloud API
make ingest-gcp                 # GCP Billing Catalog (requires ADC)
make ingest-aws                 # AWS EC2 Pricing API
make ingest-azure               # Azure Retail Prices (public API)
make ingest-scaleway            # Scaleway API
make ingest-digitalocean        # DigitalOcean API
make ingest-ovh                 # OVH API
make ingest-aruba               # Aruba Cloud (config-based, no API keys)
make ingest-upcloud             # UpCloud (requires UPCLOUD_USERNAME + UPCLOUD_PASSWORD)
make ingest-open-telekom-cloud  # Open Telekom Cloud (public API)
make ingest-exoscale            # Exoscale (config-based)
make ingest-ionos               # IONOS Cloud (config-based)
make ingest-gridscale           # gridscale (config-based)
make ingest-stackit             # STACKIT (config-based)
make ingest-elastx              # Elastx (config-based)
make ingest-cyso-cloud          # Cyso Cloud (config-based)
make ingest-seeweb              # Seeweb (requires SEEWEB_API_TOKEN)
make ingest-all                 # All of the above
make ingest-egress              # Egress rates (API + config)
make ingest-all-costs           # All cost rates (egress, storage, public IP, OS license)

See the ingestion commands → /examples#other-endpoints

Cost rate ingestion — For TCO-aware recommendations, ingest egress (and optionally storage, public IP, OS license). AWS, Azure, GCP use APIs; Hetzner, DO, Scaleway, OVH use YAML config. See docs/CONFIG_RATE_SOURCES.md for documentation links.

Ingestion is idempotent. Providers, regions, and instance types are upserted by unique keys. Prices are append-only — duplicate (instance_type, region, same hour) are skipped. You can rerun at any time; the recommendation engine always uses the last ingested snapshot. You control how often it runs (e.g. daily cron).

What's in scope for ingestion

Multiple providers; on-demand (and optionally spot) pricing. What's not in scope: no real-time tickers, no reserved-instance marketplaces in this baseline. The project focuses on comparable, hourly, normalized data so the recommendation API can rank options.

With this data in place, the next step is the interface: how a request becomes a ranked list — the recommendation API.

3 of 4 The recommendation API Constraints, scoring, explain block, and the real JSON response.

← All articles