Skip to content

Data model

This project is designed as a small, understandable knowledge base:

  • Entities: things that exist (organizations, products, offers, tools, communities, etc.)
  • Taxonomy terms: controlled vocabularies (niches, traffic sources, geo, feature flags, …)
  • Relations: explicit links between records (who operates what, what belongs to what, etc.)

The model is intentionally generic and niche-agnostic so the dataset can grow beyond any single vertical or traffic source.

Source of truth: YAML

All source data lives under data/:

  • data/entities/<entity_type>/<id>.yml
  • data/taxonomies/<taxonomy>/<term_id>.yml
  • data/relations/<id>.yml

Each file contains a single record.

IDs and filenames

  • id is stable and globally unique inside its record kind.
  • Filenames must match the id exactly (e.g. org_example.yml contains id: org_example).
  • IDs use snake_case and must start with a letter: ^[a-z][a-z0-9_]+$

Stability contract

The core structure is intended to stay stable:

  • record shapes (entity, taxonomy term, relation)
  • directory layout under data/
  • IDs (once published)

The vocabulary evolves over time and is additive:

  • new entity_type folders
  • new taxonomies and taxonomy terms
  • new relation predicates

We avoid breaking changes; if we must make one, we version the API.

Entity record (node)

Entities are generic. Their domain-specific details go into the properties object so the dataset can evolve without breaking changes.

Minimal example:

id: org_example_network
entity_type: organization
name: Example Network
summary: Affiliate network (example record).
status: active
websites:
  - url: https://example.com
    type: official
terms:
  organization_role:
    - org_role_affiliate_network
sources:
  - url: https://example.com
    retrieved_at: 2026-02-03

Common fields:

  • id (required)
  • entity_type (required): directory name under data/entities/
  • name (required)
  • summary: short human-readable description
  • description: longer Markdown description
  • websites, contacts, socials
  • terms: mapping of taxonomy → list of term IDs
  • properties: free-form object for domain-specific fields
  • sources: evidence links
  • status: active | inactive | unknown

Entity types (folders)

entity_type is not a fixed list. We use it primarily to keep the repository navigable.

It is normal for different entities to share the same human name while representing different things, for example:

  • a product/brand (entity_type: product)
  • an affiliate program for that product (entity_type: affiliate_program)

Those should be connected via relations (e.g. predicate_program_for) rather than merged into one record.

Examples of entity types used in this repo:

  • organization
  • product
  • affiliate_program
  • affiliate_network
  • review
  • metric

Taxonomy term record (controlled vocabulary)

Taxonomies keep data consistent and searchable.

id: niche_igaming
taxonomy: niche
label: iGaming
description: Online gambling and betting.

Relation record (edge)

Relations are explicit, sourceable links between two entities.

id: rel_example_network_operates_brand
predicate: predicate_operates
subject: org_example_network
object: org_example_brand
sources:
  - url: https://example.com/about
    retrieved_at: 2026-02-03

Build output (static API)

The build step compiles YAML into JSON endpoints under api/v1/ on the published site.

See docs/api.md for details.