Data model¶
This project is designed as a small, understandable knowledge base:
- Entities: things that exist (organizations, products, offers, tools, communities, etc.)
- Taxonomy terms: controlled vocabularies (niches, traffic sources, geo, feature flags, …)
- Relations: explicit links between records (who operates what, what belongs to what, etc.)
The model is intentionally generic and niche-agnostic so the dataset can grow beyond any single vertical or traffic source.
Source of truth: YAML¶
All source data lives under data/:
data/entities/<entity_type>/<id>.ymldata/taxonomies/<taxonomy>/<term_id>.ymldata/relations/<id>.yml
Each file contains a single record.
IDs and filenames¶
idis stable and globally unique inside its record kind.- Filenames must match the
idexactly (e.g.org_example.ymlcontainsid: org_example). - IDs use
snake_caseand must start with a letter:^[a-z][a-z0-9_]+$
Stability contract¶
The core structure is intended to stay stable:
- record shapes (
entity,taxonomy term,relation) - directory layout under
data/ - IDs (once published)
The vocabulary evolves over time and is additive:
- new
entity_typefolders - new taxonomies and taxonomy terms
- new relation predicates
We avoid breaking changes; if we must make one, we version the API.
Entity record (node)¶
Entities are generic. Their domain-specific details go into the properties object so the dataset can evolve without breaking changes.
Minimal example:
id: org_example_network
entity_type: organization
name: Example Network
summary: Affiliate network (example record).
status: active
websites:
- url: https://example.com
type: official
terms:
organization_role:
- org_role_affiliate_network
sources:
- url: https://example.com
retrieved_at: 2026-02-03
Common fields:
id(required)entity_type(required): directory name underdata/entities/name(required)summary: short human-readable descriptiondescription: longer Markdown descriptionwebsites,contacts,socialsterms: mapping of taxonomy → list of term IDsproperties: free-form object for domain-specific fieldssources: evidence linksstatus:active|inactive|unknown
Entity types (folders)¶
entity_type is not a fixed list. We use it primarily to keep the repository navigable.
It is normal for different entities to share the same human name while representing different things, for example:
- a product/brand (
entity_type: product) - an affiliate program for that product (
entity_type: affiliate_program)
Those should be connected via relations (e.g. predicate_program_for) rather than merged into one record.
Examples of entity types used in this repo:
organizationproductaffiliate_programaffiliate_networkreviewmetric
Taxonomy term record (controlled vocabulary)¶
Taxonomies keep data consistent and searchable.
id: niche_igaming
taxonomy: niche
label: iGaming
description: Online gambling and betting.
Relation record (edge)¶
Relations are explicit, sourceable links between two entities.
id: rel_example_network_operates_brand
predicate: predicate_operates
subject: org_example_network
object: org_example_brand
sources:
- url: https://example.com/about
retrieved_at: 2026-02-03
Build output (static API)¶
The build step compiles YAML into JSON endpoints under api/v1/ on the published site.
See docs/api.md for details.