Homi the pigeonHOMING
Homi the pigeon

For the EUR hackathon

How HOMING works
under the hood.

Four pipelines move a 90-second voice note into a real-life meet-up. Voice gets captured, language gets understood, a graph finds the people, and verification keeps everyone safe before details are shared.

or pressto advance

00 / Summary

From a 90-second voice note to a real meet-up

One slide. Four pipelines. Three providers. One graph. Everything below is a visual you can point at.

01

Voice in

MediaRecorder · ElevenLabs

audio
02

Make sense

Ollama · gpt-oss

topics
03

Find people

Graph DB · Cypher

candidates
04

Meet for real

iDIN · selfie

API + LLM calls

POST /api/transcribePOST /api/analyzePOST /api/suggestHOMING:clientElevenLabsScribe · v1Ollamagpt-oss:120bOllamagpt-oss:120b
json_object0-day retentionon-device fallback

Graph schema

:REQUIRES:SCHEDULED_AT:LIKES:AVAILABLE_AT:AVOID:Activity:Topic:TimeSlot:User
4 node types5 edge typesmulti-hop ready

Security & GDPR

Device

  • audio
  • Whisper
  • edits

Server

  • topics
  • verified

Never

  • ID doc
  • selfie
  • legal name
Privacy by design · GDPR Art 25
DPO from day one
DPIA before public launch
AI Act · limited-risk · Art 50

One source of truth. Every stage writes to the same graph and reads what the others wrote. Any one pipeline can be rewritten in isolation.

01 / Pipeline

Voice → Transcript

We treat the recording as ephemeral. Audio bytes live only as long as they need to.

1

Tap the mic

MediaRecorder spins up a capture stream from the browser's microphone permission. No native shell, no native SDK.

navigator.mediaDevices.getUserMedia()
2

Stream chunks to memory

Audio is encoded as Opus inside a WebM container. Chunks accumulate in a Blob — never written to disk.

MediaRecorder · audio/webm;codecs=opus
3

Stash the blob for handoff

When recording stops, the Blob is held in a module-level variable so the next page can read it without serialising through history state.

lib/audioStash.ts
4

POST to /api/transcribe

Multipart form-data upload to a Vercel function. The function forwards the audio to ElevenLabs Scribe and proxies the JSON back.

fetch · FormData · app/api/transcribe
5

Speech-to-text

ElevenLabs Scribe (scribe_v1) returns a transcript with word-level timestamps. We discard timestamps and keep the text.

api.elevenlabs.io/v1/speech-to-text
6

Transcript returned

UTF-8 string handed to the analysis pipeline. The audio Blob is dropped — never persisted server-side.

utf-8 · no audio retention

Production promise

In the production build, the transcription model runs locally via WebAssembly (Whisper-small). The demo routes to Scribe so we can ship today — same input, same output shape.

What never leaves the device

  • The raw audio Blob
  • Recording timestamps
  • Mic device identifier

02 / Pipeline

Transcript → Atomic topics

The model has one job: split what you said into separable interests so the graph can index each one independently.

1

Send transcript

JSON POST to /api/analyze. We also pass a demoMode flag that pads short transcripts with sensible context for short hackathon recordings.

POST /api/analyze
2

Call Ollama Cloud

OpenAI-compatible endpoint, gpt-oss:120b model. response_format=json_object so we never have to parse free text.

ollama.com/v1 · gpt-oss:120b
3

Atomic separation rule

The system prompt drills the model with explicit examples: 'cooking' and 'Korean food' are TWO topics, 'board games' and 'Catan' are TWO topics.

prompt-engineered atomicity
4

Structured response

Returns topics with explanations + tag arrays, plus minor interests, languages, activity types, and three concrete activity suggestions.

JSON · zod-validated client-side
System prompt (excerpt)
Return JSON. Split into ATOMIC topics —
each interest standing on its own:

  • "Cooking" and "Korean food" are TWO topics.
  • "Board games" and "Catan" are TWO topics
    ("Catan" is more specific).

For each topic emit:
  title, explanation, tags[]
Then propose 3 concrete one-off activities.
Sample response
{
  "topics": [
    {
      "title": "Catan",
      "explanation": "Wants a chill round again.",
      "tags": ["catan", "specific game"]
    },
    {
      "title": "Board games",
      "explanation": "Casual strategy, low-pressure.",
      "tags": ["board games", "strategy"]
    }
  ],
  "activities": [
    { "title": "Start a Catan round", ... }
  ]
}

03 / Pipeline

Topics → People

Interests are many-to-many, friendship is sparse, and avoid-pairs are private. That's a graph problem, not a SQL one.

Catan nightThu 19:30Photo walkSat 14:00Cook togetherFri 18:30catanboard gamesstrategyphotographywalkscookingKorean foodAAnnaBBramCClaraDDaanEEvaFFinnGGiaHHugoIIrisJJonasKKiraLLiamThursday eveSaturday 14:00Friday eve

Hover any node, click to pin, or pick a query. Real graph queries animate the same way under the hood — they walk these edges.

Hover or click a node to inspect its edges. The graph here has 25 nodes and 59 edges — pick any one to see who it knows.

Legend

Activity
Topic
User
TimeSlot
:REQUIRES — activity → topic
:LIKES — user → topic
:AVAILABLE_AT — user → time
:SCHEDULED_AT — activity → time
:AVOID — private user ↔ user
Cypher · match query
MATCH (a:Activity {id: $aid})-[:REQUIRES]->(t:Topic)
      <-[:LIKES]-(u:User)
WHERE u.id <> $creator
  AND any(l IN a.languages WHERE l IN u.languages)
  AND NOT (u)-[:AVOID]-(:User {id: $creator})
  AND (u)-[:AVAILABLE_AT]->(:TimeSlot {day: a.day})
RETURN u.id, count(DISTINCT t) AS score
ORDER BY score DESC
LIMIT 5;

Multi-hop without joins

Friend-of-friend pathing is one extra hop, not a recursive CTE.

Private edges, real privacy

:AVOID edges are user-scoped — never exposed in match output, just filtered.

Sparse and dense both fast

Most users like few topics; index lookups beat join planning.

04 / Boundaries

Where the data lives

We log only what makes the next match better. ID documents and selfies never touch our storage.

On-device

Raw audio recording

MediaRecorder Blob, dropped after upload

On device

Transcription model (prod)

WebAssembly Whisper in browser

On device

Topic edits

Local state until 'Looks right'

On device

Server

Atomic topics (text)

Graph :Topic nodes attached to your :User

Server

Activity records

Graph :Activity nodes with :REQUIRES edges

Server

Verified-true flag

One boolean on your :User node

Server

Never stored

ID document

Provider returns true/false only

Never stored

Selfie photo

Liveness check happens on-device

Never stored

Full legal name

We keep first name; nothing else

Never stored

05 / Principles

What HOMING refuses to do

Each refusal is a design choice. Together they're why this enables human connection instead of replacing or surveilling it — the BCG X brief made flesh.

Won't do

Profile browsing or swiping

Instead

Activity-first matching — you choose what to do, the graph finds people who said they want the same thing.

Won't do

Public popularity scores or 'social capital'

Instead

Match scores stay internal to the matching service. Users never rank or rate each other.

Won't do

An AI chatbot that replaces conversation

Instead

Homi can draft the first message; you read it, edit it, send it. The chat is between humans from line one.

Won't do

Notifying people that someone declined them

Instead

Declines are private and invisible to the declined side. No signal, no read-receipt, no shame.

Won't do

Engagement-time as the success metric

Instead

Success is the group meeting without us. The /group screen literally invites you to start a WhatsApp.

Won't do

Mixing 16-17 year olds with adults in the same pool

Instead

EUR pilot is 18-29 only. A younger track needs separate safeguarding before it ships.

The boundary isn't a feature list. It's the product.

06 / Compliance

GDPR and the AI Act, by design

The architecture above is also the compliance story. Privacy by design means we satisfy these obligations as a side-effect of how the system is built — not as a bolt-on.

GDPR

Art 6(1)(a)Explicit consent

Granular, revocable per data class — voice recording, topics, matching, availability are separate toggles.

Art 17Right to erasure

Deleting a :User node cascades :LIKES, :AVAILABLE_AT and :AVOID edges in one transaction.

Art 20Portability

JSON export of every node and edge the user authored, downloadable from settings.

Art 25Privacy by design

On-device transcription, anonymous-until-verified flow, no profile browsing.

AI Act

Risk tierLimited-risk system

Used to assist matching, not to grade or rank humans. Final action is always the user&apos;s.

Art 50Transparency

Every Homi suggestion is labeled. Users see the topics extracted and can edit them before anything is used.

ProviderZero data retention

LLM provider contract: no training on prompts, no log retention beyond 24h.

Bias auditQuarterly review

Match outcomes audited by sex / language / faculty cohort. Findings published.

Operations

RoleDPO from day one

Designated Data Protection Officer reports directly to the founders.

DPIABefore public launch

Data Protection Impact Assessment, reviewed with the EUR data office.

AuditAnnual security audit

Third-party penetration test + report; remediation tracked publicly.

ListPublic sub-processor list

Every vendor that touches data is named in the privacy policy with a 30-day change notice.

We classify as limited-risk under the AI Act because the model assists with matching but the final action — accept, decline, verify, meet — is always a human decision. We publish the DPIA, the bias audit, and the sub-processor list before opening enrollment.

07 / Delivery

€2M, 18 months, EUR-first

MVP at month 3. Closed EUR pilot at month 6. Three to four Dutch universities by month 18 — under budget and politically survivable.

€2.0M

Total budget

18mo

To 4-uni rollout

6 FTE

Eng · DPO · design

4 uni

Dutch network · M18

Budget · 18 months

Team · 4 eng + DPO + designer × 18mo€1.1M · 55%
Compliance · DPO, DPIA, legal, audits€250K · 13%
Infra · cloud, LLM, verification provider€150K · 8%
Pilot · launch, content, EUR partnership€200K · 10%
Reserve · contingency + edge-case scope€300K · 15%

Total

2.0M

Roadmap

M1–3

Production MVP

The flow you saw — voice in, topics, match, verify — hardened, monitored, documented. No public users yet.

M4–6

Closed EUR pilot · 50 students

Hand-picked across faculties. DPIA finalised. Weekly cadence. We measure outcomes, not screens.

M7–12

EUR open + audited

Public to all EUR students. DPIA + bias audit published. Security review by a third party.

M13–18

3–4 Dutch universities

TU Delft, Leiden, Utrecht, Wageningen. Same product, university-scoped graphs.

Built for the EUR hackathon

Activity-first. Profile never. Built to be deleted.

Next 15 · TS · Tailwind v4 · Ollama · ElevenLabs · graph DB