For Job Seekers

Stay updated and unlock new opportunities in growing life science startups or SME's.

Unlock opportunities

For Employers

Sustainable recruiting doesn't have to be obnoxiously expensive. Period!

Check Our Services & Fees

Jobs search

Lead Data Engineer

Knowledge Gate Group • Full-time • København og omegn, DK • 16h ago

Why we're hiring a Lead Data Engineer

We're building the expert intelligence layer for scientific research: a knowledge graph that connects the world to leading experts based on publications & clinical trials in precise ontologies. You'll design pipelines that ingest millions of life-science records, shaping a graph of how scientific knowledge is modelled, enriched, & served.

This is true green-fields work. Your decisions will lay the data foundations for our entire expert intelligence platform.

What You'll Do

You will be working at the intersection of science, data engineering & AI to build expert intelligence.

Own data end-to-end, design & run data pipelines turning millions of scientific records into a knowledge graph.
Implement precision entity resolution & enrichment, disambiguate & enrich experts from noisy data sources.
Utilise LLM workflows where it makes sense, for entity extraction, relationship inference & quality validation
Develop vector embeddings & semantic search capabilities to power expert discovery & similarity matching.
Model life-science entities & relationships, ontologies, author networks, publication & clinical trial metadata.
Build graph & vector data access, performant, accessible, reliable, observable & testable data access.
Move fast & ship value incrementally, done-and-iterating beats perfect-and-pending.
Radiate intent & document your thinking openly, collaborating async-first in a hybrid environment
Lead when you're the expert, follow when someone else is, challenging assumptions when necessary
Use AI as a daily force multiplier across coding, schema design, debugging, optimisation & validation.
Destroy your colleagues at Geoguessr (optional but strongly encouraged).

What You'll Need

Technical Skills

Graph Databases: Neo4j, ArangoDB, Neptune; schema design, relationship modelling, query optimisation.
Python Data Engineering: ETL development; pandas/polars; distributed processing with Spark or Dask.
Entity Resolution: Deduplication, merging, enrichment across heterogeneous scientific data sources.
AI-Assisted Data Extraction: LLM entity extraction, schema generation & quality validation.
Vector Search: Experience with Pinecone, FAISS, Qdrant, or Weaviate; embeddings, hybrid retrieval.
Workflow Orchestration: Robust, observable pipelines using Airflow or Dagster.
Data Formats & Standards: Parquet, JSONL, RDF/Turtle; selecting formats for graph & semantic use cases.
Embedding Models: Understanding of HuggingFace/OpenAI models, dimensionality tradeoffs & cost.

Executive Skills

Ownership mindset: Treat data & schemas as products powering multiple domains.
Strategic evaluation: Choose tech aligned with our scale, latency expectations, & roadmap needs.
Process engineering: Build reliable, repeatable & maintainable workflows.
Cross-functional communication: Bridge product engineers & scientific domain teams.
Comfort with scientific data realities: Deep rabbit holes of sprawling complexity.

Strong Bonus

Life Sciences familiarity: Publication, clinical trial, institutional, ontologies (MeSH, SNOMED, Gene Ontology).
Hands-on with scientific datasets: OpenAlex, PubMed/MEDLINE, ORCID, Semantic Scholar, ClinicalTrials.gov

Why You Might Hate It Here

You want predictability & routine.
You dislike documenting or sharing your thinking openly.
You see AI as a threat rather than an amplifier.
You're looking for a "safe" corporate environment - we're not that.

We mean this sincerely: if those points do not work, you'll be happier elsewhere.

Why You'll Love Working Here

Real Autonomy: You'll own outcomes, not tickets. This is your domain - you'll define data strategy.
Greenfield Opportunity: Build the from scratch. Your decisions shape our data capabilities for years.
Mission That Matters: Your work directly enables research - accelerating scientific breakthroughs.
AI-First Culture: We use AI as a creative & operational partner across every function.
High Impact: Every domain depends on what you build. Expert coverage directly drives our success.

Success Metrics (6-month target)

Expert Coverage: Knowledge graph spans 1+ million experts with rich profile data & relationships.
AI & Platform Enablement: AI & other domains consuming knowledge graph insights.