Working models, datasets, and tools. Most are in public; all are open to fork, critique, and conversation.

  1. 01

    BookBack — Reclaim the Commons

    Public-domain reclamation project
    • Indigenous Data
    • Data Justice
    • Provenance

    A project for restoring public-domain works to public access in the face of extractive scraping. Built around the principle that AI training data should honor provenance, not erase it.

    View on GitHub
  2. 02

    The Demographics of Faerûn

    D&D dataset for data-science education
    • Data Science Education
    • Tabletop Gaming
    • Synthetic Data

    A dynamic fictional dataset built on the Forgotten Realms setting, designed to make data-science pedagogy more engaging and immersive. Used in classroom contexts to teach analysis, visualization, and modeling.

    View on GitHub
  3. 03

    Psychedelic Trip Report LLM

    Large-language-model research tool
    • NLP
    • LLM
    • Health Research

    Built on 70,000 entries from the Erowid dataset, this tool uses large language models to assess subjective elements in psychedelic experiences. Designed with applications in synthetic drug discovery and qualitative health research.

    View on GitHub
  4. 04

    Video Game Review Analysis Tool

    Sentiment analysis & classification
    • NLP
    • Sentiment Analysis
    • Gaming

    A sentiment-analysis tool trained on 30,000 Steam reviews for Hades by Supergiant Games. Combines unsupervised learning and multi-class classification to surface patterns in player feedback.

    View on GitHub
  5. 05

    Personality & Psychedelic Use Analysis

    Behavioral data analysis
    • Statistics
    • Psychology
    • Drug Research

    An analysis of the correlation between the 'Openness to experience' personality factor and psychedelic drug use. Designed to aid researchers and companies in identifying participants for research studies.

    View on GitHub