Working Workshop on Benchmarks for Human Flourishing with AI
Workshop dates: September 17-18, 2025
Deadline for application : August 1, 2025
Comprehensive Description
The "Benchmarks for Human Flourishing with AI" initiative is a collaborative effort organized by the MIT Media Lab's Advancing Humans with AI (AHA) research program and collaborators. This action-oriented workshop prioritizes working group collaboration and tangible deliverables over traditional presentations, with the goal of developing rigorous assessment frameworks that measure how AI systems contribute to human flourishing across six key dimensions:
- Comprehension & Agency: Measuring how AI systems improve human understanding while preserving autonomy and decision-making capacity.
- Curiosity & Learning: Assessing how AI supports continuous learning, intellectual growth, and curiosity-driven exploration.
- Creativity & Expression: Evaluating how AI enhances human creative processes and expressive capabilities rather than replacing them.
- Physical & Mental Wellbeing: Measuring AI's impact on health outcomes, stress reduction, and overall mental wellness.
- Healthy Social Lives: Assessing how AI affects social connections, community engagement, and relationship quality.
- Sense of Purpose: Evaluating how AI supports meaningful goal pursuit, personal values alignment, and life satisfaction.
Methodological Approaches for Human-AI Flourishing Evaluation
Our workshop will explore three complementary methodological approaches to develop robust benchmarks for evaluating AI's impact on human flourishing:
Interactive Human-AI Behavior Classification
This approach involves systematic classification and analysis of interaction patterns between humans and AI systems. Key elements include:
- Development of taxonomies for classifying both human and AI behaviors during interaction
- Real-time monitoring of interaction dynamics to identify patterns associated with positive and negative flourishing outcomes
- Multi-modal analysis incorporating linguistic, behavioral, and physiological signals
- Creation of interaction profiles that correlate specific behavioral patterns with flourishing dimensions
- Deployment of observational studies in natural environments to capture authentic interaction patterns
Randomized Controlled Trials (RCTs)
RCTs provide the gold standard for establishing causal relationships between AI use and human flourishing outcomes:
- Design of controlled experiments with random assignment to different AI interaction conditions
- Longitudinal assessment tracking changes in flourishing metrics over extended periods
- Inclusion of diverse participant populations to ensure generalizable findings
- Implementation of validated psychological and behavioral measures aligned with flourishing dimensions
- Analysis of differential impacts based on individual differences, context, and use patterns
Human-AI Interaction Simulation
This approach uses controlled simulations to anticipate potential flourishing impacts:
- Development of computational models that simulate human-AI interactions across various scenarios
- Creation of synthetic interaction datasets to test hypotheses before human implementation
- Use of agent-based modeling to explore emergent patterns in human-AI ecosystems
- Incorporation of validated psychological models of human behavior and cognitive processing
- Iterative refinement through comparison with real-world interaction data
By integrating these three methodological approaches, our benchmark development will benefit from both the ecological validity of real-world observation, the causal precision of experimental designs, and the exploratory potential of simulation. Each working group will consider how these approaches can be adapted to their specific flourishing dimension, potentially developing hybrid methodologies that leverage the strengths of multiple approaches.
Two-Day Workshop Plan
Pre-Workshop Requirements
Each working group must meet virtually 2-3 weeks before the workshop to:
- Review relevant literature in their assigned dimension
- Draft preliminary benchmark proposals and assessment methodologies
- Identify key challenges and research gaps
- Prepare a brief (3-slide) overview of their approach for Day 1
- Submit a one-page summary document for distribution to all participants
Day 1: Working Groups & Framework Development
8:30 - 9:00 AM: Registration & Welcome Coffee
9:00 - 9:30 AM: Opening Introduction
- Brief overview of initiative goals and expected outcomes
- Introduction to working group structure and process
- Review of pre-workshop preparation outcomes
9:30 - 10:15 AM: Targeted Keynote - "From Evaluation to Flourishing: A New Paradigm"
- Focus on actionable frameworks rather than theoretical perspectives
- Emphasis on experimental design challenges and solutions
10:15 - 10:30 AM: Break
10:30 AM - 12:30 PM: Working Group Session 1 - Framework Development Participants work in six pre-assigned dimension groups:
- Team 1: Comprehension & Agency
- Team 2: Curiosity & Learning
- Team 3: Creativity & Expression
- Team 4: Physical & Mental Wellbeing
- Team 5: Healthy Social Lives
- Team 6: Sense of Purpose
Each team will:
- Refine the measurement framework for their dimension
- Outline specific metrics and experimental protocols
- Develop assessment tools for both laboratory and ecological settings
- Identify implementation challenges and solutions
12:30 - 1:30 PM: Working Lunch with Cross-Group Exchange
- Representatives rotate to neighboring groups to share initial ideas
- Structured feedback gathering using provided templates
- Identification of overlap and integration opportunities
1:30 - 3:00 PM: Working Group Session 2 - Framework Refinement
- Groups incorporate feedback from lunch exchange
- Develop detailed assessment protocols and measurement tools
- Begin drafting validation approaches
3:00 - 3:15 PM: Break
3:15 - 4:45 PM: Structured Feedback Session
- Each group presents 10-minute work-in-progress (not polished presentations)
- Designated respondents provide 5 minutes of structured feedback per group
- Brief open Q&A focused on constructive improvement
4:45 - 5:30 PM: Integration Workshop
- Interactive session identifying connections between dimensions
- Mapping of potential composite metrics
- Discussion of implementation priorities
5:30 - 7:00 PM: Working Dinner with Facilitated Discussion Tables
- Themed tables focused on cross-cutting challenges
- Facilitated problem-solving discussions
- Documentation of insights for Day 2
Day 2: Benchmark Refinement & Action Planning
8:30 - 9:00 AM: Morning Coffee & Reflection Exercise
- Structured reflection on Day 1 outcomes
- Identification of overnight insights
- Prioritization of Day 2 objectives
9:00 - 9:20 AM: Day 2 Direction Setting
- Brief synthesis of Day 1 outcomes
- Clarification of Day 2 deliverables
- Working group adjustments if needed
9:20 - 11:00 AM: Working Group Session 3 - Validation & Implementation Groups continue work with focus on:
- Validation methodologies for proposed metrics
- Implementation pathways for different contexts
- Resource requirements and practical constraints
- Ethical considerations and safeguards
11:00 - 11:15 AM: Break
11:15 AM - 12:15 PM: Cross-Cutting Working Groups Participants reorganize into mixed teams addressing:
- Team A: Integration of cognitive and emotional metrics
- Team B: Short vs. long-term assessment approaches
- Team C: Implementation in high-stakes vs. everyday AI systems
- Team D: Inclusivity and cultural considerations in benchmarking
12:15 - 1:15 PM: Working Lunch with Progress Documentation
- Structured documentation of morning accomplishments
- Preparation for afternoon feedback
1:15 - 2:45 PM: Comprehensive Feedback Workshop
- Gallery walk format with visual documentation of all frameworks
- Rotating feedback stations with structured prompts
- Digital capture of feedback for immediate integration
2:45 - 3:00 PM: Break
3:00 - 4:30 PM: Final Working Group Session - Action Planning Original dimension groups reconvene to:
- Finalize benchmark frameworks incorporating all feedback
- Develop concrete implementation plans
- Establish validation study designs
- Create 90-day action plans for post-workshop progress
4:30 - 5:30 PM: Commitment Workshop
- Each group presents 5-minute action plans
- Structured commitment process for next steps
- Formation of ongoing working relationships
- Resource allocation and support needs identification
5:30 - 6:00 PM: Closing Integration & Next Steps
- Documentation of workshop achievements
- Clear assignment of post-workshop responsibilities
- Timeline for continued collaboration
Post-Workshop Working Group Deliverables:
- Comprehensive benchmark framework document for each dimension
- Detailed experimental protocols for validation studies
- Implementation guidelines for researchers and developers
- Open-source assessment tools and methodologies
- 90-day progress tracking system
- Quarterly virtual working group meetings schedule
- Plan for publication and dissemination of frameworks
This workshop emphasizes active collaboration, structured feedback, and concrete deliverables over traditional presentations. By requiring pre-work and focusing on working sessions, we aim to produce actionable benchmark frameworks that can immediately advance the field's approach to measuring AI's contribution to human flourishing.