Using MotherDuck with Hypermode

MotherDuck

Serverless analytics platform powered by DuckDB

Overview

MotherDuck is a serverless analytics platform that combines the speed of DuckDB with cloud convenience. This guide will walk you through connecting your Hypermode agent to MotherDuck, enabling powerful data analytics, complex queries, and insights generation using both your own data and MotherDuck’s rich sample datasets.

Prerequisites

Before connecting MotherDuck to Hypermode, you’ll need:

A MotherDuck account
MotherDuck API token for authentication
A Hypermode workspace

Setting up MotherDuck

Step 1: Create your MotherDuck account

If you haven’t already, sign up for a MotherDuck account to access serverless analytics capabilities.

Step 2: Access your workspace

Once logged in, you’ll see your MotherDuck workspace with access to:

Your personal databases
Sample datasets (NYC Taxi, TPC-H, etc.)
Query editor and analytics tools

Step 3: Generate an API token

Create an API token for secure access:

Navigate to Settings → Access Tokens in your MotherDuck dashboard
Click Create Token
Give your token a descriptive name (for example, “Hypermode Integration”)
Copy the generated token immediately

Store your API token securely! You won’t be able to see it again after creation, and it provides full access to your MotherDuck workspace.

Creating your MotherDuck agent

Step 1: Create a new agent

From the Hypermode interface, create a new agent manually:

Click the agent dropdown menu
Select “Create new Agent”

Step 2: Configure agent settings

Use these recommended settings for your MotherDuck agent:

Agent Name: AnalyticsAgent
Agent Title: Connects to MotherDuck Analytics
Description: AnalyticsAgent performs data analysis and generates insights
Instructions: You have a connection to MotherDuck and various other developer tools to perform advanced analytics, generate insights, and answer data questions. You can query large datasets, perform aggregations, and create data visualizations. You have access to sample datasets including NYC Taxi data, TPC-H benchmark data, and more.
Model: GPT-4.1

Step 3: View your agent profile

Once created, navigate to your agent’s settings page to see the profile:

Connecting to MotherDuck

Step 1: Add the MotherDuck connection

Navigate to the Connections tab and add MotherDuck:

Click “Add connection”
Select “MotherDuck” from the dropdown

Step 2: Configure credentials

Enter your MotherDuck credentials:

API Token: Your MotherDuck API token created in the previous step

Verifying the MotherDuck connection

Step 1: Test basic connectivity

Start a new thread and test the connection:

What databases are available in my MotherDuck workspace?

Expected result Your agent will show the available databases:

md_information_schema - System metadata
my_db - Your personal database
sample_data - Rich sample datasets

Step 2: Explore sample datasets

Test access to the sample datasets:

What sample datasets are available? Can you show me the schemas and table structures?

Expected result Your agent will discover these sample datasets:

NYC data (sample_data.nyc): taxi trips, service requests, rideshare data
Hacker News (sample_data.hn): 3.8M posts and comments
Movies (sample_data.kaggle): 41K movies with embeddings
WHO Air Quality (sample_data.who): Global ambient air quality data
Stack Overflow Survey (sample_data.stackoverflow_survey): Developer survey results

Step 3: Query sample data

Test with a simple query on the NYC taxi dataset:

Show me 3 sample taxi trips from the NYC dataset with their basic details

Expected result Your agent will return something like:

VendorID: 2, Pickup: 2022-11-04 23:13:01, Dropoff: 2022-11-04 23:25:38
Trip Distance: 1.78 miles, Total Amount: $16.56, Passengers: 4

Exploring built-in sample datasets

NYC taxi dataset (3.2M records)

The NYC taxi dataset contains detailed trip information with 19 fields including timestamps, locations, fares, and payment details:

Analyze NYC taxi usage patterns by hour of day - when are the busiest times?

Real results from the data:

Peak hours: 12-6 PM (157K-223K trips per hour)
Quietest time: 3-4 AM (23K trips)
Highest average fares: Early morning 4-6 AM ($27-30) due to airport trips
Lowest average fares: Mid-morning 9-11 AM (~$20-21)

Hacker News dataset (3.8M posts)

Complete Hacker News data including stories, comments, scores, and timestamps:

What are the highest-scoring Hacker News stories of all time?

Real top stories include:

“Mechanical Watch” - 4,298 points by todsacerdoti
“Google Search Is Dying” - 3,636 points by dbrereton
“My First Impressions of Web3” - 3,393 points by natdempk
Major news: “Queen Elizabeth II has died” - 2,827 points

Movies dataset (41K movies with embeddings)

Rich movie data with titles, overviews, and pre-computed embeddings for similarity search:

Find movies similar to "Toy Story" using semantic search

Dataset includes:

Movie titles and plot overviews
512-dimensional embeddings for semantic similarity
Popular films from Toy Story to modern releases

WHO air quality dataset

Global ambient air quality measurements with PM2.5, PM10, and NO2 concentrations:

Which cities have the highest air pollution levels?

Features:

PM2.5 and PM10 particulate matter concentrations
NO2 (nitrogen dioxide) levels
Geographic coordinates for mapping
Multi-year temporal data

Setting up your data environment

Step 1: Understand the data structure

Get familiar with the complete sample dataset structure:

Give me a detailed overview of all sample datasets including row counts, key fields, and potential analysis opportunities

Your agent will discover:

NYC Taxi: 3.2M trips with fare, location, and time data
Hacker News: 3.8M posts with scores, authors, and content
Movies: 41K films with embeddings for ML applications
Air Quality: WHO data with pollution measurements globally

Step 2: Create your own database (Optional)

You can create custom databases alongside the sample data:

Create a new database called 'my_analytics' for custom analysis

Step 3: Update agent instructions

Enhance your agent’s capabilities with specific dataset knowledge:

You are connected to MotherDuck with access to these verified sample datasets:

**NYC Data (sample_data.nyc.taxi)**: 3,252,717 taxi trip records from 2022
- Fields: pickup/dropoff times, locations (PULocationID/DOLocationID), fares, distances
- Great for: Time series analysis, location-based insights, revenue analysis

**Hacker News (sample_data.hn.hacker_news)**: 3,866,740 posts and comments
- Fields: title, score, author, timestamp, type (story/comment)
- Great for: Content analysis, trending topics, user behavior

**Movies (sample_data.kaggle.movies)**: 41,371 movies with embeddings
- Fields: title, overview, 512-dimensional embeddings
- Great for: Semantic similarity, recommendation systems

**WHO Air Quality (sample_data.who.ambient_air_quality)**: Global pollution data
- Fields: PM2.5/PM10/NO2 concentrations, coordinates, population
- Great for: Environmental analysis, geographic patterns

Always use the full table names with schema prefixes (e.g., sample_data.nyc.taxi).

Testing analytical operations

Test 1: Time series analysis with real data

Analyze actual taxi usage patterns:

Show me NYC taxi demand by hour with average fares - what patterns do you see?

Real insights from the data:

Clear daily patterns with rush hour peaks
Early morning premium pricing (airport runs)
Lowest activity 2-4 AM, highest 5-7 PM
Weekend vs weekday patterns visible in the data

Test 2: Content analysis

Explore Hacker News trending topics:

What topics generate the highest engagement on Hacker News based on story titles and scores?

Real findings:

Tech criticism (“Google Search Is Dying”) scores highly
Major news events (Queen Elizabeth, Musk/Twitter) get massive engagement
Technical deep-dives (“Mechanical Watch”) resonate with the community

Test 3: Geospatial analysis potential

While location IDs need lookup tables, the data structure supports location analysis:

Analyze pickup and dropoff location patterns in the taxi data

Test 4: Semantic similarity with embeddings

Leverage the pre-computed movie embeddings:

Using the movie embeddings, find films similar to popular titles

The movies dataset includes 512-dimensional embeddings perfect for similarity analysis and recommendation systems.

What you can do

With your MotherDuck connection established, your agent can:

Perform complex analytics on millions of records with DuckDB’s speed
Generate business insights from real-world datasets
Run time series analysis on temporal data (taxi trips, HN posts)
Execute semantic search using pre-computed embeddings
Analyze geographic patterns with coordinate data
Process text data from movie overviews and HN content
Handle big data efficiently with columnar processing
Create statistical summaries and trend analysis
Integrate with other tools for comprehensive workflows

Advanced analytical capabilities with real examples

Window functions and analytical SQL

Calculate running totals and moving averages for taxi revenue by day using the actual NYC data

Text analysis on real content

Analyze Hacker News story titles to identify trending topics and sentiment patterns

Embeddings and similarity search

Use the movie embeddings to build a recommendation system - find movies similar to "Toy Story"

Environmental data analysis

Identify the most polluted cities globally using WHO air quality measurements

Best practices for the sample datasets

NYC Taxi Data: Always filter by date ranges for performance; use appropriate aggregations for time-based analysis
Hacker News: Consider story vs comment types; use score thresholds for quality filtering
Movies: Leverage embeddings for similarity; combine with text analysis of overviews
Air Quality: Account for different measurement standards; filter by data quality indicators

Sample analytical workflows with real data

Peak demand prediction

Using historical taxi patterns, predict optimal driver deployment times

Real insight The data shows 4-6 AM has highest fares but lowest volume - perfect for premium positioning.

Community engagement analysis

Identify what content types perform best on Hacker News

Real finding Technical deep-dives and industry criticism generate highest engagement scores.

Environmental health correlation

Correlate air quality data with population density to identify at-risk areas

Content recommendation engine

Build a semantic movie recommendation system using the pre-computed embeddings

Advanced analytics examples

Seasonal trend analysis

Identify seasonal patterns in taxi usage and predict demand for different times of year

A/B testing analysis

Design and analyze A/B tests using statistical functions to determine significance of results

Predictive analytics preparation

Create features and prepare datasets for machine learning models to predict taxi demand or customer behavior

Troubleshooting

Common query issues with sample data

Table not found errors: Always use full schema names like sample_data.nyc.taxi
Performance with large datasets: Use LIMIT and appropriate WHERE clauses
Memory constraints: The taxi dataset has 3.2M rows - be mindful of result sizes
Embedding queries: Movie embeddings are 512-dimensional arrays - use appropriate similarity functions

Query performance optimization

Filter early: NYC taxi data benefits from date/time filtering
Use appropriate indexes: MotherDuck optimizes automatically but be mindful of query patterns
Batch large operations: For analysis across millions of HN posts, consider sampling strategies

The sample datasets are production-ready for analysis and perfect for learning advanced SQL patterns. Start with the NYC taxi data for time-series practice, use Hacker News for text analysis, leverage movie embeddings for ML applications, and explore WHO data for geospatial analysis.

The combination of real data scale (millions of records) with advanced DuckDB features makes MotherDuck perfect for production-grade analytics learning and development.

Learn more

Combine MotherDuck with other Hypermode connections to create powerful data-driven workflows. For example, use Slack to share analytical insights, GitHub to version control your analytical queries, or Stripe to combine payment data with other business metrics for comprehensive reporting.

Hypermode

Agents

Resources

​MotherDuck

​Overview

​Prerequisites

​Setting up MotherDuck

​Step 1: Create your MotherDuck account

​Step 2: Access your workspace

​Step 3: Generate an API token

​Creating your MotherDuck agent

​Step 1: Create a new agent

​Step 2: Configure agent settings

​Step 3: View your agent profile

​Connecting to MotherDuck

​Step 1: Add the MotherDuck connection

​Step 2: Configure credentials

​Verifying the MotherDuck connection

​Step 1: Test basic connectivity

​Step 2: Explore sample datasets

​Step 3: Query sample data

​Exploring built-in sample datasets

​NYC taxi dataset (3.2M records)

​Hacker News dataset (3.8M posts)

​Movies dataset (41K movies with embeddings)

​WHO air quality dataset

​Setting up your data environment

​Step 1: Understand the data structure

​Step 2: Create your own database (Optional)

​Step 3: Update agent instructions

​Testing analytical operations

​Test 1: Time series analysis with real data

​Test 2: Content analysis

​Test 3: Geospatial analysis potential

​Test 4: Semantic similarity with embeddings

​What you can do

​Advanced analytical capabilities with real examples

​Window functions and analytical SQL

​Text analysis on real content

​Embeddings and similarity search

​Environmental data analysis

​Best practices for the sample datasets

​Sample analytical workflows with real data

​Peak demand prediction

​Community engagement analysis

​Environmental health correlation

​Content recommendation engine

​Advanced analytics examples

​Seasonal trend analysis

​A/B testing analysis

​Predictive analytics preparation

​Troubleshooting

​Common query issues with sample data

​Query performance optimization

​Learn more

MotherDuck

Overview

Prerequisites

Setting up MotherDuck

Step 1: Create your MotherDuck account

Step 2: Access your workspace

Step 3: Generate an API token

Creating your MotherDuck agent

Step 1: Create a new agent

Step 2: Configure agent settings

Step 3: View your agent profile

Connecting to MotherDuck

Step 1: Add the MotherDuck connection

Step 2: Configure credentials

Verifying the MotherDuck connection

Step 1: Test basic connectivity

Step 2: Explore sample datasets

Step 3: Query sample data

Exploring built-in sample datasets

NYC taxi dataset (3.2M records)

Hacker News dataset (3.8M posts)

Movies dataset (41K movies with embeddings)

WHO air quality dataset

Setting up your data environment

Step 1: Understand the data structure

Step 2: Create your own database (Optional)

Step 3: Update agent instructions

Testing analytical operations

Test 1: Time series analysis with real data

Test 2: Content analysis

Test 3: Geospatial analysis potential

Test 4: Semantic similarity with embeddings

What you can do

Advanced analytical capabilities with real examples

Window functions and analytical SQL

Text analysis on real content

Embeddings and similarity search

Environmental data analysis

Best practices for the sample datasets

Sample analytical workflows with real data

Peak demand prediction

Community engagement analysis

Environmental health correlation

Content recommendation engine

Advanced analytics examples

Seasonal trend analysis

A/B testing analysis

Predictive analytics preparation

Troubleshooting

Common query issues with sample data

Query performance optimization

Learn more