Build your first Dagster pipeline
Welcome to Dagster! In this guide, we'll cover:
- Setting up a basic Dagster project using Dagster OSS for local development
- Creating a single Dagster asset that encapsulates the entire Extract, Transform, and Load (ETL) process
- Using Dagster's UI to monitor and execute your pipeline
- Deploying your changes to the cloud
If you have created a project through the Dagster+ Serverless UI, see the Dagster+ Serverless quickstart guide instead.
Prerequisites
Before getting started, you will need to make sure you install the following prerequisites:
- Python 3.9+
- If using uv as your package manager, you will need to install uv(Recommended).
- If using pip as your package manager, you will need to install the create-dagsterCLI with Homebrew,curl, orpip.
For detailed instructions, see the Installation guide.
Step 1: Scaffold a new Dagster project
- uv
- pip
- 
Open your terminal and scaffold a new Dagster project: uvx create-dagster@latest project dagster-quickstart
- 
Respond yto the prompt to runuv syncafter scaffolding 
- 
Change to the dagster-quickstartdirectory:cd dagster-quickstart
- 
Activate the virtual environment: - MacOS/Unix
- Windows
 source .venv/bin/activate.venv\Scripts\activate
- 
Install the required dependencies in the virtual environment: uv add pandas
- 
Open your terminal and scaffold a new Dagster project: create-dagster project dagster-quickstart
- 
Change to the dagster-quickstartdirectory:cd dagster-quickstart
- 
Create and activate a virtual environment: - MacOS/Unix
- Windows
 python -m venv .venvsource .venv/bin/activatepython -m venv .venv.venv\Scripts\activate
- 
Install the required dependencies: pip install pandas
- 
Install your project as an editable package: pip install --editable .
Your new Dagster project should have the following structure:
- uv
- pip
.
└── dagster-quickstart
   ├── pyproject.toml
   ├── src
   │   └── dagster_quickstart
   │       ├── __init__.py
   │       ├── definitions.py
   │       └── defs
   │           └── __init__.py
   ├── tests
   │    └── __init__.py
   └── uv.lock
.
└── dagster-quickstart
   ├── pyproject.toml
   ├── src
   │   └── dagster_quickstart
   │       ├── __init__.py
   │       ├── definitions.py
   │       └── defs
   │           └── __init__.py
   └── tests
      └── __init__.py
Step 2: Scaffold an assets file
Use the dg scaffold defs command to generate an assets file on the command line:
dg scaffold defs dagster.asset assets.py
This will add a new file assets.py to the defs directory:
src
└── dagster_quickstart
   ├── __init__.py
   └── defs
      ├── __init__.py
      └── assets.py
Step 3: Add data
Next, create a sample_data.csv file. This file will act as the data source for your Dagster pipeline:
mkdir src/dagster_quickstart/defs/data && touch src/dagster_quickstart/defs/data/sample_data.csv
In your preferred editor, copy the following data into this file:
id,name,age,city
1,Alice,28,New York
2,Bob,35,San Francisco
3,Charlie,42,Chicago
4,Diana,31,Los Angeles
Step 4: Define the asset
To define the assets for the ETL pipeline, open src/dagster_quickstart/defs/assets.py file in your preferred editor and copy in the following code:
import pandas as pd
import dagster as dg
sample_data_file = "src/dagster_quickstart/defs/data/sample_data.csv"
processed_data_file = "src/dagster_quickstart/defs/data/processed_data.csv"
@dg.asset
def processed_data():
    ## Read data from the CSV
    df = pd.read_csv(sample_data_file)
    ## Add an age_group column based on the value of age
    df["age_group"] = pd.cut(
        df["age"], bins=[0, 30, 40, 100], labels=["Young", "Middle", "Senior"]
    )
    ## Save processed data
    df.to_csv(processed_data_file, index=False)
    return "Data loaded successfully"
At this point, you can list the Dagster definitions in your project with dg list defs. You should see the asset you just created:
dg list defs
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Section ┃ Definitions                                               ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Assets  │ ┏━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┓ │
│         │ ┃ Key            ┃ Group   ┃ Deps ┃ Kinds ┃ Description ┃ │
│         │ ┡━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━┩ │
│         │ │ processed_data │ default │      │       │             │ │
│         │ └────────────────┴─────────┴──────┴───────┴─────────────┘ │
└─────────┴───────────────────────────────────────────────────────────┘
You can also load and validate your Dagster definitions with dg check defs:
dg check defs
All component YAML validated successfully.
All definitions loaded successfully.
Step 5: Run your pipeline
- 
In the terminal, navigate to your project's root directory and run: dg dev
- 
Open your web browser and navigate to http://localhost:3000, where you should see the Dagster UI:  
- 
In the top navigation, click the Assets tab, then click View lineage:  
- 
To run the pipeline, click Materialize:  
- 
To view the run as it executes, click the Runs tab, then on the right side of the page, click View:  To change how the run is displayed, you can use the view buttons in the top left corner of the page:   
You can also run the pipeline by using the dg launch --assets command and passing an asset selection:
dg launch --assets "*"
Step 6: Verify the results
In your terminal, run:
cat src/dagster_quickstart/defs/data/processed_data.csv
You should see the transformed data, including the new age_group column:
id,name,age,city,age_group
1,Alice,28,New York,Young
2,Bob,35,San Francisco,Middle
3,Charlie,42,Chicago,Senior
4,Diana,31,Los Angeles,Middle
Step 7. Deploy to production
Once you have run your pipeline locally, you can optionally deploy it to production.
- OSS
- Dagster+ Hybrid
To deploy to OSS production, see the OSS deployment docs. If you have already set up a production OSS deployment with an existing project, you will need to create a workspace.yaml file to tell your deployment where to find each project (also known as a code location).
- Set up a Hybrid deployment, if you haven't already.
- In the root directory of your project, run dg scaffold build-artifactsto create abuild.yamldeployment configuration file and a Dockerfile.
- To deploy to the cloud, you can either:
- Perform a one-time deployment with the dagster-cloudCLI
- Set up CI/CD for continuous deployment.
 
- Perform a one-time deployment with the 
With Dagster+ Hybrid, you can also use branch deployments to safely test your changes against production data.
Next steps
Congratulations! You've just built and run your first pipeline with Dagster. Next, you can:
- Follow the Tutorial to learn how to build a more complex ETL pipeline
- Check out our Python primer series for an in-depth tour of Python modules, packages and imports
- Create your own Dagster project, add assets and integrations, and automate your pipeline
- Test your pipelines with asset checks and debug them in real time with pdb