Dataset
class, which provides a unified interface for handling experimental measurements, simulation results, and synthetic data. This guide covers the essential workflows for creating, importing, manipulating, and exporting datasets in various formats commonly used in biochemical research.
Understanding Dataset Structure
TheDataset
class serves as the central data container in Catalax, designed to handle the complexities of biochemical data while providing a clean, consistent interface. Understanding its structure is essential for effectively working with experimental and computational data.
Core Components
ADataset
contains several key components that work together to organize and manage your data:
states
: A list of state names that defines what states, such as molecules, proteins or process variables, are tracked in this dataset. This serves as the schema that ensures consistency across all measurements.measurements
: A list ofMeasurement
objects, where each measurement represents one experimental condition, simulation run, or data point in your study.name
,description
: Metadata fields that help organize and document your datasets for reproducibility, sharing, and long-term data management.id
: A unique identifier that distinguishes this dataset from others, automatically generated to ensure uniqueness.type
: Classification of the dataset (measurement, simulation, or prediction) that helps organize different types of data in your research workflow.
Measurement Structure
Each individualMeasurement
within a dataset contains the detailed information for one experimental condition or simulation run:
initial_conditions
: A dictionary mapping state names to their initial concentrations, which serves as the starting point for simulation or represents the experimental setup conditions.time
: An array of time points at which measurements were taken. This can beNone
for datasets that only contain initial conditions (such as when setting up simulations).data
: A dictionary that maps each state name to its complete concentration time series, providing the full temporal evolution of the system under the given conditions.id
: A unique identifier for the individual measurement, allowing precise referencing and data retrieval.
Creating Datasets
From Models
The most common way to create a new dataset is from an existing model, which automatically sets up the correct states structure:Adding Initial Conditions
Once you have a dataset structure, you can add initial conditions that represent different experimental scenarios or simulation starting points:add_initial()
creates a new Measurement
object with the specified initial conditions. This flexible approach allows you to build datasets that represent complex experimental designs with multiple conditions and replicates.