How to Write Efficient Python Data Classes
Writing efficient Python data classes cuts boilerplate while keeping your code clean. And this article will teach you how.
Image by Author
# Introduction
Standard Python objects store attributes in instance dictionaries. They are not hashable unless you implement hashing manually, and they compare all attributes by default. This default behavior is sensible but not optimized for applications that create many instances or need objects as cache keys.
Data classes address these limitations through configuration rather than custom code. You can use parameters to change how instances behave and how much memory they use. Field-level settings also allow you to exclude attributes from comparisons, define safe defaults for mutable values, or control how initialization works.
This article focuses on the key data class capabilities that improve efficiency and maintainability without adding complexity.
You can find the code on GitHub.
# 1. Frozen Data Classes for Hashability and Safety
Making your data classes immutable provides hashability. This allows you to use instances as dictionary keys or store them in sets, as shown below:
The frozen=True parameter makes all fields immutable after initialization and automatically implements __hash__(). Without it, you would encounter a TypeError when trying to use instances as dictionary keys.
This pattern is essential for building caching layers, deduplication logic, or any data structure requiring hashable types. The immutability also prevents entire categories of bugs where state gets modified unexpectedly.
# 2. Slots for Memory Efficiency
When you instantiate thousands of objects, memory overhead compounds quickly. Here is an example:
The slots=True parameter eliminates the per-instance __dict__ that Python normally creates. Instead of storing attributes in a dictionary, slots use a more compact fixed-size array.
For a simple data class like this, you save several bytes per instance and get faster attribute access. The tradeoff is that you cannot add new attributes dynamically.
# 3. Custom Equality with Field Parameters
You often do not need every field to participate in equality checks. This is especially true when dealing with metadata or timestamps, as in the following example:
Output:
The compare=False parameter on a field excludes it from the auto-generated __eq__() method.
Here, two users are considered equal if they share the same ID and email, regardless of when they logged in or how many times. This prevents spurious inequality when comparing objects that represent the same logical entity but have different tracking metadata.
# 4. Factory Functions with Default Factory
Using mutable defaults in function signatures is a Python gotcha. Data classes provide a clean solution:
The default_factory parameter takes a callable that generates a new default value for each instance. Without it, using items: list = [] would create a single shared list across all instances — the classic mutable default gotcha!
This pattern works for lists, dicts, sets, or any mutable type. You can also pass custom factory functions for more complex initialization logic.
# 5. Post-Initialization Processing
Sometimes you need to derive fields or validate data after the auto-generated __init__ runs. Here is how you can achieve this using post_init hooks: