How to use Python dataclasses

mercredi 22 octobre 2025, 11:00 , par InfoWorld

Everything in Python is an object, or so the saying goes. If you want to create your own custom objects, with their own properties and methods, you use Python’s class object to do it. But creating classes in Python sometimes means writing loads of repetitive, boilerplate code; for example, to set up the class instance from the parameters passed to it or to create common functions like comparison operators.

Dataclasses, introduced in Python 3.7 (and backported to Python 3.6), provide a handy, less-verbose way to create classes. Many of the common things you do in a class, like instantiating properties from the arguments passed to the class, can be reduced to a few basic instructions by using dataclasses.

The backstage power of Python dataclasses

Consider this example of a conventional class in Python:

class Book:
'''Object for tracking physical books in a collection.'''
def __init__(self, name: str, weight: float, shelf_id:int = 0):
self.name = name
self.weight = weight # in grams, for calculating shipping
self.shelf_id = shelf_id
def __repr__(self):
return(f'Book(name={self.name!r},
weight={self.weight!r}, shelf_id={self.shelf_id!r})')

The biggest headache here is that you must copy each of the arguments passed to __init__ to the object’s properties. This isn’t so bad if you’re only dealing with Book, but what if you have additional classes—say, a Bookshelf, Library, Warehouse, and so on? Plus, typing all that code by hand increases your chances of making a mistake.

Here’s the same class implemented as a Python dataclass:

from dataclasses import dataclass

@dataclass
class Book:
'''Object for tracking physical books in a collection.'''
name: str
weight: float
shelf_id: int = 0

When you specify properties, called fields, in a dataclass, the @dataclass decorator automatically generates all the code needed to initialize them. It also preserves the type information for each property, so if you use a linting too that checks type information, it will ensure that you’re supplying the right kinds of variables to the class constructor.

Another thing @dataclass does behind the scenes is to automatically create code for common dunder methods in the class. In the conventional class above, we had to create our own __repr__. In the dataclass, the @dataclass decorator generates the __repr__ for you. While you still can override the generated code, you don’t need to manually write code for the most common cases.

Once a dataclass is created, it is functionally identical to a regular class. There is no performance penalty for using a dataclass. There’s only a small performance penalty for declaring the class as a dataclass, and that’s a one-time cost when the dataclass object is created.

Advanced Python dataclass initialization

The dataclass decorator can take initialization options of its own. Most of the time, you won’t need to supply them, but they can come in handy for certain edge cases. Here are some of the most useful ones (they’re all True/False):

frozen: Generates class instances that are read-only. Once data has been assigned, it can’t be modified. This is useful if instances of the dataclass are intended to be hashable, which allows them (among other things) to be used as dictionary keys. If you set frozen, the generated dataclass will also automatically have a __hash__ method created for it. (You also can use unsafe_hash=true to generate a __hash__ method for the dataclass, regardless of whether the dataclass is read-only or not, but that call invokes unsafe behavior.)

slots: Allows instances of dataclasses to use less memory by only allowing fields explicitly defined in the class. The memory savings really only manifest at scale — e.g., when generating upwards of thousands of instances of a given object. If you’re only generating a couple of dataclass instances at a time, it probably isn’t worth it.

kw_only: This setting makes all fields for the class keyword-only, so they must be defined using keyword arguments rather than positional arguments. This is a useful way to provide a dataclass instance’s arguments by way of a dictionary.

Customizing Python dataclass fields

How dataclasses work by default should be okay for the majority of use cases. Sometimes, though, you need to fine-tune how the fields in your dataclass are initialized. The following code sample demonstrates how to use the field function for fine-tuning:

from dataclasses import dataclass, field
from typing import List

@dataclass
class Book:
'''Object for tracking physical books in a collection.'''
name: str
condition: str = field(compare=False)
weight: float = field(default=0.0, repr=False)
shelf_id: int = 0
chapters: List[str] = field(default_factory=list)

When you set a default value to an instance of field, it changes how the field is set up depending on what parameters you provide. These are the most commonly-used options for field (though there are others):

default: Sets the default value for the field. You should use default if you a) use field to change any other parameters for the field, and b) want to set a default value on the field on top of that. In the above example, we used default to set weight to 0.0.

default_factory: Provides the name of a function, which takes no parameters, that returns some object to serve as the default value for the field. In the example, we wanted chapters to be an empty list.

repr: By default (True), controls if the field in question shows up in the automatically generated __repr__ for the dataclass. In this case, we didn’t want the book’s weight shown in the __repr__, so we used repr=False to omit it.

compare: By default (True), includes the field in the comparison methods automatically generated for the dataclass. Here, we didn’t want condition to be used as part of the comparison for two books, so we set compare=False.

Note that we adjusted the order of the fields so the non-default fields appeared first.

Controlling Python dataclass initialization

At this point, you might be wondering, “How do I get control over the init process to make more fine-grained changes if the __init__ method of a dataclass is generated automatically?” In these cases, you can use the __post_init__ method or or InitVar type.

__post_init__

If you include the __post_init__ method in your dataclass definition, you can provide instructions for modifying fields or other instance data:

from dataclasses import dataclass, field
from typing import List

@dataclass
class Book:
'''Object for tracking physical books in a collection.'''
name: str
weight: float = field(default=0.0, repr=False)
shelf_id: Optional[int] = field(init=False)
chapters: List[str] = field(default_factory=list)
condition: str = field(default='Good', compare=False)

def __post_init__(self):
if self.condition == 'Discarded':
self.shelf_id = None
else:
self.shelf_id = 0

In this example, we’ve created a __post_init__ method to set shelf_id to None if the book’s condition is initialized as 'Discarded'. Note how we use field to initialize shelf_id, and pass init as False to field. This means shelf_id won’t be initialized in __init__, but it is registered as a field with the dataclass overall, with type information.

InitVar

Another way to customize Python dataclass setup is to use the InitVar type. This lets you specify a field that will be passed to __init__ and then to __post_init__, but won’t be stored in the class instance.

By using InitVar, you can take in parameters when setting up the dataclass that are only used during initialization. Here’s an example:

from dataclasses import dataclass, field, InitVar
from typing import List

@dataclass
class Book:
'''Object for tracking physical books in a collection.'''
name: str
condition: InitVar[str] = 'Good'
weight: float = field(default=0.0, repr=False)
shelf_id: int = field(init=False)
chapters: List[str] = field(default_factory=list)

def __post_init__(self, condition):
if condition == 'Unacceptable':
self.shelf_id = None
else:
self.shelf_id = 0

Setting a field’s type to InitVar (with its subtype being the actual field type) signals to @dataclass to not make that field into a dataclass field, but to pass the data along to __post_init__ as an argument.

In this version of our Book class, we’re not storing condition as a field in the class instance. We’re only using condition during the initialization phase. If we find that condition was set to 'Unacceptable', we set shelf_id to None—but we don’t store condition itself in the class instance.

When to use Python dataclasses, and when not to

One common scenario for using dataclasses is to replace the namedtuple. Dataclasses offer the same behaviors and more, and they can be made immutable (as namedtuples are) by simply using @dataclass(frozen=True) as the decorator.

Another possible use case is replacing nested dictionaries (which can be clumsy) with nested instances of dataclasses. If you have a dataclass Library, with a list property of shelves, you could use a dataclass ReadingRoom to populate that list, then add methods to make it easy to access nested items (e.g., a book on a shelf in a particular room).

It’s also important to note, though, that not every Python class needs to be a dataclass. If you’re creating a class mainly to group together a bunch of static methods, rather than as a container for data, you don’t need to make it a dataclass. For instance, a common pattern with parsers is to have a class that takes in an abstract syntax tree, walks the tree, and dispatches calls to different methods in the class based on the node type. Because the parser class has very little data of its own, a dataclass isn’t useful here.

Lire la suite sur InfoWorld