Beyond dict and list: A Practical Deep Dive into Python's collections Module
Learn how Python's built-in collections module replaces fiddly boilerplate with purpose-built containers — Counter, defaultdict, deque, namedtuple, OrderedDict, and ChainMap — through runnable examples and the pitfalls to watch for.
Python's built-in dict, list, and tuple get you a long way, but a surprising amount of everyday code is just boilerplate working around their limitations: checking whether a key exists before appending to it, hand-rolling a tally loop, or popping from the front of a list and quietly paying a performance penalty. The standard-library collections module exists precisely to delete that boilerplate. It ships six specialized container types that are battle-tested, fast, and already installed on every Python you will ever touch. This guide walks through all six with runnable examples, and points out the pitfalls that trip people up.
Counter: tallying made trivial
Counting how often things occur is one of the most common tasks in programming, and Counter turns it into a one-liner. It is a dict subclass that maps each element to its count, and crucially it returns 0 for missing keys instead of raising KeyError.
from collections import Counter
words = "the quick brown fox the lazy dog the end".split()
c = Counter(words)
print(c["the"]) # 3
print(c["missing"]) # 0 (no KeyError)
print(c.most_common(2)) # [('the', 3), ('quick', 1)]
# Feed it more data later
c.update(["fox", "fox"])
print(c["fox"]) # 3
Counters also support arithmetic, which is perfect for combining or comparing tallies. Addition sums counts, subtraction removes them (dropping anything that hits zero or below), and the & and | operators take the element-wise minimum and maximum:
a = Counter(apple=3, banana=2)
b = Counter(apple=1, cherry=5)
print(a + b) # Counter({'cherry': 5, 'apple': 4, 'banana': 2})
print(a - b) # Counter({'apple': 2, 'banana': 2}) cherry dropped
print(a & b) # Counter({'apple': 1}) element-wise minimum
print(a | b) # Counter({'cherry': 5, 'apple': 3, 'banana': 2}) maximum
One pitfall worth knowing: the subtraction operator keeps only positive results, but the subtract() method mutates in place and can leave zero or negative counts. Reach for the operator when you want a clean tally and the method when you genuinely need negatives.
defaultdict: no more "does this key exist?"
How often have you written if key not in d: d[key] = [] before appending? defaultdict removes that ceremony. You give it a factory — a zero-argument callable — and it calls that factory automatically the first time a missing key is accessed.
from collections import defaultdict
# Group names by department
dd = defaultdict(list)
for name, dept in [("Ann", "Eng"), ("Bob", "Eng"), ("Cara", "Sales")]:
dd[dept].append(name)
print(dict(dd)) # {'Eng': ['Ann', 'Bob'], 'Sales': ['Cara']}
# Count characters with an int factory (default 0)
counts = defaultdict(int)
for ch in "mississippi":
counts[ch] += 1
print(dict(counts)) # {'m': 1, 'i': 4, 's': 4, 'p': 2}
The factory can be anything callable, including defaultdict itself, which lets you build arbitrarily deep nested dictionaries without pre-creating each level:
def tree():
return defaultdict(tree)
root = tree()
root["a"]["b"]["c"] = 1 # every level springs into existence
print(root["a"]["b"]["c"]) # 1
The classic gotcha: simply reading a missing key creates it. Calling dd["new"] to "check" a value silently inserts an empty list. If you want a non-mutating lookup, use dd.get("new") instead.
deque: fast appends and pops at both ends
A Python list is great for appending at the end, but list.pop(0) and list.insert(0, x) are O(n) because every other element has to shift. A deque (double-ended queue) gives you O(1) operations at both ends, making it the right tool for queues, sliding windows, and breadth-first traversals.
from collections import deque
dq = deque([1, 2, 3], maxlen=3)
dq.append(4) # 1 falls off the left
print(dq) # deque([2, 3, 4], maxlen=3)
dq.appendleft(0) # 4 falls off the right
print(dq) # deque([0, 2, 3], maxlen=3)
dq.rotate(1) # shift everything right by one
print(dq) # deque([3, 0, 2], maxlen=3)
The maxlen argument is a hidden gem: once the deque is full, adding to one end automatically discards from the other. That makes a fixed-size deque an elegant way to keep, say, the last N log lines or a rolling window of recent values with zero manual trimming.
namedtuple: tuples with self-documenting fields
Returning a bare tuple like (3, 4) forces every caller to remember what each position means. A namedtuple gives those positions names while staying a lightweight, immutable tuple under the hood — so it is still unpackable, indexable, and hashable.
from collections import namedtuple
Point = namedtuple("Point", ["x", "y"])
p = Point(3, 4)
print(p.x, p.y) # 3 4 access by name
print(p[0]) # 3 still indexable
print(p._asdict()) # {'x': 3, 'y': 4}
# Immutable, so "edit" by making a copy
p2 = p._replace(y=10)
print(p2) # Point(x=3, y=10)
Because it is immutable, a namedtuple is safe to use as a dictionary key or set member. If you need mutability or type-checked fields, the closely related typing.NamedTuple offers a class-based syntax, and dataclasses.dataclass is the modern choice when you want defaults and methods. But for a quick, frozen record, the original namedtuple is hard to beat.
OrderedDict: when order is the whole point
Since Python 3.7, a regular dict remembers insertion order, so you no longer need OrderedDict just to preserve sequence. What keeps it relevant are its order-aware extras — chiefly move_to_end() — and the fact that its equality comparison is order-sensitive.
from collections import OrderedDict
od = OrderedDict.fromkeys("abcde")
od.move_to_end("a") # push 'a' to the back
print(list(od)) # ['b', 'c', 'd', 'e', 'a']
od.move_to_end("e", last=False) # pull 'e' to the front
print(list(od)) # ['e', 'b', 'c', 'd', 'a']
This makes OrderedDict a tidy foundation for an LRU cache: on each access, move the key to the end, and when you need to evict, pop from the front with popitem(last=False). (For a ready-made cache, functools.lru_cache is usually the better answer, but understanding the mechanism is valuable.)
ChainMap: layered lookups without merging
When you have several dictionaries representing layers of configuration — command-line flags over environment over defaults — you often want to search them in priority order without physically merging them. ChainMap groups multiple mappings into a single view and searches them left to right.
from collections import ChainMap
defaults = {"color": "red", "user": "guest"}
overrides = {"user": "anton"}
cm = ChainMap(overrides, defaults)
print(cm["user"]) # anton (found in overrides first)
print(cm["color"]) # red (falls through to defaults)
# Writes always go to the FIRST mapping
cm["theme"] = "dark"
print(overrides) # {'user': 'anton', 'theme': 'dark'}
The key behaviour to internalize is that reads cascade through every layer, but writes, updates, and deletes only ever touch the first dictionary. The underlying dictionaries stay live, so a later change to defaults is immediately visible through the ChainMap — no rebuild required.
Wrap-up and next steps
Each of these containers replaces a recurring scrap of boilerplate with something clearer and usually faster. Reach for Counter whenever you are tallying, defaultdict when you are grouping or accumulating, deque when you need a real queue or a fixed-size window, namedtuple for lightweight immutable records, OrderedDict when ordering operations matter, and ChainMap for layered configuration. A good next step is to scan a project you already have for the tell-tale patterns — a key not in d guard, a pop(0), a manual count dictionary — and swap in the purpose-built type. The code gets shorter, the intent gets clearer, and you lean on well-tested standard-library implementations instead of reinventing them.