Fixing inefficient Django ORM calls in brownfield projects

Code Review Doctor
2 min readNov 11, 2020

--

One of the quickest wins for improving the speed of a Django app is to fix the “oops didn’t mean to” inefficient database reads that slipped into a codebase unnoticed over the years. Indeed, as the codebase matures and the team members change and code gather dust old inefficient ORM queries can be overlooked. Simple ORM queries like:

def check_hounds():
queryset = HoundsModel.objects.all()
if len(queryset) > 2:
return "oh no. Run!"

This is reading every record from the database and checking at application level. Much less efficient than:

def check_hounds():
if HoundsModel.objects.count() > 2:
return "oh no. Run!"

It’s easy to scoff at such inefficiencies. “I would never do that”. Well, maybe not. But what about some dev in the past that was rushing to meet a deadline, an 11pm code reviewer equally surviving on coffee during crunch time? And you’re now responsible for that code!

Slightly less textbook, but also just as easy to both overlook and improve:

def write_condolence_letters(visited_coops):
queryset = ChickenModel.objects.all()
for chicken in queryset:
if chicken.coop.pk in visited_coops:
return f"dear {chicken.coop.owner_name}..."
else:
...

Spot the problems? Well, looping over all() will read everything from the db in one go. If there are billions of chickens then expect performance issues. For that we can use to chunk the reads 2000 (default) records pulled from the db at a time. Additionally, chicken.coop.pk does an additional database read for each chicken because relationships are lazily evaluated by default: coop is read from the db when only it was accessed via chicken.coop. For this particular field we can use Django's field caching: Django creates an _id field for related field. So this can be:

def write_condolence_letters(visited_coops):
queryset = ChickenModel.objects.all()
for chicken in queryset.iterator():
if chicken.coop_pk in visited_coops:
return f"dear {chicken.coop.owner_name}..."
else:
...

What if we’re working with a field on a related model other than the pk? Sure coop_pk is created by Django for us, but what if we needed to access other fields such as chicken.coop.owner_name? For that we an use select_related or prefetch_related (depending on if it's a ForeignKey or OneToOne etc relationship):

def write_condolence_letters(visited_coops):
queryset = ChickenModel.objects.all()
for chicken in queryset.select_related('coop'):
if chicken.coop.pk in visited_coops:
return f"dear {chicken.coop.owner_name}..."
else:
...

Now the related coop will be pulled out at during the same read as chicken. Notice though iterator is no longer used because iterator is useless with select_related and prefetch_related: iterator nullifies their effects. So these are mutually exclusive so only with context can we really say specifically what performance improvement strategy is needed.

Does your codebase have old inefficient ORM lookups?

It’s easier to spot new inefficient ORM lookups during code review but what about those already in your codebase? I can help with that: I’m a bot that can review your code in seconds instead of you spending hours:

If you would prefer inefficient ORM calls to not make it into your codebase, I also review pull requests:

See the GitHub PR bot and reduce dev effort of improving your code.

Sign up to discover human stories that deepen your understanding of the world.

--

--

Code Review Doctor
Code Review Doctor

Written by Code Review Doctor

I’m a GitHub bot that automatically improves your Python and Django. https://codereview.doctor

No responses yet

Write a response