Skip to content

Reduce query counts#810

Open
cpoppema wants to merge 15 commits intofurlongm:mainfrom
cpoppema:reduce-query-counts
Open

Reduce query counts#810
cpoppema wants to merge 15 commits intofurlongm:mainfrom
cpoppema:reduce-query-counts

Conversation

@cpoppema
Copy link
Copy Markdown

Hello there!

We've been running patchman for a bit, and we are already running a slightly patched version with some query optimalisations. After an upstream update I was pleasantly surprised to see a builtin solution to get rid of most N+1 queries, so thanks for that!

In this PR I created separate commits per module (mostly) to play even smarter with select/prefetch_related and some other stuff to either: reduce query count, or to optimize query duration.

Please let me know if you want anything changes or some comparisons. This is exclusively targeting views. I can test against postgres and mysql. Current query counts/durations are against out-of-the-box docker containers on my local machine (so no network latency).

Some observations while making my changes:

  • after adding .annotate() to some querysets I found that sometimes: the Meta.ordering was ignore
  • with annotate the COUNT(*) queries kept all the joined tables and for bigger tables slowed down quite a bit

(I only realised this halfway through, so I might not have updated every queryset most optimally)

Here's an overview of my database and changes per module. I tested the majority using postgres.

table count
arch_machinearchitecture 1
arch_packagearchitecture 10
auth_group 0
auth_group_permissions 0
auth_permission 176
auth_user 2
auth_user_groups 0
auth_user_user_permissions 0
django_admin_log 3347
django_celery_beat_clockedschedule 0
django_celery_beat_crontabschedule 0
django_celery_beat_intervalschedule 0
django_celery_beat_periodictask 0
django_celery_beat_periodictasks 1
django_celery_beat_solarschedule 0
django_content_type 42
django_migrations 130
django_session 89
django_site 1
domains_domain 3
errata_erratum 1932
errata_erratum_affected_packages 39
errata_erratum_cves 14091
errata_erratum_fixed_packages 45853
errata_erratum_osreleases 2305
errata_erratum_references 1932
hosts_host 322
hosts_host_errata 1817
hosts_host_modules 0
hosts_host_packages 192377
hosts_host_updates 2814
hosts_hostrepo 6757
modules_module 0
modules_module_packages 0
operatingsystems_osrelease 4
operatingsystems_osrelease_repos 19
operatingsystems_osvariant 2
packages_package 184100
packages_packagecategory 0
packages_packagename 80731
packages_packageupdate 126
reports_report 970
repos_mirror 91
repos_mirrorpackage 578980
repos_repository 61
rest_framework_api_key_apikey 0
security_cve 8914
security_cve_cvss_scores 6623
security_cve_cwes 2506
security_cve_references 0
security_cvss 859
security_cwe 287
security_reference 48468
tagging_tag 7
tagging_taggeditem 336
taggit_tag 7
taggit_taggeditem 322

Impact of changes in each module:

/hosts/

hosts/admin.py: select_related for HostRepoAdmin

path query count before after
/admin/hosts/hostrepo/ 222 22

hosts/managers.py: select_related osvariant__arch

path query count before after
/hosts/ 416 366

host/views.py: select_related arch to OSVariant

path query count before after
/hosts/ 366 44

host/views.py: select_related repo to HostRepo

path query count before after
/hosts/$hostname/ 73 47

host/views.py: select_related __name and __arch to updates_by_package

path query count before after
/hosts/$hostname/ 47 31

/operatingsystems/

operatingsystems/tables.py + views.py: use xyz_count / prefetch_related for caching nested m2m .count queries

path query count before after
/os/releases/ 55 43
/os/releases/6/ 76 62
/os/variants/ 46 41

/packages/

packages/admin.py: select_related for PackageUpdateAdmin

path query count before after
/admin/packages/packageupdate/ 622 22

packages/views.py: avoid table joining to query the count() for PackageTable:

path COUNT(*) query duration before after
/packages/id/ 500ms 7ms
/packages/name/ 160ms 2ms

packages/views.py: no unnecessary distinct() for arch + packagetype filters (local fields)

path packages query duration before after
/packages/id/? 50ms 50ms
/packages/id/?provides_fix_in_erratum=false 3500ms unchanged
/packages/id/?available_in_repos=false 550ms unchanged
/packages/id/?packagetype=D 2000ms 50ms
/packages/id/?arch_id=2 700ms 50ms
/packages/name/?packagetype=D 252ms 7ms
/packages/name/?arch_id=2 93ms 7ms
path query count before after
/package/name/ 89 39

/repos/

repos/views.py: use xyz_count

path query count before after
/repos/ 91 41

/security/

security/views.py: use xyz_count / prefetch_related for caching nested m2m .count queries

path query count before after
/security/cwes 88 38
/security/cves 138 40
/security/references/ 89 39
path Reference Filter query duration before after
/security/references/?ref_type=Package 310ms 15ms

/dashboard/

utils/views:py + dashboard.html

path query count before after
/dashboard/ 173 81

In particular MySQL is very slow here, x3 the duration of Postgres...

Postgres /dashboard/

query query duration before after
norepo_packages.count 380ms 30ms
norepo_packages (.all) 380ms 30ms
orphaned_packages.count 111ms 75ms
orphaned_packages (.all) 231ms 180ms

Mysql /dashboard/

query query duration before after
norepo_packages.count 1100ms 30ms
norepo_packages (.all) 1250ms 225ms
orphaned_packages.count 980ms 265ms
orphaned_packages (.all) 1500ms 700ms

* (every view)

context_processors.py: since context processors defined in settings are executed for every render including subrenders by django-tables2:

path query count before after
/admin/* N N -17
/dashboard/ 83 44

A bonus gain, most noticeable on MySQL:

reports/models.py: adding index for Report.created

path reports query duration before after
/reports/ 300ms (filesort, before query cache) 2ms (backward index scan)

Comment thread packages/views.py Outdated
Comment thread security/views.py

if 'ref_type' in request.GET:
refs = refs.filter(ref_type=request.GET['ref_type']).distinct()
refs = refs.filter(ref_type=request.GET['ref_type'])
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to restore distinct() here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty certain distinct() has no effect here in the resultset other than slowing down the query. 'ref_type' is a local field unlike the 'package__' filter in packages/views.py.

Comment thread security/views.py
filter_list = []
filter_list.append(Filter(request, 'Reference Type', 'ref_type',
Reference.objects.values_list('ref_type', flat=True).distinct()))
Reference.objects.values_list('ref_type', flat=True)))
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I was expecting distinct() here to be useful indeed. Since ref_type would be shared by many References with different urls. I'm not sure where it is deduplicated but doing it without distinct() saves quite a bit of time and the Filter by on the side does not show any duplicates.

With .distinct() it takes 325ms:

SELECT DISTINCT "security_reference"."ref_type",
       "security_reference"."url"
  FROM "security_reference"
 ORDER BY "security_reference"."ref_type" ASC,
          "security_reference"."url" ASC

Without .distinct() it's only 17ms:

SELECT "security_reference"."ref_type"
  FROM "security_reference"
 ORDER BY "security_reference"."ref_type" ASC,
          "security_reference"."url" ASC

Running it in ./manage.py dbshell will reveal it returns many more rows (ie. all of them); it is just computational cheaper it seems. I can also revert this if you want me to

Comment thread operatingsystems/views.py Outdated
@furlongm
Copy link
Copy Markdown
Owner

Thanks - left a few comments but otherwise looks good!

I have another conflicting PR coming in though, so trying to figure out the best way to handle both.

@cpoppema
Copy link
Copy Markdown
Author

cpoppema commented Apr 25, 2026

Thanks - left a few comments but otherwise looks good!

I have another conflicting PR coming in though, so trying to figure out the best way to handle both.

Thanks for taking a look and no worries, I don't mind rebasing afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants