Reduce query counts#810
Conversation
|
|
||
| if 'ref_type' in request.GET: | ||
| refs = refs.filter(ref_type=request.GET['ref_type']).distinct() | ||
| refs = refs.filter(ref_type=request.GET['ref_type']) |
There was a problem hiding this comment.
Also need to restore distinct() here?
There was a problem hiding this comment.
Pretty certain distinct() has no effect here in the resultset other than slowing down the query. 'ref_type' is a local field unlike the 'package__' filter in packages/views.py.
| filter_list = [] | ||
| filter_list.append(Filter(request, 'Reference Type', 'ref_type', | ||
| Reference.objects.values_list('ref_type', flat=True).distinct())) | ||
| Reference.objects.values_list('ref_type', flat=True))) |
There was a problem hiding this comment.
So, I was expecting distinct() here to be useful indeed. Since ref_type would be shared by many References with different urls. I'm not sure where it is deduplicated but doing it without distinct() saves quite a bit of time and the Filter by on the side does not show any duplicates.
With .distinct() it takes 325ms:
SELECT DISTINCT "security_reference"."ref_type",
"security_reference"."url"
FROM "security_reference"
ORDER BY "security_reference"."ref_type" ASC,
"security_reference"."url" ASCWithout .distinct() it's only 17ms:
SELECT "security_reference"."ref_type"
FROM "security_reference"
ORDER BY "security_reference"."ref_type" ASC,
"security_reference"."url" ASCRunning it in ./manage.py dbshell will reveal it returns many more rows (ie. all of them); it is just computational cheaper it seems. I can also revert this if you want me to
|
Thanks - left a few comments but otherwise looks good! I have another conflicting PR coming in though, so trying to figure out the best way to handle both. |
Thanks for taking a look and no worries, I don't mind rebasing afterwards. |
Hello there!
We've been running patchman for a bit, and we are already running a slightly patched version with some query optimalisations. After an upstream update I was pleasantly surprised to see a builtin solution to get rid of most N+1 queries, so thanks for that!
In this PR I created separate commits per module (mostly) to play even smarter with select/prefetch_related and some other stuff to either: reduce query count, or to optimize query duration.
Please let me know if you want anything changes or some comparisons. This is exclusively targeting views. I can test against postgres and mysql. Current query counts/durations are against out-of-the-box docker containers on my local machine (so no network latency).
Some observations while making my changes:
(I only realised this halfway through, so I might not have updated every queryset most optimally)
Here's an overview of my database and changes per module. I tested the majority using postgres.
arch_machinearchitecturearch_packagearchitectureauth_groupauth_group_permissionsauth_permissionauth_userauth_user_groupsauth_user_user_permissionsdjango_admin_logdjango_celery_beat_clockedscheduledjango_celery_beat_crontabscheduledjango_celery_beat_intervalscheduledjango_celery_beat_periodictaskdjango_celery_beat_periodictasksdjango_celery_beat_solarscheduledjango_content_typedjango_migrationsdjango_sessiondjango_sitedomains_domainerrata_erratumerrata_erratum_affected_packageserrata_erratum_cveserrata_erratum_fixed_packageserrata_erratum_osreleaseserrata_erratum_referenceshosts_hosthosts_host_erratahosts_host_moduleshosts_host_packageshosts_host_updateshosts_hostrepomodules_modulemodules_module_packagesoperatingsystems_osreleaseoperatingsystems_osrelease_reposoperatingsystems_osvariantpackages_packagepackages_packagecategorypackages_packagenamepackages_packageupdatereports_reportrepos_mirrorrepos_mirrorpackagerepos_repositoryrest_framework_api_key_apikeysecurity_cvesecurity_cve_cvss_scoressecurity_cve_cwessecurity_cve_referencessecurity_cvsssecurity_cwesecurity_referencetagging_tagtagging_taggeditemtaggit_tagtaggit_taggeditemImpact of changes in each module:
/hosts/
hosts/admin.py: select_related for HostRepoAdmin
/admin/hosts/hostrepo/hosts/managers.py: select_related
osvariant__arch/hosts/host/views.py: select_related
archto OSVariant/hosts/host/views.py: select_related
repoto HostRepo/hosts/$hostname/host/views.py: select_related
__nameand__archto updates_by_package/hosts/$hostname//operatingsystems/
operatingsystems/tables.py + views.py: use xyz_count / prefetch_related for caching nested m2m .count queries
/os/releases//os/releases/6//os/variants//packages/
packages/admin.py: select_related for PackageUpdateAdmin
/admin/packages/packageupdate/packages/views.py: avoid table joining to query the count() for PackageTable:
/packages/id//packages/name/packages/views.py: no unnecessary distinct() for arch + packagetype filters (local fields)
/packages/id/?/packages/id/?provides_fix_in_erratum=false/packages/id/?available_in_repos=false/packages/id/?packagetype=D/packages/id/?arch_id=2/packages/name/?packagetype=D/packages/name/?arch_id=2/package/name//repos/
repos/views.py: use xyz_count
/repos//security/
security/views.py: use xyz_count / prefetch_related for caching nested m2m .count queries
/security/cwes/security/cves/security/references//security/references/?ref_type=Package/dashboard/
utils/views:py + dashboard.html
/dashboard/In particular MySQL is very slow here, x3 the duration of Postgres...
Postgres
/dashboard/norepo_packages.countnorepo_packages (.all)orphaned_packages.countorphaned_packages (.all)Mysql
/dashboard/norepo_packages.countnorepo_packages (.all)orphaned_packages.countorphaned_packages (.all)* (every view)
context_processors.py: since context processors defined in settings are executed for every
renderincluding subrenders by django-tables2:/admin/*/dashboard/A bonus gain, most noticeable on MySQL:
reports/models.py: adding index for Report.created
/reports/