Skip to content

Conversation

@asantoni
Copy link

@asantoni asantoni commented Sep 26, 2025

Hi all, just quickly hacked in support for SAMPLE BY support when creating models and added a missing datetime function. I also implemented support for the SAMPLE clause.

This adds a new "sample_by" parameter to the constructor of BaseMergeTree, which can be used to enable sampling on your table.

from clickhouse_backend import models
from clickhouse_backend.models.functions.hashes import farmFingerprint64
from clickhouse_backend.models.functions.datetime import toStartOfDay

class DemoLog(models.ClickhouseModel):
    timestamp = models.DateTimeField(default=timezone.now)
    ip = models.GenericIPAddressField(default="::")

    class Meta:
        engine = models.MergeTree(
            primary_key=("timestamp", toStartOfDay("timestamp"), farmFingerprint64("ip")),
            order_by=("timestamp", toStartOfDay("timestamp"), farmFingerprint64("ip")),
            partition_by=toStartOfMonth("timestamp"),
            sample_by=(farmFingerprint64("ip"),),
            index_granularity=8192,
        )

You can query using the SAMPLE clause using the new .sample function like so:

session_count_estimate = DemoLog.objects.filter(timestamp__gte=time_start, timestamp__lte=time_end).sample(0.1).aggregate(session_count=Count('id') * 10

The new sample function takes two parameters:

    def sample(self, sample_fraction, sample_offset=None):

which generates either a SAMPLE k or SAMPLE k OFFSET m clause as per the Clickhouse docs on SAMPLE.

I didn't include unit tests because I'm too lazy to spin up all the docker stuff and I'm in a hurry to get some bare minimum thing here working. I'm hoping this is PR is useful for others and could be useful for the project. Thanks!

@jayvynl
Copy link
Owner

jayvynl commented Oct 22, 2025

Nice job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants