-
Notifications
You must be signed in to change notification settings - Fork 5k
enh: stable cache tag filter(equal condition) results in stream #33508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @Tony2h, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement to the system's caching infrastructure by adding a specialized cache for stable tag filter conditions. This new cache is designed to optimize queries that involve specific equality-based tag filtering, thereby reducing redundant computations and accelerating data retrieval. The changes encompass new data structures, API extensions, and comprehensive cache management logic to ensure data consistency across various metadata operations. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new caching mechanism for tag filter queries with EQUAL conditions, which is a good performance enhancement. The implementation spans both the query execution and metadata caching layers. I've found several critical issues related to memory management and correctness, such as a use-after-free, potential for memory allocation with a negative size, and unsafe iterator usage during removal. There are also some medium-severity issues like missing NULL checks, code duplication, and minor code style problems. Addressing these issues is important for the stability and correctness of this new feature.
source/libs/executor/src/executil.c
Outdated
| for (int32_t i = 0; i < taosArrayGetSize(pIdWithVal); ++i) { | ||
| STagDataEntry* pEntry = taosArrayGet(pIdWithVal, i); | ||
| len += sizeof(col_id_t) + | ||
| ((SValueNode*)pEntry->pValueNode)->node.resType.bytes; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length calculation for the digest payload is incorrect for variable-length data types. ((SValueNode*)pEntry->pValueNode)->node.resType.bytes will return the size of a pointer for types like VARCHAR, not the actual length of the string data. This will result in an incorrect digest and cache misses. You should use a logic similar to buildTagDataEntryKey to correctly calculate the length for variable-length types.
for (int32_t i = 0; i < taosArrayGetSize(pIdWithVal); ++i) {
STagDataEntry* pEntry = taosArrayGet(pIdWithVal, i);
SValueNode* pValueNode = (SValueNode*)pEntry->pValueNode;
len += sizeof(col_id_t);
if (IS_VAR_DATA_TYPE(pValueNode->node.resType.type)) {
if (pValueNode->node.resType.type == TSDB_DATA_TYPE_JSON) {
len += getJsonValueLen(pValueNode->datum.p);
} else {
len += varDataLen(pValueNode->datum.p);
}
} else {
len += pValueNode->node.resType.bytes;
}
}| *tagCondKeyLen = | ||
| (int32_t)(taosArrayGetSize(*pTagColIds) * (sizeof(col_id_t) + 1) - 1); | ||
| *pTagCondKey = (char*)taosMemoryCalloc(1, *tagCondKeyLen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If pTagColIds is empty, taosArrayGetSize(*pTagColIds) will be 0. This makes *tagCondKeyLen equal to (0 * (sizeof(col_id_t) + 1) - 1), which is -1. Calling taosMemoryCalloc(1, -1) on the next line is undefined behavior and will likely cause a crash or memory corruption. You should handle the case where pTagColIds is empty.
| code = taosHashRemove((*pTagConds)->set, key, keyLen); | ||
| TSDB_CHECK_CODE(code, lino, _end); | ||
| } | ||
| pIter = taosHashIterate((*pTagConds)->set, pIter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iterator pIter might be invalidated by taosHashRemove on line 1112, but it is used again on line 1115 to get the next element. This can lead to undefined behavior like skipping elements or crashing. You should get the next iterator before the potential removal. A safer pattern would be: void* pNextIter = taosHashIterate((*pTagConds)->set, pIter); if (found) { ... taosHashRemove(...); } pIter = pNextIter;
source/libs/executor/src/executil.c
Outdated
| int32_t tagCondKeyLen; | ||
| SArray* pTagColIds = NULL; | ||
| buildTagCondKey(pTagCond, &pTagCondKey, &tagCondKeyLen, &pTagColIds); | ||
| taosArrayDestroy(pTagColIds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| STagConds** pTagConds = | ||
| (STagConds**)taosHashGet(pTableEntry, &suid, sizeof(uint64_t)); | ||
| if (pTagConds == NULL) { | ||
| // add new (suid -> tag conds) entry | ||
| STagConds* pEntry = (STagConds*)taosMemoryMalloc(sizeof(STagConds)); | ||
| TSDB_CHECK_NULL(pEntry, code, lino, _end, terrno); | ||
|
|
||
| pEntry->hitTimes = 0; | ||
| pEntry->set = taosHashInit( | ||
| 1024, taosGetDefaultHashFunction(TSDB_DATA_TYPE_BINARY), | ||
| false, HASH_NO_LOCK); | ||
| taosHashSetFreeFp(pEntry->set, freeTagFilterEntryFp); | ||
| TSDB_CHECK_NULL(pEntry->set, code, lino, _end, terrno); | ||
|
|
||
| code = taosHashPut( | ||
| pTableEntry, &suid, sizeof(uint64_t), &pEntry, POINTER_BYTES); | ||
| TSDB_CHECK_CODE(code, lino, _end); | ||
|
|
||
| pTagConds = (STagConds**)taosHashGet(pTableEntry, &suid, sizeof(uint64_t)); | ||
| TSDB_CHECK_NULL(pTagConds, code, lino, _end, terrno); | ||
| } | ||
|
|
||
| STagCondFilterEntry** pFilterEntry = | ||
| (STagCondFilterEntry**)taosHashGet( | ||
| (*pTagConds)->set, pTagCondKey, tagCondKeyLen); | ||
| if (pFilterEntry == NULL) { | ||
| // add new (tag cond -> filter entry) entry | ||
| STagCondFilterEntry* pEntry = | ||
| (STagCondFilterEntry*)taosMemoryMalloc(sizeof(STagCondFilterEntry)); | ||
| TSDB_CHECK_NULL(pEntry, code, lino, _end, terrno); | ||
|
|
||
| pEntry->hitTimes = 0; | ||
| pEntry->set = taosHashInit( | ||
| 1024, taosGetDefaultHashFunction(TSDB_DATA_TYPE_BINARY), | ||
| false, HASH_NO_LOCK); | ||
| taosHashSetFreeFp(pEntry->set, freeSArrayPtr); | ||
| TSDB_CHECK_NULL(pEntry->set, code, lino, _end, terrno); | ||
|
|
||
| code = taosHashPut( | ||
| (*pTagConds)->set, pTagCondKey, tagCondKeyLen, &pEntry, POINTER_BYTES); | ||
| TSDB_CHECK_CODE(code, lino, _end); | ||
|
|
||
| pFilterEntry = (STagCondFilterEntry**)taosHashGet( | ||
| (*pTagConds)->set, pTagCondKey, tagCondKeyLen); | ||
| TSDB_CHECK_NULL(pFilterEntry, code, lino, _end, terrno); | ||
| (*pFilterEntry)->pColIds = taosArrayDup(pTagColIds, NULL); | ||
| } else { | ||
| // pColIds is already set, so we can destroy the new one | ||
| taosArrayDestroy(pTagColIds); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code for creating a new STagConds entry (lines 834-852) and a new STagCondFilterEntry (lines 857-881) is quite verbose and contains similar patterns of allocation, initialization, and insertion into a hash map. This could be refactored into helper functions to improve readability and maintainability. For example, you could create get_or_create_tag_conds and get_or_create_filter_entry functions.
| pFilterEntry = (STagCondFilterEntry**)taosHashGet( | ||
| (*pTagConds)->set, pTagCondKey, tagCondKeyLen); | ||
| TSDB_CHECK_NULL(pFilterEntry, code, lino, _end, terrno); | ||
| (*pFilterEntry)->pColIds = taosArrayDup(pTagColIds, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value of taosArrayDup is not checked. If memory allocation fails, it can return NULL, which would lead to a NULL pointer being assigned to (*pFilterEntry)->pColIds. This would likely cause a crash later. You should check the return value and handle the error.
(*pFilterEntry)->pColIds = taosArrayDup(pTagColIds, NULL);
TSDB_CHECK_NULL((*pFilterEntry)->pColIds, code, lino, _end, terrno);| } | ||
|
|
||
| // add to cache. | ||
| SArray* pPayload = taosArrayDup(pUidList, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value of taosArrayDup is not checked. If memory allocation fails, it can return NULL, which would lead to a NULL pointer being assigned to pPayload. This would likely cause a crash later. You should check the return value and handle the error.
SArray* pPayload = taosArrayDup(pUidList, NULL);
TSDB_CHECK_NULL(pPayload, code, lino, _end, terrno);| code = taosThreadRwlockWrlock(pRwlock); | ||
| TSDB_CHECK_CODE(code, lino, _end); | ||
|
|
||
| tb_uid_t suid = pDroppedTable->ctbEntry.suid;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| int32_t keyLen = 0; | ||
| char* pKey = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pFilterEntry->set, context.digest, tListLen(context.digest)); | ||
| if (pArray != NULL) { | ||
| // check and remove the dropped table uid from the array | ||
| // TODO(Tony Zhang): optimize this scan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #33508 +/- ##
===========================================
+ Coverage 54.05% 71.27% +17.22%
===========================================
Files 472 534 +62
Lines 294900 345316 +50416
Branches 99139 0 -99139
===========================================
+ Hits 159397 246137 +86740
- Misses 83784 99179 +15395
+ Partials 51719 0 -51719
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
e7c4db8 to
60ba3b2
Compare
60ba3b2 to
9398411
Compare
Description
Please briefly describe the code changes in this pull request.
Checklist
Please check the items in the checklist if applicable.