Skip to content

Commit b0c4ac3

Browse files
authored
unwind tail recursion for characters (#7477)
1 parent 4f8695f commit b0c4ac3

File tree

3 files changed

+9
-3
lines changed

3 files changed

+9
-3
lines changed

NEWS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -338,7 +338,7 @@ See [#2611](https://github.com/Rdatatable/data.table/issues/2611) for details. T
338338
339339
19. Ellipsis elements like `..1` are correctly excluded when searching for variables in "up-a-level" syntax inside `[`, [#5460](https://github.com/Rdatatable/data.table/issues/5460). Thanks @ggrothendieck for the report and @MichaelChirico for the fix.
340340
341-
20. `forderv` could segfault on keys with long runs of identical bytes (e.g., many duplicate columns) because the single-group branch tail-recursed radix-by-radix until the C stack ran out, [#4300](https://github.com/Rdatatable/data.table/issues/4300). This is a major problem since sorting is extensively used in `data.table`. Thanks @quantitative-technologies for the report and @ben-schwen for the fix.
341+
20. `forderv` could segfault on keys with long runs of identical bytes because the single-group branch tail-recursed radix-by-radix until the C stack ran out. This affected both integer/numeric sorting with many duplicate columns ([#4300](https://github.com/Rdatatable/data.table/issues/4300)) and character sorting with long common prefixes ([#7462](https://github.com/Rdatatable/data.table/issues/7462)). This is a major problem since sorting is extensively used in `data.table`. Thanks @quantitative-technologies and @DavisVaughan for the reports, and @ben-schwen for the fix.
342342
343343
21. `[` now preserves existing key(s) when new columns are added before them, instead of incorrectly setting a new column as key, [#7364](https://github.com/Rdatatable/data.table/issues/7364). Thanks @czeildi for the bug report and the fix.
344344

inst/tests/tests.Rraw

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21839,6 +21839,9 @@ DT[, V1000 := 20:1]
2183921839
test(2343.1, forderv(DT, by=names(DT), sort=FALSE, retGrp=TRUE), forderv(DT, by=c("V1", "V1000"), sort=FALSE, retGrp=TRUE))
2184021840
x = c(rep(0, 7e5), 1e6)
2184121841
test(2343.2, forderv(list(x)), integer(0))
21842+
# resolve tail recursion on long character vectors with common prefix #7462
21843+
x = paste0(strrep("a", 1e5L), 2:1)
21844+
test(2343.3, forderv(x), INT(2, 1))
2184221845

2184321846
# Keep key when new column added before existing key in j
2184421847
# Incorrect key can lead to incorrect join result #7364

src/forder.c

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -246,16 +246,17 @@ static void cradix_r(SEXP *xsub, int n, int radix)
246246
// 3) no need to maintain o. Just simply reorder x. No grps or push.
247247
{
248248
if (n<=1) return;
249+
while (TRUE) {
249250
int *thiscounts = cradix_counts + radix*256;
250251
uint8_t lastx = 0; // the last x is used to test its bin
251252
for (int i=0; i<n; i++) {
252253
lastx = radix<LENGTH(xsub[i]) ? (uint8_t)(CHAR(xsub[i])[radix]) : 1; // no NA_STRING present, 1 for "" (could use 0 too maybe since NA_STRING not present)
253254
thiscounts[ lastx ]++;
254255
}
255256
if (thiscounts[lastx]==n && radix<ustr_maxlen-1) {
256-
cradix_r(xsub, n, radix+1);
257257
thiscounts[lastx] = 0; // all x same value, the rest must be 0 already, save the memset
258-
return;
258+
radix++; // tail recursion eliminated to avoid segfault when prefixes are long
259+
continue;
259260
}
260261
int itmp = thiscounts[0];
261262
for (int i=1; i<256; i++) {
@@ -280,6 +281,8 @@ static void cradix_r(SEXP *xsub, int n, int radix)
280281
thiscounts[i] = 0; // set to 0 now since we're here, saves memset afterwards. Important to do this ready for this memory's reuse
281282
}
282283
if (itmp<n-1) cradix_r(xsub+itmp, n-itmp, radix+1); // final group
284+
break;
285+
}
283286
}
284287

285288
static void cradix(SEXP *x, int n)

0 commit comments

Comments
 (0)