Skip to content

Commit 2b65189

Browse files
authored
Merge pull request #70 from cloudkite-io/feat/mssql-diff-backups
Add support to MSSQL differential backups
2 parents 1afc8ce + b3ec0fe commit 2b65189

File tree

5 files changed

+481
-77
lines changed

5 files changed

+481
-77
lines changed

mssql/Chart.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@ apiVersion: v2
22
name: mssql
33
description: A Helm chart for Kubernetes
44
type: application
5-
version: 0.1.9
6-
appVersion: "0.1.9"
5+
version: 0.2.1
6+
appVersion: "0.2.1"

mssql/README.md

Lines changed: 333 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,336 @@
22
This chart is inspired by https://gitlab.com/xrow-public/helm-mssql with the main modification being able to run multiple MSSQL servers under one chart.
33

44
The image that we are using(`gcr.io/cloudkite-public/mssql:2022`) is a from `registry.gitlab.com/xrow-public/helm-mssql/mssql:1.2.0` that has been pulled without any modification.
5+
6+
## Backup Configuration
7+
8+
This chart supports a sophisticated multi-tier backup strategy for MSSQL databases, combining Full, Differential, and Transaction Log backups. All backups are stored directly to AWS S3 using SQL Server's native S3 backup functionality.
9+
10+
**Note:** Backup scripts are built into the custom Docker image.
11+
12+
### Understanding MSSQL Backup Types
13+
14+
#### 1. Full Backup
15+
A **Full Backup** creates a complete copy of the entire database, including all data files and objects (tables, procedures, functions, views, indexes, etc.).
16+
17+
**Characteristics:**
18+
- Captures everything in the database at a point in time
19+
- Establishes a "base" for differential backups
20+
- Largest backup size, takes the longest time
21+
- Essential for any restore operation
22+
- Uses T-SQL command: `BACKUP DATABASE [db] WITH FORMAT, COMPRESSION`
23+
24+
**When to use:**
25+
- As the foundation of any backup strategy (required)
26+
- Weekly or daily, depending on database size and criticality
27+
- After major schema changes or data migrations
28+
29+
#### 2. Differential Backup
30+
A **Differential Backup** captures only the data that has changed since the last FULL backup.
31+
32+
**Characteristics:**
33+
- Much faster and smaller than full backups
34+
- Each differential backup grows larger as more changes accumulate
35+
- Relies on the most recent full backup as its "base"
36+
- Uses T-SQL command: `BACKUP DATABASE [db] WITH DIFFERENTIAL, COMPRESSION`
37+
38+
**How it works:**
39+
- SQL Server tracks changed extents (groups of 8 pages) using a differential bitmap
40+
- When you run a full backup, the bitmap resets
41+
- Each differential backup captures all changes since that full backup
42+
- Each subsequent differential is cumulative (includes all changes since the last full)
43+
44+
**When to use:**
45+
- Between full backups for more frequent protection
46+
- Daily or every few hours for production databases
47+
- When you want faster backups than full but still need database-level recovery
48+
49+
**Important:** Once you take a new full backup, the differential "chain" resets. All previous differential backups become obsolete because they reference the old full backup.
50+
51+
#### 3. Transaction Log Backup
52+
A **Transaction Log Backup** captures all database transactions since the last log backup.
53+
54+
**Characteristics:**
55+
- Enables point-in-time recovery (restore to a specific moment, e.g., "5 minutes before the incident")
56+
- Only works with databases in FULL or BULK_LOGGED recovery model (see below)
57+
- Very small and fast (only captures transaction log records)
58+
- Sequential - must be applied in order during restore
59+
- Uses T-SQL command: `BACKUP LOG [db] WITH COMPRESSION`
60+
61+
**When to use:**
62+
- For critical databases requiring minimal data loss (RPO < 1 hour)
63+
- When you need the ability to restore to an exact moment in time (e.g., right before a bad deployment or accidental data deletion)
64+
- Every 15-30 minutes for high-availability scenarios
65+
66+
**Understanding Recovery Models:**
67+
68+
SQL Server databases operate in one of three recovery models that determine how transaction logs are managed:
69+
70+
1. **SIMPLE Recovery Model** (Default for new databases)
71+
- Transaction logs are automatically truncated after each checkpoint
72+
- **You CANNOT take transaction log backups** (the backup command will fail)
73+
- Can only restore to the point of the last full or differential backup
74+
- Use when: Development, staging, or databases where some data loss is acceptable
75+
- **Our recommendation:** Use Full + Differential backups only (disable log backups)
76+
77+
2. **FULL Recovery Model** (Recommended for production)
78+
- Transaction logs are preserved until you back them up
79+
- Allows point-in-time recovery to any moment between backups
80+
- **You MUST take regular transaction log backups** or the log file will grow indefinitely and fill up disk space
81+
- Use when: Production databases where minimal data loss is critical
82+
- **Our recommendation:** Enable Full + Differential + Transaction Log backups
83+
84+
3. **BULK_LOGGED Recovery Model** (Advanced use case)
85+
- Similar to FULL but optimized for bulk operations
86+
- Allows transaction log backups like FULL model
87+
88+
**How to check the database's recovery model:**
89+
```sql
90+
SELECT name, recovery_model_desc
91+
FROM sys.databases
92+
WHERE name = 'the_database_name';
93+
```
94+
95+
**Why transaction log backups are disabled by default:**
96+
- Most databases start in SIMPLE recovery model, where log backups don't work
97+
- Transaction log backups require more complex restore procedures
98+
- They generate more frequent backup jobs and storage costs
99+
- Full + Differential backups are sufficient for most use cases (RPO of a few hours is acceptable)
100+
101+
**When you SHOULD enable transaction log backups:**
102+
- the database is in FULL recovery model (or you're willing to change it)
103+
- You need to recover to a specific point in time (e.g., "right before that DELETE statement ran")
104+
- You cannot tolerate losing more than 15-30 minutes of data
105+
- You have critical production data where every transaction matters
106+
107+
**Important:** If you enable transaction log backups, you must also change the database to FULL recovery model:
108+
```sql
109+
ALTER DATABASE [the_database_name] SET RECOVERY FULL;
110+
```
111+
112+
### Backup Strategy Patterns
113+
114+
The chart supports three common backup patterns:
115+
116+
#### Pattern 1: Basic (Small/Medium databases, Dev/Staging)
117+
```yaml
118+
backup:
119+
strategy:
120+
full:
121+
enabled: true
122+
schedule: "0 2 * * 0" # Weekly on Sunday at 2 AM
123+
retentionDays: 90
124+
differential:
125+
enabled: true
126+
schedule: "0 2 * * 1-6" # Daily Mon-Sat at 2 AM
127+
retentionDays: 30
128+
log:
129+
enabled: false
130+
```
131+
- **RPO (Recovery Point Objective):** ~24 hours
132+
- **RTO (Recovery Time Objective):** Fast (restore 1 full + 1 differential)
133+
- **Storage usage:** Low
134+
- **Best for:** Non-critical databases, development, staging environments
135+
136+
#### Pattern 2: Standard (Production databases)
137+
```yaml
138+
backup:
139+
strategy:
140+
full:
141+
enabled: true
142+
schedule: "0 2 * * *" # Daily at 2 AM
143+
retentionDays: 90
144+
differential:
145+
enabled: true
146+
schedule: "0 */6 * * *" # Every 6 hours
147+
retentionDays: 30
148+
log:
149+
enabled: false
150+
```
151+
- **RPO:** ~6 hours
152+
- **RTO:** Fast
153+
- **Storage usage:** Medium
154+
- **Best for:** Most production databases with standard availability requirements
155+
156+
#### Pattern 3: High Availability (Critical databases)
157+
```yaml
158+
backup:
159+
strategy:
160+
full:
161+
enabled: true
162+
schedule: "0 2 * * *" # Daily at 2 AM
163+
retentionDays: 90
164+
differential:
165+
enabled: true
166+
schedule: "0 */3 * * *" # Every 3 hours
167+
retentionDays: 30
168+
log:
169+
enabled: true
170+
schedule: "*/15 * * * *" # Every 15 minutes
171+
retentionDays: 7
172+
```
173+
- **RPO:** ~15 minutes (can restore to any point in time)
174+
- **RTO:** Moderate (requires replaying transaction logs)
175+
- **Storage usage:** Higher
176+
- **Best for:** Critical databases with strict availability SLAs
177+
178+
### S3 Organization
179+
180+
Backups are organized in S3 with the following structure:
181+
182+
```
183+
s3://the-backup-bucket/
184+
└── backups/
185+
└── {database_name}/
186+
├── full/
187+
│ ├── db_a-full-2025-01-20-02-00.bak
188+
│ ├── db_a-full-2025-01-27-02-00.bak
189+
│ └── db_a-full-2025-02-03-02-00.bak
190+
├── differential/
191+
│ ├── db_a-diff-2025-01-21-02-00.bak
192+
│ ├── db_a-diff-2025-01-22-02-00.bak
193+
│ └── db_a-diff-2025-01-26-02-00.bak
194+
└── log/
195+
├── db_a-log-2025-01-26-12-00.trn
196+
├── db_a-log-2025-01-26-12-15.trn
197+
└── db_a-log-2025-01-26-12-30.trn
198+
```
199+
200+
### Restore Process
201+
202+
#### Restore from Full + Differential Backup
203+
204+
1. **Identify the backups needed:**
205+
- Most recent FULL backup before the desired restore point
206+
- Most recent DIFFERENTIAL backup before the desired restore point
207+
208+
2. **Restore the full backup:**
209+
```sql
210+
RESTORE DATABASE [db_a]
211+
FROM URL = 's3://deverus-mssql-backups-bucket/backups/db_a/full/db_a-full-2025-01-20-02-00.bak'
212+
WITH NORECOVERY,
213+
CREDENTIAL = 's3://deverus-mssql-backups-bucket.s3.us-east-1.amazonaws.com';
214+
```
215+
216+
3. **Restore the differential backup:**
217+
```sql
218+
RESTORE DATABASE [db_a]
219+
FROM URL = 's3://deverus-mssql-backups-bucket/backups/db_a/differential/db_a-diff-2025-01-26-02-00.bak'
220+
WITH RECOVERY,
221+
CREDENTIAL = 's3://deverus-mssql-backups-bucket.s3.us-east-1.amazonaws.com';
222+
```
223+
224+
**Note:** You only need the latest differential backup. All previous differential backups can be ignored as each differential is cumulative.
225+
226+
#### Restore from Full + Differential + Transaction Logs
227+
228+
1. **Restore the full backup with NORECOVERY:**
229+
```sql
230+
RESTORE DATABASE [db_a]
231+
FROM URL = 's3://deverus-mssql-backups-bucket/backups/db_a/full/db_a-full-2025-01-20-02-00.bak'
232+
WITH NORECOVERY,
233+
CREDENTIAL = 's3://deverus-mssql-backups-bucket.s3.us-east-1.amazonaws.com';
234+
```
235+
236+
2. **Restore the differential backup with NORECOVERY:**
237+
```sql
238+
RESTORE DATABASE [db_a]
239+
FROM URL = 's3://deverus-mssql-backups-bucket/backups/db_a/differential/db_a-diff-2025-01-26-02-00.bak'
240+
WITH NORECOVERY,
241+
CREDENTIAL = 's3://deverus-mssql-backups-bucket.s3.us-east-1.amazonaws.com';
242+
```
243+
244+
3. **Restore transaction log backups in sequence:**
245+
```sql
246+
-- Restore each log backup in chronological order
247+
RESTORE LOG [db_a]
248+
FROM URL = 's3://deverus-mssql-backups-bucket/backups/db_a/log/db_a-log-2025-01-26-12-00.trn'
249+
WITH NORECOVERY,
250+
CREDENTIAL = 's3://deverus-mssql-backups-bucket.s3.us-east-1.amazonaws.com';
251+
252+
RESTORE LOG [db_a]
253+
FROM URL = 's3://deverus-mssql-backups-bucket/backups/db_a/log/db_a-log-2025-01-26-12-15.trn'
254+
WITH NORECOVERY,
255+
CREDENTIAL = 's3://deverus-mssql-backups-bucket.s3.us-east-1.amazonaws.com';
256+
257+
-- Last log restore with RECOVERY to bring database online
258+
RESTORE LOG [db_a]
259+
FROM URL = 's3://deverus-mssql-backups-bucket/backups/db_a/log/db_a-log-2025-01-26-12-30.trn'
260+
WITH RECOVERY,
261+
CREDENTIAL = 's3://deverus-mssql-backups-bucket.s3.us-east-1.amazonaws.com';
262+
```
263+
264+
4. **Or restore to a specific point in time:**
265+
```sql
266+
RESTORE LOG [db_a]
267+
FROM URL = 's3://deverus-mssql-backups-bucket/backups/db_a/log/db_a-log-2025-01-26-12-30.trn'
268+
WITH RECOVERY, STOPAT = '2025-01-26 12:25:00',
269+
CREDENTIAL = 's3://deverus-mssql-backups-bucket.s3.us-east-1.amazonaws.com';
270+
```
271+
272+
### Configuration Example
273+
274+
```yaml
275+
databases:
276+
- name: db-a
277+
database: db_a
278+
backup:
279+
enabled: true
280+
s3Bucket: "the-primary-backup-bucket"
281+
s3Region: "us-east-1"
282+
283+
# Backup strategy configuration
284+
strategy:
285+
# Full backup schedule (required as base for differential)
286+
full:
287+
enabled: true
288+
schedule: "0 2 * * 0" # Weekly on Sunday at 2:00 AM UTC
289+
retentionDays: 90 # Keep for 3 months
290+
291+
# Differential backup schedule (requires full backups)
292+
differential:
293+
enabled: true
294+
schedule: "0 2 * * 1-6" # Daily Mon-Sat at 2:00 AM UTC
295+
retentionDays: 30 # Keep for 1 month
296+
297+
# Transaction log backup (optional, for point-in-time recovery)
298+
log:
299+
enabled: false
300+
schedule: "*/30 * * * *" # Every 30 minutes
301+
retentionDays: 7 # Keep for 1 week
302+
303+
image:
304+
repository: gcr.io/cloudkite-public/mssql
305+
tag: "2022"
306+
pullPolicy: IfNotPresent
307+
308+
serviceAccount:
309+
create: true
310+
name: "db-a-backup-sa"
311+
annotations:
312+
eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/mssql-backup-role-for-db-a"
313+
```
314+
315+
### IAM Permissions Required
316+
317+
The service account's IAM role needs the following S3 permissions:
318+
319+
```json
320+
{
321+
"Version": "2012-10-17",
322+
"Statement": [
323+
{
324+
"Effect": "Allow",
325+
"Action": [
326+
"s3:PutObject",
327+
"s3:GetObject",
328+
"s3:ListBucket"
329+
],
330+
"Resource": [
331+
"arn:aws:s3:::the-backup-bucket",
332+
"arn:aws:s3:::the-backup-bucket/*"
333+
]
334+
}
335+
]
336+
}
337+
```

mssql/sample.values.yaml

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
loadBalancerName: mssql-load-balancer
2-
32
# Global image configuration
43
image:
54
repository: gcr.io/cloudkite-public/mssql
@@ -30,12 +29,32 @@ databases:
3029
memory: 4Gi
3130
backup:
3231
enabled: true
33-
schedule: "0 2 * * *" # Daily at 2:00 AM UTC
3432
s3Bucket: "your-primary-backup-bucket"
3533
s3Region: "us-east-1"
34+
35+
# Backup strategy configuration
36+
strategy:
37+
# Full backup schedule (required as base for differential)
38+
full:
39+
enabled: true
40+
schedule: "0 2 * * 0" # Weekly on Sunday at 2:00 AM UTC
41+
retentionDays: 90 # Keep for 3 months
42+
43+
# Differential backup schedule (requires full backups)
44+
differential:
45+
enabled: true
46+
schedule: "0 2 * * 1-6" # Daily Mon-Sat at 2:00 AM UTC
47+
retentionDays: 30 # Keep for 1 month
48+
49+
# Transaction log backup (optional, for point-in-time recovery)
50+
log:
51+
enabled: false
52+
schedule: "*/30 * * * *" # Every 30 minutes
53+
retentionDays: 7 # Keep for 1 week
54+
3655
image:
3756
repository: gcr.io/cloudkite-public/mssql
38-
tag: "2022"
57+
tag: "v0.3.3"
3958
pullPolicy: IfNotPresent
4059
serviceAccount:
4160
create: true

0 commit comments

Comments
 (0)