Skip to content

Commit f798751

Browse files
Refactor AWS Glue modules documentation and outputs
- Updated MODULE.md for aws-glue-data-lake-catalog to improve formatting and clarify input/output descriptions. - Renamed output variable from to for consistency in aws-glue-data-lake-catalog. - Enhanced MODULE.md for aws-glue-jobs with requirements, providers, modules, inputs, and outputs sections. - Simplified variable definitions in aws-glue-jobs to remove default values for optional fields. - Improved naming conventions for job tags in aws-glue-jobs to include resource prefix. - Added default_run_properties to workflows in aws-glue-workflow for better job configuration. - Updated regex checks in outputs for aws-glue-workflow to ensure proper trigger name matching.
1 parent b144590 commit f798751

File tree

11 files changed

+309
-34
lines changed

11 files changed

+309
-34
lines changed

.pre-commit-config.yaml

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# Pre-commit configuration for Terraform Infrastructure
2+
# See https://pre-commit.com for more information
3+
# See https://pre-commit.com/hooks.html for more hooks
4+
5+
default_stages: [pre-commit]
6+
default_language_version:
7+
python: python3
8+
9+
repos:
10+
# Core pre-commit hooks for general file hygiene
11+
- repo: https://github.com/pre-commit/pre-commit-hooks
12+
rev: v5.0.0
13+
hooks:
14+
# File formatting and structure
15+
- id: trailing-whitespace
16+
args: [--markdown-linebreak-ext=md]
17+
- id: end-of-file-fixer
18+
- id: check-yaml
19+
args: [--allow-multiple-documents]
20+
exclude: \.ya?ml\.j2$
21+
- id: check-json
22+
- id: check-merge-conflict
23+
- id: check-added-large-files
24+
args: [--maxkb=1024]
25+
26+
# Git and file system checks
27+
- id: check-case-conflict
28+
- id: check-symlinks
29+
- id: destroyed-symlinks
30+
31+
# Content validation
32+
- id: check-executables-have-shebangs
33+
- id: check-shebang-scripts-are-executable
34+
35+
# Terraform-specific hooks using the most popular pre-commit-terraform
36+
- repo: https://github.com/antonbabenko/pre-commit-terraform
37+
rev: v1.99.5 # Latest as of Jan 2025
38+
hooks:
39+
# Core Terraform formatting and validation
40+
- id: terraform_fmt
41+
args:
42+
- --args=-recursive
43+
- --args=-diff
44+
45+
- id: terraform_validate
46+
args:
47+
- --hook-config=--retry-once-with-cleanup=true
48+
49+
# Security scanning with Trivy (modern replacement for tfsec)
50+
- id: terraform_trivy
51+
args:
52+
- --args=--format=compact
53+
- --args=--exit-code=1
54+
- --args=--severity=HIGH,CRITICAL
55+
- --args=--skip-dirs="**/.terraform"
56+
57+
# Documentation generation
58+
- id: terraform_docs
59+
args:
60+
- --hook-config=--path-to-file=docs/MODULE.md
61+
- --hook-config=--add-to-existing-file=true
62+
- --hook-config=--create-file-if-not-exist=true
63+
64+
# Provider lock file management
65+
- id: terraform_providers_lock
66+
args:
67+
- --hook-config=--mode=only-check-is-current-lockfile-cross-platform
68+
- --args=-platform=linux_amd64
69+
- --args=-platform=linux_arm64
70+
- --args=-platform=darwin_amd64
71+
- --args=-platform=darwin_arm64
72+
73+
# Markdown linting for better documentation
74+
- repo: https://github.com/DavidAnson/markdownlint-cli2
75+
rev: v0.15.0
76+
hooks:
77+
- id: markdownlint-cli2
78+
args:
79+
- --config=.markdownlint.json
80+
81+
# YAML formatting and linting
82+
- repo: https://github.com/adrienverge/yamllint
83+
rev: v1.37.0
84+
hooks:
85+
- id: yamllint
86+
args: [-c=.yamllint]
87+
88+
# Shell script linting
89+
- repo: https://github.com/shellcheck-py/shellcheck-py
90+
rev: v0.10.0.1
91+
hooks:
92+
- id: shellcheck
93+
args: [--severity=warning]
94+
95+
# Secret scanning for security
96+
- repo: https://github.com/gitleaks/gitleaks
97+
rev: v8.21.2
98+
hooks:
99+
- id: gitleaks
100+
101+
# Typo and spelling checks
102+
- repo: https://github.com/crate-ci/typos
103+
rev: v1.28.3
104+
hooks:
105+
- id: typos
106+
args: [--config=.typos.toml]
107+
exclude: |
108+
(?x)^(
109+
.*\.lock.*|
110+
.*\.tfstate.*|
111+
.git/.*
112+
)$
113+
114+
# Configuration for excluding files/directories
115+
exclude: |
116+
(?x)^(
117+
\.terraform/.*|
118+
\.terraform\.lock\.hcl$|
119+
.*\.tfstate.*|
120+
.*\.terraform\.lock\.hcl$|
121+
node_modules/.*|
122+
\.venv/.*|
123+
__pycache__/.*
124+
)$

modules/aws-glue-code-registry/docs/MODULE.md

Lines changed: 74 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,74 @@
1+
## Requirements
2+
3+
| Name | Version |
4+
|------|---------|
5+
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 1.5.0 |
6+
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | >= 5.0 |
7+
8+
## Providers
9+
10+
| Name | Version |
11+
|------|---------|
12+
| <a name="provider_aws"></a> [aws](#provider\_aws) | >= 5.0 |
13+
14+
## Modules
15+
16+
| Name | Source | Version |
17+
|------|--------|---------|
18+
| <a name="module_code_artifacts_bucket"></a> [code\_artifacts\_bucket](#module\_code\_artifacts\_bucket) | terraform-aws-modules/s3-bucket/aws | 5.2.0 |
19+
20+
## Resources
21+
22+
| Name | Type |
23+
|------|------|
24+
| [aws_cloudwatch_log_group.code_registry](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_log_group) | resource |
25+
| [aws_iam_role.code_registry_access](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource |
26+
| [aws_iam_role_policy.additional_s3_access](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy) | resource |
27+
| [aws_iam_role_policy.cloudwatch_logs](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy) | resource |
28+
| [aws_iam_role_policy.s3_code_artifacts_access](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy) | resource |
29+
| [aws_s3_bucket_notification.code_artifacts_notification](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_notification) | resource |
30+
| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
31+
| [aws_region.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/region) | data source |
32+
33+
## Inputs
34+
35+
| Name | Description | Type | Default | Required |
36+
|------|-------------|------|---------|:--------:|
37+
| <a name="input_additional_s3_bucket_arns"></a> [additional\_s3\_bucket\_arns](#input\_additional\_s3\_bucket\_arns) | List of additional S3 bucket ARNs that the code registry IAM role needs access to | `list(string)` | `[]` | no |
38+
| <a name="input_cloudwatch_kms_key_id"></a> [cloudwatch\_kms\_key\_id](#input\_cloudwatch\_kms\_key\_id) | KMS key ID for CloudWatch Logs encryption | `string` | `null` | no |
39+
| <a name="input_create_iam_role"></a> [create\_iam\_role](#input\_create\_iam\_role) | Whether to create IAM role for code registry access | `bool` | `true` | no |
40+
| <a name="input_create_s3_bucket"></a> [create\_s3\_bucket](#input\_create\_s3\_bucket) | Whether to create a new S3 bucket for code artifacts | `bool` | `true` | no |
41+
| <a name="input_enable_cloudwatch_logging"></a> [enable\_cloudwatch\_logging](#input\_enable\_cloudwatch\_logging) | Whether to enable CloudWatch logging for code registry | `bool` | `true` | no |
42+
| <a name="input_enable_s3_notifications"></a> [enable\_s3\_notifications](#input\_enable\_s3\_notifications) | Whether to enable S3 bucket notifications for code artifact uploads | `bool` | `false` | no |
43+
| <a name="input_existing_s3_bucket_name"></a> [existing\_s3\_bucket\_name](#input\_existing\_s3\_bucket\_name) | Name of existing S3 bucket to use for code artifacts (when create\_s3\_bucket is false) | `string` | `null` | no |
44+
| <a name="input_force_destroy"></a> [force\_destroy](#input\_force\_destroy) | Whether to allow destruction of S3 bucket with objects (use with caution in production) | `bool` | `false` | no |
45+
| <a name="input_log_retention_days"></a> [log\_retention\_days](#input\_log\_retention\_days) | Number of days to retain CloudWatch logs | `number` | `14` | no |
46+
| <a name="input_max_session_duration"></a> [max\_session\_duration](#input\_max\_session\_duration) | Maximum session duration for the code registry access role (in seconds) | `number` | `3600` | no |
47+
| <a name="input_name"></a> [name](#input\_name) | Name to be used as prefix for all resources created by this module | `string` | n/a | yes |
48+
| <a name="input_s3_bucket_name"></a> [s3\_bucket\_name](#input\_s3\_bucket\_name) | Name of the S3 bucket for code artifacts. If not provided, will be auto-generated | `string` | `null` | no |
49+
| <a name="input_s3_kms_key_id"></a> [s3\_kms\_key\_id](#input\_s3\_kms\_key\_id) | KMS key ID for S3 bucket encryption. If not provided, AES256 encryption will be used | `string` | `null` | no |
50+
| <a name="input_s3_lifecycle_rules"></a> [s3\_lifecycle\_rules](#input\_s3\_lifecycle\_rules) | S3 bucket lifecycle rules for cost optimization | `any` | <pre>[<br/> {<br/> "abort_incomplete_multipart_upload": {<br/> "days_after_initiation": 7<br/> },<br/> "id": "delete_old_versions",<br/> "noncurrent_version_expiration": {<br/> "days": 90<br/> },<br/> "status": "Enabled"<br/> }<br/>]</pre> | no |
51+
| <a name="input_s3_notification_configurations"></a> [s3\_notification\_configurations](#input\_s3\_notification\_configurations) | List of S3 notification configurations for code artifact uploads | <pre>list(object({<br/> id = string<br/> events = list(string)<br/> filter_prefix = optional(string)<br/> filter_suffix = optional(string)<br/> }))</pre> | <pre>[<br/> {<br/> "events": [<br/> "s3:ObjectCreated:*"<br/> ],<br/> "filter_suffix": ".jar",<br/> "id": "code-upload-notification"<br/> },<br/> {<br/> "events": [<br/> "s3:ObjectCreated:*"<br/> ],<br/> "filter_suffix": ".whl",<br/> "id": "wheel-upload-notification"<br/> }<br/>]</pre> | no |
52+
| <a name="input_tags"></a> [tags](#input\_tags) | A map of tags to assign to all resources | `map(string)` | `{}` | no |
53+
54+
## Outputs
55+
56+
| Name | Description |
57+
|------|-------------|
58+
| <a name="output_code_artifacts_bucket_arn"></a> [code\_artifacts\_bucket\_arn](#output\_code\_artifacts\_bucket\_arn) | ARN of the S3 bucket used for code artifacts |
59+
| <a name="output_code_artifacts_bucket_domain_name"></a> [code\_artifacts\_bucket\_domain\_name](#output\_code\_artifacts\_bucket\_domain\_name) | Domain name of the S3 bucket used for code artifacts |
60+
| <a name="output_code_artifacts_bucket_id"></a> [code\_artifacts\_bucket\_id](#output\_code\_artifacts\_bucket\_id) | ID of the S3 bucket used for code artifacts |
61+
| <a name="output_code_artifacts_bucket_name"></a> [code\_artifacts\_bucket\_name](#output\_code\_artifacts\_bucket\_name) | Name of the S3 bucket used for code artifacts |
62+
| <a name="output_code_artifacts_bucket_regional_domain_name"></a> [code\_artifacts\_bucket\_regional\_domain\_name](#output\_code\_artifacts\_bucket\_regional\_domain\_name) | Regional domain name of the S3 bucket used for code artifacts |
63+
| <a name="output_code_registry_role_arn"></a> [code\_registry\_role\_arn](#output\_code\_registry\_role\_arn) | ARN of the code registry access IAM role |
64+
| <a name="output_code_registry_role_id"></a> [code\_registry\_role\_id](#output\_code\_registry\_role\_id) | ID of the code registry access IAM role |
65+
| <a name="output_code_registry_role_name"></a> [code\_registry\_role\_name](#output\_code\_registry\_role\_name) | Name of the code registry access IAM role |
66+
| <a name="output_code_registry_summary"></a> [code\_registry\_summary](#output\_code\_registry\_summary) | Summary of code registry configuration |
67+
| <a name="output_common_tags"></a> [common\_tags](#output\_common\_tags) | Common tags applied to all resources |
68+
| <a name="output_log_group_arns"></a> [log\_group\_arns](#output\_log\_group\_arns) | ARNs of the CloudWatch log groups for code registry |
69+
| <a name="output_log_group_names"></a> [log\_group\_names](#output\_log\_group\_names) | Names of the CloudWatch log groups for code registry |
70+
| <a name="output_resource_prefix"></a> [resource\_prefix](#output\_resource\_prefix) | Resource prefix used for naming |
71+
172
<!-- BEGIN_TF_DOCS -->
273
## Requirements
374

@@ -48,8 +119,8 @@
48119
| <a name="input_name"></a> [name](#input\_name) | Name to be used as prefix for all resources created by this module | `string` | n/a | yes |
49120
| <a name="input_s3_bucket_name"></a> [s3\_bucket\_name](#input\_s3\_bucket\_name) | Name of the S3 bucket for code artifacts. If not provided, will be auto-generated | `string` | `null` | no |
50121
| <a name="input_s3_kms_key_id"></a> [s3\_kms\_key\_id](#input\_s3\_kms\_key\_id) | KMS key ID for S3 bucket encryption. If not provided, AES256 encryption will be used | `string` | `null` | no |
51-
| <a name="input_s3_lifecycle_rules"></a> [s3\_lifecycle\_rules](#input\_s3\_lifecycle\_rules) | S3 bucket lifecycle rules for cost optimization | `any` | <pre>[<br> {<br> "abort_incomplete_multipart_upload": {<br> "days_after_initiation": 7<br> },<br> "id": "delete_old_versions",<br> "noncurrent_version_expiration": {<br> "days": 90<br> },<br> "status": "Enabled"<br> }<br>]</pre> | no |
52-
| <a name="input_s3_notification_configurations"></a> [s3\_notification\_configurations](#input\_s3\_notification\_configurations) | List of S3 notification configurations for code artifact uploads | <pre>list(object({<br> id = string<br> events = list(string)<br> filter_prefix = optional(string)<br> filter_suffix = optional(string)<br> }))</pre> | <pre>[<br> {<br> "events": [<br> "s3:ObjectCreated:*"<br> ],<br> "filter_suffix": ".jar",<br> "id": "code-upload-notification"<br> },<br> {<br> "events": [<br> "s3:ObjectCreated:*"<br> ],<br> "filter_suffix": ".whl",<br> "id": "wheel-upload-notification"<br> }<br>]</pre> | no |
122+
| <a name="input_s3_lifecycle_rules"></a> [s3\_lifecycle\_rules](#input\_s3\_lifecycle\_rules) | S3 bucket lifecycle rules for cost optimization | `any` | <pre>[<br/> {<br/> "abort_incomplete_multipart_upload": {<br/> "days_after_initiation": 7<br/> },<br/> "id": "delete_old_versions",<br/> "noncurrent_version_expiration": {<br/> "days": 90<br/> },<br/> "status": "Enabled"<br/> }<br/>]</pre> | no |
123+
| <a name="input_s3_notification_configurations"></a> [s3\_notification\_configurations](#input\_s3\_notification\_configurations) | List of S3 notification configurations for code artifact uploads | <pre>list(object({<br/> id = string<br/> events = list(string)<br/> filter_prefix = optional(string)<br/> filter_suffix = optional(string)<br/> }))</pre> | <pre>[<br/> {<br/> "events": [<br/> "s3:ObjectCreated:*"<br/> ],<br/> "filter_suffix": ".jar",<br/> "id": "code-upload-notification"<br/> },<br/> {<br/> "events": [<br/> "s3:ObjectCreated:*"<br/> ],<br/> "filter_suffix": ".whl",<br/> "id": "wheel-upload-notification"<br/> }<br/>]</pre> | no |
53124
| <a name="input_tags"></a> [tags](#input\_tags) | A map of tags to assign to all resources | `map(string)` | `{}` | no |
54125
| <a name="input_workload_account_ids"></a> [workload\_account\_ids](#input\_workload\_account\_ids) | List of AWS account IDs that should have cross-account access to the code artifacts bucket | `list(string)` | `[]` | no |
55126

@@ -70,4 +141,4 @@
70141
| <a name="output_log_group_arns"></a> [log\_group\_arns](#output\_log\_group\_arns) | ARNs of the CloudWatch log groups for code registry |
71142
| <a name="output_log_group_names"></a> [log\_group\_names](#output\_log\_group\_names) | Names of the CloudWatch log groups for code registry |
72143
| <a name="output_resource_prefix"></a> [resource\_prefix](#output\_resource\_prefix) | Resource prefix used for naming |
73-
<!-- END_TF_DOCS -->
144+
<!-- END_TF_DOCS -->

modules/aws-glue-data-lake-catalog/docs/MODULE.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
# aws-glue-data-lake-catalog
2+
13
<!-- BEGIN_TF_DOCS -->
24
## Requirements
35

@@ -26,14 +28,14 @@ No modules.
2628

2729
| Name | Description | Type | Default | Required |
2830
|------|-------------|------|---------|:--------:|
29-
| <a name="input_additional_databases"></a> [additional\_databases](#input\_additional\_databases) | Map of additional databases to create outside of the standard data lake layers | <pre>map(object({<br> description = string<br> location = optional(string, null)<br> parameters = optional(map(string), {})<br> }))</pre> | `{}` | no |
31+
| <a name="input_additional_databases"></a> [additional\_databases](#input\_additional\_databases) | Map of additional databases to create outside of the standard data lake layers | <pre>map(object({<br/> description = string<br/> location = optional(string, null)<br/> parameters = optional(map(string), {})<br/> }))</pre> | `{}` | no |
3032
| <a name="input_catalog_id"></a> [catalog\_id](#input\_catalog\_id) | The ID of the Glue Catalog. If not provided, the AWS account ID will be used | `string` | `null` | no |
3133
| <a name="input_create_export_database"></a> [create\_export\_database](#input\_create\_export\_database) | Whether to create an export database for final outputs | `bool` | `true` | no |
3234
| <a name="input_create_shared_databases"></a> [create\_shared\_databases](#input\_create\_shared\_databases) | Whether to create shared databases (shared, export) | `bool` | `true` | no |
33-
| <a name="input_data_lake_paths"></a> [data\_lake\_paths](#input\_data\_lake\_paths) | Map of layer names to their S3 path prefixes | `map(string)` | <pre>{<br> "bronze": "iceberg-warehouse/bronze",<br> "gold": "iceberg-warehouse/gold",<br> "silver": "iceberg-warehouse/silver"<br>}</pre> | no |
35+
| <a name="input_data_lake_paths"></a> [data\_lake\_paths](#input\_data\_lake\_paths) | Map of layer names to their S3 path prefixes | `map(string)` | <pre>{<br/> "bronze": "iceberg-warehouse/bronze",<br/> "gold": "iceberg-warehouse/gold",<br/> "silver": "iceberg-warehouse/silver"<br/>}</pre> | no |
3436
| <a name="input_data_lake_sublayers"></a> [data\_lake\_sublayers](#input\_data\_lake\_sublayers) | Configuration for sublayers within each data lake layer. Each layer can have multiple sublayers (e.g., source systems, processing stages) | `map(list(string))` | `{}` | no |
3537
| <a name="input_database_prefix"></a> [database\_prefix](#input\_database\_prefix) | Prefix for all Glue database names. Should follow the pattern: namespace-short\_domain-account\_name (e.g., dwh-wl-workloads-data-lake-develop) | `string` | n/a | yes |
36-
| <a name="input_layers"></a> [layers](#input\_layers) | List of data lake layers to create databases for | `list(string)` | <pre>[<br> "bronze",<br> "silver",<br> "gold"<br>]</pre> | no |
38+
| <a name="input_layers"></a> [layers](#input\_layers) | List of data lake layers to create databases for | `list(string)` | <pre>[<br/> "bronze",<br/> "silver",<br/> "gold"<br/>]</pre> | no |
3739
| <a name="input_s3_bucket_uri"></a> [s3\_bucket\_uri](#input\_s3\_bucket\_uri) | The S3 bucket URI where data lake data is stored (e.g., s3://bucket-name) | `string` | n/a | yes |
3840
| <a name="input_tags"></a> [tags](#input\_tags) | A map of tags to assign to all resources created by this module | `map(string)` | `{}` | no |
3941

@@ -48,7 +50,7 @@ No modules.
4850
| <a name="output_databases_by_sublayer"></a> [databases\_by\_sublayer](#output\_databases\_by\_sublayer) | Map of databases organized by sublayer |
4951
| <a name="output_export_database"></a> [export\_database](#output\_export\_database) | Export database details |
5052
| <a name="output_gold_database_name"></a> [gold\_database\_name](#output\_gold\_database\_name) | Name of the first gold database (for backward compatibility) |
51-
| <a name="output_raw_zone_database_name"></a> [raw\_zone\_database\_name](#output\_raw\_zone\_database\_name) | Name of the first raw\_zone database (for backward compatibility) |
53+
| <a name="output_raw_database_name"></a> [raw\_database\_name](#output\_raw\_database\_name) | Name of the first raw\_zone database (for backward compatibility) |
5254
| <a name="output_shared_database"></a> [shared\_database](#output\_shared\_database) | Shared database details |
5355
| <a name="output_silver_database_name"></a> [silver\_database\_name](#output\_silver\_database\_name) | Name of the first silver database (for backward compatibility) |
54-
<!-- END_TF_DOCS -->
56+
<!-- END_TF_DOCS -->

modules/aws-glue-data-lake-catalog/outputs.tf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ output "export_database" {
6565
}
6666

6767
# Legacy outputs for backward compatibility (pick the first database for each layer)
68-
output "raw_zone_database_name" {
68+
output "raw_database_name" {
6969
description = "Name of the first raw_zone database (for backward compatibility)"
7070
value = length([
7171
for key, db in aws_glue_catalog_database.databases : db.name

0 commit comments

Comments
 (0)