Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redshift-alpha: Rationalization of IAM roles creation for Lambdas execution #32089

Closed
2 tasks
matteo-pirola opened this issue Nov 11, 2024 · 3 comments · Fixed by #32363
Closed
2 tasks

redshift-alpha: Rationalization of IAM roles creation for Lambdas execution #32089

matteo-pirola opened this issue Nov 11, 2024 · 3 comments · Fixed by #32363
Assignees
Labels
@aws-cdk/aws-redshift Related to Amazon Redshift bug This issue is a bug. effort/medium Medium work item – several days of effort p1

Comments

@matteo-pirola
Copy link

Describe the feature

Overview

I suggest a rationalization of the IAM roles created and used to execute the Lambda functions responsible of generating and executing Redshift queries to create or update resources.

As of today, the framework generates many "single-scope" roles leading to an explosion on the number of IAM roles in an account, making it easy to reach the IAM quota on roles per account (1000).

AS IS

Currently, when using the Table construct to create Redshift tables, a CloudFormation Stack with the following is generated:

  • A Resource called Query Redshift Database XYZ, consisting of a Lambda function and an IAM role for its execution
  • For each table, a Lambda function and an "associated" IAM role for its execution; each role has permission to only execute the function it is "associated" to

This means that, if I want to create n tables I will obtain a total of at least n+1 IAM roles (if I put them all in the same Stack).

Example code that I am using to create the tables:

import aws_cdk as cdk
from aws_cdk import Stack
from aws_cdk import aws_secretsmanager as secretsmanager
from aws_cdk.aws_redshift_alpha import Cluster, Column, Table, TableDistStyle, TableSortStyle
import os
import json
from constructs import Construct

class MyRedshiftTablesStack(Stack):
 
    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)
 
        secret = secretsmanager.Secret.from_secret_name_v2(
            self, "MySecretId", <secret_name>
        )
 

        cluster = Cluster.from_cluster_attributes(
            self,
            "MyCluster",
            cluster_name=<cluster_name>,
            cluster_endpoint_address=<cluster_endpoint_address>,
            cluster_endpoint_port=<cluster_endpoint_port>,
        )
 
        table_columns = [
            Column(
                name="column_1",
                data_type="integer",
                dist_key=False,
                comment="My first column"
            )
        ]

        dist_style = TableDistStyle["KEY"]
        removal_policy = cdk.RemovalPolicy["DESTROY"]
        sort_style = TableSortStyle["AUTO"]
 
        Table(
            self,
            "MyTableName",
            cluster=cluster,
            database_name="my_db",
            admin_user=secret,
            table_columns=table_columns,
            dist_style=dist_style,
            removal_policy=removal_policy,
            sort_style=sort_style,
            table_name="my_schema.my_table",
            table_comment="My first table"
        )

TO BE

I suggest allowing the developer choosing existing roles as defaults for the execution of the Lambda functions whenever possible.
This way it would be possible to find a good balance between segregation of permissions and role quantity on a case-by-case basis.

Use Case

This feature would be crucial whenever the number of Redshift tables created with cdk-redshift-alpha gets bigger. Indeed, having a number of IAM roles which linearly scales with the number of tables is not sustainable and will eventually lead to the reach of IAM quotas, also providing very few benefits overall.

Proposed Solution

I suggest allowing the developer use existing roles for the execution of the Lambda functions.
In order to achieve such thing one should modify the underlying Lambdas creation to allow for external role selection (e.g. passing it via the Table construct of the library).

Other Information

If the proposed suggestion is not technically possible, I suggest to rethink about the number of roles generated by the library e.g. by sharing a single IAM role among the Lambda functions created in the same Stack.

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CDK version used

2.165.0

Environment details (OS name and version, etc.)

Linux EC2 - aws/codebuild/standard:7.0

@matteo-pirola matteo-pirola added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Nov 11, 2024
@github-actions github-actions bot added the @aws-cdk/aws-redshift Related to Amazon Redshift label Nov 11, 2024
@pahud pahud self-assigned this Nov 11, 2024
@pahud
Copy link
Contributor

pahud commented Nov 11, 2024

Yes you are right.

My PoC

export class IssueTriageStack extends Stack {
  readonly cluster: redshift.Cluster;
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);

    // default vpc
    const vpc = ec2.Vpc.fromLookup(this, 'DefaultVPC', { isDefault: true });

    // create redshift.Cluster
    this.cluster = new redshift.Cluster(this, 'Redshift', {
      masterUser: {
        masterUsername: 'admin',
      },
      vpc,
    });

    // create redshift.Table
    for (let i = 0; i < 10; i++) {
      this.createTable(`Table${i}`);
    }
  }
  private createTable(tableId: string) {
    const table = new redshift.Table(this, tableId, {
      cluster: this.cluster,
      databaseName: 'my_database',
      tableName: 'my_table',
      tableColumns: [
        { name: 'id', dataType: 'integer' },
      ],
    });
  }
}

And it's going to create 11 iam Roles which should be avoided.

% npx cdk diff 2>&1 | grep -c "AWS::IAM::Role"
11

Tracking down the source code, I've found when a Table is created, a DatabaseQuery would be created, which creates a lambda SingletonFunction and I assume both the function and role should be singleton and shared. Not sure why it's creating a new role for every function.

I am tentative making it a p2 but will reach out to the team for inputs.

@pahud pahud added bug This issue is a bug. p2 and removed feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Nov 11, 2024
@pahud pahud removed their assignment Nov 11, 2024
@pahud pahud added the effort/medium Medium work item – several days of effort label Nov 11, 2024
@pahud pahud changed the title @aws-cdk/aws-redshift-alpha: Rationalization of IAM roles creation for Lambdas execution redshift-alpha: Rationalization of IAM roles creation for Lambdas execution Nov 11, 2024
@pahud pahud added p1 and removed p2 labels Nov 13, 2024
@5d 5d self-assigned this Nov 28, 2024
@mergify mergify bot closed this as completed in #32363 Jan 3, 2025
@mergify mergify bot closed this as completed in db950b3 Jan 3, 2025
Copy link

github-actions bot commented Jan 3, 2025

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

1 similar comment
Copy link

github-actions bot commented Jan 3, 2025

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 3, 2025
iankhou pushed a commit that referenced this issue Jan 13, 2025
…ion (#32363)

### Issue # (if applicable)

Closes #32089.

### Reason for this change



The Redshift tables use a singleton function as the invoker for various custom resource onEvent Lambda functions. Currently, each custom resource lambda function has a dedicated IAM role to assume. However, since it’s the same singleton function, a shared role could achieve the same outcome.

### Description of changes



Use the same IAM role for the singleton invoker function to assume.

### Description of how you validated changes



deployed to my local stack

### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
@aws-cdk/aws-redshift Related to Amazon Redshift bug This issue is a bug. effort/medium Medium work item – several days of effort p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants