Welcome! Please use this repository to discover and introduce Matillion Best Practices within your Data Productivity Cloud environment. Many of these pipelines can be found on the Matillion Exchange. After loading, the pipelines can then be edited to match your given use case. Learn more about Data Productivity Cloud here.
- Export & Import Files
- Within this page click Code > Download to Zip
- Import directly to your Data Productivity Cloud project.
OR
-
Connect your Data Productivity Cloud project to a Github repository containing these pipelines.
- Prerequisites
- Directions
- Fork this repository (referred to as "forked repository" for the following steps), making it available to your Github account.
- Create a new Matillion project, setting it up with Advanced settings to allow the connection to a remote Github repo.
- Set the forked repository as the connected repo.
- From within the Designer UI, select Git > Pull remote changes.
- Your Pipelines should populate with Best Practice pipelines, available for editing according to your use case.
- The forked repository will keep you updated on any changes to the DPC Best Practice Pipelines repository, allowing for easy & efficient merges of latest code.
- AI
- Barista Reviews
- Use a Large Language Model (LLM) to process unstructured data with this set of Data Productivity Cloud pipelines. The pipelines demonstrate how to use the Data Productivity Cloud AI Prompt components to convert unstructured text into structured data, to make it ready for analysis. The sample data is a set of imaginary barista coffee reviews.
- Olympic Athlete Data Analysis
- Analyze historical Olympic athlete data with this set of Data Productivity Cloud pipelines. These pipelines find the most successful athletes, by medal count, in a historical Olympic Games of your choice.
- Sales Sentiment Analysis
- Run sentiment analysis on unstructured conversation transcripts integrated from three different sales systems.
- Sentiment Analysis with Anthropic Claude 3 Sonnet using Amazon Bedrock
- Perform sentiment analysis on your data using Anthropic Claude 3 Sonnet via Amazon Bedrock. This set of Data Productivity Cloud pipelines performs sentiment analysis on a set of product review data using the Anthropic Claude 3 Sonnet Large Language Model (LLM).
- Sentiment Analysis with Titan Text Express using Amazon Bedrock
- Perform sentiment analysis on your data using Titan Text Express via Amazon Bedrock. This set of Data Productivity Cloud pipelines performs sentiment analysis on a set of product review data using the Titan Text Express Large Language Model (LLM).
- Sentiment Analysis with Cohere Command using Amazon Bedrock
- Perform sentiment analysis on your data using Cohere Command via Amazon Bedrock. This set of Data Productivity Cloud pipelines performs sentiment analysis on a set of product review data using the Cohere Command Large Language Model (LLM).
- Sentiment Analysis with Meta Llama3 70B using Amazon Bedrock
- Perform sentiment analysis on your data using Meta Llama3 70B via Amazon Bedrock. This set of Data Productivity Cloud pipelines performs sentiment analysis on a set of product review data using the Meta Llama3 70B Large Language Model (LLM).
- Sentiment Analysis with Mistral 7B Instruct using Amazon Bedrock
- Perform sentiment analysis on your data using Mistral 7B Instruct via Amazon Bedrock. This set of Data Productivity Cloud pipelines performs sentiment analysis on a set of product review data using the Mistral 7B Instruct Large Language Model (LLM).
- Sentiment Analysis with OpenAI GPT 3.5 Turbo
- Perform sentiment analysis on your data using OpenAI GPT 3.5 Turbo. This set of Data Productivity Cloud pipelines performs sentiment analysis on a set of product review data using the OpenAI GPT 3.5 Turbo Large Language Model (LLM).
- Sentiment Analysis with OpenAI GPT 4
- Perform sentiment analysis on your data using OpenAI GPT 4. This set of Data Productivity Cloud pipelines performs sentiment analysis on a set of product review data using the OpenAI GPT-4 Large Language Model (LLM).
- Sentiment Analysis with OpenAI GPT 4 Turbo
- Perform sentiment analysis on your data using OpenAI GPT 4 Turbo. This set of Data Productivity Cloud pipelines performs sentiment analysis on a set of product review data using the OpenAI GPT 4 Turbo Large Language Model (LLM).
- Sentiment Analysis with OpenAI GPT 4o
- Perform sentiment analysis on your data using OpenAI GPT 4o. This set of Data Productivity Cloud pipelines performs sentiment analysis on a set of product review data using the OpenAI GPT 4o Large Language Model (LLM).
- Snowflake Cortex Components
- Use Snowflake Cortex generative AI capabilities in a Data Productivity Cloud transformation pipeline. The example is from a travel company, which collects hotel stay reviews.
- Unstructured text classification - Job Titles
- Classify job titles using zero-shot and few-shot learning with a Large Language Model (LLM) in a Data Productivity Cloud pipeline. These pipelines extract job titles from Salesforce, and run them through an LLM to:
- Standardize them into a defined set of job titles
- Categorize them as ‘IC’ (Individual Contributor) or ‘Manager’
- The results are saved into a lookup table for quick reference in future.
- Classify job titles using zero-shot and few-shot learning with a Large Language Model (LLM) in a Data Productivity Cloud pipeline. These pipelines extract job titles from Salesforce, and run them through an LLM to:
- Barista Reviews
- Connectors
- SalesForce
- Account First Opportunity
- Account Most Recent Opportunity
- Accounts With Account Teams
- Accounts With Contacts
- Accounts With Primary Contact
- Campaigns with Contacts or Leads
- Contacts with Cases
- Opportunities With Account
- Opportunities With Primary Campaign Source
- Opportunity Roles And Contacts All
- Opportunity Roles And Contacts All Primary
- Opportunity Roles And Contacts One Primary
- Unite Lead And Contact
- Unite Task And Event
- SalesForce
- Data Engineering
- Data Transposing / Pivoting
- Transpose or pivot data between wide and narrow representations. Three Data Productivity Cloud pipelines that demonstrate techniques for transposing (pivoting) data between wide and narrow representations. Open example transpose extract and load first.
- Data Vault 2.0 Pipelines
- Template your Hub, Link and Satellite Data Vault 2.0 tables with these Data Productivity Cloud pipelines.
- Externally Managed Script
- Run the contents of an externally managed SQL script with this Data Productivity Cloud pipeline.
- Extract from an XML API using a variable
- Extract and load data from an XML based REST API using variables in Data Productivity Cloud pipelines.
- Full Load Strategy Medallion Schema
- Full data refresh strategy with a Medallion data architecture, demonstrated using Salesforce accounts.
- Incremental Load Data Replication Strategy in a Medallion data architecture
- Incremental load data replication is a simple strategy which involves copying changed records from a source to a target system.
- XML API data in Snowflake using a File Format
- Load, flatten and transform XML data using a Snowflake File Format. These Data Productivity Cloud pipelines extract XML data from a public RSS feed, load it into Snowflake using a File Format, and then transform and flatten it. An RSS feed is a simple REST API.
- XML API data in Snowflake using a UDF to convert to JSON
- Load, flatten and transform XML data using a Snowflake UDF to convert XML to JSON. These Data Productivity Cloud pipelines extract XML data from a public RSS feed, load it into Snowflake using a File Format, and then transform and flatten it by converting the XML to JSON using a User Defined Function (UDF). An RSS feed is a simple REST API.
- Data Transposing / Pivoting
- Devops
- Check Network Access
- Check connectivity between your Data Productivity Cloud agent and a network data source.
- Find Agent IP Address
- Note: Customer Hosted Agent Only Find the public IP address of your Data Productivity Cloud Agent.
- Check Network Access
- Labs
- Powered Up Pipelines
- Best practices webinar taking place on 25 June 2024.
- Powered Up Pipelines