Analytics Engineering Workflow (Trial Server Beta Functionality)

This document covers how to use the Analytics Engineering Workflow in Ellie. The workflow is currently available on Ellie’s trial server as a beta functionality. If you have any feedback or questions, please e-mail support@ellie.ai. Please note, that this functionality is still in development and may change in future releases.

Prerequisites: 

  1. Snowflake environment, with write user access to be able to create views.

  2. Dbt project initialized in dbt core or dbt cloud (with Snowlake authentication).

See this dbt guide: https://docs.getdbt.com/docs/core/connect-data-platform/snowflake-setup

  1. Github Organization Account with your dbt project in a repository. (Note: Ellie’s GitHub integration currently only supports organization accounts and doesn’t work with personal user repositories.)

Why is this valuable?

You can now use Ellie as a central platform where you store documentation on your data warehouses. This makes it easy for analytics engineers to discover and document data in a scalable way.

It also enables them to kick off their analytics projects without having to dive deep into the database.

Plus, the documentation is visible to the whole organization — enabling knowledge sharing across the team and the enterprise (when necessary). Ellie’s user types can support multiple levels of access.

We leverage genAI to create the first draft of the documentation, and all AI-generated suggestions are highlighted in blue.

If a user edits this metadata, it counts as “human-verified” and changes color to default, black.

TIP: We recommend you only have one source model per database schema to ensure a unified documentation layer in Ellie. You can then arrange your source models as it is in your Snowflake environment.

Keep in mind that you can’t make any changes to the database schema within the source model environment. You can edit the metadata that’s created in Ellie, such as the semantic name of the table, table description, and column descriptions.

Instructions how to set up Ellie:

Step 1. (Optional) Create a folder for your source models

 

Step 2. Load your source model from Snowflake into Ellie

 

 

image-20240927-064055.png
Create a source model


Here’s an example of the credentials you need to enter:
Account: “ACHG0I8.QJ30302”
Username: “JDOE”
Password: “PASSWORD2”
Role: “SYSADMIN”

You can now choose to save your connection, by entering a connection name (optional).
Connect to your Snowflake account and select the Warehouse, Database, and Schema parameters.

When uploading the source model, make sure to tick the “generate groups” and “generate metadata” checkboxes to generate additional AI context for your tables.

“Generate groups”, as the name suggests, groups the various tables. This enables you to filter connected tables, as identified by genAI.

“Generate metadata” generates additional metadata for each table. This includes:

  1. A semantic name for the table

  2. A description of the table

  3. Descriptions for each column in the table

  4. Logical groups that group related columns together

Step 3. Wait for Ellie to generate metadata 

Note: Ellie does not store any real data that’s in your data warehouse. This means you will be prompted to enter your Snowflake credentials every time genAI has to generate metadata because the data needs to be fetched fresh.

Important data safety notice: As a part of this process, Ellie will fetch a 1000 rows of sample data and share it with the Large Language Model (LLM) for analysis. No data is stored in Ellie. If you are an OpenAI customer, let the Ellie team know. We can switch the workflow so it goes through the customer’s OpenAI environment.

Note: If the genAI fails to generate metadata for one of the tables, open the table view and press the “regenerate” button. See a case bellow:

 

Step 4. Browse through the tables aand select data for your transformation project

You can choose the tables using the checkboxes in the top right corner of each entity.

Note: If you need help finding the data you need, you can use the grouping option or ask our chatbot to identify tables with the relevant data.

Groups:

AI Chatbot:

Step 5. Create a “transformation project”

Once you’ve selected the tables, press the “Transform Selected” button and select “Create Transformation Project” to enter the details of your project.

Tip: You can always add more tables to your transformation project from the same “Source Model”. Open the Source Model, select the tables you’d like to add or update, press “Transform Selected” and select “Update Transformation Project”. The additional tables you chose will be added to the transformation process.

 

Step 6. Working with the transformation project in Ellie

You’ll see the tables you selected from the Source Model when you open the transformation project.

What you see in the Transformation Project are your dbt staging models. All the metadata from the source model is brought to the transformation project, and you can start preparing your staging tables.

The idea of this canvas is to make the mundane task of selecting and preparing the columns you need for your dbt project much easier.

You no longer need to code every single line of your staging models. The code is instead generated automatically.

To prepare the table, open an entity (a table).

You can still edit the metadata in this view just like in the source model, but you can now also do dbt stage model preparations:

  • Select the columns you want for your project

  • Change the data type

  • Change the alias for the table

In future releases, we will also add dbt parameters like staging and tests. This would enable users to do even more things without having to code.

As you make the selections and changes to your columns, Ellie will generate the code for your staging models.

You can see the code by clicking on the SQL tab at the top of the table view.

You can copy the raw code directly from here and change to different dialects. We recommend that you download the files or push the project to the relevant Github repository.

Once you are happy with a certain stage model, you can mark it as “Ready”. You will now see a visual check mark icon indicating that the stage model has been finalized.

Here’s an example:

Step 7. Bring the configurations you have defined in Ellie to your dbt project.

Currently Ellie does the following things:

  1. Create staging SQL files for all the tables in your transformation project

  2. Create a sources.yml file

Ellie uses file names and folder hierarchy based on dbt’s documentation best practices.

See example:

To bring the information generated by Ellie into your dbt project, you can do one of two things from the “Project Configuration” tab.

  1.  Download the files from Ellie and bring them to your dbt project manually:

    • Press the “Download” button in the Project Configuration tab

    • Unzip the contents of the downloaded compressed file

    • Bring them to your dbt project folder

  2. Push the code to your GitHub Repository:
    Note: For this option to be available, you need to authorize your GitHub Organization Account in the “Admin Tools -> Connectors” page.

    • Press the “Push to Version Control” button in the Project Configuration tab

    • Select the repository to which you’d like to make the commit

    • Select the branch to which you’d like to make the commit

    • Enter the commit message

    • Press “Push”

    • Upon success, you’ll be able to see the commit in your Github repository and pull it to your dbt core or dbt cloud project

How it looks in a Github Repository:

Step 8. How to validate that the dbt files work

Make sure that all prerequisites have been met and that you have brought Ellie’s transformation project files to your dbt project.

To validate that the project is set up as intended, choose “dbt run”. If you get a success message, then all the files have been set up correctly.

If you get a success message, you should also get views created in your snowflake environment, see the example:

Step 9. You can now proceed with your transformations in dbt!

 

If you have any questions, please reach out to support@ellie.ai and we’ll be glad to help.