Quickstart

Wrgl works much like Git but for data, primarily CSV data for now. Each Wrgl commit track a single CSV table instead of an entire directory. This allows Wrgl to compute data changes down to the cell level. While it does take up a little more disk space than Git, it gives you the ability to answer questions such as: How many rows were modified versus being added? What are the ids of the rows that were removed? Which column was removed, or added? etc...

Wrgl also has built-in graphical difftool and mergetool. Wrgl's concepts such as commit, reference, branch, and remote works much like how they do in Git, therefore difftool and mergetool are necessary. You can even host your own repository with wrgld which comes with its own authentication system. But enough introduction for now.

Content

    Installation

    See installation page for details, but for the impatient:

    sudo bash -c 'curl -L https://github.com/wrgl/wrgl/releases/latest/download/install.sh | bash'

    Commit a CSV

    First, let's initialize a repository with wrgl init:

    wrgl init

    This command creates a repository at .wrgl sub-directory in the current directory. This directory houses all configs and data of your local repository. Before you can commit, you need to tell the repository who you are with wrgl config:

    wrgl config set user.name "{your name}" --global
    wrgl config set user.email "{your email}" --global

    The secret sauce of Wrgl is that each commit can have an optional primary key which defines the list of columns that can be used to identify individual row. If your CSV has one or more columns that would be considered primary key in a traditional SQL database, chances are they are also the primary key in Wrgl. Here's how to commit a file with wrgl commit:

    wrgl commit main data.csv "initial commit" -p id

    Here we're creating a commit under branch main with the data from data.csv, having "initial commit" as the commit message, with id as the primary key. From now on, Wrgl will be able to accurately detect which rows were changed versus being added or removed, thanks to the primary key. If you don't specify a primary key then all columns will be used as the primary key.

    Look at your committed data

    wrgl log works much like git log. Here's how to list commits under branch main:

    wrgl log main

    You can also preview the data with a graphical interface with wrgl preview:

    wrgl preview main

    Or get back the data as plain CSV with wrgl export:

    wrgl export main > main.csv

    Authenticate to WrglHub

    When it comes time to share your repository with the world or your coleagues, you need a hosted repository. There are 2 ways to host a Wrgl repository, either use the open-sourced wrgld binary, or hosting it on WrglHub which spares you the hassle of hosting your own repository.

    Create a free account at WrglHub then acquire an access token with wrgl credentials:

    wrgl credentials authenticate https://hub.wrgl.co/api

    Push your changes

    Create a new repository on WrglHub:

    wrgl hub repo create {repo-name} --set-remote origin

    Now add the new repository as a remote using wrgl remote (here we're naming the remote "origin" to keep with Git's tradition):

    wrgl remote add origin https://hub.wrgl.co/api/users/{your-username}/repos/{repo-name}/

    Now you can wrgl push much like git push:

    wrgl push origin refs/heads/main:main --set-upstream

    Now you should be able to see the commit live if you visit https://hub.wrgl.co/@{your-username}/r/${repo-name}/refs/heads/main

    Other useful commands