Quickstart
Wrgl works much like Git but for data, primarily CSV data for now. Each Wrgl commit track a single CSV table instead of an entire directory. This allows Wrgl to compute data changes down to the cell level. While it does take up a little more disk space than Git, it gives you the ability to answer questions such as: How many rows were modified versus being added? What are the ids of the rows that were removed? Which column was removed, or added? etc...
Wrgl also has built-in graphical difftool and mergetool. Wrgl's concepts such as commit, reference, branch, and remote works much like how they do in Git, therefore difftool and mergetool are necessary. You can even host your own repository with wrgld which comes with its own authentication system. But enough introduction for now.
Content
Installation
See installation page for details, but for the impatient:
sudo bash -c 'curl -L https://github.com/wrgl/wrgl/releases/latest/download/install.sh | bash'
Commit a CSV
First, let's initialize a repository with wrgl init:
wrgl init
This command creates a repository at .wrgl
sub-directory in the current directory. This directory houses all configs and data of your local repository. Before you can commit, you need to tell the repository who you are with wrgl config:
wrgl config set user.name "{your name}" --global
wrgl config set user.email "{your email}" --global
The secret sauce of Wrgl is that each commit can have an optional primary key which defines the list of columns that can be used to identify individual row. If your CSV has one or more columns that would be considered primary key in a traditional SQL database, chances are they are also the primary key in Wrgl. Here's how to commit a file with wrgl commit:
wrgl commit main data.csv "initial commit" -p id
Here we're creating a commit under branch main
with the data from data.csv
, having "initial commit"
as the commit message, with id
as the primary key. From now on, Wrgl will be able to accurately detect which rows were changed versus being added or removed, thanks to the primary key. If you don't specify a primary key then all columns will be used as the primary key.
Look at your committed data
wrgl log works much like git log
. Here's how to list commits under branch main
:
wrgl log main
You can also preview the data with a graphical interface with wrgl preview:
wrgl preview main
Or get back the data as plain CSV with wrgl export:
wrgl export main > main.csv
Other useful commands
- wrgl diff: shows changes between commits.
- wrgl merge: merge two branches into one.
- wrgl fetch: fetch changes from a remote.
- wrgl pull: pull and merge changes to a local branch.