HTTP API
You can run wrgld over a repository to start an HTTP server that provides remote access. By default, only registered users who have been granted specific scopes can access this HTTP API. To learn more about users and scopes, see wrgl auth.
This API tries to be as restful as possible. Most endpoints can also be thought of as URI and handle different verbs according to their conventional meaning. Most API calls are stateless and use JSON format for payloads. Exceptions are /receive-pack/ and /upload-pack/ endpoints which use cookies and are stateful.
Repositories hosted on WrglHub can also be accessed via this API with the exception of the /authenticate/ endpoint (learn more in the authentication section). The API for each hosted repository can be accessed at:
https://hub.wrgl.co/api/users/{username}/repos/{reponame}/
Content
wrgld
Starts an HTTP server providing access to the repository at <working_dir>/.wrgl or WRGL_DIR folder if it is given.
wrgld [WRGL_DIR] [flags]
Flags
--badger-log
set Badger log level, valid options are "error", "warning", "debug", and "info" (defaults to "error")
--config-file
read config from file
-h, --help
help for wrgld (default false)
--log-verbosity
verbosity level. Higher means more logs (default 0)
-p, --port
port number to listen to (default 80)
--proxy
make all outgoing requests through this proxy
--read-timeout
request read timeout as described at https://pkg.go.dev/net/http#Server.ReadTimeout (default 30s)
--resource-id
UMA resource id created in keycloak. If not given, the server will attempt to create the resource when authorization is required.
--write-timeout
response write timeout as described at https://pkg.go.dev/net/http#Server.WriteTimeout (default 30s)
Examples
# starts HTTP API over <working_dir>/.wrgl at port 80
wrgld
# starts HTTP API over directory my-repo and port 4000
wrgld ./my-repo -p 4000
# increase read and write timeout
wrgld --read-timeout 60s --write-timeout 60s
Authentication
To access this API, you need to log in at /authenticate/ endpoint and obtain a JWT token, then pass it in each request via the Authorization header:
Authorization: Bearer {JWT token}
For repositories hosted on WrglHub however, you need to authenticate via https://hub.wrgl.co/api/authenticate/ endpoint. The good news is that this endpoint works the same and the access token can be used for all hosted repositories.
Error handling
This API may respond with the following status codes:
Code | Description |
---|---|
200 | The operation was a success. |
3XX | Redirect. |
4XX | A common error has occurred either with the URI or the payload. More details are available in the accompanying JSON payload. |
500 | An internal server error has occurred. |
If the status code is 4XX, the following JSON payload is sent:
Name | Description |
---|---|
message | The error message. |
csv | If the error is a CSV-parsing error, this is an object that contains further details such as startLine, line, and column where the error occurred. |
/authenticate/
POST
Exchange email/password for a JWT token which can be used to access the rest of this API.
Request payload
Name | Description |
---|---|
user's email | |
password | user's password |
Response payload
Name | Description |
---|---|
idToken | the JWT token to access the rest of this API |
Examples
curl https://repository/authenticate/ -XPOST \
-H 'Content-Type: application/json' \
--data '{"email": "john.doe@domain.com", "password": "password"}'
{
"idToken": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NDA3NDIwOTUsImlhdCI6MTYzMjk2NjA5NSwiaXNzIjoiV3JnbGQiLCJlbWFpbCI6ImpvaG4uZG9lQGRvbWFpbi5jb20iLCJuYW1lIjoiSm9obiBEb2UifQ.QVTB-BdQn1TFijaNNJvc-KTjcgeD5SsvsClzZj5rUgU"
}
Response:
/refs/
GET
Required scope: repo.read
Get all references
Response payload
Name | Description |
---|---|
refs | A mapping between references and head commits. |
Examples
curl https://repository/refs/ \
-H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NDA3NDIwOTUsImlhdCI6MTYzMjk2NjA5NSwiaXNzIjoiV3JnbGQiLCJlbWFpbCI6ImpvaG4uZG9lQGRvbWFpbi5jb20iLCJuYW1lIjoiSm9obiBEb2UifQ.QVTB-BdQn1TFijaNNJvc-KTjcgeD5SsvsClzZj5rUgU'
{
"refs": {
"heads/main": "67026b17716d98a3b52ed9f82737e990",
"heads/draft": "e5c898974cc59dc9ca15cba516eb79bf"
}
}
Response:
/refs/heads/:branch/
GET
Required scope: repo.read
Get the head commit of a branch.
Response payload
Name | Description |
---|---|
sum | Checksum of the binary presentation of this commit. This also serves as the identifier for this commit. |
authorName | Name of the author. |
authorEmail | Email of the author. |
message | The commit message |
time | The commit time |
parents | List of parent commits |
table.sum | Checksum of the underlying table. |
table.columns | List of column names. |
table.rowsCount | Number of rows. |
table.pk | Indices of primary key columns. |
Examples
curl https://repository/refs/heads/main/ \
-H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NDA3NDIwOTUsImlhdCI6MTYzMjk2NjA5NSwiaXNzIjoiV3JnbGQiLCJlbWFpbCI6ImpvaG4uZG9lQGRvbWFpbi5jb20iLCJuYW1lIjoiSm9obiBEb2UifQ.QVTB-BdQn1TFijaNNJvc-KTjcgeD5SsvsClzZj5rUgU'
{
"sum": "b82a7bc38f81c93397a1468167c1cd2a",
"authorName": "John Doe",
"authorEmail": "john.doe@domain.com",
"message": "second commit",
"table": {
"sum": "86c76d68e453f322ce40a348ec033e0d",
"columns": [
"id",
"name",
"address",
"city"
],
"pk": [
0
],
"rowsCount": 18278
},
"time": "2021-09-27T19:05:09+07:00",
"parents": [
"67026b17716d98a3b52ed9f82737e990"
]
}
Response:
/commits/
POST
Required scope: repo.write
Commit a CSV file under a branch. Unlike most other requests, this request uses multipart/form-data
format to upload the CSV file along with other fields.
Request payload
Name | Description |
---|---|
branch | The branch name. Valid branch name can only consist of alphanumerics, underscore ('_'), and dash ('-'). |
message | The commit message. |
file | The CSV file to process. If the filename ends with ".gz" then it will be unzipped. |
primaryKey | Comma-separated list of primary key columns. |
Response payload
Name | Description |
---|---|
sum | Checksum of the commit. |
table | Checksum of the underlying table. |
Examples
curl https://repository/commits/ -XPOST \
-H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NDA3NDIwOTUsImlhdCI6MTYzMjk2NjA5NSwiaXNzIjoiV3JnbGQiLCJlbWFpbCI6ImpvaG4uZG9lQGRvbWFpbi5jb20iLCJuYW1lIjoiSm9obiBEb2UifQ.QVTB-BdQn1TFijaNNJvc-KTjcgeD5SsvsClzZj5rUgU' \
-F branch=main \
-F message="new data" \
-F file=@data.csv \
-F primaryKey=id
{
"sum": "dc6a85da7b691f444ef7aff844038efd",
"table": "752da8a3f5138c2c2ee44f825bf8d8f4"
}
Response:
GET
Return the commit tree rooted at the commit with the given checksum.
Query parameters
Name | Description |
---|---|
head | The root commit. This could be a reference or a commit sum. |
maxDepth | The max depth of the returned tree. If it is not specified then defaults to 20. |
Response payload
Name | Description |
---|---|
sum | Checksum of the root commit. |
root | The root commit object. Each commit object contains a nested tree of ancestor commits. |
Commit object fields:
Name | Description |
---|---|
authorName | Name of the author. |
authorEmail | Email of the author. |
message | The commit message |
time | The commit time |
parents | List of parent checksums. |
table.sum | Checksum of the underlying table. |
parentCommits | Mapping between parent checksum and commit objects. |
Examples
curl 'https://repository/commits/?head=heads%2Fmain' \
-H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NDA3NDIwOTUsImlhdCI6MTYzMjk2NjA5NSwiaXNzIjoiV3JnbGQiLCJlbWFpbCI6ImpvaG4uZG9lQGRvbWFpbi5jb20iLCJuYW1lIjoiSm9obiBEb2UifQ.QVTB-BdQn1TFijaNNJvc-KTjcgeD5SsvsClzZj5rUgU'
{
"sum": "8c958f57cb85a6da985daf72cecc41d4",
"root": {
"authorName": "John Doe",
"authorEmail": "john.doe@domain.com",
"message": "second commit",
"table": {
"sum": "0dfffa2f481252602f4194b97a3616fd"
},
"time": "2021-09-27T19:05:07+07:00",
"parents": [
"67026b17716d98a3b52ed9f82737e990"
],
"parentCommits": {
"67026b17716d98a3b52ed9f82737e990": {
"authorName": "John Doe",
"authorEmail": "john.doe@domain.com",
"message": "first commit",
"table": {
"sum": "43a5f3447e82b53a2574ef5af470df96"
},
"time": "2021-09-27T19:04:58+07:00"
}
}
}
}
Response:
/commits/:sum/
GET
Required scope: repo.read
Get the commit object with the given checksum.
Response payload
The same payload as GET /refs/heads/:branch/.
Examples
curl https://repository/commits/b82a7bc38f81c93397a1468167c1cd2a/ \
-H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NDA3NDIwOTUsImlhdCI6MTYzMjk2NjA5NSwiaXNzIjoiV3JnbGQiLCJlbWFpbCI6ImpvaG4uZG9lQGRvbWFpbi5jb20iLCJuYW1lIjoiSm9obiBEb2UifQ.QVTB-BdQn1TFijaNNJvc-KTjcgeD5SsvsClzZj5rUgU'
{
"sum": "b82a7bc38f81c93397a1468167c1cd2a",
"authorName": "John Doe",
"authorEmail": "john.doe@domain.com",
"message": "second commit",
"table": {
"sum": "86c76d68e453f322ce40a348ec033e0d",
"columns": [
"id",
"name",
"address",
"city"
],
"pk": [
0
],
"rowsCount": 18278
},
"time": "2021-09-27T19:05:09+07:00",
"parents": [
"67026b17716d98a3b52ed9f82737e990"
]
}
Response:
/commits/:sum/profile/
GET
Required scope: repo.read
Get the data summary of a commitResponse payload
Name | Description |
---|---|
rowsCount | Number of rows |
columns | Summaries of each column. Each summary object is describe below. |
Column summary object:
Name | Description |
---|---|
name | Name of the column |
naCount | Number of empty values |
minStrLen | Minimum length of value |
maxStrLen | Maximum length of value |
avgStrLen | Average length of value |
min | Minimum value, available if all values are numeric |
max | Maximum value, available if all values are numeric |
mean | Mean value, available if all values are numeric |
median | Median value, available if all values are numeric |
stdDeviation | Standard deviation, available if all values are numeric |
percentiles | List of all percentile values from 5th till 95th percentile, available if all values are numeric |
topValues | List of top 20 values that occur most frequently and their counts. Unavailable if all values in this column are unique. |
Examples
curl https://repository/commits/86c76d68e453f322ce40a348ec033e0d/profile/ \
-H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NDA3NDIwOTUsImlhdCI6MTYzMjk2NjA5NSwiaXNzIjoiV3JnbGQiLCJlbWFpbCI6ImpvaG4uZG9lQGRvbWFpbi5jb20iLCJuYW1lIjoiSm9obiBEb2UifQ.QVTB-BdQn1TFijaNNJvc-KTjcgeD5SsvsClzZj5rUgU'
{
"rowsCount": 29568,
"columns": [
{
"name": "complaint_uid",
"naCount": 0,
"minStrLen": 32,
"maxStrLen": 32,
"avgStrLen": 32
},
{
"name": "tracking_number",
"naCount": 891,
"minStrLen": 3,
"maxStrLen": 14,
"avgStrLen": 10,
"topValues": [
{
"v": "2015-HS ",
"c": 137
},
{
"v": "2013-HS ",
"c": 89
},
{
"v": "2010-HS ",
"c": 84
},
{
"v": "2011-HS ",
"c": 57
},
{
"v": "10-",
"c": 40
},
{
"v": "2012-HS ",
"c": 36
},
{
"v": "2007-0214-R",
"c": 35
},
{
"v": "2014-HS ",
"c": 34
},
{
"v": "09-",
"c": 29
},
{
"v": "2011-0578-C",
"c": 28
},
{
"v": "12-",
"c": 26
},
{
"v": "2006-0030-C",
"c": 25
},
{
"v": "2010-1507-C",
"c": 25
},
{
"v": "2010-0999-R",
"c": 24
},
{
"v": "2013-0287-C",
"c": 21
},
{
"v": "2016-0464-P",
"c": 21
},
{
"v": "2011-1212-R",
"c": 18
},
{
"v": "2014-0775-R",
"c": 18
},
{
"v": "L-035-19",
"c": 18
},
{
"v": "11-",
"c": 17
}
]
},
{
"name": "data_production_year",
"naCount": 976,
"min": 2016,
"max": 2021,
"mean": 1953.35,
"median": 2020,
"stdDeviation": 65.57,
"minStrLen": 6,
"maxStrLen": 6,
"avgStrLen": 6,
"topValues": [
{
"v": "2020.0",
"c": 24642
},
{
"v": "2021.0",
"c": 2505
},
{
"v": "2019.0",
"c": 1062
},
{
"v": "2018.0",
"c": 378
},
{
"v": "2016.0",
"c": 5
}
],
"percentiles": [
2019,
2020,
2020,
2020,
2020,
2020,
2020,
2020,
2020,
2020,
2020,
2020,
2020,
2020,
2020,
2020,
2020,
2020,
2021
]
}
]
}
Response:
/tables/:sum/
GET
Required scope: repo.read
Get the table object with the given checksum.
Response payload
Name | Description |
---|---|
columns | List of column names. |
rowsCount | Number of rows. |
pk | Indices of primary key columns. |
Examples
curl https://repository/tables/86c76d68e453f322ce40a348ec033e0d/ \
-H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NDA3NDIwOTUsImlhdCI6MTYzMjk2NjA5NSwiaXNzIjoiV3JnbGQiLCJlbWFpbCI6ImpvaG4uZG9lQGRvbWFpbi5jb20iLCJuYW1lIjoiSm9obiBEb2UifQ.QVTB-BdQn1TFijaNNJvc-KTjcgeD5SsvsClzZj5rUgU'
{
"columns": [
"id",
"name",
"address",
"city"
],
"pk": [
0
],
"rowsCount": 18278
}
Response:
/tables/:sum/blocks/
GET
Required scope: repo.read
Download a range of blocks from a table.
Query parameters
Name | Description |
---|---|
start | First index (zero-indexed, inclusive) of the blocks to download. Defaults to 0. |
end | Last index (non-inclusive) of the blocks to download. If not specified, download to the last block. |
format | The payload format, defaults to csv . Valid choices are:
|
columns | If format=csv and columns=true then the first row is the column names. This is a great way to get back the original CSV. |
Response payload
No matter whether the payload is CSV or binary, it is always gzip encoded (with header Content-Encoding: gzip
).
Examples
# Download the entire table as CSV
curl 'https://repository/tables/86c76d68e453f322ce40a348ec033e0d/blocks/?columns=true' \
-H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NDA3NDIwOTUsImlhdCI6MTYzMjk2NjA5NSwiaXNzIjoiV3JnbGQiLCJlbWFpbCI6ImpvaG4uZG9lQGRvbWFpbi5jb20iLCJuYW1lIjoiSm9obiBEb2UifQ.QVTB-BdQn1TFijaNNJvc-KTjcgeD5SsvsClzZj5rUgU' \
| gunzip - > data.csv
# Download the first block only
curl 'https://repository/tables/86c76d68e453f322ce40a348ec033e0d/blocks/?end=1' \
-H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NDA3NDIwOTUsImlhdCI6MTYzMjk2NjA5NSwiaXNzIjoiV3JnbGQiLCJlbWFpbCI6ImpvaG4uZG9lQGRvbWFpbi5jb20iLCJuYW1lIjoiSm9obiBEb2UifQ.QVTB-BdQn1TFijaNNJvc-KTjcgeD5SsvsClzZj5rUgU' \
| gunzip - > data.csv
/tables/:sum/rows/
GET
Required scope: repo.read
Get rows from a table at specified offsets.
Query parameters
Name | Description |
---|---|
offsets | Command-separated offsets (zero-indexed) of rows to download. If not specified, 307 -redirect to /tables/:sum/blocks/ to download all rows. |
Response payload
Rows will be downloaded as gzipped CSV.
Examples
# Download rows with odd offset from 1 to 9
curl 'https://repository/tables/86c76d68e453f322ce40a348ec033e0d/rows/?offsets=1,3,5,7,9' \
-H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NDA3NDIwOTUsImlhdCI6MTYzMjk2NjA5NSwiaXNzIjoiV3JnbGQiLCJlbWFpbCI6ImpvaG4uZG9lQGRvbWFpbi5jb20iLCJuYW1lIjoiSm9obiBEb2UifQ.QVTB-BdQn1TFijaNNJvc-KTjcgeD5SsvsClzZj5rUgU' \
| gunzip - > data.csv
/tables/:sum/profile/
GET
Required scope: repo.read
Get the data summary of a table.
Response payload
The same payload as commit profile endpoint.
/blocks/
GET
Required scope: repo.read
Download a range of blocks from the table underlying a commit.
Query parameters
Name | Description |
---|---|
head | Commit hash (e.g. 639c229dd42c53e03d716eaa0829916b) or ref name (e.g. heads/main) |
start | First index (zero-indexed, inclusive) of the blocks to download. Defaults to 0. |
end | Last index (non-inclusive) of the blocks to download. If not specified, download to the last block. |
format | The payload format, defaults to csv . Valid choices are:
|
columns | If format=csv and columns=true then the first row is the column names. This is a great way to get back the original CSV. |
Response payload
No matter whether the payload is CSV or binary, it is always gzip encoded (with header Content-Encoding: gzip
).
Examples
# Download the entire table as CSV
curl 'https://repository/blocks/?head=86c76d68e453f322ce40a348ec033e0d&columns=true' \
-H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NDA3NDIwOTUsImlhdCI6MTYzMjk2NjA5NSwiaXNzIjoiV3JnbGQiLCJlbWFpbCI6ImpvaG4uZG9lQGRvbWFpbi5jb20iLCJuYW1lIjoiSm9obiBEb2UifQ.QVTB-BdQn1TFijaNNJvc-KTjcgeD5SsvsClzZj5rUgU' \
| gunzip - > data.csv
# Download the first block only
curl 'https://repository/blocks/?head=heads%2Fmain&end=1' \
-H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NDA3NDIwOTUsImlhdCI6MTYzMjk2NjA5NSwiaXNzIjoiV3JnbGQiLCJlbWFpbCI6ImpvaG4uZG9lQGRvbWFpbi5jb20iLCJuYW1lIjoiSm9obiBEb2UifQ.QVTB-BdQn1TFijaNNJvc-KTjcgeD5SsvsClzZj5rUgU' \
| gunzip - > data.csv
/rows/
GET
Required scope: repo.read
Get rows from the table underlying a commit at specified offsets.
Query parameters
Name | Description |
---|---|
head | Commit hash (e.g. 639c229dd42c53e03d716eaa0829916b) or ref name (e.g. heads/main) |
offsets | Command-separated offsets (zero-indexed) of rows to download. If not specified, 307 -redirect to /tables/:sum/blocks/ to download all rows. |
Response payload
Rows will be downloaded as gzipped CSV.
Examples
# Download rows with odd offset from 1 to 9
curl 'https://repository/rows/?head=86c76d68e453f322ce40a348ec033e0d&offsets=1,3,5,7,9' \
-H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NDA3NDIwOTUsImlhdCI6MTYzMjk2NjA5NSwiaXNzIjoiV3JnbGQiLCJlbWFpbCI6ImpvaG4uZG9lQGRvbWFpbi5jb20iLCJuYW1lIjoiSm9obiBEb2UifQ.QVTB-BdQn1TFijaNNJvc-KTjcgeD5SsvsClzZj5rUgU' \
| gunzip - > data.csv
/diff/:sum1/:sum2/
GET
Required scope: repo.read
Compare two commits at sum1
and sum2
.
Response payload
The naming convention for field names is to call the first commit the "newer" commit, and the second commit the "older" commit. It doesn't mean that there is any sort of chronological relationship between the two commits, but rather to keep consistent with diff UI which also calls the first commit the "newer" commit.
Name | Description |
---|---|
tableSum | Table sum of the first commit. |
oldTableSum | Table sum of the second commit. |
columns | Column names of the first commit. |
oldColumns | Column names of the second commit. |
pk | Indices of primary key columns of the first commit. |
oldPK | Indices of primary key columns of the first commit. |
rowDiff | List of row-level changes. Row-level changes are only computed and presented when the primary key for both commits are the same, which more concretely means one of two cases:
|
Examples
curl 'https://repository/diff/8c958f57cb85a6da985daf72cecc41d4/67026b17716d98a3b52ed9f82737e990/' \
-H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NDA3NDIwOTUsImlhdCI6MTYzMjk2NjA5NSwiaXNzIjoiV3JnbGQiLCJlbWFpbCI6ImpvaG4uZG9lQGRvbWFpbi5jb20iLCJuYW1lIjoiSm9obiBEb2UifQ.QVTB-BdQn1TFijaNNJvc-KTjcgeD5SsvsClzZj5rUgU'
{
"tableSum": "0dfffa2f481252602f4194b97a3616fd",
"oldTableSum": "43a5f3447e82b53a2574ef5af470df96",
"oldPK": [
0
],
"pk": [
0
],
"oldColumns": [
"id",
"name",
"address",
"city",
"employment"
],
"columns": [
"id",
"name",
"address",
"city",
"employment"
],
"rowDiff": [
{
"off1": 0
},
{
"off1": 1,
"off2": 0
},
{
"off2": 3
}
]
}
Response: