HTTP API for Delta Lake. Delta Fetch allows to configure HTTP endpoints to retrieve rows from Delta Lake tables.
After applying methods mentioned in storage recommendations you can easly achieve fetch times that are around 1s on any amount of data since it scales horizontally pretty good.
Delta table that in S3 looks like this:
It has 24498176 records. Here are few examples of the requests time it took to serve a requests (using file index cache):
time curl http://localhost:8080/api/data/disable_optimize_ordered/872480210503_234678
{"version":5,"data":{"user_id":"872480210503_234678","sub_type":"PREPAID","activation_date":"2018-09-01","status":"ACTIVE","deactivation_date":"9999-01-01"}}curl 0.00s user 0.01s system 1% cpu 0.982 total
---
time curl http://localhost:8080/api/data/disable_optimize_ordered/579520210231_237911
{"version":5,"data":{"user_id":"579520210231_237911","sub_type":"PREPAID","activation_date":"2018-06-24","status":"ACTIVE","deactivation_date":"9999-01-01"}}curl 0.00s user 0.01s system 0% cpu 1.250 total
---
➜ ~ time curl http://localhost:8080/api/data/disable_optimize_ordered/875540210000_245810
{"version":2,"data":{"user_id":"875540210000_245810","sub_type":"PREPAID","activation_date":"2018-09-01","status":"ACTIVE","deactivation_date":"9999-01-01"}}curl 0.00s user 0.01s system 1% cpu 0.870 total
Service caches Delta table index (value ranges) after the first request to resource API is made.
After the first request, following requests are using in-memory index to find Parquet files
that potentially contain desired value. You can force the service to update the index by adding ?exact=true query
param, when making a HTTP request. You can also enable background process, which will updated cached index
on your specified interval:
app:
cache-update-interval: 10mResources can be configured in the following way:
app:
resources:
- path: /api/data/{table}/{identifier}
schema-path: /api/schemas/{table}/{identifier}
delta-path: s3a://bucket/delta/{table}/
response-type: SINGLE
filter-variables:
- column: id
path-variable: identifierpathproperty defines API path which will be used to query your Delta tables. Path variables can be defined by using curly braces as shown in the example.schema-path(optional) property can be used to define API path for Delta table schema.delta-pathproperty defines S3 path of your Delta table. Path variables on this path will be filled in by variables provided in API path.response-type(optional, default:SINGLE) property defines weather to search for multiple resources, or a single one. UseLISTtype for multiple resources.max-results(optional, default:100) maximum number of rows that can be returned in case ofLISTresponse-type.filter-variables(optional) additional filters applied to Delta table.
Delta Fetch uses dedicated thread pools for handling API requests and Parquet file reading:
micronaut:
executors:
api:
type: FIXED
nThreads: ${API_EXECUTOR_THREADS:24}
parquet-reader:
type: FIXED
nThreads: ${PARQUET_READER_THREADS:30}apiexecutor handles incoming HTTP requests. Default: 24 threads.parquet-readerexecutor reads Parquet files in parallel for faster response times. Default: 30 threads.
You can tune these values based on your workload and available CPU cores. The parquet-reader pool is shared across all concurrent API requests, so size it accordingly if you expect high concurrency.
Delta Fetch currently supports two authorization mechanisms.
Example of basic authentication:
app:
security:
enabled: true
basic:
enabled: true
username: username
password: passwordWe also support OAuth2 with JWT tokens, which are verified by using JWK (OpenID Connect):
app:
security:
enabled: true
oauth2:
enabled: true
claims-validators:
issuer: https://issuer.url/oauth/token
jwks:
url: https://stagecat.exacaster.com/uaa/token_keys
allowed-scopes:
- ws.117.ownerIf JWT token has any of scope defined in allowed-scopes, user is allowed to access the API.
Path variables can also be used in scope list, for example: ws.{worpsace}.owner.
To configure credentials for S3 connection use these properties:
app:
hadoop-props:
"fs.s3a.access.key": XXX
"fs.s3a.secret.key": XXXXXXDelta Fetch is MIT licensed.
