Skip to content

Commit 7ae57f6

Browse files
killme2008Copilotnicecui
authored
docs: improve quick start (#2129)
Signed-off-by: Dennis Zhuang <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Yiran <[email protected]>
1 parent 9ae5df7 commit 7ae57f6

File tree

4 files changed

+744
-188
lines changed

4 files changed

+744
-188
lines changed

docs/getting-started/quick-start.md

Lines changed: 186 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,14 @@ Before proceeding, please ensure you have [installed GreptimeDB](./installation/
99

1010
This guide will walk you through creating a metric table and a log table, highlighting the core features of GreptimeDB.
1111

12+
You’ll learn (10–15 minutes)
13+
* Start and connect to GreptimeDB locally
14+
* Create metrics and logs tables and insert sample data
15+
* Query and aggregate data
16+
* Compute p95 and ERROR counts in 5-second windows and align them
17+
* Join metrics with logs to spot anomalous hosts and time periods
18+
* Combine SQL and PromQL to query data
19+
1220
## Connect to GreptimeDB
1321

1422
GreptimeDB supports [multiple protocols](/user-guide/protocols/overview.md) for interacting with the database.
@@ -38,25 +46,25 @@ Suppose you have an event table named `grpc_latencies` that stores the gRPC serv
3846
The table schema is as follows:
3947

4048
```sql
49+
-- Metrics: gRPC call latency in milliseconds
4150
CREATE TABLE grpc_latencies (
4251
ts TIMESTAMP TIME INDEX,
4352
host STRING INVERTED INDEX,
4453
method_name STRING,
4554
latency DOUBLE,
4655
PRIMARY KEY (host, method_name)
47-
) with('append_mode'='true');
56+
);
4857
```
4958

5059
- `ts`: The timestamp when the metric was collected. It is the time index column.
5160
- `host`: The hostname of the application server, enabling [inverted index](/user-guide/manage-data/data-index.md#inverted-index).
5261
- `method_name`: The name of the RPC request method.
5362
- `latency`: The latency of the RPC request.
5463

55-
And it's [append only](/user-guide/deployments-administration/performance-tuning/design-table.md#when-to-use-append-only-tables) by setting `append_mode` to true, which is good for performance.
56-
5764
Additionally, there is a table `app_logs` for storing application logs:
5865

5966
```sql
67+
-- Logs: application logs
6068
CREATE TABLE app_logs (
6169
ts TIMESTAMP TIME INDEX,
6270
host STRING INVERTED INDEX,
@@ -73,7 +81,8 @@ CREATE TABLE app_logs (
7381
- `log_level`: The log level of the log entry.
7482
- `log_msg`: The log message, enabling [fulltext index](/user-guide/manage-data/data-index.md#fulltext-index).
7583

76-
It's append only, too.
84+
And it's [append only](/user-guide/deployments-administration/performance-tuning/design-table.md#when-to-use-append-only-tables) by setting `append_mode` to true, which is good for performance. Other table options, such as data retention, are supported too.
85+
7786
::::tip
7887
We use SQL to ingest the data below, so we need to create the tables manually. However, GreptimeDB is [schemaless](/user-guide/ingest-data/overview.md#automatic-schema-generation) and can automatically generate schemas when using other ingestion methods.
7988
::::
@@ -158,7 +167,7 @@ INSERT INTO app_logs (ts, host, api_path, log_level, log_msg) VALUES
158167

159168
### Filter by tags and time index
160169

161-
You can filter data using the WHERE clause.
170+
You can filter data using the `WHERE` clause.
162171
For example, to query the latency of `host1` after `2024-07-11 20:00:15`:
163172

164173
```sql
@@ -206,7 +215,15 @@ GROUP BY host;
206215

207216
Filter the log messages by keyword `timeout`:
208217
```sql
209-
SELECT * FROM app_logs WHERE lower(log_msg) @@ 'timeout' AND ts > '2024-07-11 20:00:00';
218+
SELECT
219+
*
220+
FROM
221+
app_logs
222+
WHERE
223+
lower(log_msg) @@ 'timeout'
224+
AND ts > '2024-07-11 20:00:00'
225+
ORDER BY
226+
ts;
210227
```
211228

212229
```sql
@@ -228,92 +245,214 @@ You can use [range queries](/reference/sql/range.md#range-query) to monitor late
228245
For example, to calculate the p95 latency of requests using a 5-second window:
229246

230247
```sql
231-
SELECT
232-
ts,
233-
host,
234-
approx_percentile_cont(0.95) WITHIN GROUP (ORDER BY latency) RANGE '5s' AS p95_latency
235-
FROM
248+
SELECT
249+
ts,
250+
host,
251+
approx_percentile_cont(0.95) WITHIN GROUP (ORDER BY latency)
252+
RANGE '5s' AS p95_latency
253+
FROM
236254
grpc_latencies
237-
ALIGN '5s' FILL PREV;
255+
ALIGN '5s' FILL PREV
256+
ORDER BY
257+
host,ts;
238258
```
239259

240260
```sql
241261
+---------------------+-------+-------------+
242262
| ts | host | p95_latency |
243263
+---------------------+-------+-------------+
244-
| 2024-07-11 20:00:05 | host2 | 114 |
245-
| 2024-07-11 20:00:10 | host2 | 111 |
246-
| 2024-07-11 20:00:15 | host2 | 115 |
247-
| 2024-07-11 20:00:20 | host2 | 95 |
248264
| 2024-07-11 20:00:05 | host1 | 104.5 |
249265
| 2024-07-11 20:00:10 | host1 | 4200 |
250266
| 2024-07-11 20:00:15 | host1 | 3500 |
251267
| 2024-07-11 20:00:20 | host1 | 2500 |
268+
| 2024-07-11 20:00:05 | host2 | 114 |
269+
| 2024-07-11 20:00:10 | host2 | 111 |
270+
| 2024-07-11 20:00:15 | host2 | 115 |
271+
| 2024-07-11 20:00:20 | host2 | 95 |
252272
+---------------------+-------+-------------+
253273
8 rows in set (0.06 sec)
254274
```
255275

276+
The range query is very powerful for querying and aggregating data based on time windows, please read the [manual](/reference/sql/range.md#range-query) to learn more.
277+
256278
### Correlate Metrics and Logs
257279

258280
By combining the data from the two tables,
259281
you can easily and quickly determine the time of failure and the corresponding logs.
260282
The following SQL query uses the `JOIN` operation to correlate the metrics and logs:
261283

262284
```sql
263-
-- CTE using Range Query to query metrics and logs with aligned time windows
285+
-- Align metrics and logs into 5s buckets, then join
264286
WITH
287+
-- metrics: per-host p95 latency in 5s buckets
265288
metrics AS (
266-
SELECT
267-
ts,
268-
host,
269-
approx_percentile_cont(0.95) WITHIN GROUP (ORDER BY latency) RANGE '5s' AS p95_latency
270-
FROM
271-
grpc_latencies
289+
SELECT
290+
ts,
291+
host,
292+
approx_percentile_cont(0.95) WITHIN GROUP (ORDER BY latency) RANGE '5s' AS p95_latency
293+
FROM grpc_latencies
272294
ALIGN '5s' FILL PREV
273-
),
295+
),
296+
-- logs: per-host ERROR counts in the same 5s buckets
274297
logs AS (
275-
SELECT
276-
ts,
298+
SELECT
299+
ts,
277300
host,
278-
count(log_msg) RANGE '5s' AS num_errors,
279-
FROM
280-
app_logs
281-
WHERE
282-
log_level = 'ERROR'
301+
count(log_msg) RANGE '5s' AS num_errors
302+
FROM app_logs
303+
WHERE log_level = 'ERROR'
283304
ALIGN '5s'
284-
)
285-
-- Analyze and correlate metrics and logs
286-
SELECT
287-
metrics.ts,
288-
p95_latency,
289-
coalesce(num_errors, 0) as num_errors,
290-
metrics.host
291-
FROM
292-
metrics
293-
LEFT JOIN logs ON metrics.host = logs.host
294-
AND metrics.ts = logs.ts
295-
ORDER BY
296-
metrics.ts;
305+
)
306+
SELECT
307+
m.ts,
308+
m.p95_latency,
309+
COALESCE(l.num_errors, 0) AS num_errors,
310+
m.host
311+
FROM metrics m
312+
LEFT JOIN logs l
313+
ON m.host = l.host AND m.ts = l.ts
314+
ORDER BY m.ts, m.host;
297315
```
298316

299317

300318
```sql
301319
+---------------------+-------------+------------+-------+
302320
| ts | p95_latency | num_errors | host |
303321
+---------------------+-------------+------------+-------+
304-
| 2024-07-11 20:00:05 | 114 | 0 | host2 |
305322
| 2024-07-11 20:00:05 | 104.5 | 0 | host1 |
323+
| 2024-07-11 20:00:05 | 114 | 0 | host2 |
306324
| 2024-07-11 20:00:10 | 4200 | 10 | host1 |
307325
| 2024-07-11 20:00:10 | 111 | 0 | host2 |
308-
| 2024-07-11 20:00:15 | 115 | 0 | host2 |
309326
| 2024-07-11 20:00:15 | 3500 | 4 | host1 |
310-
| 2024-07-11 20:00:20 | 110 | 0 | host2 |
327+
| 2024-07-11 20:00:15 | 115 | 0 | host2 |
311328
| 2024-07-11 20:00:20 | 2500 | 0 | host1 |
329+
| 2024-07-11 20:00:20 | 95 | 0 | host2 |
312330
+---------------------+-------------+------------+-------+
313331
8 rows in set (0.02 sec)
314332
```
315333

316334
We can see that during the time window when the gRPC latencies increases, the number of error logs also increases significantly, and we've identified that the problem is on `host1`.
335+
336+
### Query data via PromQL
337+
338+
GreptimeDB supports [Prometheus Query Language and its APIs](/user-guide/query-data/promql.md), allowing you to query metrics using PromQL. For example, you can retrieve the p95 latency over the last 1 minute per host with this query:
339+
340+
```promql
341+
quantile_over_time(0.95, grpc_latencies{host!=""}[1m])
342+
```
343+
344+
To test this, use the following curl command:
345+
```bash
346+
curl -X POST \
347+
-H 'Authorization: Basic {{authorization if exists}}' \
348+
--data-urlencode 'query=quantile_over_time(0.95, grpc_latencies{host!=""}[1m])' \
349+
--data-urlencode 'start=2024-07-11 20:00:00Z' \
350+
--data-urlencode 'end=2024-07-11 20:00:20Z' \
351+
--data-urlencode 'step=1m' \
352+
'http://localhost:4000/v1/prometheus/api/v1/query_range'
353+
```
354+
355+
We set the `step` to 1 minute.
356+
357+
Output:
358+
```json
359+
{
360+
"status": "success",
361+
"data": {
362+
"resultType": "matrix",
363+
"result": [
364+
{
365+
"metric": {
366+
"__name__": "grpc_latencies",
367+
"host": "host1",
368+
"method_name": "GetUser"
369+
},
370+
"values": [
371+
[
372+
1720728000.0,
373+
"103"
374+
]
375+
]
376+
},
377+
{
378+
"metric": {
379+
"__name__": "grpc_latencies",
380+
"host": "host2",
381+
"method_name": "GetUser"
382+
},
383+
"values": [
384+
[
385+
1720728000.0,
386+
"113"
387+
]
388+
]
389+
}
390+
]
391+
}
392+
}
393+
```
394+
395+
Even more powerful, you can use SQL to execute PromQL and mix the two, for example:
396+
```sql
397+
TQL EVAL ('2024-07-11 20:00:00Z', '2024-07-11 20:00:20Z','1m')
398+
quantile_over_time(0.95, grpc_latencies{host!=""}[1m]);
399+
```
400+
401+
This SQL query will produce:
402+
```sql
403+
+---------------------+---------------------------------------------------------+-------+-------------+
404+
| ts | prom_quantile_over_time(ts_range,latency,Float64(0.95)) | host | method_name |
405+
+---------------------+---------------------------------------------------------+-------+-------------+
406+
| 2024-07-11 20:00:00 | 113 | host2 | GetUser |
407+
| 2024-07-11 20:00:00 | 103 | host1 | GetUser |
408+
+---------------------+---------------------------------------------------------+-------+-------------+
409+
```
410+
411+
Rewrite the correlation example:
412+
```sql
413+
WITH
414+
metrics AS (
415+
TQL EVAL ('2024-07-11 20:00:00Z', '2024-07-11 20:00:20Z', '5s')
416+
quantile_over_time(0.95, grpc_latencies{host!=""}[5s])
417+
),
418+
logs AS (
419+
SELECT
420+
ts,
421+
host,
422+
COUNT(log_msg) RANGE '5s' AS num_errors
423+
FROM app_logs
424+
WHERE log_level = 'ERROR'
425+
ALIGN '5s'
426+
)
427+
SELECT
428+
m.*,
429+
COALESCE(l.num_errors, 0) AS num_errors
430+
FROM metrics AS m
431+
LEFT JOIN logs AS l
432+
ON m.host = l.host
433+
AND m.ts = l.ts
434+
ORDER BY
435+
m.ts,
436+
m.host;
437+
```
438+
439+
```sql
440+
+---------------------+---------------------------------------------------------+-------+-------------+------------+
441+
| ts | prom_quantile_over_time(ts_range,latency,Float64(0.95)) | host | method_name | num_errors |
442+
+---------------------+---------------------------------------------------------+-------+-------------+------------+
443+
| 2024-07-11 20:00:05 | 103 | host1 | GetUser | 0 |
444+
| 2024-07-11 20:00:05 | 113 | host2 | GetUser | 0 |
445+
| 2024-07-11 20:00:10 | 140.89999999999998 | host1 | GetUser | 10 |
446+
| 2024-07-11 20:00:10 | 113.8 | host2 | GetUser | 0 |
447+
| 2024-07-11 20:00:15 | 3400 | host1 | GetUser | 4 |
448+
| 2024-07-11 20:00:15 | 114 | host2 | GetUser | 0 |
449+
| 2024-07-11 20:00:20 | 3375 | host1 | GetUser | 0 |
450+
| 2024-07-11 20:00:20 | 115 | host2 | GetUser | 0 |
451+
+---------------------+---------------------------------------------------------+-------+-------------+------------+
452+
```
453+
454+
By using [TQL](/reference/sql/tql.md) commands, you can combine the power of SQL and PromQL, making correlation analysis and complex queries no longer difficult.
455+
317456
<!-- TODO need to fix bug
318457
319458
### Continuous aggregation

0 commit comments

Comments
 (0)