Skip to content

Commit 59e769a

Browse files
authored
Add simple tutorial for Quicwkit on Lambdas (#4418)
* Add simple tutorial for Quicwkit on Lambdas * Fix title and description of lambda tutorial. * Tutorial aws update.
1 parent 1088e57 commit 59e769a

File tree

3 files changed

+136
-3
lines changed

3 files changed

+136
-3
lines changed
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
---
2+
title: Search with AWS Lambda
3+
description: Index and search using AWS Lambda on 20 million log entries
4+
tags: [aws, integration]
5+
icon_url: /img/tutorials/aws-logo.png
6+
sidebar_position: 4
7+
---
8+
9+
In this tutorial, we will index and search about 20 million log entries (7 GB decompressed) located on AWS S3 with Quickwit Lambda.
10+
11+
Concretely, we will deploy an AWS CloudFormation stack with the Quickwit Lambdas, and two buckets: one staging for hosting gzipped newline-delimited JSON files to be indexed and one for hosting the index data. The staging bucket is optional as Quickwit indexer can read data from any S3 files it has access to.
12+
13+
![Tutorial stack overview](../../assets/images/quickwit-lambda-service.svg)
14+
15+
## Install
16+
17+
### Install AWS CDK
18+
19+
We will use [AWS CDK](https://aws.amazon.com/cdk/) for our infrastructure automation script. Install it using [npm](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm):
20+
```bash
21+
npm install -g aws-cdk
22+
23+
You also need AWS credentials to be properly configured in your shell. One way is using the [credentials file](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
24+
25+
Finally, clone the Quickwit repository:
26+
```bash
27+
git clone https://github.com/quickwit-oss/tutorials.git
28+
cd tutorials/simple-lambda-stack
29+
```
30+
31+
### Setup python environment
32+
33+
We use python 3.10 to define the AWS CloudFormation stack we need to deploy, and a python CLI to invoke Lambdas.
34+
Let's install those few packages (boto3, aws-cdk-lib, click, pyyaml).
35+
36+
```bash
37+
# Install pipenv if needed.
38+
pip install --user pipenv
39+
pipenv shell
40+
pipenv install
41+
```
42+
43+
### Download Quickwit Lambdas
44+
45+
```bash
46+
mkdir -p cdk.out
47+
wget -P cdk.out https://github.com/quickwit-oss/quickwit/releases/download/aws-lambda-beta-01/quickwit-lambda-indexer-beta-01-x86_64.zip
48+
wget -P cdk.out https://github.com/quickwit-oss/quickwit/releases/download/aws-lambda-beta-01/quickwit-lambda-searcher-beta-01-x86_64.zip
49+
```
50+
51+
### Bootstrap and deploy
52+
53+
Configure the AWS region and [account id](https://docs.aws.amazon.com/IAM/latest/UserGuide/console_account-alias.html) where you want to deploy the stack:
54+
55+
```bash
56+
export CDK_ACCOUNT=123456789
57+
export CDK_REGION=us-east-1
58+
```
59+
60+
If this region/account pair was not bootstrapped by CDK yet, run:
61+
```bash
62+
cdk bootstrap aws://$CDK_ACCOUNT/$CDK_REGION
63+
```
64+
65+
This initializes some basic resources to host artifacts such as Lambda packages.
66+
67+
## Index the HDFS logs dataset
68+
69+
Here is an example of a log entry of the dataset:
70+
```json
71+
{
72+
"timestamp": 1460530013,
73+
"severity_text": "INFO",
74+
"body": "PacketResponder: BP-108841162-10.10.34.11-1440074360971:blk_1074072698_331874, type=HAS_DOWNSTREAM_IN_PIPELINE terminating",
75+
"resource": {
76+
"service": "datanode/01"
77+
},
78+
"attributes": {
79+
"class": "org.apache.hadoop.hdfs.server.datanode.DataNode"
80+
},
81+
"tenant_id": 58
82+
}
83+
```
84+
85+
If you have a few minutes ahead of you, you can index the whole dataset which is available on our public S3 bucket.
86+
87+
```bash
88+
python cli.py index s3://quickwit-datasets-public/hdfs-logs-multitenants.json.gz
89+
```
90+
91+
If not, just index the 10,000 documents dataset:
92+
93+
```bash
94+
python cli.py index s3://quickwit-datasets-public/hdfs-logs-multitenants-10000.json
95+
```
96+
97+
## Execute search queries
98+
99+
Let's start with a query on the field `severity_text` and look for errors: `severity_text:ERROR`:
100+
101+
```bash
102+
python cli.py search '{"query":"severity_text:ERROR"}'
103+
```
104+
105+
It should respond under 1 second and return 10 hits out of 345 if you indexed the whole dataset. If you index the first 10,000 documents, you won't have any hits, try to query `INFO` logs instead.
106+
107+
108+
Let's now run a more advanced query: a date histogram with a term aggregation on the `severity_text`` field:
109+
110+
```bash
111+
python cli.py search '{ "query": "*", "max_hits": 0, "aggs": { "events": { "date_histogram": { "field": "timestamp", "fixed_interval": "30d" }, "aggs": { "log_level": { "terms": { "size": 10, "field": "severity_text", "order": { "_count": "desc" } } } } } } }'
112+
```
113+
114+
It should respond under 2 seconds and return the top log levels per 30 days.
115+
116+
117+
### Cleaning up
118+
119+
First, you have to delete the files created on your S3 buckets.
120+
Once done, you can delete the stack.
121+
122+
```bash
123+
cdk destroy -a cdk/app.py
124+
rm -rf cdk.out
125+
```
126+
127+
Congratz! You finished this tutorial! You can level up with the following tutorials to discover all Quickwit features.
128+
129+
## Next steps
130+
131+
- [Advanced Lambda tutorial](tutorial-aws-lambda.md) which covers an end-to-end use cases
132+
- [Search REST API](/docs/reference/rest-api)
133+
- [Query language](/docs/reference/query-language)

docs/get-started/tutorials/tutorial-aws-lambda.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
2-
title: Serverless Search on AWS Lambda
2+
title: Serverless E2E with Lambda
33
description: Index and search using AWS Lambda based on an end to end usecase.
44
tags: [aws, integration]
55
icon_url: /img/tutorials/aws-logo.png
6-
sidebar_position: 3
6+
sidebar_position: 5
77
---
88

99
In this tutorial, we’ll show you how to run Quickwit on Lambda on a complete use case. We’ll present you the associated cloud resources, a cost estimate and how to deploy the whole stack using AWS CDK.

docs/get-started/tutorials/tutorial-hdfs-logs-distributed-search-aws-s3.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Distributed search on AWS S3
33
description: Index log entries on AWS S3 using an EC2 instance and launch a distributed cluster.
44
tags: [aws, integration]
55
icon_url: /img/tutorials/aws-logo.png
6-
sidebar_position: 4
6+
sidebar_position: 6
77
---
88

99
In this guide, we will index about 40 million log entries (13 GB decompressed) on AWS S3 using an EC2 instance and launch a three-node distributed search cluster.

0 commit comments

Comments
 (0)