Store data in Elastic search using PutElasticsearchJson
This article is a part of Vector Search using Spring AI Elastic Search and Apache Nifi 2
Now that we have data converted into json, we will store this in elastic search.
Use below docker-compose
to start kibana
and elastic search 8
.
1name: es-kibana
2services:
3 elasticsearch:
4 image: docker.elastic.co/elasticsearch/elasticsearch:8.18.1
5 ports:
6 - 127.0.0.1:9200:9200
7 - 127.0.0.1:9300:9300
8 environment:
9 - 'ES_JAVA_OPTS=-Xms1g -Xmx1g'
10 - 'discovery.type=single-node'
11 - 'cluster.routing.allocation.disk.threshold_enabled=false'
12 - 'xpack.security.enabled=false'
13 healthcheck:
14 test: ['CMD', 'curl', '-f', 'http://localhost:9200/_cluster/health?wait_for_status=green&timeout=10s']
15 interval: 5s
16 timeout: 10s
17 retries: 10
18 kibana:
19 depends_on:
20 - elasticsearch
21 image: docker.elastic.co/kibana/kibana:8.13.4
22 ports:
23 - 127.0.0.1:5601:5601
24 environment:
25 - ELASTICSEARCH_HOSTS=["http://elasticsearch:9200"]
Start container using
1docker compose -f elasticsearch.yml up
Now that we have json ready and elastic search server up we haver small modification to do to our Apache Nifi 2 processor such that instead of insert it all becomes
upsert.
We will use EvaluateJsonPath
processor to read id from the json and assign it to flowfile attribute as shown below
Now when you run all Nifi Processor, data should get ingested and you should see the embeddings field along with other json data and mapping should show dense vector ad type when we check index mapping
1http get http://localhost:9200/amazon-products/_mappings | grep "embedding"
1{
2 "amazon-products": {
3 "mappings": {
4 "properties": {
5 "@@timestamp": {
6 "type": "date"
7 },
8 ...
9 "embedding": {
10 "type": "dense_vector",
11 "dims": 3072,
12 "index": true,
13 "similarity": "cosine",
14 "index_options": {
15 "type": "int8_hnsw",
16 "m": 16,
17 "ef_construction": 100
18 }
19 }
20 ...
21
22 }
23 }
24 }
25}
In the next article we will write spring boot application using spring-ai to search for from the same elastic search index.