Neural Search Code Example
The vector search using embedding is sometimes so called Neural Search. Here shows example how to retrieve documents using Vector Database, Qdrant. There is Qdrant python-client (and REST API) to access dataset. When we would like to get top-5 closer vector using python-client from Qdrant,
from qdrant_client import QdrantClient
def search(
collection_name: str,
vector: list[float],
top_k: int=5
):
search_result = self.qdrant_client.search(
collection_name=collection_name,
query_vector=vector,
query_filter=None,
top=top_k
)
payloads = [hit.payload for hit in search_result]
return payloads
There are some possible ways to obtain embeddings: using the Embedding models available on Huggingface, or utilizing the OpenAI API. The following is a method for using OpenAI’s Embedding.
def get_embedding(
text: float,
openai_model: str="text-embedding-ada-002",
openai_key: str=""
):
import openai
openai.api_key = openai_key
text = text.replace("\n", " ")
return openai.Embedding.create(
input=[text],
model=openai_model
)['data'][0]['embedding']
The “text-embedding-ada-002” model utilizes the “cl100k_base” tokenizer, with a maximum input token limit of 8191 and an output embedding dimension of 1536. Since 1536 dimensions are relatively high, it could be intriguing to explore research avenues such as dimensionality reduction to devise innovative search methods.
Create a collection
from qdrant_client import QdrantClient
from qdrant_client.http import models
def create(
collection_name: str,
size: int=100,
host: str="localhost",
port: int=6333
):
client = QdrantClient(
host,
port=port
)
client.recreate_collection(
collection_name="{collection_name}",
vectors_config=models.VectorParams(
size=size,
distance=models.Distance.COSINE
),
)