October 19, 2024

Neural Search Code Example

The vector search using embedding is sometimes so called Neural Search. Here shows example how to retrieve documents using Vector Database, Qdrant. There is Qdrant python-client (and REST API) to access dataset. When we would like to get top-5 closer vector using python-client from Qdrant,

from qdrant_client import QdrantClient

def search(
    collection_name: str,
    vector: list[float],
    top_k: int=5    
):
    search_result = self.qdrant_client.search(
        collection_name=collection_name,
        query_vector=vector,
        query_filter=None,
        top=top_k
    )

    payloads = [hit.payload for hit in search_result]
    return payloads

There are some possible ways to obtain embeddings: using the Embedding models available on Huggingface, or utilizing the OpenAI API. The following is a method for using OpenAI’s Embedding.

def get_embedding(
    text: float,
    openai_model: str="text-embedding-ada-002",
    openai_key: str=""
):
    import openai
    openai.api_key = openai_key
   
    text = text.replace("\n", " ")
    return openai.Embedding.create(
        input=[text],
        model=openai_model
    )['data'][0]['embedding']

The “text-embedding-ada-002” model utilizes the “cl100k_base” tokenizer, with a maximum input token limit of 8191 and an output embedding dimension of 1536. Since 1536 dimensions are relatively high, it could be intriguing to explore research avenues such as dimensionality reduction to devise innovative search methods.

Create a collection

from qdrant_client import QdrantClient
from qdrant_client.http import models

def create(
    collection_name: str,
    size: int=100,
    host: str="localhost",
    port: int=6333
):
    client = QdrantClient(
        host,
        port=port
    )

    client.recreate_collection(
        collection_name="{collection_name}",
        vectors_config=models.VectorParams(
            size=size,
            distance=models.Distance.COSINE
        ),
    )

https://qdrant.tech/documentation/concepts/collections/