Semantic Product Search Using Embeddings

Rohan das
3 min readMar 1, 2025

--

In our previous blog post Understanding Embeddings: A Simple Explanation and Hands-on, we explored how embeddings transform textual data into numerical vectors, capturing semantic relationships between words and phrases. Building upon that foundation, this project demonstrates how embeddings can power a Semantic Product Search to provide more relevant search results based on user intent.

Rohan Das | Semantic Product Search

Step 1: Initialising Dummy Data

To begin, we set up a sample product catalog with a variety of items across different categories. Each product consists of:

  • A unique ID
  • A product title
  • A short description

Example dataset:

product_data = {
0: ["Wireless Earbuds", "High-quality sound with noise cancellation, Bluetooth 5.0, and 24-hour battery life."],
1: ["Gaming Laptop", "15.6-inch display, Intel i7 processor, NVIDIA RTX 3060, 16GB RAM, 512GB SSD."],
2: ["Kids Smartwatch", "GPS tracking, parental controls, and fun games for kids."],
# More products...
}

Step 2: Generating Embeddings

Since traditional keyword-based search does not capture the true meaning behind user queries, we use embeddings to encode product descriptions into numerical vectors. This helps compare their semantic similarity.

We generate embeddings for each product using OpenAI’s API (or any preferred model):

for i in product_data.values():
embd = get_openai_embedding(f"{i[0]}: {i[1]}")
i.append(embd)

Each product now has an additional field that represents its vector embedding.

Step 3: Implementing a Simple Search System

We now create a function that takes a user query, converts it into an embedding, and finds the most relevant products by computing cosine similarity.

def search(prompt):
user_embd = get_openai_embedding(prompt)
similarity_scores = []

for product_id, product_info in product_data.items():
product_embd = product_info[2]
similarity = cosine_similarity(user_embd, product_embd)
similarity_scores.append((similarity, product_id))

similarity_scores.sort(key=lambda x: x[0], reverse=True)

result = []
for i in range(3):
product_id = similarity_scores[i][1]
product_info = product_data[product_id]
result.append({"title": product_info[0], "description": product_info[1]})

return result

Now, if a user searches for “fitness tracker for children”, the system might return:

Kids Fitness Tracker: Bright colors, parental tracking, and interactive activity challenges.
Kids Smartwatch: GPS tracking, parental controls, and fun games for kids.
Compact Fitness Tracker: Slim and lightweight, perfect for 24/7 wear.

Step 4: What is a Vector Database?

A Vector Database is a specialised database designed for efficiently storing and searching high-dimensional vectors, such as embeddings. Unlike traditional relational databases that rely on structured queries, vector databases use mathematical techniques like cosine similarity to perform searches based on meaning rather than exact matches.

Why use a Vector Database?

  • Efficiency: Traditional searches iterate over the entire dataset (O(n)), whereas a vector database uses optimised data structures to reduce complexity to O(log n).
  • Scalability: It can handle millions of high-dimensional vectors efficiently.
  • Better Search Results: Captures semantic meaning rather than just matching keywords.

To optimise our search, we use Milvus, an open-source vector database:

!pip3 install -U pymilvus
from pymilvus import MilvusClient
client = MilvusClient("milvus_demo.db")

if client.has_collection(collection_name="demo_collection"):
client.drop_collection(collection_name="demo_collection")
client.create_collection(
collection_name="demo_collection",
dimension=1536,
)

Now, we insert our product embeddings into the collection:

data = []
for id, prd in product_data.items():
doc = {
"id": id,
"vector": prd[2],
"text": prd[0] + ": " + prd[1],
"subject": prd[0]
}
data.append(doc)

res = client.insert(collection_name="demo_collection", data=data)
print(res)

Step 5: Searching the Vector Database

With the data stored, we now perform semantic searches efficiently:

def search(prompt):
query_vectors = get_openai_embedding(prompt)
res = client.search(
collection_name="demo_collection",
data=[query_vectors],
limit=3,
output_fields=["text", "subject"],
)
result = []
for i in range(3):
recm = res[0][i]
p_id = recm['id']
prod = product_data[p_id]
result.append(f"{prod[0]}: {prod[1]}")

return result

res = search("I need a smartwatch to monitor my child’s health")
for product in res:
print("\n" + str(product))

The output might be:

Kids Fitness Tracker: Bright colors, parental tracking, and interactive activity challenges.
Kids Smartwatch: GPS tracking, parental controls, and fun games for kids.
Advanced Fitness Tracker: Built-in GPS, SpO2 monitor, and AI-based fitness insights.

Conclusion

By leveraging embeddings and a vector database, we have built an intelligent Semantic Product Search that understands the meaning behind user queries instead of just matching keywords. This approach makes product searches more relevant and enhances the search experience significantly.

For a deeper understanding of how embeddings work, check out our previous blog post: Understanding Embeddings: A Simple Explanation and Hands-on.

--

--

Rohan das
Rohan das

Written by Rohan das

Rohan Das on a data science journey. Through insightful blogs, he shares his experiences & knowledge. Join Rohan as he inspires fellow data science enthusiasts.

No responses yet