Introduction
If you’re building AI-powered features like semantic search or working with Large Language Models (LLMs), you’ve probably encountered terms like “vectors,” “embeddings,” “token embeddings,” and “neural network weights.” These concepts are often confusing because they’re related but serve very different purposes.
This guide will clarify:
- What embedding vectors are and how they’re used for search
- The difference between embedding vectors and token embeddings
- How LLMs actually generate text (spoiler: they don’t “reverse” vectors)
- What neural network weights are
- Why you can’t convert a vector back to original text
Part 1: Embedding Vectors (For Semantic Search)
What Is a Vector? (The General Term)
A vector is simply a list of numbers. In mathematics, it’s an array of numerical values.
Common Misconception: “Vectors are always 3D (three numbers) representing points in 3D space”
Reality: Vectors can have any number of dimensions (any number of values), not just 3!
Examples:
- `[5]` – a 1-dimensional vector (just one number)
- [1, 2]` – a 2-dimensional vector (two numbers, like x, y coordinates)
- [1, 2, 3]` – a 3-dimensional vector (three numbers, like x, y, z in 3D space)
- `[0.5, -0.2, 0.8, 0.1]` – a 4-dimensional vector (four numbers)
- [0.123, -0.456, 0.789, …, 0.234]` – a 768-dimensional vector (768 numbers!)
Why the Confusion?
3D graphics (video games, 3D modeling) popularized the concept of vectors as `[x, y, z]` coordinates, but that’s just **one use case** of vectors.
Vectors are used everywhere in computing:
- Graphics: `[x, y, z]` represents a 3D position (3 dimensions)
- 2D Graphics: `[x, y]` represents a 2D position (2 dimensions)
- Data science: `[age, height, weight, income]` represents a person’s attributes (4 dimensions)
- Machine learning: `[feature1, feature2, …, feature100]` represents data features (100 dimensions)
- Embeddings: `[0.123, -0.456, …, 0.789]` represents text meaning (768 dimensions)
The “Space” Concept
While 3D vectors represent points in 3D space, higher-dimensional vectors represent points in higher-dimensional spaces:
- 2D vector = point in 2D space (a flat plane)
- 3D vector = point in 3D space (our physical world)
- 768D vector = point in 768-dimensional space (abstract mathematical space)
You can’t visualize 768-dimensional space, but mathematically it works the same way – it’s just more dimensions!
What Is an Embedding Vector? (The Specific Term)
An embedding vector is a specific type of vector that represents text (or other data) in a way that captures its semantic meaning.
Key Point: An embedding vector IS a vector, but it’s a vector with a specific purpose – to encode meaning.
The Relationship:
– ✅ An embedding vector **is** a vector (it’s a list of numbers)
– ❌ Not all vectors are embedding vectors (vectors can represent many things)
Think of it like this:
Vector = A container (like a box)
Embedding vector = A specific type of box (one that contains meaning-encoded numbers)
What Are Embedding Vectors?
An embedding vector is a numerical representation of text that captures its semantic meaning. Think of it as converting words into a list of numbers that represent what the text “means” rather than what it “says.”
How They Work
When you vectorize text like “Government announces new climate policy,” the embedding model converts it into a list of numbers:
Original: “Government announces new climate policy”
Vector: [0.123, -0.456, 0.789, 0.234, -0.567, …] (768 numbers for nomic-embed-text)
Key Characteristics
- It’s a Vector: A list of numbers (e.g., 768 numbers for nomic-embed-text)
- One-Way Transformation: Text → Vector (lossy compression)
- Semantic Meaning: Similar meanings produce similar vectors
- Fixed Dimensions: Each model produces vectors of a fixed size (e.g., 768 numbers)
- Cannot Be Reversed: You cannot convert the vector back to the original text
Terminology Note
In practice, people often say:
– “Vector” when they mean “embedding vector” (in AI/ML context)
– “Embedding” when they mean “embedding vector”
These are usually interchangeable in conversation, but technically:
– Vector = General term (any list of numbers)
– Embedding = The process of converting data to vectors
– Embedding vector = The resulting vector from embedding
Why Can’t You Reverse a Vector?
Think of it like a fingerprint:
– A fingerprint uniquely identifies a person
– But you can’t reconstruct the entire person from just their fingerprint
– Similarly, a vector captures the “essence” of text meaning, but not the exact words
Mathematical Reason: The transformation is lossy – information is compressed and discarded. Multiple different texts could theoretically produce similar (or even identical) vectors, so reversing would be ambiguous.
Use Case: Semantic Search
Embedding vectors excel at finding semantically similar content:
Example:
– You search for: “climate policy changes”
– The system finds:
– “Government announces new carbon tax legislation” (high similarity)
– “Parliament debates environmental protection bill” (high similarity)
– “Manchester United wins match” (low similarity – correctly excluded)
Even though these articles don’t contain the exact words “climate policy changes,” they’re semantically related.
How Similarity Works in High Dimensions:
Just like you can measure distance between two points in 3D space:
– 3D: Distance = √[(x₁-x₂)² + (y₁-y₂)² + (z₁-z₂)²]
You can measure “distance” (similarity) between two points in 768D space:
– 768D: Similarity = cosine of angle between vectors
The math works the same way, just with more dimensions!
Embedded Models Are Already Trained
The models have already been trained on billions of words and their ‘closeness’ to each other, which is why when you vectorise an article (or something else) using vector searches will find semantically similar content.
Whilst exact keyword search is slightly faster, vector search enables search through meaning, which means it’s a lot more flexible.







