Upa's Expeditions

.
Too much is happening in the world of AI, such that it is hard to keep up:-
Lets read this article to not just understand these individually but also coherently!😊
Large Language Models (LLMs) are a type of Artificial Intelligence designed to understand, summarize, and generate human-like text. They are developed and trained using trillions of data.
Data is converted to Tokens (small chunks of data, roughly 3/4 of a word for english text). The number of tokens depends on the model.
If the number of Tokens in the context window is less, that leads to low latency/ faster response and can be used for small sized documents ( eg. nano, mini, flash). These have 2000-4000 tokens (~1500-3000 words) and are often cheaper to run. For large novels or changing large files, we typically need more number of tokens ( eg. GPT 4.1, Gemini 2.5)
Imagine this like a short term memory that the LLM maintains for keeping the context of the conversation history. This is measured in terms of the number of tokens. The size of the context window is limited and depends on the model.
Transformers are the "brains" behind almost all modern AI, including Gemini, ChatGPT, and Claude.
Before Transformers were invented by Google in 2017, AI processed text one word at a time, like a person reading through a straw. Transformers changed everything by allowing the AI to look at an entire sentence or document all at once.
Its an architecture which enables context finding in a semantic way and enabled parallel processing increasing the speed factor.
We can make a chatbot using an Open AI SDK. But we might quickly realise that there are some missing pieces. Storing Chat, maintaining conversation history, connecting to organisation's internal knowledge system and handling the possiblity that company might switch from Open AI to anthropic or gemini in the future.
To handle this complexity, we have a readily implemented abstraction layer called Langchain. This helps you build AI agents using minimal code. It addresses all those painpoints using pre-build components and standardised interfaces.
These are basically pre-build components having direct access to LLMs and can used as simple imports in our code. Without Langchain, we need to build all this infrastucture ourselves for API management for multiple LLMs and other complexities.
Langchain has access to:
Prompt Engineering is the "art and science" of crafting precise instructions (prompts) to get the best possible results from an AI model like Gemini, ChatGPT, or Claude.
Unlike The SQL DB which stores data by value, the vector DB stores the data by meaning. It uses embedding for the same. Vector DB handles embeddeding at scale and provides efficient retrieval based on semantic similarity. It shifts the burden from the user searching the database to the person who sets up the database, making it easier for user to search by meaning.
LLM can now freely search based on meaning and have the confidence that it will return relevant results ( for both "holiday" and "vacation"). Eg. Pinecone, Chromadb
Earlier in SQL database, to search for something we had to match the exact keyword to get relevant results. Embedding solved this by making search more semantic. An embeddging model takes a text and converts it into a vector (1536 numbers) that represents the meaning. The vector can be represented in a graph. Similar concepts end up getting similar dimensions and similar number patterns."Holiday" and "Vacation" may now be closer to each other due to embedding.
Now, lets consider a situation where the same word can have different meaning depending on the context in which it is used. That's why can't simple put words into embedding and store it in database.
We also need Dimentionality.
Dimensionality helps us capture tone, formality, and other features giving richness to the words.
We use 1536 dimensions today allowing depth in each search.
In SQL we used the "where" query. But in Vector DB, we need to look at:
Scoring helps differentiate between "Can I take my company laptop to Florida" and "Does my company allow vacations to Florida"
Eg. chunk size: 500 characters (balanced) overlap: 100 characters Result: 40% better retrieval accuracy
Instead of searching through the entire company or organisation database of documents, AI assistant can fit them into context window and generate output. This is Retrieval Augmented Generation.
The prompt provided is put into embedding and compared with the embedding in the vector DB. As a part of Semantic Search, meaning and context of the query is matched against the existing database to find the most relevant "chunks" of information related to the query
Retrieved Data is injected into the prompt at runtime.
AI assistants rely on what they learned during pre-training, which becomes outdated.
In R.A.G, the semantic search results pends to the prompt that serves as an augmented knowledge. So, now the AI assistant is now given real time data, withoutthe need to modify or fine tune LLM with custom data.
AI assistant generates response based on the semantic relevant data retrieved from the vector database.
R.A.G is a very powerful system that can instantly improve the depth of the knowledge beyind its trainig data.
Most applications are much more complex than a simple chat app. They need to connect with the HRM system, connect to employee docs and make personalised responses.
Langchain has its limitations when complex multistep workflows, conditional branching or iterative processes need to be handled for various business requirements.
This problem is solved by LangGraph. It helps in better orchestration and go beyond Q/A interactions. LangGraph treats the AI application as a Directed Graph. It consistes of nodes (which are individual units of computation), edges (helps in connecting the nodes) and state (central "memory bank" that tracks everything that has happened so far). Every Node in a langraph is like a function and theedges represent the execution flow.
Some powerful capabilities of langGraph:
LangGraph is an essential tool for workflow automation
In 2026, MCP (Model Context Protocol) has become the "USB-C for AI." It is an open standard introduced by Anthropic to solve the biggest problem in AI: fragmentation.
Before MCP, if you wanted an AI (like Claude or ChatGPT) to read your Google Drive, search your Slack, or query your database, developers had to write a unique, custom "connector" for every single combination of AI and tool.
Traditional APIs exposes endpoints that require implementationand understanding leading rigid intergration tied to specific systems.
MCP not just functions like an API but also has self describing interfaces and tools which AI agents can understand and use autonomously. It puts the burden on AI agents rather than developers, unlike traditional APIs
AI has helped us go from manual seaching by value or text (upto 30 mins) to complex semantic document search (<30secs) with better accuracy using context aware search.
In the near future, AI will become even better with a deeper implementation of:
This would not just let AI answer questions but actively solve problems before users can even ask for.