Programming with Large Language Models is a new frontier. There are many new concepts and paradigm shifts in how we build software. This post is a brief intro into some of the tools you can use to get into start building apps on the LLM.
OpenAI Functions
OpenAI Functions have only been around for a few months, but its still a hugely impactful shift in how the OpenAI API can be used. Essentially we can now include a functions array in the API request (along with the chat messages, etc) which will inform the GPT (3.5 or 4) API to return the response in the form of the requested function.
openai-function-calling-tools
The best npm library I have found for OpenAI function calling is from Johann Lai, openai-function-calling-tools has plenty of examples to work from as well as a demo site.
I used the project’s Google Search API tool in a Google clone with next.js to demonstrate the Functions API. A Google search checks OpenAI first, then goes fetches from Google if the LLM decides its a web search.
Langchain
There was plenty of work done in this space before we had OpenAI functions, this is where Langchain comes along. Langchain is probably the most popular library for working with LLMs broadly. “Chains” are sequences of calls that are customized to your application’s use case. There’s a vast number of examples and concepts, which is why simply using OpenAI functions is so appealing.
OpenAI functions can only partially eliminate the need for Langchain as it provides much more.
At this time there isn’t a direct replacement for OpenAI functions within a Llama server. So if you are working with Llama, you may want to explore Langchain to create a similar type of functions API.
Getting structured output
A simpler way to conceptualize LLM functions is to approach them as returning output in a structured format, like json or a string from regex.
Some common use cases would be text classification, or other simple natural language processing use cases like named entity recognition.
Thiggle is a unique project that has set out to simplify this process, along with bridging support between OpenAI, Open Source and Edge LLMs. This helps fill the void that OpenAI functions and a Llama API have for the time being.
Retrieval augmented generation
Retrieval-augmented generation (RAG) is an AI framework that improves the quality of responses from large language models. RAG works by grounding the model on external sources of knowledge to supplement the LLM's internal representation of information.
Langchain and OpenAI Functions can help build a RAG app, which often may be more suitable than fine-tuning or building a model.
Once the document is embedded into a vector store it can be retrieved prior to the LLM. Its no different than an autocomplete search on a website.
I explored building a PDF chat app using transformers.js WASM based embeddings and a WebGPU 7b quantized Llama model using Matt Rickard’s react-llm.
https://huggingface.co/spaces/matthoffner/web-llm-embed
web-llm-embed uses a simple in-memory vector store from Langchain with the embeddings. Its a proof of concept but useful for basic PDF chat!
Having a vector store allows you to essentially create a larger context LLM at the expense of a lookup prior to the LLM call, and works for basically any API. For example it can be used with a Github or Gitlab API to improve code search, or with a OpenAPI doc to provide documentation examples.
LLM as the new UI
The LLM essentially becomes the user interface in terms of how we build applications. Providing it with unstructured data we can hopefully return a response that is not only accurate but human friendly and contextual.
There is plenty more to think about before going to production, but nothing stopping you from getting started and building something interesting!