Building a Chatbot with Next.js Running LLaMA 2 Locally

July 22, 2023

LLaMA 2, a fresh Open Source language model by meta, is a powerful tool for natural language processing tasks. In this guide, we’ll build a chatbot using LLaMA 2 and Next.js, the popular React framework.

Disclaimer: This is a rough proof-of-concept style implementation you probably don’t want to use in production. However, this is a solid starting point if you want to play around with LLaMA running locally in your Next.js application.

Setting Up Next.js

First, let’s set up a new Next.js project by running the following command in your terminal:

npx create-next-app@latest llama-chatbot

Navigate to your new Next.js application:

cd llama-chatbot

Building the LLaMA 2 Model

Before building our chatbot, we must locally set up the LLaMA 2 model. Running LLaMA 2 locally on your Mac involves cloning the llama.cpp repository, building it, and downloading the model.

For easy access within our Next.js application, we’ll clone the LLaMA project within the root directory of our Next.js project. This setup will help keep our project organized.

In your terminal, navigate to the root directory of your Next.js project and clone the LLaMA repository:

# See: https://gist.github.com/adrienbrault/b76631c56c736def9bc1bc2167b5d129
git clone https://github.com/ggerganov/llama.cpp.git llama

Navigate into the cloned directory and reset the repository to the commit that works with the approach we’re using in this guide:

cd llama
git reset --hard b9b7d94fc10a8039befd1bc3af4f4b09c620c351

Build the LLaMA model using the LLAMA_METAL=1 flag to enable the Metal backend:

LLAMA_METAL=1 make

Then, download the LLaMA 2 model.

wget "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"

Or, if wget isn’t installed on your machine, you can use curl instead:

curl -LJO "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"

Finally, to ensure the LLaMA project doesn’t interfere with our Next.js project, add it to the .gitignore file:

cd ..
echo "/llama" >> .gitignore

We have successfully set up the LLaMA 2 model locally in our Next.js project with these steps.

Building the Chatbot

Next, we’ll integrate the LLaMA 2 model into our Next.js application and build a simple chat interface.

We start by creating a new API route file, src/app/api/chat/route.js. This API endpoint will handle the communication with the LLaMA 2 model.

// src/app/api/chat/route.js
import path from "path";

const { spawn } = require("child_process");

const getAnswer = ({ messages }) => {
  const messageString = messages.map((m) => {
    if (m.role === "system") {
      return `<s>[INST] <<SYS>>\n${m.content}\n<</SYS>>\n\n`;
    }
    if (m.role === "assistant") {
      return `${m.content}</s><s>[INST] `;
    }

    return `${m.content} [/INST] `;
  });

  return spawn(
    `./main`,
    [
      "-t",
      "8",
      "-ngl",
      "1",
      "-m",
      "llama-2-13b-chat.ggmlv3.q4_0.bin",
      "--color",
      "-c",
      "2048",
      "--temp",
      "0.7",
      "--repeat_penalty",
      "1.1",
      "-n",
      "-1",
      "-p",
      messageString,
    ],
    {
      cwd: path.join(process.cwd(), "llama"),
    },
  );
};

const getAnswerStream = ({ messages }) => {
  const encoder = new TextEncoder();
  return new ReadableStream({
    start(controller) {
      const llama = getAnswer({ messages });

      let start = false;
      llama.stdout.on("data", (data) => {
        if (data.includes("[/INST]")) {
          start = true;
          return;
        }
        if (!start) return;

        const chunk = encoder.encode(String(data));
        controller.enqueue(chunk);
      });

      llama.stderr.on("data", (data) => {
        console.log(`stderr: ${data}`);
      });

      llama.on("close", () => {
        controller.close();
      });
    },
  });
};

export async function POST(request) {
  const { messages } = await request.json();

  if (!messages) {
    return new Response("No message in the request", { status: 400 });
  }

  return new Response(getAnswerStream({ messages }));
}

The getAnswer() function spawns a child process to run the LLaMA 2 model with a set of arguments. The arguments include the path to the model, the number of threads to use, and the text to process.

The getAnswerStream() function creates a ReadableStream that processes the data returned by the LLaMA 2 model. It uses the TextEncoder API to convert the string data into a stream of chunks.

The POST() function is an asynchronous function that handles POST requests to the /api/chat route. It extracts the messages from the request body and passes them to the getAnswerStream() function.

Building the LLaMA Chat UI with React

First, install the ai library, which we’ll use to build the chat interface:

npm install ai

This library provides a set of React hooks for building chat interfaces.

Finally, update your src/app/page.tsx file with the following code:

// src/app/page.tsx
"use client";
import { useChat } from "ai/react";

export default function Home() {
  const { messages, input, handleInputChange, handleSubmit } = useChat({
    api: "/api/chat",
    initialMessages: [
      { id: "system", role: "system", content: "You are a philosopher." },
    ],
  });

  return (
    <form onSubmit={handleSubmit}>
      {messages.map((message) => (
        <p key={message.id}>
          {message.role}: {message.content}
        </p>
      ))}
      <input
        onChange={handleInputChange}
        value={input}
        className="text-black"
      />
      <button type="submit">Send Message</button>
    </form>
  );
}

The useChat() hook from the ai library provides a set of functions and variables for managing the chat interface. The initialMessages prop is used to set the initial state of the chat.

The handleSubmit() function sends a POST request to the /api/chat route whenever the user submits the form. The handleInputChange() function updates the state of the input field whenever the user types a message.

Wrapping Up

That’s it! We’ve successfully built a chatbot with the LLaMA 2 model and Next.js. This chatbot can serve as a starting point for more complex applications, such as a customer service bot or a language learning assistant. Feel free to experiment and enhance your chatbot with the capabilities of the LLaMA 2 model. Happy coding!