Deploying Serge on GCP: A Step-by-Step Guide

3 minute read

Published:

Serge is a chat interface that allows you to run Alpaca models without the need for any API keys. It is entirely self-hosted, fits on 4GB of RAM, and can run on the CPU. In this guide, we’ll walk you through the steps required to deploy Serge on Google Cloud Platform (GCP).

Before we get started, make sure you have a GCP account set up and are familiar with the basic concepts of GCP, including creating and managing virtual machines, and setting up firewall rules.

Step 1: Create a Virtual Machine

The first step in deploying Serge on GCP is to create a virtual machine. We recommend using a machine type with at least 4GB of RAM and a CPU. You can choose any operating system you like, but I recommend using a Linux-based distribution such as Ubuntu or Debian.

To create a virtual machine in GCP, follow these steps:

  • Go to console.cloud.google.com and create a new project (eg. serge)
  • Navigate to the Compute Engine section and click on the Create Instance button
  • Choose the machine type you want to use, and choose the region and the zone for your instance. In this example, we are picking the us-central1 region and us-central1-c zone. Also, we are gonna use a e2-standard-8 machine so that we can run models up to 30 billion parameters:

image

  • Select the operating system you want to run and configure the disk space. In this example, we are gonna use Ubuntu 22.04 LTS for the OS and a 250GB SSD as our boot disk. Click on the SELECT button:

image

  • Configure the firewall to allow HTTP/HTTPS traffic:

image

  • Leave everything else as it is and click on the CREATE button to create your virtual machine
  • Navigate to VPC networks and click on default
  • From the sidebar menu click on Firewall and then click on the Create a firewall rule button
  • Choose a name (eg serge-firewall) and on the Target tags field fill in the http-server tag. In the Source IPv4 ranges enable all by filling in 0.0.0.0/0. In the Protocols and ports section click on TCP and open port 8008 (we will need it for later). Click on the SAVE button:

image

Step 2: Install Dependencies

To get started with serge, follow these steps:

  • Navigate to VM Instances and click on the SSH button
  • Clone the serge repository:
    git clone https://github.com/nsarrazin/serge.git
    cd serge
    
  • Start the Docker container:
    sudo docker-compose up -d
    
  • Download the required tokenizers:
    sudo docker-compose exec serge python3 /usr/src/app/api/utils/download.py tokenizer 7B
    sudo docker-compose exec serge python3 /usr/src/app/api/utils/download.py tokenizer 13B
    sudo docker-compose exec serge python3 /usr/src/app/api/utils/download.py tokenizer 30B
    

    Please note that the models occupy the following storage space: 7B requires 4.21G, 13B requires 8.14G, and 30B requires 20.3G

Step 3: Access the API

Once you’ve installed the dependencies and started the Docker container, you can access the serge API by following these steps:

  • Navigate to VM Instances and copy the External IP of your machine

  • Open your web browser and navigate to http://external-ip:8008/ and Voilà:

image

  • You should see the serge homepage, which means that the API is up and running!

  • To use the API, make requests to http://external-ip:8008/api/:

That’s it! You’re now ready to use serge on GCP. Happy coding!