Deploying Serge on GCP: A Step-by-Step Guide
Published:
Serge is a chat interface that allows you to run Alpaca models without the need for any API keys. It is entirely self-hosted, fits on 4GB of RAM, and can run on the CPU. In this guide, we’ll walk you through the steps required to deploy Serge on Google Cloud Platform (GCP).
Before we get started, make sure you have a GCP account set up and are familiar with the basic concepts of GCP, including creating and managing virtual machines, and setting up firewall rules.
Step 1: Create a Virtual Machine
The first step in deploying Serge on GCP is to create a virtual machine. We recommend using a machine type with at least 4GB of RAM and a CPU. You can choose any operating system you like, but I recommend using a Linux-based distribution such as Ubuntu or Debian.
To create a virtual machine in GCP, follow these steps:
- Go to console.cloud.google.com and create a new project (eg. serge)
- Navigate to the Compute Engine section and click on the
Create Instance
button - Choose the machine type you want to use, and choose the region and the zone for your instance. In this example, we are picking the
us-central1
region andus-central1-c
zone. Also, we are gonna use ae2-standard-8
machine so that we can run models up to 30 billion parameters:
- Select the operating system you want to run and configure the disk space. In this example, we are gonna use Ubuntu 22.04 LTS for the OS and a 250GB SSD as our boot disk. Click on the
SELECT
button:
- Configure the firewall to allow
HTTP/HTTPS
traffic:
- Leave everything else as it is and click on the
CREATE
button to create your virtual machine - Navigate to VPC networks and click on
default
- From the sidebar menu click on
Firewall
and then click on theCreate a firewall rule
button - Choose a name (eg serge-firewall) and on the
Target tags
field fill in thehttp-server
tag. In the Source IPv4 ranges enable all by filling in0.0.0.0/0
. In the Protocols and ports section click onTCP
and open port8008
(we will need it for later). Click on theSAVE
button:
Step 2: Install Dependencies
To get started with serge
, follow these steps:
- Navigate to VM Instances and click on the
SSH
button - Clone the
serge
repository:git clone https://github.com/nsarrazin/serge.git cd serge
- Start the Docker container:
sudo docker-compose up -d
- Download the required tokenizers:
sudo docker-compose exec serge python3 /usr/src/app/api/utils/download.py tokenizer 7B sudo docker-compose exec serge python3 /usr/src/app/api/utils/download.py tokenizer 13B sudo docker-compose exec serge python3 /usr/src/app/api/utils/download.py tokenizer 30B
Please note that the models occupy the following storage space: 7B requires 4.21G, 13B requires 8.14G, and 30B requires 20.3G
Step 3: Access the API
Once you’ve installed the dependencies and started the Docker container, you can access the serge
API by following these steps:
Navigate to VM Instances and copy the
External IP
of your machineOpen your web browser and navigate to http://external-ip:8008/ and Voilà:
You should see the
serge
homepage, which means that the API is up and running!To use the API, make requests to http://external-ip:8008/api/:
That’s it! You’re now ready to use serge
on GCP. Happy coding!