By using AWS re:Post, you agree to the AWS re:Post Terms of Use

How To Install TextGen WebUI on AWS

4 minute read
Content level: Intermediate
2

There are so many open source LLMs and sometimes we wish we could try many of them with a single unified interface. This guide shows you how to install text-generation-webui from oobabooga on AWS.

Text Generation Web UI is a new tool which allows to effortlessly run multiple Large Language Models in the same instance. It helps anyone to easily run models using HuggingFace Tranformers on GPU but also models reduced in size through GPTQ or even running on CPU (such as llama.cpp) from a single UI. In this step by step guide I will walk you through how to install it on an EC2 on AWS for testing different models.

TextGeneration UI with Falcon-40B

Pre-requisites

For this short tutorial, we will be using:

  • An AWS Account
  • A Deep Learning AMI based on Ubuntu 20.04 and Pytorch 2.0.
  • An EC2 instance GPU based instance like g5 instances, with enough disk to hold multiple LLMs.

You can do this using the console. If you prefer to use Cloudformation template you can use the following:

AWSTemplateFormatVersion: '2010-09-09'
Description: CloudFormation template for launching EC2 instance with Deep Learning Image and text generation web UI setup.

Parameters:
  YourIpAddress:
    Description: IP address range that is allowed SSH access to the EC2 instance (183.13.12.3/32)
    Type: String
    MinLength: '9'
    MaxLength: '18'
    AllowedPattern: '(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/(\d{1,2})'
    ConstraintDescription: Must be a valid IP CIDR range of the form x.x.x.x/x.

  AmiId:
    Description: AMI ID for the EC2 instance (default is for us-west-2 region)
    Type: String
    Default: ami-05ac04cf9d9989c1d

  InstanceType:
    Description: EC2 instance type
    Type: String
    AllowedValues:
      - g5.12xlarge
      - g5.2xlarge
    Default: g5.12xlarge

  KeyPairName:
    Description: Name of an existing EC2 key pair to enable SSH access
    Type: AWS::EC2::KeyPair::KeyName

Resources:
  MyEC2Instance:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: !Ref InstanceType
      ImageId: !Ref AmiId
      KeyName: !Ref KeyPairName
      SecurityGroupIds:
        - !Ref MySecurityGroup
      BlockDeviceMappings:
        - DeviceName: /dev/sda1
          Ebs:
            VolumeType: gp3
            VolumeSize: 1024 # 1TB
      UserData:
        Fn::Base64: !Sub |
          #!/bin/bash
          sudo apt update
          sudo apt upgrade -y
  MySecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security group for EC2 instance with text generation web UI
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 22
          ToPort: 22
          CidrIp: !Ref YourIpAddress

Outputs:
  InstanceId:
    Description: The Instance ID
    Value: !Ref MyEC2Instance

  PublicDNS:
    Description: Public DNS of the EC2 instance
    Value: !GetAtt MyEC2Instance.PublicDnsName

  PublicIP:
    Description: Public IP address of the EC2 instance
    Value: !GetAtt MyEC2Instance.PublicIp

Once you have your instance setup, you can connect to the instance using ssh

ssh -i "<my_key.pem>" ubuntu@<public_ip>

Once you are inside the machine, you can follow the steps described in the official github page:

conda create -n textgen python=3.10.9
conda activate textgen
conda init bash
pip install torch torchvision torchaudio
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
pip uninstall -y llama-cpp-python
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir
python download-model.py facebook/opt-1.3b
python server.py 

If you use the same instance of the example (g5.12xlarge) you can use the following to get a public web URL for 72 hours

python server.py --auto-devices --gpu-memory 24 24 24 24 --cpu-memory 186 --share

From within the UI you can now try many models from huggingface. These are some which I tried successfully from thebloke user on huggingface :

  • TheBloke/falcon-40b-instruct-GPTQ
  • TheBloke/guanaco-65B-GPTQ
  • TheBloke/WizardCoder-15B-1.0-GPTQ
  • TheBloke/vicuna-13b-v1.3-GPTQ
  • TheBloke/LLaMa-65B-GPTQ-3bit

Enter image description here

If you want to see it is actually using the GPUs and how much GPU memory these are using you can install nvtop:

sudo apt install nvtop
nvtop

Conclusion

TextGeneration-WebUI is an interesting project which can help you test multiple LLMs using AWS insfrastructure. For more advanced use cases or to run these into production I would recommend to check Sagemaker Jumpstart and Sagemaker Endpoints .