Running Ollama on AWS Lambda

Running Ollama on AWS Lambda
Picture generated by ChatGPT

Running Ollama has never been this easy. How easy? As easy as running it on a AWS Lambda Function. It gives a taste of both the worlds, either running it easily on AWS Lambda or running it for free / cheap using AWS Lambda.

In this post, we'll be looking at how to exactly run Ollama on AWS Lambda. Let me walk you through the insanity:

  1. I wanted to run Ollama on the web, but I did not want to run it on ECS or EC2.
  2. I wanted to run Ollama on the web, but did not want a hefty bill.
  3. I wanted to run Ollama for Development Purposes on the web, because why not?

Understanding the solution

This solution uses the following technologies:

  1. Terraform
  2. Docker
  3. Bash Scripting
  4. AWS Lambda
  5. AWS Elastic Container Repository (ECR)

We'll be utilising Lambda Adapter that was built using Rust to overlap the AWS Lambda interface which in return would allow us to run Ollama executable / binary on AWS Lambda.

Solution

Please head to this GitHub repository and follow through the README.

Understanding the solution comes to a few different sections. The first section is understanding how Ollama works. By default, Ollama exposes itself to a specific Port that isn't Port 80 or Port 8080. However, Ollama has given us the leeway of exposing Ollama services onto a different Port altogether and we're definitely using this leeway!

Then, Ollama has its own Docker Image and that is open source. This allows us to understand how Ollama works, and how it can be used onto another Dockerfile t o achieve the results we desire. We obviously wouldn't want to re-build the entire Ollama Docker Image. Therefore, we might as well use the ready built ones.

Thirdly, we need to understand the methods available to invoke our AWS Lambda Function. Which is either API Gateway or Lambda Function URLs. In this scenario, we will use Lambda Functions URL as we want to enable the capability to stream results to the Client (In my case - I used Streamlit and Ollama Python Library to build the client).

Lastly, slap everything together - This would be messy or hard to manage in the beginning but it would be a good journey. In this case, setup a Terraform Module with all the required services and perform several initial testing to ensure everything works as expected. Then, pack it all on a "Production" ready Terraform module and prepare it for release.

Moving forward

  1. You can re-run the deployment step to pull in latest base Ollama image.
  2. You may extend the configuration further to accomodate with any of your requirements.
  3. Anything to suggest? Please feel free to raise a PR.

Resources

GitHub - InspectorGadget/ollama-on-lambda: I wasn’t insane. I swear
I wasn’t insane. I swear. Contribute to InspectorGadget/ollama-on-lambda development by creating an account on GitHub.