Handling AI Scaling and Reducing Costs

    Abstract article image showing an acorn resting on a rock
    Article by Gunther Cox
    Posted May 9, 2025

    When it comes to the programming adage “Speed, Quality, Price: Choose Two” I tend to prefer low price. Perhaps it’s the influence that the Zen of Python has had on me, or perhaps it’s simply the trend of software to outpace hardware when it comes to improving efficiency.

    Now is better than never. Although never is often better than right now.

    ~ Zen of Python

    Regardless, the case I’ll make here is that we can demonstrate a method to handle simple AI tasks at scale favoring queued results that complete eventually over immediate results that require adding resources each time scaling is needed.

    Scaling using a queue

    Scaling using a queue

    The idea here is to handle tasks submitted by users by adding them to a queue, then a task executor takes items from the queue (processing one task at a time). This makes it so that the primary limitation to the number of tasks that can be handled is the queue itself. However, it is less expensive to add storage to our queue than it is to add additional task executors.

    This highlights the contrast between the queued design vs physically scaling by adding an additional task executor for each additional user. Of course the users in the example on the right benefit from getting immediate results as they don’t have to wait their turn in a queue. One way we can begin to remedy this is to then scale our queueing infrastructure:

    Scaling horizontally

    An example using Redis and Celery

    Diving into a practical example, we can create a queue using Redis, and a task executor using Celery.

    The following files will be needed:

    # Dockerfile
    
    FROM python:3.11-slim-bookworm
    
    COPY ./requirements.txt /code/requirements.txt
    
    RUN pip install -r /code/requirements.txt
    RUN python -m spacy download en_core_web_sm
    
    COPY ./tasks.py /code
    
    WORKDIR /code
    
    CMD ["celery", "-A", "tasks", "worker", "--loglevel=INFO"]
    
    
    # docker-compose.yml
    
    services:
    
      rabbitmq:
        image: rabbitmq:4.1
        ports:
          - "5672:5672"
    
      celery:
        build: .
        volumes:
           - ./data:/data
    
    
    # requirements.txt
    
    celery==5.5.2
    chatterbot[dev]==1.2.6
    
    
    # tasks.py
    
    from celery import Celery
    from chatterbot import ChatBot
    
    
    app = Celery('tasks', broker='pyamqp://guest@rabbitmq//')
    
    
    @app.task
    def solve(message_id: int, message: str):
        """
        Create a simple chatbot that can solve math problems.
        """
        chatbot = ChatBot(
            'Example Bot',
            logic_adapters=[
                'chatterbot.logic.MathematicalEvaluation'
            ]
        )
    
        response = chatbot.get_response(message)
    
        # Save the response to a file in the data directory
        with open(f'/data/{message_id}-response.txt', 'w+') as f:
            f.write(response.text)
    
        return response
    
    

    Running the example

    Next, use docker to start your services.

    sudo docker compose up -d
    

    You can check to make sure the expected services are running using the ps subcommand. You should see output similar to the following:

    sudo docker compose ps
    
    NAME                           IMAGE                      COMMAND                  SERVICE    CREATED              STATUS              PORTS
    chatterbot-celery-celery-1     chatterbot-celery-celery   "celery -A tasks wor…"   celery     7 seconds ago        Up 6 seconds        
    chatterbot-celery-rabbitmq-1   rabbitmq:4.1               "docker-entrypoint.s…"   rabbitmq   About a minute ago   Up About a minute   4369/tcp, 5671/tcp, 15691-15692/tcp, 25672/tcp, 0.0.0.0:5672->5672/tcp, [::]:5672->5672/tcp
    

    Next, we can simulate a user submitting a task to the queue. Celery will then detect this task and start running it.

    sudo docker compose exec celery python
    
    Python 3.11.12 (main, Apr 28 2025, 22:10:55) [GCC 12.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from tasks import solve
    >>> solve.delay(1, 'What is 50 divided by 2?')
    <AsyncResult: 984463f9-603b-4242-b3f5-8d626dc4f7a6>
    

    Now if we inspect the contents of the data directory, we should see the a new file named 1-response.txt containing the solved equation:

    50 divided by 2 = 25
    

    Scaling horizontally

    We can horizontally scale the task executor (Celery) layer of our infrastructure. The following command will create a new instance of our celery container so we have 2 total running.

    sudo docker compose up -d --scale celery=2
    
    [+] Running 3/3
     ✔ Container chatterbot-celery-rabbitmq-1  Running    0.0s 
     ✔ Container chatterbot-celery-celery-1    Running    0.0s 
     ✔ Container chatterbot-celery-celery-2    Started 
    

    Once the scaling is complete, both Celery workers will run in parallel, each able to take a task out of the queue for processing.

    Final notes and musings

    There is a time and a place for delayed processing in the method that has been described here. Actions such as conversing with AI likely require more real-time strategies for returning responses promptly. Tasks better suited to the strategy described in this post might include long running tasks such as AI image generation or analysis, or generating summaries of relatively static data.

    As noted in my Earth Day post from a few weeks back, AI models can consume a large amount of power to generate output. I think an interesting experiment to consider might be to set up a queue, and use a Raspberry Pi connected to a solar panel to process tasks. Perhaps in some use cases the initial investment in hardware could provide some level of ROI compared to purchasing cloud services.

    If you do happen to be using a Raspberry Pi running on solar power, a relevant note might be the use of a rabbitmq container over an alternative such as redis. Per the Celery docs redis is more susceptible to data loss in the event of abrupt power failures.


    If you found this article useful and want to request similar or related content feel free to open a ticket in this website's issue tracker on GitHub. Please use the "blog post topic" tag when doing so.

    © 2025 Gunther Cox