Last changed
Contact our team to test out this image for free. Please also indicate any other images you would like to evaluate.
Chainguard Containers are regularly-updated, secure-by-default container images.
For those with access, this container image is available on cgr.dev:
Be sure to replace the ORGANIZATION placeholder with the name used for your organization's private repository within the Chainguard Registry.
The text-generation-inference image is based on Hugging Face's Text Generation Inference (TGI) toolkit. This Chainguard image provides the same functionality as the upstream TGI container with the following key differences:
These differences provide enhanced security and maintainability while preserving full compatibility with TGI's production-ready capabilities for serving Large Language Models.
To use this image effectively, you should have:
For more information on TGI capabilities and requirements, refer to the official Hugging Face TGI documentation.
The text-generation-inference image provides a production-ready server for deploying Large Language Models. The following examples demonstrate common usage patterns.
Start by checking the available options:
Verify the installed version:
To start a TGI server with a Hugging Face model, run:
This command starts the server in the background, maps port 8080 on your host to port 80 in the container, and mounts a local data directory for model caching.
Once the server is running, you can interact with it through multiple API endpoints. TGI provides OpenAI-compatible endpoints for easy integration with existing tools:
The server responds with a JSON object containing the generated text. For streaming responses, set "stream": true in the request.
You can interact with the server using the Hugging Face huggingface_hub library:
Run the Python script:
For production workloads requiring GPU acceleration, use the --gpus flag:
This distributes the model across multiple GPUs using tensor parallelism for improved performance with larger models.
The text-generation-inference image can be configured through command-line arguments and environment variables. This section demonstrates a common production configuration for serving a model with specific resource constraints.
Key configuration options include:
--model-id: Hugging Face Hub model identifier (e.g., meta-llama/Llama-2-7b-chat-hf)--num-shard: Number of GPU shards for tensor parallelism (default: 1)--port: Server listening port (default: 80)--max-concurrent-requests: Maximum number of concurrent requests (default: 128)--max-input-length: Maximum input token length (default: 1024)--max-total-tokens: Maximum total tokens including input and output (default: 2048)--max-batch-prefill-tokens: Maximum tokens for prefill batching (default: 4096)--max-batch-total-tokens: Maximum tokens for total batch (default: 16384)For a production deployment serving a Llama 2 model with custom resource limits:
This configuration distributes the model across 2 GPUs, limits concurrent requests to 64, and sets appropriate token limits for handling longer conversations. The model cache is persisted to the local models directory for faster subsequent startups.
Alternatively, you can configure TGI using environment variables:
This approach is useful for container orchestration platforms where environment variables are easier to manage than command-line arguments.
For more information on working with Text Generation Inference and Large Language Models:
Chainguard's free tier of Starter container images are built with Wolfi, our minimal Linux undistro.
All other Chainguard Containers are built with Chainguard OS, Chainguard's minimal Linux operating system designed to produce container images that meet the requirements of a more secure software supply chain.
The main features of Chainguard Containers include:
For cases where you need container images with shells and package managers to build or debug, most Chainguard Containers come paired with a development, or -dev, variant.
In all other cases, including Chainguard Containers tagged as :latest or with a specific version number, the container images include only an open-source application and its runtime dependencies. These minimal container images typically do not contain a shell or package manager.
Although the -dev container image variants have similar security features as their more minimal versions, they include additional software that is typically not necessary in production environments. We recommend using multi-stage builds to copy artifacts from the -dev variant into a more minimal production image.
To improve security, Chainguard Containers include only essential dependencies. Need more packages? Chainguard customers can use Custom Assembly to add packages, either through the Console, chainctl, or API.
To use Custom Assembly in the Chainguard Console: navigate to the image you'd like to customize in your Organization's list of images, and click on the Customize image button at the top of the page.
Refer to our Chainguard Containers documentation on Chainguard Academy. Chainguard also offers VMs and Libraries — contact us for access.
This software listing is packaged by Chainguard. The trademarks set forth in this offering are owned by their respective companies, and use of them does not imply any affiliation, sponsorship, or endorsement by such companies.
Chainguard container images contain software packages that are direct or transitive dependencies. The following licenses were found in the "latest" tag of this image:
Apache-2.0
BSD-2-Clause
BSD-3-Clause
CC-BY-4.0
FTL
GCC-exception-3.1
GPL-2.0
For a complete list of licenses, please refer to this Image's SBOM.
Software license agreement