FRACTZ logo
HomeAboutNewsroomKnowledge BaseContact

Efficient AI Model Serving with KServe ModelMesh and Kubernetes Persistent Volumes

By Fractz - 25 Sep 2024
Efficient AI Model Serving with KServe ModelMesh and Kubernetes Persistent Volumes

Efficiently serving AI models at scale is essential for maintaining high-performance applications in today data-driven world. KServe’s ModelMesh Serving framework, when paired with Kubernetes Persistent Volumes (PVs), offers a scalable and resource-efficient solution for deploying AI models. This blog provides a step-by-step guide on configuring ModelMesh Serving to utilize PVs, reducing deployment latency and eliminating reliance on cloud object storage.

1. Introduction

As AI models grow more complex and require frequent updates, the infrastructure supporting these models must evolve to keep pace. Traditional methods of storing model files on cloud object storage can introduce delays during deployment, particularly when fetching large model files. KServe’s ModelMesh Serving addresses these challenges by providing a scalable model serving framework that can dynamically load and unload models as needed. By leveraging Kubernetes Persistent Volumes, this solution further optimizes model serving by reducing latency and improving overall efficiency.

2. Step-by-Step Guide

This section provides a detailed guide on configuring KServe ModelMesh Serving with Kubernetes Persistent Volumes. Follow these steps to set up an efficient AI model serving environment.

2.1. Prerequisites

Before starting, ensure you have the following:

• A Kubernetes cluster with admin privileges (or Minikube) with at least 4 CPUs and 8 GB of memory.

• kubectl and kustomize (v4.0.0+) installed.

• A “Quickstart” installation of ModelMesh Serving.

2.2. Create a Persistent Volume Claim (PVC)

To begin, create a Persistent Volume Claim (PVC) within your Kubernetes cluster to allocate storage that ModelMesh can use to store model files.


apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: "my-models-pvc"
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
EOF

Apply this configuration using kubectl:


kubectl apply -f - <<EOF
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: "my-models-pvc"
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
EOF

Verify that the PVC is created and bound to a persistent volume:


kubectl get pvc

# Output example:
# NAME            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS       AGE
# my-models-pvc   Bound    pvc-783726ab-9fd3-47f3-8c7d-bf7822d6d7f8   15Gi       RWX            retain-file-gold   2m
2.3. Create a Pod to Access the PVC

Next, create a pod that will mount the PVC as a volume. This pod will be used to upload your model files to the persistent volume.


apiVersion: v1
kind: Pod
metadata:
  name: "pvc-access"
spec:
  containers:
    - name: main
      image: ubuntu
      command: ["/bin/sh", "-ec", "sleep 10000"]
      volumeMounts:
        - name: "my-pvc"
          mountPath: "/mnt/models"
  volumes:
    - name: "my-pvc"
      persistentVolumeClaim:
        claimName: "my-models-pvc"
EOF

Apply this configuration using kubectl:


kubectl apply -f - <<EOF
---
apiVersion: v1
kind: Pod
metadata:
  name: "pvc-access"
spec:
  containers:
    - name: main
      image: ubuntu
      command: ["/bin/sh", "-ec", "sleep 10000"]
      volumeMounts:
        - name: "my-pvc"
          mountPath: "/mnt/models"
  volumes:
    - name: "my-pvc"
      persistentVolumeClaim:
        claimName: "my-models-pvc"
EOF

Confirm that the pod is running:


kubectl get pods | grep pvc\|STATUS

# Output example:
# NAME                 READY   STATUS    RESTARTS   AGE
# pvc-access           1/1     Running   0          2m12s
2.4. Store the Model on the Persistent Volume

Now, add your AI model to the persistent volume. In this tutorial, we use the MNIST handwritten digit recognition model trained with scikit-learn.

First, download the model file:


curl -sOL https://github.com/kserve/modelmesh-minio-examples/raw/main/sklearn/mnist-svm.joblib

Next, copy the model file onto the pvc-access pod:


kubectl cp mnist-svm.joblib pvc-access:/mnt/models/

Verify that the model has been successfully uploaded to the persistent volume:


kubectl exec -it pvc-access -- ls -alr /mnt/models/

# Expected output:
# total 356
# -rw-r--r-- 1    501 staff      344917 Sep 17 09:20 mnist-svm.joblib
# drwxr-xr-x 3 nobody 4294967294   4096 Sep 17 09:20 ..
# drwxr-xr-x 2 nobody 4294967294   4096 Sep 17 09:20 .
2.5. Configure ModelMesh Serving to Use the Persistent Volume Claim

To enable ModelMesh Serving to dynamically bind to the PVC, configure it by setting the allowAnyPVC configuration flag to true. Create a ConfigMap with this setting:


apiVersion: v1
kind: ConfigMap
metadata:
  name: model-serving-config
data:
  config.yaml: |
    allowAnyPVC: true
EOF

Apply this configuration using kubectl:


kubectl apply -f - <<EOF
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: model-serving-config
data:
  config.yaml: |
    allowAnyPVC: true
EOF
2.6. Deploy a New Inference Service

With the PVC configured, you can now deploy a new inference service that references the persistent volume. Here’s how to deploy a service that uses the mnist-svm.joblib model:


apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: sklearn-mnist
  annotations:
    serving.kserve.io/deploymentMode: ModelMesh
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storage:
        parameters:
          type: pvc
          name: my-models-pvc
        path: mnist-svm.joblib
EOF

Apply the inference service configuration using kubectl:


kubectl apply -f - <<EOF
---
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: sklearn-mnist
  annotations:
    serving.kserve.io/deploymentMode: ModelMesh
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storage:
        parameters:
          type: pvc
          name: my-models-pvc
        path: mnist-svm.joblib
EOF

After a few seconds, your inference service should be ready:


kubectl get isvc

# Output example:
# NAME            URL                                               READY   PREV   LATEST   AGE
# sklearn-mnist   grpc://modelmesh-serving.modelmesh-serving:8033   True                    35s
2.7. Run an Inference Request

To test the setup, run an inference request against the deployed model. First, set up port-forwarding:


kubectl port-forward --address 0.0.0.0 service/modelmesh-serving 8008 &

Next, use curl to send an inference request. The data array represents the grayscale values of a 64-pixel image scan of the digit to be classified:


MODEL_NAME="sklearn-mnist"

curl -X POST -k "http://localhost:8008/v2/models/${MODEL_NAME}/infer" -d '{"inputs": [{ "name": "predict", "shape": [1, 64], "datatype": "FP32", "data": [0.0, 0.0, 3.0, 10.0, 15.0, 16.0, 2.0, 0.0, 0.0, 2.0, 14.0, 16.0, 11.0, 15.0, 7.0, 0.0, 0.0, 7.0, 16.0, 3.0, 5.0, 15.0, 4.0, 0.0, 0.0, 4.0, 14.0, 10.0, 12.0, 13.0, 0.0, 0.0, 0.0, 0.0, 3.0, 13.0, 15.0, 12.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.0, 15.0, 15.0, 5.0, 0.0, 0.0, 0.0, 0.0, 15.0, 15.0, 15.0, 6.0, 0.0, 0.0, 0.0, 0.0, 10.0, 12.0, 11.0, 0.0, 0.0]}]}'

The JSON response should look like this, indicating that the model correctly identified the digit "7":


{
  "model_name": "sklearn-mnist__isvc-2d5cba6382",
  "outputs": [
    {"name": "predict", "datatype": "INT64", "shape": [1], "data": [7]}
  ]
}
3. Conclusion

By configuring KServe ModelMesh Serving to utilize Kubernetes Persistent Volumes, you can significantly improve the efficiency and performance of your AI model deployments. This approach not only reduces the latency associated with fetching models from remote storage but also enhances the overall scalability of your AI infrastructure.


Fractz

25 Sep 2024

Share:

Similar stories

Student Onboarding Automation with AI: From Interest to Admission

Fractz

20 Jul 2024

Subscribe to our newsletter

Stay updated with the latest industry news, trends, and innovations.

FRACTZ logo
Home
© FRACTZ. 2024. All rights reserved
IVA: 02502640507

When you visit or interact with our sites, services or tools, we or our authorised service providers may use cookies for storing information to help provide you with a better, faster and safer experience and for marketing purposes.