Deploying image classification (mxnet) using Mxnet Model Server (mms)
Summary
In this post, i will summarize steps required when deploying a simple image classification (mxnet) using Mxnet Model Server (mms).
Detail
Like other deeplearning framework, mxnet also provides tons a pretrained model and tools cover nearly all of machine learning task like image classification, object detection, segmentation, … These pretrained models and tools are inside the gluon-cv package.
For similarity, i’ll use the pretrained Resnet50 model. For other model please see model_zoo
Run the image classification service
- Download source code
$ git clone https://github.com/gachiemchiep/mms_example
- Re-create anaconda environment and activate it
$ cd mmx_example
# This will create environment called DL with python=3.7
$ conda env create -f environment.yml
# Activate anaconda
$ conda activate DL
- Create mms model archive file
$ mkdir mms_example/mms_example/services
$ cd mms_example/mms_example
$ model-archiver --model-name mxnet_resnet50 --model-path mxnet_resnet50 --handler img_classifier_service:handle --export-path ../model-archives/ --force
# A new file called mxnet_resnet50.mar should be created inside model-archives directory
- Start mms server
$ cd mms_example/mms_example
$ mxnet-model-server --start
- Test
$ cd mms_example/mms_example/resources
$ curl -X POST http://127.0.0.1:8080/predictions/mxnet_resnet50 -T chair.jpg
- Output should be as follow
[
{
"rocking chair": "0.4921491"
},
{
"folding chair": "0.08000506"
},
{
"wool": "0.027606748"
},
{
"cowboy hat": "0.024664406"
},
{
"velvet": "0.020736441"
}
]
- Shutdown the model server
mxnet-model-server --stop
Source code explain
At this state, our source will be as follow
mms_example/
├── environment.yml
├── LICENSE
├── mms_example
│ ├── common
│ ├── config.properties
│ ├── __init__.py
│ ├── logs : log directory
│ │ ├── access_log.log
│ │ ├── mms_log.log
│ │ ├── mms_metrics.log
│ │ ├── model_log.log
│ │ └── model_metrics.log
│ ├── model-archives
│ │ └── mxnet_resnet50.mar : model archieve file which we created
│ ├── resources
│ ├── services
│ │ ├── __init__.py
│ │ └── mxnet_resnet50
│ │ ├── img_classifier_service.py : service file
│ │ └── __init__.py
│ └── tests
└── README.md
The most important file is img_classifier_service.py, this include all of our logic from loading model, reading uploader data, inference, …
- Load model
class ResNet50Classifier():
def __init__(self):
self.ctx = None
self.net = None
self.initialized = False
def initialize(self, context):
"""
Load the model and mapping file to perform infernece.
:param context: model relevant worker information
:return:
"""
properties = context.system_properties
gpu_id = properties.get("gpu_id")
self.ctx = mx.cpu() if gpu_id is None else mx.gpu(gpu_id)
self.net = gluoncv.model_zoo.get_model("ResNet50_v1d", pretrained=True, ctx=self.ctx)
self.initialized = True
- Read uploaded data and convert to mxnet’s NDArray
def preprocess(self, data):
"""
Scales, crops, and normalizes a PIL image for a mxnet model,
:param data:
:return: ndarray
"""
img_arrs = []
for idx in range(len(data)):
img = data[idx].get("data")
if img is None:
img = data[idx].get("body")
if img is None:
img = data[idx].get("data")
if img is None or len(img) == 0:
self.error = "Empty image input"
return None
img_arr = mx.image.imdecode(img)
img_arrs.append(img_arr)
img_arrs = gluoncv.data.transforms.presets.imagenet.transform_eval(img_arrs)
return img_arrs
- Do the inferencing
def inference(self, img, topk=5):
"""
:param img:
:param topk:
:return:
"""
pred = self.net(img.as_in_context(self.ctx))
# map predicted values to probability by softmax
probs = mx.nd.softmax(pred)[0].asnumpy()
# find the 5 class indices with the highest score
inds = mx.nd.topk(pred, k=topk)[0].astype('int').asnumpy().tolist()
rets = []
for i in range(topk):
ret = dict()
ret[self.net.classes[inds[i]]] = str(probs[inds[i]])
rets.append(ret)
return [rets]
- The final logic
_service = ResNet50Classifier()
def handle(data, context):
if not _service.initialized:
_service.initialize(context)
if data is None:
return None
data = _service.preprocess(data)
data = _service.inference(data)
data = _service.postprocess(data)
return data
In the last section, we used the following command to create model-archive
$ model-archiver --model-name mxnet_resnet50 --model-path mxnet_resnet50 --handler img_classifier_service:handle --export-path ../model-archives/ --force
As we can see, the model-archive will use handle method as logic. This method is defined inside the img_classifier_service.py file.
Benchmark result
My computer has the following specs :
cpu:
AMD Ryzen 5 2600 Six-Core Processor
graphics card:
nVidia GP106 [GeForce GTX 1060 6GB]
memory:
16GB
My computer only have GTX-1060 GPU card, the benchmarking result is as follow :
# benchmarking command
$ ab -k -l -n 10000 -c 100 -T "image/jpeg" -p kitten.jpg http://127.0.0.1:8080/predictions/mxnet_resnet50
# This will send 10000 request with 100 concurrency. So please be patient. i would take around 5 minutes
# GPU usage: only 673 MB
(DL) gachiemchiep@gachiemchiep:services$ nvidia-smi
Thu Jun 13 20:55:58 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... On | 00000000:1C:00.0 On | N/A |
| 40% 59C P2 46W / 120W | 1702MiB / 6075MiB | 45% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1933 G /usr/lib/xorg/Xorg 699MiB |
| 0 3685 G ...quest-channel-token=3700236306674113463 214MiB |
| 0 7697 G ...p/pycharm-educational/12/jre64/bin/java 2MiB |
| 0 8227 C ...chiep/opt/miniconda2/envs/DL/bin/python 673MiB |
| 0 22775 G ...-token=CCBE267BA54B1DB7F35623396BB493CD 43MiB |
# Benchmark result
Benchmarking 127.0.0.1 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software:
Server Hostname: 127.0.0.1
Server Port: 8080
Document Path: /predictions/mxnet_resnet50
Document Length: Variable
Concurrency Level: 100
Time taken for tests: 242.090 seconds
Complete requests: 10000
Failed requests: 0
Keep-Alive requests: 10000
Total transferred: 4320000 bytes
Total body sent: 1111520000
HTML transferred: 1970000 bytes
Requests per second: 41.31 [#/sec] (mean)
Time per request: 2420.901 [ms] (mean)
Time per request: 24.209 [ms] (mean, across all concurrent requests)
Transfer rate: 17.43 [Kbytes/sec] received
4483.74 kb/s sent
4501.16 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.6 0 6
Processing: 53 2409 154.0 2417 2679
Waiting: 53 2409 154.0 2417 2679
Total: 58 2409 153.6 2417 2679
Percentage of the requests served within a certain time (ms)
50% 2417
66% 2445
75% 2462
80% 2477
90% 2519
95% 2572
98% 2618
99% 2636
100% 2679 (longest request)
- Note : by default configuration, mms only queue 100 jobs. So if you need more concurrency, increase the value of job_queue_size inside config.properties. With a value of job_queue_size=1000 , i can increase the -c to 200
# benchmarking command
$ ab -k -l -n 10000 -c 200 -T "image/jpeg" -p kitten.jpg http://127.0.0.1:8080/predictions/mxnet_resnet50
# Benchmark result
Benchmarking 127.0.0.1 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software:
Server Hostname: 127.0.0.1
Server Port: 8080
Document Path: /predictions/mxnet_resnet50
Document Length: Variable
Concurrency Level: 200
Time taken for tests: 241.262 seconds
Complete requests: 10000
Failed requests: 0
Keep-Alive requests: 10000
Total transferred: 4320000 bytes
Total body sent: 1111520000
HTML transferred: 1970000 bytes
Requests per second: 41.45 [#/sec] (mean)
Time per request: 4825.240 [ms] (mean)
Time per request: 24.126 [ms] (mean, across all concurrent requests)
Transfer rate: 17.49 [Kbytes/sec] received
4499.13 kb/s sent
4516.61 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 7 83.2 0 1023
Processing: 2037 4771 198.9 4794 6032
Waiting: 2037 4771 198.9 4794 6031
Total: 2043 4778 230.1 4794 7054
Percentage of the requests served within a certain time (ms)
50% 4794
66% 4830
75% 4851
80% 4864
90% 4903
95% 4936
98% 4965
99% 5006
100% 7054 (longest request)
Quite a good benchmark resulti guess.
End
Leave a comment