Please use this identifier to cite or link to this item: http://ir.lib.seu.ac.lk/handle/123456789/7575
Full metadata record
DC FieldValueLanguage
dc.contributor.authorDilushan, G.-
dc.contributor.authorRiza, M. S. I.-
dc.contributor.authorJananie, J.-
dc.date.accessioned2025-06-01T08:30:39Z-
dc.date.available2025-06-01T08:30:39Z-
dc.date.issued2024-11-06-
dc.identifier.citationConference Proceedings of 13th Annual Science Research Session – 2024 on “"Empowering Innovations for Sustainable Development Through Scientific Research" on November 6th 2024. Faculty of Applied Sciences, South Eastern University of Sri Lanka, Sammanthurai.. pp. 32.en_US
dc.identifier.isbn978-955-627-029-7-
dc.identifier.urihttp://ir.lib.seu.ac.lk/handle/123456789/7575-
dc.description.abstractThe complexity of the predominant deployment of machine learning inference and batch processing needs attention for careful resource utilization without degrading the performance. End user side, when the cost is considered and goes for the model inference on CPUs, the systematic setting of batching will be a challenge. Determining batch size requires analysis of different settings depending on various resource configurations, which becomes a challenge. Then the decision space becomes large and domain expertise may be necessary to satisfy the end users' cost and performance requirements. The performance of the inference mainly relies on the computation time of the given neural network, task processing density (p), and the compute power of the nodes (c). The experiment of comparing the estimated end-to-end inference time and real measured time of AlexNet and ResNet50 for a given configuration and different batch sizes results in different coefficients for the suggested batch sizes by the proposed, designed and implemented profiler. The inference time estimation formulas handled by the different researchers and our experiment results motivated us to propose a representation to place the right batch size to the right node is flops(NN)*p/c. Now the remaining challenge is how to manipulate the batching through these representations efficiently without degrading the performance. As another contribution, we design the optimizer which minimizes the end-to-end inference time considering the batch size. This research mainly focuses on the end users of cloud platforms where users can get pay-per-use configurations for their inferencing workloads. Our proposed representation and optimization technique could be extended in cloud platforms. While other batching techniques of different researchers mainly consider various scheduling mechanisms, our framework motivates the future researchers to deeply analyze the computation and the configurations. Hence the proposed system provides a configuration oriented batching mechanism and give mitigation techniques to improve the inferencing performance.en_US
dc.language.isoen_USen_US
dc.publisherFaculty of Applied Sciences, South Eastern University of Sri Lanka, Sammanthurai.en_US
dc.subjectBatch schedulingen_US
dc.subjectCPU clustersen_US
dc.subjectDeep learning inferenceen_US
dc.subjectQuality of service (QoS)en_US
dc.subjectResource utilization.en_US
dc.titleQoS aware automated batch inferenceen_US
dc.typeArticleen_US
Appears in Collections:13th Annual Science Research Session

Files in This Item:
File Description SizeFormat 
QoS AWARE AUTOMATED BATCH INFERENCE.pdf9.06 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.