👋 Hi, I’m Vaibhav Balloli, a Ph.D. student at Realaize Lab with Prof. Elizabeth Bondi-Kelly at the University of Michigan, Ann Arbor. I’m broadly interested in the intersection of sequential decision making and multi-agent systems with applications that focus on benefitting the society. Previously, I was a Research Fellow at Microsoft Research India, I’ve worked on HAMS, Project Vasudha and Project VeLLM. I’m a proponent of Open Source and also pretty active in developing in and contributing to Open Source, so do checkout my work on my GitHub.
Ph.D. student, 2023
University of Michigan
B.E. in Electronics and Communication Engineering, 2020
BITS Pilani, Hyderabad Campus
September 2023: One paper accepted at NeurIPS 2023!
August 2023: Joined UMich, as a Ph.D. student!
July 2023: Featured in the SARC-BPHC Alumni Unplugged podcast!
July 2023: Selected for HAIST-MAIA Intro Fellowship on AI Safety.
EnCortex package that provides optimization and decision making for improving sustainability of energy producers. (Currently integrated as a product at Energy & Mobility, Microsoft as a product)
Real-time perception requires planned resource utilization. Computational planning in real-time perception is governed by two considerations – accuracy and latency. There exist run-time decisions (e.g. choice of input resolution) that induce tradeoffs affecting performance on a given hardware, arising from intrinsic (content, e.g. scene clutter) and extrinsic (system, e.g. resource contention) characteristics. Earlier runtime execution frameworks employed rule-based decision algorithms and operated with a fixed algorithm latency budget to balance these concerns, which is sub-optimal and inflexible. We propose Chanakya, a learned approximate execution framework that naturally derives from the streaming perception paradigm, to automatically learn decisions induced by these tradeoffs instead. Chanakya is trained via novel rewards balancing accuracy and latency implicitly, without approximating either objectives. Chanakya simultaneously considers intrinsic and extrinsic context, and predicts decisions in a flexible manner. Chanakya, designed with low overhead in mind, outperforms state-of-the-art static and dynamic execution policies on public datasets on both server GPUs and edge devices
Large language models (LLMs) are at the forefront of transforming numerous domains globally. However, their inclusivity and effectiveness remain limited for non-Latin scripts and low-resource languages. This paper tackles the imperative challenge of enhancing the multilingual performance of LLMs, specifically focusing on Generative models. Through systematic investigation and evaluation of diverse languages using popular question-answering (QA) datasets, we present novel techniques that unlock the true potential of LLMs in a polyglot landscape. Our approach encompasses three key strategies that yield remarkable improvements in multilingual proficiency. First, by meticulously optimizing prompts tailored for polyglot LLMs, we unlock their latent capabilities, resulting in substantial performance boosts across languages. Second, we introduce a new hybrid approach that synergizes GPT generation with multilingual embeddings and achieves significant multilingual performance improvement on critical tasks like QA and retrieval. Finally, to further propel the performance of polyglot LLMs, we introduce a novel learning algorithm that dynamically selects the optimal prompt strategy, LLM model, and embeddings per query. This dynamic adaptation maximizes the efficacy of LLMs across languages, outperforming best static and random strategies. Our results show substantial advancements in multilingual understanding and generation across a diverse range of languages.
Opportunistic networks provide delay-tolerant communication in the absence of network infrastructure. Video streaming is challenging for such networks since real-time feedback is not available and the network is highly distributed and disconnected. This paper proposes an algorithm to adaptively transfer video over such networks according to available network resources. Scalable Video Coding is used to ensure that video of minimal quality gets delivered. If resources allow, the algorithm transmits the video at higher quality. We present a simple implementation for Android devices as a proof-of-concept. For evaluating the performance impact of variants in the algorithm, we also present the simulation results.