Georgia Tech & Google Propose a Novel Discrete Variational Autoencoder for Automatically Improving Code Efficiency


Improved engine and body designs have significantly boosted the fuel efficiency of cars sold in the USA over the last 20 years, and an Environmental Protection Agency (EPA) study shows that a user’s driving style now plays a critical role in maintaining this advantage, as sudden stops and starts, etc., can negatively impact fuel efficiency by up to 33 percent. A similar phenomenon can be found in software development, where recent hardware-based improvements can be undermined by inefficient coding.

Optimizing the computational efficiency of code is challenging — even for experts — as it requires a deep understanding of computational complexity and the underlying hardware.

In the new paper Learning to Improve Code Efficiency, a research team from the Georgia Institute of Technology and Google Research presents a novel discrete generative latent-variable model designed to help programmers identify more computationally efficient code variants, taking a step toward automating the process of code performance optimization.

The team summarizes their main contributions as follows:

  1. We frame code efficiency optimization as a generative problem.
  2. Using the Google Code Jam competitive programming dataset (Code-Jam), we analyze the distribution and characteristics of high-performance solutions.
  3. We propose a novel discrete generative latent-variable model of program edits to model a distribution of fast programs, conditioned on their slower counterparts.
  4. We qualitatively demonstrate that the learned discrete latent variables represent different edits, and that the edits that are assigned to one latent variable are generally consistent.

The team frames the code efficiency optimization task as a sequence-to-sequence problem: given an input code sequence, the model will try to output more efficient code. The goal of this work is thus to develop a model that can condition on a given program and predict more computationally efficient variants of that program. These variants are based on multiple code transformation approaches, such as changing data structures, wrapping or unwrapping code in functions, adjusting loop bounds, etc.

The team proposes an explicit and interpretable discrete variational auto-encoder model for generating efficiency-improving code transformations. They demonstrate that this model can be trained from a supervised dataset and provide prescriptive feedback in the form of hints learned from the dataset, which can then be used to help programmers write more efficient code.

In their empirical study, the team compared the proposed approach with a vanilla sequence-to-sequence transformer on three axes: correctness, efficiency, and diversity. The results show the novel discrete generative latent-variable model outperforms the baselines on all axes.

The researchers believe their work takes a promising step toward automating the process of identifying and applying higher-level performance optimizations, enabling developers to write more efficient code that will reduce computational burdens and their associated carbon footprint.

The paper Learning to Improve Code Efficiency is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.



Source link

Previous articleCheck out this STALKER-themed supermarket that just opened in Kyiv, Ukraine
Next articleApple TV+ lands Skydance film ‘The Family Plan’ starring Mark Wahlberg