Claudio Spiess

PhD Student, Computer Science

cvspiess at ucdavis dot edu
DECAL Lab
University of California, Davis
Davis, CA

Bio

Hey there! I’m Claudio 👋🏼 I’m a PhD student in Computer Science at the University of California, Davis. My research interests are in the intersection of Natural Language Processing and Software Engineering. I’m advised by Prof. Prem Devanbu, and am part of the DECAL Lab. I study how machine learning techniques, mostly from the NLP world, can understand source code, and to a greater extent, help software engineering. In particular, most of my work is focused on applying LLMs (Large Language Models) to source code.

From model “understanding”, we can do useful things like program generation from natural language, automated bug fixing, automated documentation, anomaly detection (bugs!), reverse engineering, naming, among many others. Not only do I seek to build systems, but to investigate metaphysical questions: do LLMs understand code? What do they learn? What biases and problems do these approaches have? And most importantly, how to fix them? On the practical side, I’m interested in how cognitive load can be alleviated while writing software by smart tools for programmers. As of summer 2023, my main focus has been on the calibration of LLMs for code, or rather the lack thereof.

Previously, I helped build a data driven lending platform at Dutch FinTech startup Floryn as a full stack software engineer, making machine learning work for loans. Most processes had some form of machine learning algorithms backing them, so I built some interesting tools around this, such as interpretation & explanation tools for non-technicals.

I received my bachelor degree in Computer Science & Engineering from the Free University of Bolzano. I wrote a research thesis, concentrating on NLP for software engineering, under the supervision of Dr. Romain Robbes and Dr. Andrea Janes. During this time, I was affiliated with the Software and Systems Engineering (SwSE) research group.

When I’m not hacking around on code or models, I like to travel the world with a backpack (44 countries/territories and counting), scuba dive (113 dives and counting), and hike volcanoes. I also speak six languages: English, German, French, Dutch, Italian, and some Spanish.

Projects

Calibration and Correctness of Language Models for Code paper illustration

Claudio Spiess, David Gros, Kunal Suresh Pai, Michael Pradel, Md Rafiqul Islam Rabin, Amin Alipour, Susmit Jha, Prem Devanbu, Toufique Ahmed
Calibration and Correctness of Language Models for Code

arXiv, 2023-2024

Machine learning models are widely used but can also often be wrong. Users would benefit from a reliable indication of whether a given output from a given model should be trusted, so a rational decision can be made whether to use the output or not. In this work, we studied whether current LLMs for code are well calibrated.

arXiv

STraceBERT: Source Code Retrieval using Semantic Application Traces paper illustration

Claudio Spiess
STraceBERT: Source Code Retrieval using Semantic Application Traces

FSE 2023, 2021-2023

Novel approach that utilizes a Java dynamic analysis tool to record calls to core Java libraries, and a BERT-style model on the recorded application traces for effective method source code retrieval from a candidate set. Experiments demonstrate the effectiveness in retrieving the source code compared to existing approaches. Proposed approach offers a promising solution to the problem of code retrieval in software reverse engineering and opens up new avenues for further research in this area.

FSE

Method Name Suggestions: An Open Vocabulary Approach paper illustration

Claudio Spiess, Romain Robbes, Andrea Janes
Method Name Suggestions: An Open Vocabulary Approach

Bachelor Thesis, 2019

Performed a series of experiments evaluating architectural and data preprocessing changes on the accuracy of the names produced by the STOTA code2seq model. We found that BPE can successfully be applied for method naming, resulting in model performance up to 4.2 F1 points higher than baseline

ResearchGate

Universal Transformer: Towards Learned Positional Encodings paper illustration

Claudio Spiess, Baris Sen, David Gayda, Stefan Broeger
Universal Transformer: Towards Learned Positional Encodings

Class Project, 2022

A novel pretraining approach for architecture and modality agnostic Transformer positional embeddings.