Claudio Spiess
PhD Student, Computer Science
cvspiess at ucdavis dot edu
DECAL LabUniversity of California, DavisDavis, CA Bio
Hey there! I’m Claudio 👋🏼 I’m a PhD student in Computer Science at the University of California, Davis. My research interests are in the intersection of Natural Language Processing and Software Engineering. I’m advised by Prof. Prem Devanbu, and am part of the DECAL Lab. I study how machine learning techniques, mostly from the NLP world, can understand source code, and to a greater extent, help software engineering. In particular, most of my work is focused on applying LLMs (Large Language Models) to source code.
From model “understanding”, we can do useful things like program generation from natural language, automated bug fixing, automated documentation, anomaly detection (bugs!), reverse engineering, naming, among many others. Not only do I seek to build systems, but to investigate metaphysical questions: do LLMs understand code? What do they learn? What biases and problems do these approaches have? And most importantly, how to fix them? On the practical side, I’m interested in how cognitive load can be alleviated while writing software by smart tools for programmers. As of summer 2023, my main focus has been on the calibration of LLMs for code, or rather the lack thereof.
Previously, I helped build a data driven lending platform at Dutch FinTech startup Floryn as a full stack software engineer, making machine learning work for loans. Most processes had some form of machine learning algorithms backing them, so I built some interesting tools around this, such as interpretation & explanation tools for non-technicals.
I received my bachelor degree in Computer Science & Engineering from the Free University of Bolzano. I wrote a research thesis, concentrating on NLP for software engineering, under the supervision of Dr. Romain Robbes and Dr. Andrea Janes. During this time, I was affiliated with the Software and Systems Engineering (SwSE) research group.
When I’m not hacking around on code or models, I like to travel the world with a backpack (44 countries/territories and counting), scuba dive (113 dives and counting), and hike volcanoes. I also speak six languages: English, German, French, Dutch, Italian, and some Spanish.
Projects
Claudio Spiess,
David Gros,
Kunal Suresh Pai,
Michael Pradel,
Md Rafiqul Islam Rabin,
Amin Alipour,
Susmit Jha,
Prem Devanbu,
Toufique AhmedCalibration and Correctness of Language Models for Code
arXiv,
2023-2024
Machine learning models are widely used but can also often be wrong. Users would benefit from a reliable indication of whether a given output from a given model should be trusted, so a rational decision can be made whether to use the output or not. In this work, we studied whether current LLMs for code are well calibrated.
Claudio SpiessSTraceBERT: Source Code Retrieval using Semantic Application Traces
FSE 2023,
2021-2023
Novel approach that utilizes a Java dynamic analysis tool to record calls to core Java libraries, and a BERT-style model on the recorded application traces for effective method source code retrieval from a candidate set. Experiments demonstrate the effectiveness in retrieving the source code compared to existing approaches. Proposed approach offers a promising solution to the problem of code retrieval in software reverse engineering and opens up new avenues for further research in this area.
Claudio Spiess,
Romain Robbes,
Andrea JanesMethod Name Suggestions: An Open Vocabulary Approach
Bachelor Thesis,
2019
Performed a series of experiments evaluating architectural and data preprocessing changes on the accuracy of the names produced by the STOTA code2seq model. We found that BPE can successfully be applied for method naming, resulting in model performance up to 4.2 F1 points higher than baseline
Claudio SpiessSimulating crop yields in El Oro, Ecuador
Class Project,
2023
Created interactive visualizations of GeoTiff files representing actual yields, potential yields, mean temperature, etc for various climate change scenarios using ipyleaflet, geopandas, rasterio, and more.
Claudio Spiess,
Stefan BroegerSPM: Social Package Manager
Class Project,
2022
A pip wrapper that uses Distributed Hash Tables (DHT) to store which packages you and your friends have.
Claudio SpiessCUDA Docker Stack
Personal Proect,
2023
Docker image providing CUDA-acclerated Jupyter Lab environment, with PyTorch and HuggingFace.
Claudio SpiessKickstarter Project Analysis
Personal Project,
2021
Performed basic exploratory data analysis on a public Kickstarter dataset to produce recommendations for a successful Kickstarter project.
Claudio Spiess,
Riccardo FellugaPhotoStack
Class Project,
2018
Photo storage system using ML (YOLOv3) to order and categorize photos without user input Infrastructure consists of Dockerized MongoDB, Redis, and NGINX instances with Node.js and Python services. Frameworks and languages used include TypeScript, Flask, React, Bulma CSS, Apollo GraphQL, and Express.js
Claudio SpiessBBQ-Planner
Personal Project,
2019
A BBQ planning application built with Python and Django.
Claudio SpiessDNA analysis
Personal Project,
2022
An analysis of my genome from 23andme and SNPedia. Based off lorarjohns' work.
Claudio SpiessJtrak to Macdive
Personal Project,
2020
A small script to convert JTrak dive logs to Macdive format.
Claudio Spiess,
Riccardo FellugaToodleMoodle
Class Project,
2018
A CLI tool for analyzing and attacking Moodle installations for pentesters using various publicly available exploits. Developed using Crystal, the fast, natively compiled Ruby lookalike.
Claudio SpiessPerry
Personal Project,
2018
Perry is an asset tracker (GPS) designed to run on an Arduino, with a React powered web interface and Node backend.
Claudio Spiess,
Alex SkiffFitbit Simulator
Class Project,
2017
A Fitbit simulator
Claudio SpiessVasser
Class Project,
2017
A high-fidelity application mockup for finding water sources.
Claudio SpiessBookhub
Class Project,
2017
Book sharing database
Claudio SpiessAddress Book
Class Project,
2017
A simple address book written in Java using JavaFX and JUnit for unit testing.
Claudio SpiessLearning English
Class Project,
2017
A quiz application for learning English.
Claudio SpiessApiCollider
Personal Project,
2018
An ideation tool that combines two random public APIs
Claudio Spiess,
François Tronche-Macaire,
Riccardo FellugaNatürliche Rechenmaschine
Class Project,
2019
A simple Yacc/Lex calculator, in German.
Claudio Spiess,
Scott StolarskiNetwork Dropbox
Class Project,
2017
A small C web server with drag-and-drop HTML5 web interface for LAN file sharing.