Profile picture

PhD Student, Computer Science

cvspiess at ucdavis dot edu
DECAL Lab

University of California, Davis
Davis, CA

Bio

Hey there! I’m Claudio 👋🏼 I’m a PhD student in Computer Science at the University of California, Davis. My research interests are in the intersection of Natural Language Processing and Software Engineering. I’m advised by Prof. Prem Devanbu, and am part of the DECAL Lab. I study how machine learning techniques, mostly from the NLP world, can understand source code, and to a greater extent, help software engineering. In particular, most of my work is focused on applying LLMs (Large Language Models) to source code.

From model “understanding”, we can do useful things like program generation from natural language, automated bug fixing, automated documentation, anomaly detection (bugs!), reverse engineering, naming, among many others. Not only do I seek to build systems, but to investigate metaphysical questions: do LLMs understand code? What do they learn? What biases and problems do these approaches have? And most importantly, how to fix them? On the practical side, I’m interested in how cognitive load can be alleviated while writing software by smart tools for programmers. As of summer 2023, my main focus has been on the calibration of LLMs for code, or rather the lack thereof.

Previously, I helped build a data driven lending platform at Dutch FinTech startup Floryn as a full stack software engineer, making machine learning work for loans. Most processes had some form of machine learning algorithms backing them, so I built some interesting tools around this, such as interpretation & explanation tools for non-technicals.

I received my bachelor degree in Computer Science & Engineering from the Free University of Bolzano. I wrote a research thesis, concentrating on NLP for software engineering, under the supervision of Dr. Romain Robbes and Dr. Andrea Janes. During this time, I was affiliated with the Software and Systems Engineering (SwSE) research group.

When I’m not hacking around on code or models, I like to travel the world with a backpack (44 countries/territories and counting), scuba dive (113 dives and counting), and hike volcanoes. I also speak six languages: English, German, French, Dutch, Italian, and some Spanish.

Projects

, , , , , , , ,
Calibration and Correctness of Language Models for Code
arXiv, 2023-2024
Machine learning models are widely used but can also often be wrong. Users would benefit from a reliable indication of whether a given output from a given model should be trusted, so a rational decision can be made whether to use the output or not. In this work, we studied whether current LLMs for code are well calibrated.

STraceBERT: Source Code Retrieval using Semantic Application Traces
FSE 2023, 2021-2023
Novel approach that utilizes a Java dynamic analysis tool to record calls to core Java libraries, and a BERT-style model on the recorded application traces for effective method source code retrieval from a candidate set. Experiments demonstrate the effectiveness in retrieving the source code compared to existing approaches. Proposed approach offers a promising solution to the problem of code retrieval in software reverse engineering and opens up new avenues for further research in this area.
, ,
Method Name Suggestions: An Open Vocabulary Approach
Bachelor Thesis, 2019
Performed a series of experiments evaluating architectural and data preprocessing changes on the accuracy of the names produced by the STOTA code2seq model. We found that BPE can successfully be applied for method naming, resulting in model performance up to 4.2 F1 points higher than baseline
, , ,
Universal Transformer: Towards Learned Positional Encodings
Class Project, 2022
A novel pretraining approach for architecture and modality agnostic Transformer positional embeddings.
,
Impact of War on Food Security in the Middle East & East Africa
Class Project, 2023
Visualized food imports, exports, and balances in the Middle East and East Africa using Tableau

Simulating crop yields in El Oro, Ecuador
Class Project, 2023
Created interactive visualizations of GeoTiff files representing actual yields, potential yields, mean temperature, etc for various climate change scenarios using ipyleaflet, geopandas, rasterio, and more.
,
SPM: Social Package Manager
Class Project, 2022
A pip wrapper that uses Distributed Hash Tables (DHT) to store which packages you and your friends have.

pandas-dp: Differential Privacy in pandas
Personal Project, 2023
A Python package for differential privacy in pandas

CUDA Docker Stack
Personal Proect, 2023
Docker image providing CUDA-acclerated Jupyter Lab environment, with PyTorch and HuggingFace.

Kickstarter Project Analysis
Personal Project, 2021
Performed basic exploratory data analysis on a public Kickstarter dataset to produce recommendations for a successful Kickstarter project.

Subtitle Keyword Extractor
Personal Project, 2022
A tool for extracting clips from videos based on whether a keyword occurs in the subtitles.
,
PhotoStack
Class Project, 2018
Photo storage system using ML (YOLOv3) to order and categorize photos without user input Infrastructure consists of Dockerized MongoDB, Redis, and NGINX instances with Node.js and Python services. Frameworks and languages used include TypeScript, Flask, React, Bulma CSS, Apollo GraphQL, and Express.js

BBQ-Planner
Personal Project, 2019
A BBQ planning application built with Python and Django.

DNA analysis
Personal Project, 2022
An analysis of my genome from 23andme and SNPedia. Based off lorarjohns' work.

Jtrak to Macdive
Personal Project, 2020
A small script to convert JTrak dive logs to Macdive format.
,
ToodleMoodle
Class Project, 2018
A CLI tool for analyzing and attacking Moodle installations for pentesters using various publicly available exploits. Developed using Crystal, the fast, natively compiled Ruby lookalike.

Perry
Personal Project, 2018
Perry is an asset tracker (GPS) designed to run on an Arduino, with a React powered web interface and Node backend.
,
Fitbit Simulator
Class Project, 2017
A Fitbit simulator

Vasser
Class Project, 2017
A high-fidelity application mockup for finding water sources.

Bookhub
Class Project, 2017
Book sharing database

Address Book
Class Project, 2017
A simple address book written in Java using JavaFX and JUnit for unit testing.

Learning English
Class Project, 2017
A quiz application for learning English.

ApiCollider
Personal Project, 2018
An ideation tool that combines two random public APIs
, ,
Natürliche Rechenmaschine
Class Project, 2019
A simple Yacc/Lex calculator, in German.
,
Network Dropbox
Class Project, 2017
A small C web server with drag-and-drop HTML5 web interface for LAN file sharing.