Claudio Spiess
PhD Student, Computer Science
cvspiess at ucdavis dot edu
DECAL LabUniversity of California, DavisDavis, CA Bio
Hey there! I’m Claudio 👋🏼 I’m a graduate student researcher at the University of California, Davis. I am a data scientist 🔮 with an emphasis on neural Natural Language Processing. I’m advised by Prof. Prem Devanbu, and am part of the DECAL Lab, which ranks as the 7th strongest software engineering research group (as of Feb. 2023) in the US. My research interests revolve around how machine learning techniques, mostly from the NLP world, can understand source code, and to a greater extent, software engineering. In particular, most of my work is focused on applying LLMs (Large Language Models) to source code.
From model “understanding”, we can do useful things like program generation from natural language, automated bug fixing, automated documentation, anomaly detection (bugs!), reverse engineering, naming, among many others. Not only do I seek to build systems, but to investigate metaphysical questions: do LLMs understand code? What do they learn? What biases and problems do these approaches have? And most importantly, how to fix them? On the practical side, I’m interested in how cognitive load can be alleviated while writing software by smart tools for programmers.
Previously, I helped build a data driven lending platform at Dutch FinTech startup Floryn as a full stack software engineer, making machine learning work for loans. Most processes had some form of machine learning algorithms backing them, so I built some interesting tools around this, such as interpretation & explanation tools for non-technicals.
I received my bachelor degree in Computer Science & Engineering from the Free University of Bolzano. I wrote a research thesis, concentrating on NLP for software engineering, under the supervision of Dr. Romain Robbes and Dr. Andrea Janes. During this time, I was affiliated with the Software and Systems Engineering (SwSE) research group.
When I’m not hacking around on code or models, I like to travel the world with a backpack (44 countries/territories and counting), scuba dive (113 dives and counting), and hike volcanoes. I also speak six languages: English, German, French, Dutch, Italian, and some Spanish.
Projects
Claudio SpiessSTraceBERT: Source Code Retrieval using Semantic Application TracesFSE 2023,2021-2023
Novel approach that utilizes a Java dynamic analysis tool to record calls to core Java libraries, and a BERT-style model on the recorded application traces for effective method source code retrieval from a candidate set. Experiments demonstrate the effectiveness in retrieving the source code compared to existing approaches. Proposed approach offers a promising solution to the problem of code retrieval in software reverse engineering and opens up new avenues for further research in this area.
Claudio Spiess
Romain Robbes
Andrea JanesMethod Name Suggestions: An Open Vocabulary ApproachBachelor Thesis,2019
Performed a series of experiments evaluating architectural and data preprocessing changes on the accuracy of the names produced by the STOTA code2seq model. We found that BPE can successfully be applied for method naming, resulting in model performance up to 4.2 F1 points higher than baseline
Claudio SpiessSimulating crop yields in El Oro, EcuadorClass Project,2023
Created interactive visualizations of GeoTiff files representing actual yields, potential yields, mean temperature, etc for various climate change scenarios using ipyleaflet, geopandas, rasterio, and more.
Claudio Spiess
Stefan BroegerSPM: Social Package ManagerClass Project,2022
A pip wrapper that uses Distributed Hash Tables (DHT) to store which packages you and your friends have.
Claudio SpiessCUDA Docker StackPersonal Proect,2023
Docker image providing CUDA-acclerated Jupyter Lab environment, with PyTorch and HuggingFace.
Claudio SpiessKickstarter Project AnalysisPersonal Project,2021
Performed basic exploratory data analysis on a public Kickstarter dataset to produce recommendations for a successful Kickstarter project.
Claudio Spiess
Riccardo FellugaPhotoStackClass Project,2018
Photo storage system using ML (YOLOv3) to order and categorize photos without user input Infrastructure consists of Dockerized MongoDB, Redis, and NGINX instances with Node.js and Python services. Frameworks and languages used include TypeScript, Flask, React, Bulma CSS, Apollo GraphQL, and Express.js
Claudio SpiessBBQ-PlannerPersonal Project,2019
A BBQ planning application built with Python and Django.
Claudio SpiessDNA analysisPersonal Project,2022
An analysis of my genome from 23andme and SNPedia. Based off lorarjohns' work.
Claudio SpiessJtrak to MacdivePersonal Project,2020
A small script to convert JTrak dive logs to Macdive format.
Claudio Spiess
Riccardo FellugaToodleMoodleClass Project,2018
A CLI tool for analyzing and attacking Moodle installations for pentesters using various publicly available exploits. Developed using Crystal, the fast, natively compiled Ruby lookalike.
Claudio SpiessPerryPersonal Project,2018
Perry is an asset tracker (GPS) designed to run on an Arduino, with a React powered web interface and Node backend.
Claudio SpiessVasserClass Project,2017
A high-fidelity application mockup for finding water sources.
Claudio SpiessBookhubClass Project,2017
Book sharing database
Claudio SpiessAddress BookClass Project,2017
A simple address book written in Java using JavaFX and JUnit for unit testing.
Claudio SpiessLearning EnglishClass Project,2017
A quiz application for learning English.
Claudio SpiessApiColliderPersonal Project,2018
An ideation tool that combines two random public APIs
Claudio Spiess
François Tronche-Macaire
Riccardo FellugaNatürliche RechenmaschineClass Project,2019
A simple Yacc/Lex calculator, in German.
Claudio Spiess
Scott StolarskiNetwork DropboxClass Project,2017
A small C web server with drag-and-drop HTML5 web interface for LAN file sharing.