Systems engineer focusing on the design and performance analysis
of complex computing systems. My experience includes developing infrastructure for
large-scale experiments, workload analysis of computing systems, and hardware design and
verification. I have a background in computer architecture and systems.
Education
Parallel Computer Architecture
Computer Architecture
Digital Design and Synthesis
Parallel Programming
Operating Systems
Algorithms
Artificial Intelligence
OOP and Data Structures
Advanced Computer Architecture
Computer Microarchitecture
System-on-Chip Design
Memory and Storage Systems
Programming Languages and Compilers
Experience
Created and supported tools and infrastructure for large-scale sweep
experiments in interconnect system analysis.
Developed a python wrapper library around IBM Spectrum LSF for easier
and flexible job offloading to HPC clusters.
Formally verified compression, AMU, and domain bridge RTL blocks.
Added ASIL-B compliant RTL parity and interface checkers in AXI-Stream
blocks.
Conducted performance analysis on DSA - an on-chip accelerator
found on Sapphire Rapids and later Xeon processors
Explored use cases for DSA to take advantage of cache
pollution mitigation and higher throughput for memory operations
Submitted patents for improving memory deduplication
techniques using DSA
Prepared and Presented a
tutorial
for Intel's on-chip accelerators at ISCA 2023 - earned Intel's internal Department Recognition Award
Investigated the effectiveness of an L1 cache in a CXL
device for DLRM offloading
Built a functional cache simulator to evaluate both hit
rates and cache occupancy rates of real DLRM memory traces
Analyzed the characteristics of real DLRM data for locality
patterns to aid in the design of the memory system
Simulated DRAM and cache designs via Ramulator to obtain
bandwidth, hit rate, and other metrics used in analysis
Debugged failing signatures for the CPU memory system testbench
Implemented new statistical coverage (SCOV) workflow:
Used scoreboard listener functions to implement SCOV
events for assessing stimulus coverage
Coded macro for ease of adding additional SCOV events
within the UVM testbench
Developed Python script to parse generated simulation log
files during regression tests to send to a MySQL database
Developed python script to analyze the use of all plusargs
within the UVM testbenches
Scripted two githooks for file update notifications
Fixed UVM register definition autogeneration for more flexible
RAL models
Programmed module for modeling transactions between a master
device to interconnect return nodes in SystemC
Formally verified round robin and LSB priority arbiters using
system verilog assertions
Improved kernel ION memory allocation speeds by ~10%
Analyzed the efficiency of IOVA’s use of caching and compared
it with MMAP’s gap searching RBTree structure
Created internal python file tracing tool for parsing Linux
RAM dump binaries
Worked towards shifting mmap allocations to use the mempool API
Publications and Patents
A Quantitative Analysis and Guidelines of Data Streaming Accelerator in Modern
Intel Xeon Scalable Processors
Reese Kuper, Ipoom Jeong, Yifan Yuan,
Ren Wang, Narayan Ranganathan, Nikhil Rao, Jiayu Hu, Sanjay Kumar, Philip Lantz, Nam Sung Kim
[PAPER] International Conference on Architectural Support for
Programming Languages and Operating Systems
(ASPLOS). April, 2024
[PAPER] Open-access ePrint Archive
(arXiv). April, 2023
Efficiently Merging Non-Identical Pages in Kernel Same-Page Merging (KSM) for
Efficient and Improved Memory Deduplication and Security