Uppsala Architecture Research Team
The Uppsala Architecture Research Team is a multi-disciplinary research group that works on a broad range of challenges in computer architecture, including microarchitecture, memory systems, compilers, security, power efficiency, simulation and modeling, runtime optimizations, co-design, and distributed systems.
Professor Stefanos Kaxiras
(PhD Wisconsin) worked at Bell Labs before coming to Uppsala. His research interests include and memory consistency models, coherence, and microarchitecture with an emphasis on security and (reducing) speculation.
Professor David Black-Schaffer
(PhD Stanford) worked at Apple before coming to Uppsala. His research interests include runtime scheduling and memory system design.
Assistant Professor Yuan Yao
(PhD Royal Institute of Technology, Stockholm) has research interests in Network on Chip (NoC) and Non-Von-Neumann architectures.
Assistant Professor Chang Hyun Park
(PhD KAIST) conducts research on the virtual memory system on both the architecture and systems side.
Professor (Emeritus) Erik Hagersten
(PhD Royal Institute of Technology, Stockholm) was the chief server architect at Sun Microsystems before coming to Uppsala. His research interests include efficient memory system designs and modeling.
Postdocs and Visiting Researchers
Associate Professor Magnus Själander
Postdoc Researcher Peter Munch
Graduate Students
Per Ekemark
Johan Janzén
Hassan Muhammad
Marina Shimchenko
Pavlos Aimoniotis
Alireza Haddadi
Rashid Aligholipour
Ahmed Nematallah
Mehmetali Semi Yenimol
Xiaoyue Chen
Shiming Li
George Stoian
Hannah Atmer
Projects
Body Operating System
Challenge: Power efficient and secure computation in an in-body processing system. Link to project page: BOS
Efficient Processors
Challenge: Making general purpose processors more efficient.
Results: Offloading instructions to simpler schedulers to reduce scheduling cost (ICCD2018, HPCA2019, DATE2019, HPCA2020); caching in the pipeline (ISCA2019).
Security and Speculation
Challenge: Building processors that are secure by design; Reducing our reliance on speculation without losing its performance advantages.
Results: Understanding speculative shadows to reduce the impact of reduced speculation (ISCA2019); hiding speculative effects (CF2019), Non-Speculative techniques to reorder memory accesses (ISCA2017, IEEE Micro Top Picks 2018, ISCA2018, MICRO2018); Compiler orchestrated software-out-of-order execution on in-order cores (PACT2016 SRC-Bronze medal, CGO2017, PLDI2018, Best of CAL 2017, TransOnComputers2018 - Featured article of the month); Limited speculation cores (ISCA2015).
Compiling for Power Efficiency
Challenge: Co-designing the hardware and compiler to maximize efficiency.
Results: Decoupling access and execute to improve DVFS (ICS2013, CGO2014, CC2016 Best Paper, HIP3ES2016, HIP3ES2017);
Smart Memory Systems
Challenge: Understanding where and when data is needed to reduce the energy consumed in moving it and the time wasted waiting for it.
Results: Direct-to-data cache designs that avoid searches (MICRO2013, ISCA2014, MICRO2015, HPCA2018); intelligent policies for placing data based on reuse for CPUs (ICCD2016, SBAC-PAD2017, ICS2019) and GPUs (IISCW2017).
Scheduling
Challenge: Matching the heterogeneous behavior of tasks and applications to heterogeneous hardware for performance.
Results: CPU and GPU task analysis and modeling (JParallelComputing2018, ISPASS2018); GPU co-execution (SBAC-PAD),
Complexity-Effective Coherence
Challenge: Create novel coherence protocols to enable highly-efficient multi/many-core systems and software shared memory implementation.
Results: Application driven, highly-efficient, VIPS family of protocols (PACT2012, ISCA2013, ISCA2015, HPCA2015); ArgoDSM distributed shared memory system (HPDC2015); Racer TSO: data-race-detection coherence, transparent to software (MICRO2016, IEEE Micro Top Picks 2017 honorable mention); compiler-assisted cache coherence (IPDPS2015, TPDS2016, CGO2017, CCPR2017, TPDS2018).
Previous Projects
Modeling
Challenge: Using low-overhead profile information to quickly model memory system behavior and performance.
Results: Architecturally independent performance models for memory systems (CGO2012, IISWC2012) and performance (ISPASS2015) and resource-sharing performance profiling (CGO2013, PACT2012).
Software Optimization for Memory Systems
Challenge: Automatic software-based cache bypassing and prefetching without hurting co-execution on multicores.
Results: Adaptive software bypassing (HPCA2013) and prefetching (PACT2015).
Startups
Eta Scale AB works to commercialize memory coherence technology for both efficient scalable hardware implementations and software distributed shared memory. (Active)
Green Cache AB took the Direct-to-Data memory system technology and worked with clients to investigate the energy-savings potential in their future mobile SoCs. (IP purchased)
Acumem AB developed the StatCache statistical memory modeling technology into the ThreadSpotter turn-key tool to help developers identify and fix memory system related issues in their software. (Sold to Rouge Wave)
Alumni (and first job)
PhD Alumni
Christos Sakalis (PhD 2021, IAR, Sweden)
Mehdi Alipour (PhD 2020, Ericsson, Sweden)
Kim-Anh Tran (PhD 2020, Google, Germany)
Ricardo Alves (PhD 2019, Intel, USA)
Nikos Nikoleris (PhD 2019, ARM, UK)
Germán Ceballos (PhD 2018, Ericsson, Sweden)
Magnus Norgren (Swedish Patent Office)
Andreas Sembrant (PhD 2017, Nvidia, USA)
Mahdad Davari (PhD 2017, Ericsson, Sweden)
Muneeb Khan (PhD 2016, Ericsson, Sweden)
Moncef Mechri (IMC, Netherlands)
Vasileios Spiliopoulos (ZeroPoint, Sweden)
Konstantinos Koukos (PhD 2016, KTH, Sweden)
Andreas Sandberg (PhD 2014, ARM, UK)
David Eklöv (PhD 2011, Samsung, USA)
Håkan Zeffer (PhD 2006, Sun Microsystems, USA)
Henrik Löf (PhD 2006, Stanford University, USA)
Erik Berg (PhD 2005, Xelerated, Sweden)
Martin Karlsson (PhD 2006, Sun Microsystems, USA)
Dan Wallin (PhD 2006, Virtutech, Sweden)
Zoran Radovic (PhD 2005, Sun Microsystems, USA)
Licentiate Alumni
Gustaf Borgström (Lic 2022, IAR, Sweden)
Postdoc Alumni
Dr. Anirban Nag (Huawei, Switzerland)
Dr. Mihail Popov (Huawei, UK)
Professor Rakesh Kumar (NTNU, Norway)
Dr. Gregory Vaumourin (Atos, France)
Dr. Andra Hugo (DDN Storage, France)
Professor Trevor Carlson (NUS, Sinagpore)
Professor Magnus Själander (NTNU, Norway)
Professor Alberto Ros (University of Murcia, Spain)
Dr. Nina Shariati (Uppsala University, Sweden)
Recent Publications
- Mutator-Driven Object Placement using Load Barriers. In MPLR 2024: Proceedings of the 21st ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes, Association for Computing Machinery (ACM), 2024. (DOI, Fulltext).
- Doppelganger Loads: A Safe, Complexity-Effective Optimization for Secure Speculation Schemes. In ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture, Conference Proceedings Annual International Symposium on Computer Architecture, Association for Computing Machinery (ACM), New York, NY, 2023. (DOI, fulltext:print).
- Exploring the Latency Sensitivity of Cache Replacement Policies. In IEEE Computer Architecture Letters, volume 22, number 2, pp 93-96, Institute of Electrical and Electronics Engineers (IEEE), 2023. (DOI, fulltext:postprint).
- Faster FunctionalWarming with Cache Merging. In PROCEEDINGS OF SYSTEM ENGINEERING FOR CONSTRAINED EMBEDDED SYSTEMS, DRONESE AND RAPIDO 2023, pp 39-47, Association for Computing Machinery (ACM), 2023. (DOI).
- Game-of-Life Temperature-Aware DVFS Strategy for Tile-Based Chip Many-Core Processors. In IEEE Journal on Emerging and Selected Topics in Circuits and Systems, volume 13, number 1, pp 58-72, Institute of Electrical and Electronics Engineers (IEEE), 2023. (DOI).
- How addresses are made. In 2023 IEEE International ymposium on Workload Characterization, IISWC, International Symposium on Workload Characterization Proceedings, pp 223-225, IEEE, 2023. (DOI).
- Large-scale Graph Processing on Commodity Systems: Understanding and Mitigating the Impact of Swapping. In The International Symposium on Memory Systems (MEMSYS '23), pp 1-11, Association for Computing Machinery (ACM), 2023. (DOI, Fulltext, fulltext:print).
- Protean: Resource-efficient Instruction Prefetching. In The International Symposium on Memory Systems (MEMSYS '23), pp 1-13, Association for Computing Machinery (ACM), 2023. (DOI, Fulltext, fulltext:print).
- ReCon: Efficient Detection, Management, and Use of Non-Speculative Information Leakage. In 56th IEEE/ACM International Symposium on Microarchitecture, MICRO 2023, pp 828-842, Association for Computing Machinery (ACM), 2023. (DOI, Fulltext, fulltext:print).
- SE-CNN: Convolution Neural Network Acceleration via Symbolic Value Prediction. In IEEE Journal on Emerging and Selected Topics in Circuits and Systems, volume 13, number 1, pp 73-85, Institute of Electrical and Electronics Engineers (IEEE), 2023. (DOI).
- Silent Stores in the Battery-less Internet of Things: A Good Idea?. In , 2023.
- Speculative inter-thread store-to-load forwarding in SMT architectures. In Journal of Parallel and Distributed Computing, volume 173, pp 94-106, Elsevier, 2023. (DOI, Fulltext).
- Analysing software prefetching opportunities in hardware transactional memory. In Journal of Supercomputing, volume 78, number 1, pp 919-944, Springer Nature, 2022. (DOI).
- Clueless: A Tool Characterising Values Leaking as Addresses. In Proceedings of the 11th International Workshop on Hardware and Architectural Support for Security And Privacy, HASP 2022, pp 27-34, Association for Computing Machinery (ACM), 2022. (DOI, Fulltext, fulltext:print).
- Data-Out Instruction-In (DOIN!): Leveraging Inclusive Caches to Attack Speculative Delay Schemes. In 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED 2022), pp 49-60, Institute of Electrical and Electronics Engineers (IEEE), 2022. (DOI).
- Delay-on-Squash: Stopping Microarchitectural Replay Attacks in Their Tracks. In ACM Transactions on Architecture and Code Optimization (TACO), volume 20, number 1, Association for Computing Machinery (ACM), 2022. (DOI, Fulltext, fulltext:print).
- Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores. In ACM Transactions on Architecture and Code Optimization (TACO), volume 19, number 2, Association for Computing Machinery (ACM), 2022. (DOI).
- Every Walk's a Hit: Making Page Walks Single-Access Cache Hits. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), February 28 – March 4, 2022, Lausanne, Switzerland, Association for Computing Machinery (ACM), 2022. (DOI, Fulltext, fulltext:postprint, fulltext:print).
- Faster Functional Warming with Cache Merging. 2022. (fulltext).
- Free Atomics: Hardware Atomic Operations without Fences. In PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22), Conference Proceedings Annual International Symposium on Computer Architecture, pp 14-26, Association for Computing Machinery (ACM), 2022. (DOI).
- Splash-4: A Modern Benchmark Suite with Lock-Free Constructs. In 2022 IEEE International Symposium on Workload Characterization (IISWC), Proceedings of the IEEE International Symposium on Workload Characterization, pp 51-64, Institute of Electrical and Electronics Engineers (IEEE), 2022. (DOI).
- Supporting Dynamic Translation Granularity for Hybrid Memory Systems. In 2022 IEEE 40th International Conference on Computer Design (ICCD), Proceedings IEEE International Conference on Computer Design, pp 25-32, Institute of Electrical and Electronics Engineers (IEEE), 2022. (DOI).
- A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC2006. In ACM Transactions on Architecture and Code Optimization (TACO), volume 18, number 2, Association for Computing Machinery (ACM), 2021. (DOI).
- Do Not Predict – Recompute!: How Value Recomputation Can Truly Boost the Performance of Invisible Speculation. In 2021 International Symposium on Secure and Private Execution Environment Design (SEED), pp 89-100, Institute of Electrical and Electronics Engineers (IEEE), 2021. (DOI).
- Early Address Prediction: Efficient Pipeline Prefetch and Reuse. In ACM Transactions on Architecture and Code Optimization (TACO), volume 18, number 3, Association for Computing Machinery (ACM), 2021. (DOI, Fulltext, fulltext:print).
- Efficient, Distributed, and Non-Speculative Multi-Address Atomic Operations. In Proceedings of 54th Annual IEEE/ACM International Symposium on Microarchitecture, Micro 2021, International Symposium on Microarchitecture Proceedings, pp 337-349, Association for Computing Machinery (ACM), 2021. (DOI).
- ITSLF: Inter-Thread Store-to-Load Forwarding in Simultaneous Multithreading. In Proceedings of 54th Annual IEEE/ACM International Symposium on Microarchitecture, Micro 2021, International Symposium on Microarchitecture Proceedings, pp 1296-1308, Association for Computing Machinery (ACM), 2021. (DOI).
- Reorder Buffer Contention: A Forward Speculative Interference Attack for Speculation Invariant Instructions. In IEEE COMPUTER ARCHITECTURE LETTERS, volume 20, number 2, pp 162-165, Institute of Electrical and Electronics Engineers (IEEE), 2021. (DOI).
- Seeds of SEED: Preventing Priority Inversion in Instruction Scheduling to Disrupt Speculative Interference. In 2021 International Symposium on Secure and Private Execution Environment Design (SEED), pp 101-107, Institute of Electrical and Electronics Engineers (IEEE), 2021. (DOI).
- Splash-4: Improving Scalability with Lock-Free Constructs. In 2021 IEEE International Symposium On Performance Analysis Of Systems And Software (ISPASS 2021), pp 235-236, Institute of Electrical and Electronics Engineers (IEEE), 2021. (DOI).
- TSOPER: Efficient Coherence-Based Strict Persistency. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), International Symposium on High-Performance Computer Architecture : Proceedings, pp 125-138, Institute of Electrical and Electronics Engineers (IEEE), 2021. (DOI).
Full UART publications list.
Teaching and Recruiting
Please see Teaching and Applying for a PhD and MS Thesis Projects for more information.
Group History
The Uppsala Architecture Research Team was founded in 1999 when Professor Erik Hagersten (PhD from the Royal Institute of Technology) moved back to Sweden from his position as chief server architect at Sun Microsystems. For the first 10 years UART did pioneering work in statistical cache modeling, leading to a successful commercialization of the technology. Professor Stefanos Kaxiras (PhD from Wisconsin) joined the group in 2010, moving from the University of Patras in Greece and bringing extensive experience in power efficiency and coherency. Professor David Black-Schaffer (PhD from Stanford) also joined in 2010, bringing heterogeneous runtime experience from his work on OpenCL at Apple. Professors Hagersten, Black-Schaffer, and Kaxiras, together with PhD student Andreas Sembrant, successfully commercialized their work in direct-to-data memory systems in the company Green Cache AB, whose IP was purchased in 2018. Associate Professor Alexandra Jimborean (PhD from University of Strasbourg ) joined in 2012, bringing experience in compile-time and run-time code analysis and optimization. Since then the group has grown to include multiple PhD students and postdocs.