This paper describes the design and implementation of HERD, a keyvalue system designed to make the best use of an RDMA network. Unlike prior RDMA-based key-value systems, HERD focuses its design on reducing network round trips while using efﬁcient RDMA primitives; the result is substantially lower latency, and throughput that saturates modern, commodity RDMA hardware. HERD has two unconventional decisions: First, it does not use RDMA reads, despite the allure of operations that bypass the remote CPU entirely. Second, it uses a mix of RDMA and messaging verbs, despite the conventional wisdom that the messaging primitives are slow. A HERD client writes its request into the server’s memory; the server computes the reply. This design uses a single round trip for all requests and supports up to 26 million key-value operations per second with 5 µs average latency. Notably, for small key-value items, our full system throughput is similar to native RDMA read throughput and is over 2X higher than recent RDMA-based keyvalue systems. We believe that HERD further serves as an effective template for the construction of RDMA-based datacenter services.