04月16, 2021

[译] Improving Redis Performance Through Multi-Thread Processing

前两天在整理NIO知识时偶然发现这篇文章,从线程模型的角度解读Redis的性能,短小精炼,故翻译分享。

原文来自 Leona Zhang :https://dzone.com/articles/improving-redis-performance-through-multi-thread-p

通过多线程处理提升Redis性能

Learn how to optimize the performance of a Redis database with multi-thread processing.

Redis is generally known as a single-process, single-thread model. This is not true. Redis also runs multiple backend threads to perform backend cleaning works, such as cleansing the dirty data and closing file descriptors. In Redis, the main thread is responsible for the major tasks, including but not limited to: receiving the connections from clients, processing the connection read/write events, parsing requests, processing commands, processing timer events, and synchronizing data. Only one CPU core runs a single process and single thread.

Redis通常被看作为是单进程,单线程模型。实际上不是这样,Redis依然会有多个后台线程去进行背后的清理工作,比如清理脏数据和关闭文件描述符 (这里应该是指开辟socket连接占用的fds)。对于Redis来说,main线程负责主要工作,包括但不限于:接受来自客户端的连接请求,处理连接的读/写事件,分析请求包,处理具体命令,处理定时事件以及数据同步工作。以上这些工作全部由单个CPU核心运行单进程与单线程完成。

For small packets, a Redis server can process 80,000 to 100,000 QPS. A larger QPS is beyond the processing capacity of a Redis server. A common solution is to partition the data and adopt multiple servers in distributed architecture.

对于体积较小的数据包,一个Redis服务可以达到8-10万的QPS,对于更进一步的指标已经超越了单台Redis服务的性能瓶颈。对此常规的解决方案是将数据分片储存在分布式架构的服务集群中。

However, this solution also has many drawbacks. For example, too many Redis servers to manage; some commands that are applicable to a single Redis server do not work on the data partitions; data partitions cannot solve the hot spot read/write problem; and data skew, redistribution, and scale-up/down become more complex. Due to restrictions of the single process and single thread, we hope that the multi-thread can be reconstructed to fully utilize the advantages of the SMP multi-core architecture, thus increasing the throughput of a single Redis server.

然而,这并不是一种一劳永逸的解决方案。比如需要对集群进行管理、有些命令不支持已分片的数据、数据分片无法解决热点读写问题、数据倾斜(这里指的应该是一个问题,即数据不能均匀的分布在集群节点中,造成孤岛节点)、分布式选举以及服务的伸缩将变得更加复杂。由于受到单进程单线程的限制,于是我们希望利用SMP多核架构的优势去赋予单Redis服务多线程的吞吐量。

To make Redis multi-threaded, the simplest way to think of is that every thread performs both I/O and command processing. However, as the data structure processed by Redis is complex, the multi-thread needs to use the locks to ensure the thread security. Improper handling of the lock granularity may deteriorate the performance.

让Redis多线程化首先想到的是用每个线程并行处理IO与命令。但是因为Redis所设计复杂的数据结构(哈哈哈,追求性能总是一把双刃剑)在并发环境下必须用所去保证线程安全,而不能恰当的控制好锁的粒度反倒使Redis的性能退化。

We suggest that the number of I/O threads should be increased to enable an independent I/O thread to read/write data in the connections, parse commands, and reply data packets, and still let a single thread process the commands and execute the timer events. In this way, the throughput of a single Redis server can be increased.

我们建议处理一个连接的IO的线程数应该被增加到足够单独处理读写事件、解析命令和数据返回,同时用单个独立线程去处理命令和执行定时事件。只有这样才能增加单线程Redis的吞吐量。

Single Process and Single Thread Model

单进程单线程模型

Advantages

  • Due to restrictions of the single-process and single-thread model, time-consuming operations (such as dict rehash and expired key deletion) are broken into multiple steps and executed one by one in Redis implementation. This prevents execution of an operation for a long time and therefore avoids long time blocking of the system by an operation. The single-process and single-thread code is easy to compile, which reduces the context switching and lock seizure caused by multi-process and multi-thread

优势:

  • 受制于单进程单线程模型,耗时操作(如hash的rehash,过期key的清理)在Redis实现中被拆解成多个步骤串行执行以预防单个操作会过于耗时,故最终避免整个Redis会被单个操作hung住。相比起多线程环境下牵涉到的线程上下文切换与锁竞争,单线程的代码也是易于编译的。

Disadvantages

  • Only one CPU core can be used, and the multi-core advantages cannot be utilized.
  • For heavy I/O applications, a large amount of CPU capacity is consumed by the network I/O operations. Applications that use Redis as cache are often heavy I/O applications. These applications basically have a high QPS, use relatively simple commands (such as get, set, and incr), but are RT sensitive. They often have a high bandwidth usage, which may even reach hundreds of megabits. Thanks to the popularization of the 10-GB and 25-GB network adapters, the network bandwidth is no longer a bottleneck. Therefore, what we need to think about is how to utilize the advantages of multi-core and performance of the network adapter.

劣势:

  • 只有单个CPU核心会被使用,多核优势无用武之地。
  • 对于IO密集型操作的应用,大量的CPU时间片被消耗在网络IO操作上。而把Redis当作缓存的应用常常正是这种IO密集型应用。这种应用有如下特点:高QPS、操作多集中在简单命令上(比如set/get/incr),且对响应时间敏感 ,这些操作通常高带宽占用,甚至达到数百m。多亏了10/25GB网卡的普及才不会让网络成为性能的瓶颈。因此我们也考虑如何用多核优势提高网络传输的性能。

Multi-Thread Model and Implementation

多线程模型与实现

Thread Model There are three thread types, namely:

  • Main thread
  • I/O thread
  • Worker thread

有三种线程类型即:

  • 主线程
  • IO线程
  • 工作线程

image.png

  • Main thread: Receives connections, creates clients, and forwards connections to the I/O thread.
  • I/O thread: Processes the connection read/write events, parses commands, forwards the complete parsed commands to the worker thread for processing, sends the response packets, and deletes connections.
  • Worker thread: Processes commands, generates the client response packets, and executes the timer events.
  • The main thread, I/O thread, and worker thread are driven by events separately.
  • Threads exchange data through the lock-free queue and send notifications through tunnels.
  • 主线程: 接受连接,创建客户端,将连接绑定给IO线程
  • IO线程:处理连接的读写事件,解析命令,将解析好的命令绑定给工作线程处理,发送响应包以及关闭连接
  • 工作线程: 处理命令,生成客户端响应包,执行定时事件
  • 主线程,IO线程和工作线程皆为事件驱动
  • 线程通过无锁队列交换数据并通过管道发送通知

Benefits of the Multi-Thread Model

多线程模型的优势

Increased Read/Write Performance

提升读写性能

The stress test result indicates that the read/write performance can be improved by about three folds in the small packet scenario.

下图展示了在较小数据包场景下多线程模型的读写性能是单线程模型的3倍

image.pngAlt text

Increased Master/Slave Synchronization Speed

提升主从同步的效率

When the master sends the synchronization data to the slave, data is sent in the I/O thread. When reading data from the master, the slave reads the full data from the worker thread, and the incremental data from the I/O thread. This can efficiently increase the synchronization speed.

主节点需要通过IO线程给从节点发送同步数据。从节点通过工作线程向主节点读取全量数据并通过IO线程获取增量数据,这无疑给数据同步带来效率的提升。

Subsequent Tasks

后续的任务

The first task is to increase the number of I/O threads and optimize the I/O read/write capability. Next, we can break down the worker thread so that each thread completes I/O reading, as well as the work of the worker thread.

首要的任务是增加IO线程数去提升IO读写能力,接着可以分解工作线程以便每个线程可以完成IO读取及各自的工作任务(这里应该是指业务处理)

Setting the Number of I/O Threads

IO线程数量的设置

  • Test results indicate that the number of I/O threads should not exceed 6. Otherwise, the worker thread will become a bottleneck for simple operations.
  • Upon startup of a process, the number of I/O threads must be set. When the process is running, the number of I/O threads cannot be modified. Based on the current connection allocation policy, modification of the number of I/O threads involves re-allocation of connections, which is quite complex.
  • 从测试结果上来看,IO线程的数量不应该超过6个,否则对于简单操作来说工作线程将成为性能瓶颈。
  • 启动流程完毕后,必须设置IO线程的数量,在运行时无法调整IO线程数量。基于当前连接分配策略,调整IO线程连接数的修改及重新分配是相当复杂的。

Considerations

值得考虑的事情

With the popularization of the 10-GB and 25-GB network adapters, how to fully utilize the hardware performance must be carefully considered. We can use technologies, such as multiple threads for networkI/O and the kernel bypass user-mode protocol stack.

随着10/25GB网卡的普及,如何利用硬件性能必须要谨慎考虑。我们可以利用如kernel这样的IO多路复用技术绕过用户态协议栈。(不是很确定这句的意思)

The I/O thread can be used to implement blocking-free data migration. The I/O thread encodes the data process or forwards commands, whereas the target node decodes data or executes commands.

IO线程可以用来实现非阻塞的数据迁移(就是数据在各端间的传输): 数据编码 、命令发送和对端的数据解码以及命令执行。

本文链接:https://check321.net/post/improve_redis_by_muti_thread.html

-- EOF --

Comments

请在后台配置评论类型和相关的值。