`

Intel Xeon 5500/5600系列 CPU服务器内存设置

阅读更多

http://www.xasun.com/article/2a/1991$3.html

 

1.With the Xeon 5500 series processors

Intel has diverged from its traditional Symmetric Multiprocessing (SMP) architecture
to a Non-Uniform Memory Access (NUMA) architecture.In a two-processor scenario, the Xeon
5500 series processors are connected through a serial coherency link called QuickPath
Interconnect (QPI). The QPI is capable of 6.4, 5.6 or 4.8 GT/s (gigatransfers per second),
depending on the processor model. The Xeon 5500 series integrates the memory controller within
the processor, resulting in two memory controllers in a two-socket system. Each memory
controller has three memory channels and supports DDR-3 memory. Depending on processor
model, the type of memory used, and the population of memory, memory may be clocked at
1333MHz, 1066MHz or 800MHz. Each memory channel supports up to 3 DIMMs per channel
(DPC), for a theoretical maximum of 9 DIMMs per processor or 18 per 2-socket server. (See
Figure 1 for illustration.) However, the actual maximum number of DIMMs per system is
dependent upon the system design.
新的55系列的至强CPU已经由原来的SMP结构改成了现在的NUMA结构,两个CPU不再对共同的内存资源
管理,而是把内存控制器集成到CPU中,每个CPU可以管理3个通道一共9条内存,CPU之间通过QPI(可以理解为内部总线)互联。而内存使用时能达到的最高频率跟CPU本身和DIMM都有关系。

2.Memory Performance


With the varied number of configurations possible in the Xeon 5500 series processor-based
systems, a number of variables emerge that influence processor/memory performance. The main
variables are memory speed, memory interleaving, memory ranks and memory population across
various memory channels and processors. Depending on the processor model and number of
DIMMs, the performance of the Xeon 5500 platform will see large memory performance
variances. We will look at each of these factors more closely in the next sections.
与内存性能最相关的包括CPU的类型,每通道安装的内存数,内存本身的性能,内存互联的方式,内存的RANK数等等。
2.1 Memory Speed
As mentioned earlier, the memory speed is determined by the combination of the processor
model, DIMM speed, and DIMMs per channel.
2.1.1 Processor model
The initial Xeon 5500 series processor-based offerings will be categorized into 3 bins called
Performance, Volume and Value. The 3 bins have the ability to clock memory at different
maximum speeds:
• 1333MHz (X55xx processor models)
• 1066MHz (E552x or L552x and up)
• 800MHz (E550x)
So, the processor model will limit the maximum frequency of the memory. Note: Because of the
integrated memory controllers the former front-side bus (FSB) no longer exists.
内存控制器集成到CPU中后,FSB就不存在的(没有前端总线的概念,和AMD的处理器一致)。
2.1.2 DDR3 DIMM Speed
DDR-3 memory will be available in various sizes at speeds of 1333MHz and 1066MHz. 1333MHz
represents the maximum capability at which memory can be clocked. However, the memory will
not be clocked faster than the capability of the processor model and will be clocked appropriately
by the BIOS.
2.1.3 DIMMs per Channel (DPC)
The number and type of DIMMs and the channels in which they reside will also determine the

speed at which memory will be clocked. Table 1 describes the behavior of the platform. The table
below assumes a 1333MHz-capable processor model (X55xx). If a slower processor model is
used, then the memory speed will be the lower of the memory speed and the processor model
memory speed capability. If the DPC is not uniform across all the channels, then the system will
clock to the frequency of the slowest channel.
每个通道使用不同数目的内存时,内存工作的频率是不一样的,具体见下表。

表1

2.1.4 Low-level Performance Specifics
It is important to understand the impact of the performance of the Xeon 5500 series platform,
depending on the memory speed. We will use both low-level memory tools and application
benchmarks to quantify the impact of memory speed.
关系内存性能的参数:延迟和吞吐量
Two of the key low-level metrics that are used to measure memory performance are memory
latency and memory throughput. We use a base Xeon 5500 2.93GHz, 1333MHz-capable 2-
socket system for this analysis. The memory configurations for the three memory speeds in the
following benchmarks are as follows:
• 1333MHz – 6 x 4GB dual-rank 1333MHz DIMMs
• 1066MHz – 12 x 2GB dual-rank DIMMs for 1066MHz
• 800MHz – 12 x 2GB dual-rank DIMMs clocked down to 800MHz in BIOS
Note: Memory ranks are explained in detail in section 3.3.
As shown in 表2 below, we show the unloaded latency to local memory. The unloaded
latency is measured at the application level and is designed to defeat processor prefetch
mechanisms. As shown in the 表2, the difference between the fastest and slowest speeds is
about 10%. This represents the high watermark for latency-sensitive workloads. Another
important thing to note is that this is almost a 50% decrease in memory latency when compared
to the previous generation Xeon 5400 series processor on 5000P chipset platforms.
内存延迟:1333对1066MHZ内存的提升在10%左右,但是55系列CPU对于54系列CPU总体上有50%的提升。

表2

A better indicator of application performance is memory throughput. We use the triad component
of the streams benchmark to compare the performance at different memory speeds. The memory
throughput assumes all local memory allocation and all 8 cores utilizing main memory. As shown
in 表3, the performance gain from running memory at 1066MHz versus 800MHz is 28%, and
the performance gain from running at 1333MHz versus 1066MHz is 9%. So, the performance
penalty of clocking memory down to 800MHz is far greater than clocking it down to 1066MHz.
This new processor design comes with some trade-offs in memory capacity, performance, and
cost: For example, more lower-cost/lower-capacity DIMMs mean lower memory speed.
Alternatively, fewer higher-capacity DIMMs cost more but offer higher performance.

注意,内存频率从1333降到1066比从1066降到800损失要小。
表3


Regardless of memory speed, the Xeon 5500 platform represents a significant improvement in
memory bandwidth over the previous Xeon 5400 platform. At 1333MHz, the improvement is
almost 500% over the previous generation. This huge improvement is mainly due to dual

 

integrated memory controllers and faster DDR-3 1333MHz memory. This improvement translates
into improved application performance and scalability.
至强55系列CPU比之前的54系列CPU的内存带宽提高了将近500%
2.1.5 Application Performance

In this section, we will discuss the impact of memory speed on the performance of three
commonly used benchmarks: SPECint®2006_rate, SPECfp®2006_rate and SPECjbb®2005. In
each case, the benchmark scores are relative to the score at 800MHz as shown in Figure 8.
SPECint2006_rate is typically used as an indicator of performance for commercial applications. It
tends to be more sensitive to processor frequency and less to memory bandwidth. There are very
few components in SPECint2006_rate that are memory bandwidth intensive and so the
performance gain with memory speed improvements is the least for this workload. In fact, most of
the difference observed is due to one of the sub-benchmarks that shows a high sensitivity to
memory frequency. There is an 8% improvement going from 800MHz to 1333MHz while the
improvement in memory bandwidth is almost 40%.
SPECfp_rate is used as an indicator for HPC (high-performance computing) workloads. It tends
to be memory bandwidth intensive and should reveal significant improvements for this workload
as memory frequency increases. As expected, a number of sub-benchmarks demonstrate
improvements as high as the difference in memory bandwidth. As shown in Figure 8, there is a
13% gain going from 800MHz to 1066MHz and another 6% improvement with 1333MHz.
SPECfp_rate captures almost 50% of the memory bandwidth improvement.
SPECjbb2005 is a workload that does not stress memory but keeps the data bus moderately
utilized. This workload provides a middle ground and the performance gains reflect that trend. As
shown in 表4, there is an 8% gain from 800MHz to 1066MHz and another 2% upside with
1333MHz.
表4


2.2 Memory Interleaving
Memory interleaving refers to how physical memory is interleaved across the physical DIMMs. A
balanced system provides the best interleaving. A Xeon 5500 series processor-based system is
balanced when all memory channels on a socket have the same amount of memory. The
simplest way to enforce optimal interleaving is by populating 6 identical DIMMs at 1333MHz, 12
identical DIMMs at 1066MHz and 18 identical DIMMs (where supported by platform) at 800MHz.
This leads to lessened performance. Figure 9 shows the impact of reduced interleaving. The first
configuration is a balanced baseline configuration where the memory is down-clocked to 800MHz
in BIOS. The second configuration populates four channels with 50% more memory than two
other channels causing an unbalanced configuration. The third configuration balances the
memory on all channels by populating the channels with fewer DIMM slots with a DIMM that is
double the capacity of others. (For example, two channels with 3 x 4GB DIMMs and one channel
with 1 x 4GB and 1 x 8GB DIMMs.) This ensures that all channels have the same capacity. As
表6 shows, the first and third balanced configurations significantly outperform the

 

unbalanced configuration. Depending on the memory footprint of the application and memory
access pattern, the impact could be higher or lower than the two applications cited in the figure.
注意,内存越多,内存的工作频率越低,12DIMMS工作在1066MHZ,18DIMMS工作在800MHZ,具体请看表7.
表6,表7



2.3 Memory Ranks

A memory rank is simply a segment of memory that is addressed by a specific address bit.
DIMMs typically have 1, 2 or 4 memory ranks, as indicated by their size designation.
• A typical memory DIMM description: 2GB 4R x8 DIMM
• The 4R designator is the rank count for this particular DIMM (R for rank = 4)
• The x8 designator is the data width of the rank
It is important to ensure that DIMMs with the appropriate number of ranks are populated in each
channel for optimal performance. Whenever possible, it is recommended to use dual-rank DIMMs
in the system. Dual-rank DIMMs offer better interleaving and hence better performance than
single-rank DIMMs. For instance, a system populated with 6 x 2GB dual-rank DIMMs outperforms
a system populated with 6 x 2GB single-rank DIMMs by 7% for SPECjbb2005. Dual-rank DIMMs
are also better than quad-rank DIMMs because quad-rank DIMMs will cause the memory speed
to be down-clocked.
Another important guideline is to populate equivalent ranks per channel. For instance, mixing
single-rank and dual-rank DIMMs in a channel should be avoided.
RANK指的是内存的生产工艺,每个通道可以支持的RANK总数是有限的,实际应用的时候应该保证内存大小与内存频率上的平衡。往往推荐使用双RANK的内存。
2.4 Memory Population across Memory Channels
It is important to ensure that all three memory channels in each processor are populated. The
relative memory bandwidth is shown in Figure 10, which illustrates the loss of memory bandwidth
as the number of channels populated decreases. This is because the bandwidth of all the
memory channels is utilized to support the capability of the processor. So, as the channels are
decreased, the burden to support the requisite bandwidth is increased on the remaining channels,
causing them to become a bottleneck.
表8

2.5 Memory Population Across Processor Sockets
Because the Xeon 5500 series uses NUMA architecture, it is important to ensure that both
memory controllers in the system are utilized, by providing both processors with memory. If only
one processor is installed, only the associated DIMM slots can be used. Adding a second
processor not only doubles the amount of memory available for use, but also doubles the number
of memory controllers, thus doubling the system memory bandwidth. It is also optimal to populate
memory for both processors in an identical fashion to provide a balanced system. Using Figure
11 as an example, Processor 0 has DIMMs populated but no DIMMs are populated for Processor
1. In this case, Processor 0 will have access to low latency local memory and high memory
bandwidth. However, Processor 1 has access only to remote or “far” memory. So, threads
executing on Processor 1 will have a long latency to access memory as compared to threads on

 

Processor 0.
This is due to the latency penalty incurred to traverse the QPI links to access the data on the
remote memory controller. The latency to access remote memory is almost 75% higher than local
memory access. The bandwidth to remote memory is also limited by the capability of the QPI
links. So, the goal should be to always populate both processors with memory.
表9

3.0 Best Practices

(最优配置方法)
In this section, we recapture the various rules to be followed for optimal memory configuration on
the Xeon 5500 based platforms.
3.1 Maximum Performance
Follow these rules for peak performance:
• Always populate both processors with equal amounts of memory to ensure a balanced
NUMA system.(两CPU使用相同容量内存)
• Always populate all 3 memory channels on each processor with equal memory capacity.
(每个CPU的3个内存通道使用相同容量的内存)
• Ensure an even number of ranks are populated per channel.
(每个通道占用的合适的RANK数)
• Use dual-rank DIMMs whenever appropriate.
(可以的话使用双RANK的内存)
• For optimal 1333MHz performance, populate 6 dual-rank DIMMs (3 per processor).
• For optimal 1066MHz performance, populate 12 dual-rank DIMMs (6 per processor).
• For optimal 800MHz performance with high DIMM counts:
– On 12 DIMM platforms, populate 12 dual-rank or quad-rank DIMMs (6 ) per processor.
– On 16 DIMM platforms:
Populate 12 dual-rank or quad-rank DIMMs (6 per processor).
Populate 14 dual-rank DIMMs of one size and 2 dual-rank DIMMs of double the size
as described in the interleaving section.
• With the above rules, it is not possible to have a performance-optimized system with 4GB,
8GB, 16GB, or 128GB. With 3 memory channels and interleaving rules, customers need to
configure systems with 6GB, 12GB, 18GB, 24GB, 48GB, 72GB, 96GB, etc., for optimized
performance.
3.2 Other Considerations
3.2.1 Plugging Order
Take care to populate empty DIMM sockets in the specific order for each platform when adding
DIMMs to Xeon 5500 series platforms, The DIMM socket farthest away from its associated
processor, per memory channel, is always plugged first. Consult the documentation with your
specific system for details.
3.2.2 Power Guidelines
This document is focused on maximum performance configuration for Xeon 5500 series
processor-based systems. Here are a few power guidelines for consideration:
• Fewer larger DIMMs (for example 6 x 4GB DIMMs vs. 12 x 2GB DIMMs will generally have
lower power requirements
• x8 DIMMs (x8 data width of rank, see section 3.3) will generally draw less power than
equivalently sized x4 DIMMs
• Consider BIOS configuration settings (see section 4.2.4)
3.2.3 Reliability
Here are two reliability guidelines for consideration:
• Using fewer, larger DIMMs (for example 6 x 4 GB DIMMs vs. 12 x 2GB DIMMs is generally
more reliable
• Xeon 5500 series memory controllers support IBM Chipkill™ memory protection technology
with x4 DIMMs (x4 data width of rank; see sect. 3.3), but not with x8 DIMMs

 

3.2.4 BIOS Configuration Settings
There are a number of BIOS configuration settings on servers using the Xeon 5500 series
processors that can also affect memory performance or benchmark results. For example, most
platforms allow the option of decreasing the memory clock speed below the supported maximum.
This may be useful for power savings but, obviously, decreases memory performance.
Meanwhile, options like Hyper-Threading Technology (formerly known as Simultaneous Multi-
Threading) and Turbo Boost Technology can also significantly affect benchmark results. Specific
memory configuration settings important to performance include:
表10

原文作者:
Ganesh Balakrishnan
IBM System x and BladeCenter Performance
Ralph M. Begun
IBM System x Development

分享到:
评论

相关推荐

    Intel原装1U机架XEON服务器不足5000元

    强氧服务器中心值中秋、国庆两节来临...标准配置了单颗XEON 2.8G/2M/800 CPU和2G DDRII ECC REG内存、250G SATAII 7200转硬盘的机器价格仅仅4950元,在三、四线产品都难以企及的价格基础上提供了1、2线整机的优秀品质。

    服务器参数配置.docx

    基本参数 产品类别 机架式 处理器 CPU类型 Intel 至强5600 CPU型号 Xeon X5660 CPU频率 2.8GHz 智能加速主频 3.2GHz 标配CPU数量 2颗 最大CPU数量 4颗 制程工艺 32nm 三级缓存 12MB 总线规格 QPI 6.4GT/s CPU核心 ...

    在Linux下搭建带MOD 我的世界(Minecraft)服务器

    在Linux下搭建带MOD 我的世界(Minecraft)服务器 ...CPU:Intel Xeon E5-2682 v4 双核 内存:4GiB 硬盘空间:40 GiB 上行宽带:2 Mbit/s 下行宽带:2 Mbit/s OS环境:Ubuntu 1604* 经实践,以上配置即可满足2~5人

    论文研究-基于64位CPU系统的计算性能比较:Opteron vs. Xeon.pdf

    目前配置的计算机服务器大量采用64位AMD Opteron和Intel Xeon两种处理器。Opteron和Xeon处理器在时钟频率、内存控制器和I/O连接等诸多方面有所不同,这些差异导致基于这两种处理器的计算机集群系统有不同的特点,其...

    超齐全的2016年10月更新CPU天梯图表

    超齐全的2016年10月更新CPU天梯图表,总共2370个CPU型号,包含intel,AMD,VIA等品牌,涵盖移动端,桌面端,服务器端,从最早的奔腾3 mobile 750MH,一直到Intel Xeon E5-2679 v4。

    Asus/华硕 DSBV-DX-C 771针主板参数

    CPU描述:支持双核英特尔 至强 处理器5000/5100系列,四核英特尔 至强 处理器5300系列处理器 CPU插槽:Socket 771 支持CPU数量:2 总线频率:FSB 1333MHz 内存规格 内存类型:DDR2 内存描述:支持双通道DDR2 533/...

    服务器升级方案.docx

    服务器升级方案 购买新服务器配置如下: CPU: Intel Xeon E5506 内存: Ramax ECC/DDR3/1333/1G 2根 够成2G 主板: 华硕Z8NA-D6C 硬盘: 希捷 SATA2 250G(ST3250310NS) 2块 组成RAID1 2.之前是Win2000的,为了避免...

    服务器部件基础.pptx

    2023/6/4 Inspur group 提纲 第一章:服务器的定义、特点 第二章:...二级缓存比一级缓存的速度低5倍,在Intel XEON MP CPU还有三级缓存(L3 Cache) 服务器部件基础全文共113页,当前为第17页。 2023/6/4 Inspur group

    组装一台服务器方案

    主板 INTEL S5000VSA 1 CPU Xeon 5110(Xeon D/1.6GHz/双核)加风扇 1 内存 创见 FBD DDR2 1G 667MZH 2

    什么是PC服务器.doc

    服务器可以分为两大类:一部分是IA(Intel Architecture)服务器,主要以Intel的CPU为主;另一部分是比IA服务器性能更高的机 器,如RISC/Unix服务器等。 PC服务器在IA的范围之内,可以看作是IA- 32(应用32位CPU的IA...

    1服务器项目清单.xls

    要求:Intel Xeon E5-2640V4(2.4GHz/10核/25MB/90W) 。 2、内存,>=128GB DDR4内存 。 3、存储,配置不低于2块600GB 6G SAS 10k硬盘。 4、配置>=4个10/100/1000M-BaseT 以太网接口;1块2端口8Gb光纤通道HBA卡(含...

    系统服务器配置.xlsx

    服务器配置建议单, 主体, 品牌,不限 系列,不限 型号,不限 类别,不限 结构,不限 处理器, CPU类型,Intel Xeon E5-2620v4 CPU频率(MHz),2.1G CPU缓存,20M 支持CPU个数,2个 内存, 内存类型,DDR4-2400 内存大小,32GB 最大...

    Linux下查看CPU型号,内存大小,硬盘空间的命令(详解)

    1 查看CPU 1.1 查看CPU个数 # cat /proc/cpuinfo | grep “physical id” | uniq | wc -l 2 **uniq命令:删除重复行;...model name : Intel(R) Xeon(R) CPU E5630 @ 2.53GHz 总结:该服务器有2个4核

    1服务器采购方案.doc

    "商品单价: "17400现货 " "基本类别 " "类别 "机架式 "结构 "2U " "处理器 " "CPU类型 "Intel至强E5-2600"CPU频率 "2.5GHz " " "V2 " " " "处理器描述 "标配1个Xeon "最大处理器数量 "2 " " "E5-2609V2 处理器" " ...

    服务器与台式机的区别.docx

    而AMD生产商处理器有:Sempron系列、Athlon系列、Turion系列(面向移动PC处理器)、AMD Opteron ,其中Opteron是面向于服务器/工作站而设计的处理器,其中Opteron又分为1XX、2XX、3XX和最新的Socket F结构的1XXX...

    服务器安装配置文档

    服务器配置 1.1 服务器硬件配置: 服务器类型 CPU 内存(RAM) 内存 硬盘类型/个数 硬盘类型 个数 网络 软驱/光驱 软驱 光驱 刀片式服务器 Intel Xeon(TM) 3.2GHz 4GB 1*80GB 集成千兆以太网控制器 DVD-ROM 1.2 ...

    GisdomR540-服务器技术白皮书.docx

    进入官网>> 详细参数 基本参数 处理器 主板 内存 存储 网络 管理及其它 电源性能 保修信息 基本参数 产品类别 机架式 产品结构 4U 处理器 CPU类型 Intel 至强5600 CPU型号 Xeon E5620 CPU频率 2.4GHz 智能加速主频 ...

    宝德PR4764GH-服务器技术白皮书.docx

    32nm 三级缓存 20M CPU核心 八核 CPU线程数 16线程 主板 扩展槽 4×PCI-E 3.0 x16 内存 内存类型 ECC DDR3 内存描述 DDR3 1600MHz ECC 四通道内存 内存插槽数量 16 最大内存容量 512GB 存储 宝德PR4764GH-服务器技术...

    RISC架构服务器简介

    RISC架构服务器是指采用精简指令系统计算结构(RISC)的服务器,与IA架构服务器(Intel Xeon处理器、AMD Opteron处理器)比较最大的区别在于:RISC架构服务器一般应用于中端UNIX领域,其在安全性、可靠性方面具有不言而喻...

    邮件服务器方案.pdf

    公司邮件服务器搭建方案 (一) 硬件配置: 服务器采用 DELL T310 处理器:Intel 至强四核 E3430 Xeon(R) CPU,2.0GHz,4M 高速缓存 内 存:高速 4GB 内存(2x2GB),1066MHz,双列 RDIMM 硬 盘:配置 2 块 500GB 15K ...

Global site tag (gtag.js) - Google Analytics