The Intel EM64T direct competitors

Transcrição

The Intel EM64T direct competitors
The Intel EM64T direct competitors
Higino Augusto Marinho Vieira da Cunha e Costa
Departamento de Informática, Universidade do Minho
[email protected]
Abstract. In the beginning of 64 bit computing in x86 architecture there were
two processors technologies compatible with IA32: AMD64 and EM64T. Both
have almost identical instruction sets, but internally are different. This paper
shows the main differences between AMD64 and EM64T.
1 Introduction
When the first 32 bit processors of the x86 family was released, the 4GB memory
limit appeared to be more than enough for any requirement.
As time has passed, the constant growth of databases and other memory hungry
applications had made the 4GB memory limit per process tight, requiring an
alternative to the current 32 bits architecture.
The first approach was made by Intel by introducing the IA64 architecture with the
Itanium. It had a completely new instruction set and broke the compatibility with the
IA32. AMD took a different approach with the Opteron, making the new processor
capable of executing 32 bit code natively by extending the current instruction set of
IA32 for 64 Bits. AMD called it x86-64 and later renamed it to AMD64. It was the
first time a company besides Intel made successful changes to the x86 architecture.
Due to the success of AMD64, Intel developed the EM64T and integrated this
technology in the Xeon family of processors.
The advantages of the 64 bits architecture are not just the size of addressable
memory by a process. Applications that use integers of 64 or more bits require less
clock cycles to complete the same operation. Typical examples are scientific
calculation and cryptographic algorithms.
Even running 32 bits applications on 64 bits Operative Systems have benefits. In
March 2004, Microsoft changed all their web servers to Opterons running a prerelease
of Windows 2003 x64 Edition, initially keeping the applications unmodified [5].
Using 32 bits, the operative system and the applications had to share 4 GB of
memory. With 64 bits, this operative system can address up to 8 terabytes of memory
for the kernel, and 8 terabytes of memory for user processes. This way, each
application can use 4 GB of memory and not interfere with the kernel memory.
2 64 Bits Processors compatible with IA32 Instruction Set
Currently there are two major solutions to 64 bits processors technology able to run
32 bits code natively.
The first to be released was the AMD64, as an answer to the initial Intel Approach
of 64 bits processors. The main concert of AMD was to make the processor
compatible with the current 32 bits architecture [2].
After the initial success of AMD64, Intel decided to follow the AMD approach,
implementing the AMD64 Instruction Set, calling it EM64T (Extended Memory 64bit Technology), later renamed to Intel 64.
Both processors instruction set are almost identical, and in most of the cases are
able to run each others code. Compilers avoid these instructions, so binary code can
be run by both.
The name x86-64 was given by AMD, but is currently used as a vendor-neutral
way to describe this family of processors.
2.1 AMD64
As explained before, with the new instruction set of the Itanium not being compatible
with previous processors of the x86 family, AMD decided to build their own 64
processor compatible with the IA32.
The AMD64 is a 64 bit processor, as it has 64 bit general purpose registers and
their respective logical and arithmetic operations. The memory pointers are also 64 bit
wide.
In addition, the number of general purpose registers were extended from eight to
sixteen and are all 64 bit. There are also eight new SSE registers (128 bit wide) (page
2 to 7 of [7]).
The No-Execute bit, already available in systems that use PAE (Physical Address
Extension), was implemented. This mechanist defines if a block of memory can
contain code or if it is just data (page 143 of [8]).
The processor knows if it is working in 64 bit or 32 bits by its operating mode
[Table 1].
In the legacy mode, it acts like any other 32 bit processor of the x86 family. There
are not any significant performance gains using this mode compared to other
equivalent 32 bit processor.
In the compatibility mode, the processor requires a 64 bit operating system, but is
able to execute 32 bit code. All the registers are seen by the application as 32 bit
registers [Fig. 1]. The major improvement in this mode is that each application can
have 4 GB of memory.
In the long mode the processor works as a true 64 bit. Both the operative system
and application have to be 64 bits. Each application has access to 1TB of memory and
has access to the registers and instructions added by this architecture.
Table 1. AMD64 Operating Modes1
Fig. 1. AMD64 Register Set2
1
2
Source: Page 3 of [7]
Source: Page 2 of [7]
3 Differences between AMD64 and EM64T processor
Comparing the instructions on AMD64 and EM64T, it can be seen that Intel had
reversed engineered the AMD64 Instruction Set [6]. The EM64T was based on
AMD’s prerelease documentation as two instructions EM64T did not initially
implement were also not present on early AMD documentation.
As the Intel processor was based on the AMD64 instruction set and not the
hardware itself, the way both processors decode and execute those instructions are
different.
3.1 Pipelines
The pipelines (chapter 3 of [1]) in a processor allows it to split instructions to
smaller micro operations and run them concurrently, so the processor can execute
more than one micro instruction at the same time (page 9-10 of [3]).
The way the pipeline is designed directly affects the processor clock frequency.
The longer the pipeline, the faster the micro operations are made, but also require
more clock cycles to complete.
The problems with long pipelines are the cache misses and the branch
misprediction. Cache misses occur where a micro operation needs the result of
another micro operation. In this case the processor needs to wait that the other micro
operation finishes so it can use the result. Misspredictions occur when an instruction
on the pipeline changes a register that had already been used in other micro operation.
This requires to flush the pipeline and restart the micro operation.
The EM64T pipeline is longer than the on AMD64 (31 to 12 stages) (pages 11-12
of [3]). Due to the less stages of the AMD pipeline, it is less likely to be affected by
cache misses and branch missprediction. A smaller pipeline also allows a less
complex prediction algorithm. Additionally the AMD64 has more execution and
decode units than the EM64T. This allows the Opteron to start more micro operations
per clock cycle.
These differences make the Opteron pipeline more efficient in structured
programming and the Xeon on linear programming (page 13 of [3]).
3.2 Memory controller
AMD and Intel made different choices in the memory controller (page 15-20 of [3])..
In AMD, the memory controller is in the processor while in Intel is in the chipset.
This makes AMD have lower memory latencies as the memory is directly attach in
the processor. The Intel processor has to make the request outside the processor and
thru the external memory controller. Typically the latency of the Opterons are 10 to
40 percent lower.
In terms of bandwidth between memory controller and processor, the Opteron has
a maximum transfer rate of 8 GB/s while Xeon has 6.4 GB/s.
3.3 Power consumption
The operation cost of a server is dependent of its power consumption.
In datacenters, cooling is an important issue to consider. As the power
consumption rises there is also the need of better cooling solutions. So the operation
cost of a server is not just the server consumption, but also the consumption needed to
keep the room temperature low.
The power consumption of Xeon is around 130 W while the Opteron is about 95 W
(page 20-21 of [3])...
4 Benchmarks
4.1 Databases
The benchmark on Table 2 and 3 counts the number of queries per second that a
MySQL database can handle on Xeon and Opteron.
As databases rely on structured programming, Operon had a better performance
than the Xeon.
Table 2. MySQL 4.0.18 using MyISAM3.
Single Xeon
(Nocona) with HT
277
338
358
375
371
371
368
Single Opteron
250 2.4GHz
298
370
435
465
455
470
472
Single Opteron
252 2.6 GHz
319
399
470
502
498
507
508
AVG
368
460
497
MAX
375
472
508
Concurrency
1
2
5
10
20
35
50
3
Source: http://www.anandtech.com/IT/showdoc.aspx?i=2447&p=4
Table 3. MySQL 4.0.18 using InnoDB4.
Concurrency
1
2
5
10
20
35
50
Single Xeon
(Irwindale) 3.6GHz
with HT
191
201
219
204
199
193
181
Single Opteron 248
Dual Channel
192
223
259
242
236
221
209
AVG
199
233
MAX
219
259
5 Conclusion
In the early stages of 64 bit on the x86 architecture there is not an absolute winner.
While the Opteron seems to be better in most situations, the bigger pipeline of the
Xeon makes it a good candidate to scientific calculation.
References
1. D.Patterson, J.Henessy, Morgan Kaufmann Publishers, “Computer Architecture: A
Quantitative Approach”, 3rd Ed., 2002
2. The AMD64 Computing Platform: Your Link to the Future of Computing. http://www.amd.
com/us-en/assets/content_type/white_papers_and_tech_docs/30172C.pdf
3. Characterizing x86 processors for industry-standard servers: AMD Opteron and Intel Xeon.
http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00238028/c00238028.pdf
4. Intel® Extended Memory 64 Technology (EM64T). http://www.dell.com/downloads/
global/vectors/2004_em64t.pdf
5. Microsoft.com Moves to x64 Version of Windows. http://www.microsoft.com/technet/
itshowcase/content/mscom64bitarchi.mspx
6. AMD and Intel Harmonize on 64: http://www.mdronline.com/watch/watch_abstract.asp?
Volname=Issue%20%23118&SID=1137&on=T&SourceID=00000377000000000000
7. AMD64 Architecture Programmer’s Manual - Volume 1: Application Programming.
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/dwamd_24592.pdf
8. AMD64 Architecture Programmer’s Manual - Volume 2: System Programming.
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
9. Tuning IBM eServer xSeries Servers for Performance - http://www.redbooks.ibm.com/
redbooks/pdfs/sg245287.pdf
4
Source: http://www.anandtech.com/IT/showdoc.aspx?i=2447&p=5

Documentos relacionados

Design, Synthesis and FPGA-based Implementation of a 32

Design, Synthesis and FPGA-based Implementation of a 32 Abstract—With the advent of personal computer, smart phones, gaming and other multimedia devices, the demand for DSP processors in sem iconductor industry and modern life is ever increasing. Tradit...

Leia mais