Tyler Hawes, Creative Cow's resident guru for all things silicon, explores the importance of AMD's entry into the multi-processor marketspace. In this article, Tyler examines the issues and the stats and gives his own opinions as to their meaning -- and with a background as a former Intel engineer who has a real fondness for AMD as well -- his opinions just might surprise you! In this in-depth exploration we know you'll find many answers to many of your questions -- not to mention that Tyler will give you answers to questions you didn't even know you had yet!
We are left with a release that holds little in the way of surprise. Indeed, the details AMD provided me with lined-up almost exactly with what I expected. But that doesnt necessarily diminish the impact or excitement of this product launch, as we shall see...
Almost two years ago AMD unleashed the Athlon processor upon the world. To the amazement of most, they quickly began gobbling up share in the desktop computing market segment. For the first time, AMD created a processor superior to the competition from Sunnyvale in terms of design, performance, and price all at once and held onto that position. At first, many pundits were cynical of the prospect that AMD would truly challenge Intel®s status as the processor King. It didnt take long, though, before the pundits not only changed their mind, but they began heralding this change as one whos time had come.
Today, AMD arguably still has the best overall performance in a desktop computer, and they certainly have the best pricing. Their tremendous progress of late has been almost entirely in desktop computing, and most of that in the consumer market. It took a while before their gains in the consumer space started to catch on in the corporate desktop market, which AMD is just now starting to make serious inroads to. Recently theyve made important launches that should help them attack the mobile market as well, such as the Duron Mobile Processor and the Athlon 4 Processor.
Despite these victories for AMD, Intel remains relatively unchallenged as the x86 King of the workstation and server market. With the launch of the Athlon MP and 760 MP chipset, AMD aims to change that. This pair of releases is aimed squarely at the high-end workstation and low to mid-end server market space.
The Athlon MP Processor is based on the new Palamino core. The design is an evolutionary step up from the Thunderbird, offering some significant but not earth-shattering performance improvements. More important is the supporting 760 MP chipset, which essentially is the standard 760 chipset with dual-processor support and 64-bit PCI support. Theres still plenty to talk about, though. First, lets summarize the major changes of the Athlon MP relative to the Athlon Thunderbird and the 760 MP relative to the 760 chipset:
|At launch time, AMD says that the processor is expected in systems from more than 50 manufacturers worldwide, with no less than 20 offering systems for sale immediately. That list is growing fast, so those numbers could already be out of date. Pricing in 1,000 unit quantities is set at $215 for the 1GHz part and $265 for the 1.2GHz part.
The Athlon MP is compatible with dual-processor configurations on the AMD 760 MP chipset and using AMDs Smart MP Technology, which well talk about later. Of course, this is the most-relevant feature, but entirely expected. There are other features, however, that make the Athlon MP perform superior to the Athlon Thunderbird.
When there is some extra bandwidth available on the bus, The Athlon MP processor will take advantage of that by pre-fetching data it expects to need, thereby alleviating the need to fetch that data later when it is requested and the bus may be more saturated. This type of speculative data prefetching has already been shown to help performance in Intels Pentium® 4, and is one of three reasons why an Athlon MP will outperform an equivalently clocked Athlon Thunderbird.
AMD has added 52 new instructions to the AMD Athlon MP, which they call 3DNow! Professional. Since the Pentium III, Intel has touted what they call SSE instructions, which can potentially provide a nice boost to applications that are optimized for them. Its long been a feather in Intels cap that AMD did not have. 3DNow! Professional is SSE-compatible, so the Athlon MP can now take advantage of those very same optimizations that have become common in workstation and server software.
3DNow! Professional instructions improve performance by simplifying, and therefore optimizing, the processing path for floating-point operations. With regular instructions, floating-point calculations are performed to a high-degree of accuracy. Often times, that level of accuracy is unnecessary. So 3DNow! / SSE instructions provide a way of performing that same calculation to the level of detail that is needed, thereby reducing the computing requirements and optimizing performance. This is a drastic over-simplification of just one aspect of these enhancements, but it should give you an idea of how it can help performance.
As you will see in the SpecFP benchmarks later on, these new instructions have opened up the performance on the Athlon MP considerably. Running the SpecFP CPU200 Base and Peak benchmarks, which measure a computers floating-point performance, we see that the Athlon MP processor gains between 11% and 15% in performance verses an Athlon Thunderbird running at the same clock speed. Since floating-point operations are of prime importance in 3D applications, this is a good indicator of the kind of performance gains 3D modelers and animators might expect (well see some real-world benchmark data to that end a little later on ).
TLBs (Translation Look-Aside Buffers) are a cache for translated memory addresses. When a request is made to system memory by the processor(s), the addresses requested are then translated into the actual physical addresses at which the data is stored in memory. By caching these addresses in the TLBs, performance is improved upon subsequent requests to memory if the requested address is already cached.
AMD has increased the size of the TLB from 32 to 40 entries. By being able to store more addresses, the TLB is more likely to cache a given address request, and therefore more likely to improve performance for that request.
The TLB design now supports speculative reload. If an instruction requests an address that is not stored in the TLB, this feature will go ahead and load that address into the TLB before the instruction completes. In this situation, the TLB wont improve performance as much as if the address had been stored in the TLB in the first place, but its better than nothing at all.
Also, the TLB design is now exclusive. With a non-exclusive TLB design, the addresses that are cached in the L1 TLB are duplicated in the L2 TLB. By only storing this information in the L1 TLB, space is freed on the L2 TLB, which could increase performance. However, it can also increase latency since the addresses are now only cached in one place. In most situations this will likely be an improvement overall.
The improved TLB is a rather technical detail and is not likely to make a huge difference in ordinary desktop use. However, in high-end applications with very heavy data i/o, every little bit counts. The cumulative effect of these types of refinements can add up to an appreciable difference.
Perhaps I should put only in quotes above, as 1.2GHz is certainly still very fast. However, the fact remains that AMD has already released the Athlon Thunderbird clocked at 1.4GHz. To this question, AMDs Bret Kirby told me:
The validation time it takes on a 2 [processor system] takes longer than on 1 [processor]. Theres no issues at all with the process were using, we just want to be sure that everything is validated to the millionth-degree before we release it. Probably, as our validation team gets bigger toward the end of the year, youll start to see parity between the desktop and workstation processors. So far, we havent had any major release mistakes, and we want to keep it that way.
Bret was gracious enough not to point out any major release mistakes of a certain competitor of theirs, but I probably wasnt the only one in the conversation who immediately called to mind Intels problematic launch of the i820 chipset or their embarrassing botched-launch of the Pentium III processor running at 1.13GHz. That AMD is placing a premium on validation and quality control over launching the fastest chip they can possibly create is a good thing; thats exactly the attitude they need to be successful in this new market space.
|The AMD 760 chipset is very similar to the AMD 760 chipset, with the notable exception of dual-processor support. I think this is good because the AMD 760 chipset has already had about six months to mature in the desktop market space and forms a solid foundation for AMD to build the MP version on. Already Tyan, Asus, Abit, Gigabyte and MSI have committed to supporting 760 MP in motherboard products. Tyans Thunder K7 motherboard is available immediately (I found it on PriceWatch.com for less than $600) and has a kitchen-sink feature set. True to form, Tyan will release a Tiger version of the motherboard later that is without all the extra features, which are not needed for many workstation configurations. Other motherboard manufactuers will be shipping their products by the third quarter.|
Of course, the exact feature list will vary from one motherboard, but the following will be supported at a minimum:
Doubtless many will be disappointed to learn that AMD does not support the Athlon Thunderbird in dual-processor systems. AMD told me that the 760 MP chipset was designed from the beginning for the Palamino core and that Thunderbird was never tested or validated on it. They also said that 760 MP motherboards will auto-detect if you have a Thunderbird installed and will disable dual-processor support. You will still be able to run a Thunderbird chip in single-processor mode.
However, Im hearing numerous reports from other testers that they are having success running AMD Athlon Thunderbird CPUs and even Durons in dual-processor mode on the 760 MP. It is possible that the hardware auto-disable feature AMD described to me was not included on the pre-release version of the Tyan Thunder K7 test motherboard, and will be added in the shipping version (although that seems unlikely to me).
Even if such a hardware lock were implemented, a third-party workaround may present itself, although none has been announced. Intel never sanctioned the use of Celeron processors in dual-processor configurations, but that didnt stop many from doing just that with great success (nor did it stop Abit from making the BP6, a board made for exactly that application). Then again, Intel didnt auto-disable Celeron processors like the AMD 760 MP chipset purportedly will. However, weve seen third-party devices trick chipset auto-detection before, such as the Gold fingers devices unlock the clock multiplier on Slot A Athlon processors, thereby allowing more flexibility for overclocking.
Whether dual Thunderbirds and Durons require a hardware workaround or else it works without any extra ingredients, it doesnt seem likely that AMD will support it, and it may void the warranty on your processors and/or motherboard. Theres not much reason for AMD to encourage those setups, since they are targeting this release at the professional workstation and server markets.
Besides, as you will see below, there may be good reason to use Athlon MP processors, as they dont cost all that much more and they offer a performance edge over standard Athlons.
The vast majority of PCI busses in the world are of the 33MHz/32-bit variety. However, since Intel released the i840 chipset over a year ago, which supports 33MHz and 66MHz 64-bit PCI busses, several manufacturers have released devices to take advantage of the new specification. Typically these are high-bandwidth devices, such as SCSI and RAID controllers, and video capture interfaces such as Pinnacles Targa 3000. Most of these devices run at 33MHz and support both 32-bit and 64-bit configurations. That way they are still compatible with the majority of systems which only support 32-bit PCI devices, but they can also take advantage of the increased bandwidth offered by newer platforms with 64-bit support.
I didnt get a specific technical reason from AMD on why they didnt incorporate 66MHz/64-bit PCI bus support in this chipset release. They did say that for the market their going after with this release, 33MHz/64-bit support is more prevalent. In the third-quarter AMD plans to release the 760 MPX chipset, which will add 66MHz PCI bus support. This chipset will probably be hitting right around the time 760 MP is getting into its stride, so that 760 MP may have a short life. However, other than the 66MHz/64-bit PCI support, theres no difference between the two and therefore no reason to wait if you are ready to buy now and Tyans Thunder K7 motherboard suits your needs.
By now most of you are familiar with DDR-SDRAM memory. This memory is able to deliver twice the bandwidth as regular SDRAM memory while not costing much more. That stands in contrast with RAMBUS RDRAM, which also offers increased bandwidth but at a very considerable expense. RDRAM is also very controversial, with several articles purporting that there is very little real-world benefit from using this ultra-expensive technology. Currently, Intel is the only x86 chipset manufacturer to support RDRAM, and even Intel has been straddling the fence a bit by promising a SDRAM platform for the Pentium 4 at sometime in the near future.
The only motherboard currently available with the 760 MP chipset, the Tyan Thunder K7, requires the use of registered DDR-SDRAM memory. AMD indicated to me that this is a chipset requirement, although Im sure they could enable support of unbuffered memory as well. Currently, registered memory is only available in the form of ECC DDR memory modules. This means that if you were planning to upgrade to this platform and already have DDR memory, youll likely need to buy new memory (unless you already purchased ECC DDR-SDRAM). AMD says the feedback they got from their customers was that registered DIMMs are what they wanted; so registered DIMMs are what they got.
Typically when new multiprocessor products come out, the first motherboards released are targeted at the high-end market where integrated-everything is desired. A few months from now, we will see workstation-specific boards that are more stripped-down, and perhaps then theyll support unbuffered DDR memory.
Did I say power supply? Im sorry, I meant nuclear reactor! It seems that desktop users have just gotten used to the idea of 300-watt power supplies. Before we get too dismayed at the power requirements, though, lets remember this is a workstation/server product. Not only will there be an extra processor, but typical configurations will include perhaps a RAID array consisting of several high-speed SCSI hard drives, power-hungry professional OpenGL graphics accelerators, and other high-speed i/o devices that suck a lot of energy.
AMD said they are developing a power supply specification and will release details about it shortly. For the time being, you can get a qualified unit made by NMB or Delta. I did a quick search on PriceWatch.com and found NMBs SD025A460WSW, an approved power-supply, for under $200. Relative to the overall costs of setting up a server/workstation, an extra $150 for a beefier power supply is not that much. Especially when you consider the damage that could result to your expensive graphics card, SCSI drives or data from under-powering your system.
Smart MP Technology is really a function of the chipset, but I thought it deserved its own topic. Later on, when you get to the benchmark data, youll see that AMDs MP configurations are scaling rather nicely compared to the competition. Smart MP is a key to this scalability, and it is a major difference between AMDs multiprocessor implementation and the competitions.
To understand the advantages of Smart MP, its useful to first discuss the traditional ways of doing things.
|Intels multi-processor implementation for the Pentium III processor relies on shared bus architecture (left). Both the processors share a single bus of communication with the Northbridge, which is their connection to all other system resources such as main memory. This means that the two processors must share the bandwidth afforded by the particular bus spec. In the case of a Pentium III system running at 133MHz FSB, that equals 1GB/s. So if both processors are fully utilized, you can see that they will each get half as much bandwidth as they would in a single-processor system. This presents a performance bottleneck, as supplying both processors with a steady workload is of major importance to take full advantage of a multiprocessor configuration.|
Now lets see how AMDs Smart MP Technology improves upon the situation:
|AMDs Smart MP design dedicates a full bus from the Northbridge to each processor. In this way, each AMD Athlon MP processor in a multiprocessor system gets its own dedicated bandwidth to the system logic and the bottleneck is removed. Since the AMD 760 MP chipset is operating at 266MHz DDR FSB, that yields a dedicated 2.1GB/s bandwidth to EACH processor. This is actually the same EV6 bus, based on Alpha technology, which is used for single-processor Athlon systems, except there are now two of them.|
Once again, lets look at a diagram at the traditional shared-bus that is employed in conventional systems:
|In a shared bus (left), any communication between the processors must go over the bus (which, remember, is more congested than AMDs dual EV6 busses), through the Northbridge controller, and written to main memory. Then, the other processor must make that same journey over the bus, through the Northbridge, to memory to read the data.|
Now lets see how AMD once again refines that process:
As you can see, with the point-to-point bus, Athlon MP processors are able to communicate to each other through just the Northbridge controller. This has many advantages. For one, it reduces latency because the trip is shorter. It also will be faster because the processors are reading directly from each others cache, as opposed to the much slower system memory. Lastly, since system memory is not being used for the transaction, it frees up system memory (although this is a very small amount relative to the massive RAM likely to be installed in a workstation/server). The net effect is an increase in performance compared to the traditional shared-bus architecture.
You might wonder in what type of situation this type of transaction would be employed. Youve probably at least heard reference to the cache on a processor. Basically, this is a small amount of very high-speed memory, hundreds of times faster than SDRAM, which is embedded on the processor. The processor will store data that it has already retrieved in its cache so that if it needs that data again it can retrieve it from the fast cache rather than from the relatively slow system memory.
This is similar to how your Internet browser uses a cache to store web pages youve downloaded before. The next time you visit a site youve been to before, if youve already looked at a page, the browser can pull it from a cache on your local hard disk rather than going over the slow Internet to get it. To continue the illustration, suppose a web page did not exist in your own computers browser cache, but it did exist in the browser cache of someone elses computer on your local network. While not as fast as getting it of your local system, it would still be much faster to read that page over your local network than to retrieve it over the Internet.
Remembering that illustration, suppose CPU0 in the diagram above wants to retrieve data that is not in its own cache but exists in the cache of CPU1; it would be much faster to retrieve the data from the cache of CPU1 than it would from the slow system memory. This is possible with both the traditional shared-bus design as well as the dual-bus design of the 760 MP, but remember the shared-bus design requires that the transaction be routed through system memory, thereby nullifying the potential benefits.
To make this inter-processor communication possible, AMD uses something they call a Modified Owner Exclusive Shared Invalid Cache Coherency Protocol, or MOESI (pronounced mo-ess-ee). MOESI keeps track of what data is stored in which computer caches. When either CPU requests data from system memory, MOESI checks first to see if that data is stored in the other CPUs cache. If so, then it retrieves it from the high-speed cache instead of memory and supplies it back to the requestor. This communication path where the chipset listens to data requests from the processors is called a Snoop bus.
I think AMD╠s strategy is smart. They began by attacking the market in which they had the most experience and where the customers were most willing to adopt an unproven product, the consumer desktop space. Having established their product there, they began moving into the corporate desktop space with their newly acquired respect. So, having put their best foot forward already, how are they going to proceed?
Well, in recent weeks they╠ve launched the AMD Duron Mobile Processor and Athlon 4 processor, both aimed at the notebook market. The new ¤PalaminoË core is important for their success their because of its lower power requirements. With the Athlon MP and 760 MP they are initiating ¤AMD╠s attack into the 1- and 2-way server marketË (quotation from AMD╠s press materials). The idea is that they will get some penetration with Athlon MP solutions in the dual-processor market, which is the low-end of the server market. This is also the server segment that is the largest and currently experiencing the most growth, and its customers are more willing to adopt new solutions than the upper echelons of the market.
Has anyone else gotten tired of the constantly changing form-factors for processors? Well, it appears AMD has tired of this as well. AMD╠s representative was able to confirm what the roadmap says, which is that Socket A is the planned interface for all AMD Athlon processors through 2002 and beyond. For those of you who are codename-happy, at the least this includes Morgan, Thoroughbred, Appaloosa, and Barton. Not only will the Socket A remain unchanged, but AMD told me that these future versions of the Athlon/Duron processor family should work on current motherboards (of course a BIOS update will be in order at that time).
Some of you may know first-hand of the cost-savings this could bring about as you upgrade. I know I wasn╠t too happy about having to replace my Slot A Asus K7V motherboard to get an Athlon ¤ThunderbirdË 1.33GHz processor, but I bit the bullet. If they╠d been the same form factor, I could╠ve saved over $100. Now, if you╠re an enterprise environment that savings could potentially be multiplied many times over.
However, most businesses use systems for their life and then replace them entirely, rather than incremental upgrades. Having a common Socket A platform is still important, though, as it simplifies administration by allowing a single platform to be standardized across an entire enterprise.
I asked AMD if there were any 3rd-party chipsets in the works for dual-operation. They of course couldn╠t comment on unreleased products from other companies, but they did say that they would work to enable any such efforts by 3rd-parties. ¤We╠re in the business of selling processors,Ë Bret Kirby said. This is consistent with AMD╠s pattern of cooperating with 3rd-party chipset vendors. In fact, in the past you could even say relying on 3rd-party manufacturers, as AMD has sometimes looked a bit too eager to hand-off the chipset business to other companies such as VIA and ALI.
However, it seems clear that AMD recognizes that they can╠t count on someone else to get the job done right when it comes to the very demanding workstation/server market space, and so they are in the chipset business for the duration (at last as far as multiprocessing goes).
I╠ve heard VIA may be working on a MP chipset for Athlon, but I wouldn╠t expect that until next year. What I really wish for is a solution from the likes of nVIDIA, who this week released their ¤nForceË chipset for the Athlon. While some may correctly point out that the nForce is targeted at consumer desktops, it also features some innovations that could find a home on the high-end, such as the use of AMD╠s Hyper Transport technology as a link to the Southbridge.
To be clear, I haven╠t even heard rumors about nVIDIA having any such plans, so this is pure wild speculation. However, AMD and nVIDIA are great friends, and a man can dream, can╠t he?
On AMDs roadmap we see a product code-named Morgan slated for release in the second half of this year. This will be a multi-processor version of the Duron processor based on the Palamino core and targeted at low-end servers and workstations, such as appliance servers and network-attached storage. Judging from the excellent value of the single-processor Duron, this may be a great solution for workstation users on a budget, though its small cache will severely hamper it in heavier i/o applications.
Next year AMD will launch their Hammer family of products with x86-64 technology and supporting 4- and 8-proccessor configurations. I am told by AMD that this will be their first platform to support more than two processors, and will also be the first AMD product to incorporate their Hyper Transport Technology (a.k.a. Lightning Data Transport), which is a revolutionary high-speed interface that has already been incorporated in third-party products (such as nVIDIAs nForce, a.k.a. Crush, chipset).
Intel has already launched their 64-bit computing platform, Itanium. One major difference between Intels and AMDs strategy for next-generation 64-bit processing is that AMDs Hammer line will natively support 32-bit processing as well as 64-bit, while Intels will resort to emulation to run any 32-bit code. This means that AMDs hammer line will allow you to take advantage of 64-bit programs while not sacrificing 32-bit performance, while Intel requires you to choose one or the other (Pentium 4 for 32-bit, Itanium for 64-bit; you can run 32-bit on Itanium but it is very slow because it is in emulation mode). While AMDs approach sounds very pragmatic at this time, it remains for a future date to determine which approach is better or more successful.
Looking back since the launch of the original Athlon processor, and looking forward to next years launch of the Hammer line of products, we see that AMDs strategy consists of a series of careful steps, each one taking them a little further into the pool. Perhaps because of this gradual and deliberate strategy, AMD has executed with confidence and poise.
We have every reason to believe that AMD will continue their successful track and build new inroads into the workstation/server market. As an engineer, I admire their success and hope they continue with their innovation and competitiveness. As a consumer, I also hope that Intel is able to respond and keep things interesting, and that this battle for dominancy between the chip kings is long and drawn-out.
Okay, enough talk, lets get to the important part: benchmarks or, more specifically, real-world performance. Being as this is not a hardware site, but an interest site for Digital Content Creators, were going to focus on benchmarks that enlighten us on AMDs MP performance in that regard. These benchmarks were provided for us by AMD. We intend to perform our own benchmarking in the near future, and will update you with the results at that time.
SPECs FP test is useful for comparing floating-point compute intensive operations. Floating-point operations are especially important in 3D applications for real-time display, rendering, and solving operations (such as particle systems, kinematics, etc.). The Base score is using the default compiling options for the benchmark, as defined by SPEC, while the Peak score reflects performance after the program has been tweaked to optimize performance.
The new Athlon MP processor bests the Athlon Thunderbird at the same clock speed by between 11% and 15%. To see this kind of performance difference between two core revisions is pretty impressive, enough to justify the MP version of the Athlon as more than just a marketing label. Since this is a single-processor test, the performance gain is coming from the optimizations made in the Palamino core of the Athlon MP processor. The existing Athlon is already well-know for having much better floating-point performance than Intels Pentium III or Pentium 4 processor, so AMD only has itself to compete against in this aspect.
BAPCOs Internet Content Creation benchmark consists of a suite of a suite of real-world benchmarks using Adobe Premiere 5.1, Adobe Photoshop 5.5, Avid Elastic Reality 3.1, Metacreation Bryce 4, and Microsoft Windows Media Encoder 4.
Athlon MP 1.2 GHz beats the Pentium 4 1.7 GHz by almost 11% in this benchmark. In the past, Pentium 4 fared a little bit better in this benchmark against the Athlon Thunderbird, but Athlon MP can now take advantage of the SSE optimizations present in some of these benchmarks that perform heavy image processing. The pre-fetch improvements help a little bit, too. We also see a 16% increase between the 1 GHz and 1.2 GHz versions of the Athlon MP, compared to a 20% increase in clock speed, so the performance is scaling fairly well as the clock speed increases in this test.
I doubt Photoshop needs introduction to our audience. It is probably one of the most widely used applications across all types of content creation.
The Athlon MP 1.2 GHz trounces the Pentium 4 1.7 GHz, besting it by no less than 25%. Even the Athlon MP 1 GHz is beating it by over 15%. Performance increases only 8.33% from the 1 GHz Athlon MP to the 1.2 GHz.
We already know that the Athlon has a much more-powerful Floating-Point performance than the Pentium, so we can expect it to easily win this test.
Indeed, no surprises here. At the same clock speed, the dual Athlon MP 1 GHz processors beat dual Pentium III 1 GHz processors by almost 12%, while the 1.2GHz Athlon MPs beat the 1GHz Pentium IIIs by 25%. Performance scaling is good again at almost 15% when stepping up to the 1.2 GHz part.
Athlon enthusiasts will be happy with the recent announcement by Avid that they have certified their high-end 3D modeling and animation software for use on AMD's Athlon processor. You might say it's about time, as we've known for a while how excellent the Athlon is in this type of application. Let's hope this sets a precedent for others to follow, as well as for other divisions of Avid...
SoftImage|XSI results are very similar to 3ds max, showing that Athlon MP makes an excellent choice for 3d modeling and animation professionals.
Here the Athlon MP is besting the equivalent Pentium III by just over 9%. Performance scaling is again at approximately 16%.
There are many more benchmarks that wed like to see, such as more specific application benchmarks and a complete cross-platform comparison. However, I think that by culling out these early benchmarks that are especially relevant to digital content creation, an accurate story begins to emerge.
First, the Athlon MP has improved significantly upon the performance of the Athlon Thunderbird. Were seeing potential double-digit percentage improvements with the new design. Secondly, the performance is scaling well as clock speeds increase, with between 9% and 16% increase in speeds in content creation with a 20% increase in clock speeds.
There are a host of other benchmarks that AMD has provided, and you can view them at your leisure from their web site at http://www.amd.com/products/cpg/server/Athlon/benchmarks.html. I chose only to include content-creation, 3D applications, and FPU benchmarks since that is what is most relevant to our readers. What the other workstation and server benchmarks show is consistent with what weve seen here: the AMD 760 MP chipset with Athlon MP Processor generally out-perform Pentium III and Pentium 4 processor-based solutions. Also, AMD MP solutions tend to benefit slightly more from a second processor, which makes sense after examining their Smart MP technology.
Perhaps the easiest way to understand the performance of AMDs multiprocessor solutions is by remembering what we already knew about the Athlon Thunderbird. Athlon is already the fastest all-around x86 processor available, and costs significantly less than competing solutions. Now AMD has made the Athlon processor better and available in dual-CPU configurations. Add to that a supporting chipset with a better-optimized high-bandwidth multiprocessor bus, and you have a very scalable platform that emerges as the price/performance leader in the multiprocessor market. In other words, everything that Athlon is to price/performance in the single-processor space, it now does in the dual-processor space as well.
However, itll likely be a couple of months before AMDs MP solution really hits its stride for a few reasons. First, Tyans Thunder K7 motherboard is currently the only one available at a price of about $600 with integrated-everything. Also, you have to use a proprietary 460-watt power supply that is only available from Delta or NMB at almost $200. So immediately, those two things reduce the pricing advantage of the Athlon MP relative to the Pentium III. However, the total cost is still considerably less than a dual Pentium 4 Xeon system.
Another potential cost issue for upgraders is the requirement for registered DDR memory, which does cost about 10% more. However, this is less of an issue since most people will be buying new platforms with 760 MP as opposed to upgrading incrementally, and workstation/server systems typically come with ECC memory anyway.
Price and performance are two very important issues, but none of that matters in the workstation/server market without excellent stability. That isnt something that benchmarks can show, with the obvious exception of a system not being able to complete benchmarks in the first place. Only time in the market can adequately establish the stability of Athlon MP and 760 MP, and that is one reason it will take a while before most of the major OEMs begin adopting them in their product line.
However, AMD knows this too, and has subjected their multiprocessor platform to extra-vigorous validation and testing. The 760 MP chipset has been two years in the making because AMD didnt want to release it until they knew they had it right. Early reports of stability are very encouraging. Theres every reason to believe that Athlon MP will have similar success in the server/workstation market as Athlon did in the desktop market. While the multiprocessor market is more conservative and less willing to adopt new solutions, AMD also has a newly gained respect based on their success with Athlon.
In the third-quarter, well begin seeing more motherboard options from several other leading manufacturers. Already-announced products from Asus, MSI, Gigabyte, Abit, and Tyan will be sans all the integrated features such as SCSI, video, LAN, which will reduce motherboard cost to under $200. Some of these will likely work with standard ATX power supplies and support unbuffered DDR memory. Once these additional choices have arrived, the Athlon MP and 760 MP will fully-realize their potential for leadership in multiprocessor price/performance.
The third quarter will also see the follow-up release of the AMD 760 MPX chipset, which will add 66MHz/64-bit PCI bus support and a better connection between the Northbridge and Southbridge chips. Also, in the second half of this year well see the debut of Morgan, the multiprocessor-capable version of the Duron that will be based on the Palamino core architecture.
As for the competition, Ive heard it said recently that laurels are uncomfortable to sit on. Intels Pentium 4 and Pentium 4 Xeon continue to be interesting products with a lot of potential, but that potential is as-yet unrealized. In order for them to become more competitive on a performance-basis, they will need to ramp up clock speeds well above 2 GHz to compensate for their slower per-clock performance.
Intel has already improved pricing by drastic cuts in chip prices, but its going to take more than that. Its no secret that the high cost of RDRAM is the single biggest obstacle to pricing parity. Despite the gradual fall in RDRAM prices, its not enough. Deliverance is going to have to come in the form of a SDRAM memory-based platform for Pentium 4. Early reports are not overly enthusiastic about the performance of these solutions, but they are still in development so could definitely improve some before release.
Barring some unforeseen fiasco, such as a major foul-up by AMD in their manufacturing process or quality assurance, Intel has a small window of opportunity to strike back perhaps six months or a year at most. After that, and probably much sooner, even conservative large OEMs will be forced to acknowledge the merits of AMDs solution.
This is not to say that Intel doesnt have a compelling product. The Pentium 4 and Pentium 4 Xeon products hold onto their lead in bandwidth-limited applications such as media encoding, although the lead is diminished with AMDs 3DNow! Professional technology and other MP optimizations. The Pentium 4 core technology has also proven it has legs, with the ability to scale to very high clock speeds.
From a consumer standpoint, we should all hope for success by both Intel and AMD. During the past two years, weve seen great advancements in performance and value, largely attributable to the competition between these two market leaders.
The AMD Athlon MP Processor on the 760 MP chipset performs as expected, which is very well. It is not perfect, but the imminent release of cheaper motherboards and faster-clocked Athlon MP processors will enhance it and extend its position as the overall price/performance leader. If you have ignored or disparaged AMD, its time you take another look now that they have the price/performance leading solution in both single and dual-processor systems. If you already liked the Athlon for single-processor systems, youll also like it for multiprocessor systems.
As for myself, I have no doubt that my next dual-processor system will be equipped with the AMD Athlon MP processor.
Tyler A. Hawes
Tyler A. Hawes has a background in the technology industry,
having worked for Microsoft, 3Com and Intel in areas from support to Software
Engineer. A year ago he founded Audio Intervisual Design, providing Web Development, Computer Animation, Video Production
and Post-Production, as well as integration services for Canopus-based real-time
non-linear editing systems. He is an active Creative Cow member and a Cowmunity
leader on the AMD, Canopus, and 3D Studio MAX forums.
Visit Creative Cow's website and forums if you got here by direct link to this article...