Find millimeter on Facebook

Related Articles

An interview with Mike Rockwell, Avid chief technology officer.

Bob Turner interviewed Avid CTO Mike Rockwell about what Apple Computer and AMD call the “megahertz myth.” Avid offers Xpress and Media Composers on both Apple and Windows operating system-based platforms so discussing this subject – while subjective does not feature inherent commercial prejudice.

Turner: Mike, I am trying to write an article explaining what is important when evaluating/selecting a postproduction platform. I would like to start with public positions from the major players. First, in the PC World's article (“AMD: Megahertz Isn't Everything; As Intel pushes to 2GHz, AMD argues performance, not speed, is what matters” by Tom Mainelli, August 24, 2001) you can find the following segment: "On the eve of Intel's 2GHz Pentium 4 launch Monday, the folks at AMD are eager to make one point: Megahertz isn't everything. 'Our combination of Athlon and DDR technology outperforms their combination of P4 and Rambus technology,' says Tim Wright, director of desktop marketing at AMD. ‘Megahertz is only part of the equation.'" Please comment on Tim Wright's statement.

Rockwell: As with most things, this depends largely on your application. Some applications depend on throughput, some depend on integer computation, some depend on floating point calculation, some depend on data manipulation. Each system is good at some things and not as good at others. In general, though, if all other factors are equal, a processor with a higher megahertz will perform faster than one with a lower megahertz.

Turner: Fair enough. At the Apple Web URL , there is a QT video where Steve Jobs and Apple Senior VP of Hardware Jon Rubinstein detail why the clock speed of a computer isn't an accurate way to compare system performance. Overall system design and processor-architecture differences affect real-world application performance; otherwise you might be fooled by what he terms ‘the megahertz myth.’ This appears to reiterate Wright's position. What is your reaction to the QuickTime video?

Rockwell: I think that there is some truth to what they are saying, but there is some marketing FUD in there as well. One can see some corroborating evidence of what they are saying with the drop in performance of a 1.41GHz P4 vs. 1GHz P3 on the same operations.

Turner: In this presentation, they demonstrate why an 867MHz G4 performed 87% faster than a 1.8MHz P4. On their website http://www.apple.com/powermac/specs.html and on their PowerMac G4 handout http://a1712.g.akamai.net/7/1712/51/fae8dc87c5abc2/
www.apple.com/powermac/pdf/PowerMacG4_DS-e.pdf
they show this in chart form. Is this true? If not, why did it appear so? In your opinion, was this demonstration fair?

Rockwell: I don't know if the Photoshop demo used filters that were SSE optimized or not. If they weren't, then it was not a fair demo. SSE is Intel's equivalent to the Altivec (Velocity Engine) instructions, and they are pretty comparable -- especially SSE2 which is in the P4. The other thing is that for normal programs that don't do image processing, I think that the Pentium 4 can be faster in a number of cases.

Turner: Do you agree with Apple that there are four factors to performance: frequency, pipeline stages, number of functional units (number of instructions per cycle), and cache design (level 1, level 2, level 3 cache)?

Rockwell: There is also RAM architecture. For image processing, this is less important than for video processing, where you are pumping large amounts of data to and from memory. The latest PC-based RAM bus memory has 3.2GB per second of memory bandwidth. DDR memory also performs well and is even cheaper. The Mac memory architecture, to my knowledge, currently has around 1GB per second of total memory bandwidth.

Turner: Can you explain in layman's terms the important aspects of cache design as it relates to the type of performance benefits a postproduction workstation will offer?

Rockwell: Today's processors spend a good chunk of their time waiting for data. The bandwidth of RAM has not kept up with the increase in processor speed. To give you a feel for this, look at the ratio of the clock speed of the processor to the clock speed of the memory. For example, on a Mac G4 the RAM memory bus clock speed is 133MHz, while its processor is running at 867MHz. This means that if the processor needs some data that is in main memory it has to wait around six (867/133) clock cycles before it can do anything. To help with this, system designers put caches in the system. A cache is basically a small amount of very high speed memory that sits between the processor and the main memory. Level 1 cache is usually directly on the CPU and generally runs at the same speed as the processor. If the data is in the level 1 cache, the processor can access it immediately. Level 2 cache can either be on-processor or off-processor. Level 3 cache is usually off the processor. By trying to anticipate CPU's requirement for data, a cache can greatly improve the performance of a system. With a fast main memory bus, the requirements for caches are decreased. So you need to look at both main memory performance and cache performance together to get an overall idea of how the system will perform. On a Pentium 4, the memory bus can run as fast as 400MHz, which gives it around three times the throughput and 1/3 the latency of the G4. It also has a 256K on chip level 2 cache. The G4 also has a 256K on chip cache, but since its memory bus is at 133MHz there is also a 2MB level 3 cache that is running at 216MHz to help with the latency issues. The pattern of data access in some applications results in most of their information being in the cache. In this case, caches dramatically improve performance. Unfortunately, when processing video in realtime, you often get cache misses and you can end up degrading to the performance of the RAM system. In this case, the Pentium 4 has a decided advantage because of its faster main memory architecture. If you are processing video one frame at a time in non-realtime, then the performance differences become smaller because you are more likely to hit the cache.

Turner: In layman's terms, can you comment on the accuracy and importance that Apple places on "pipeline stages," and if its claims are true, why does Intel use a 20-stage pipeline and Apple only a seven-stage pipeline?

Rockwell: A pipeline is like a bucket brigade of firemen. The last person in line won’t get any water until it has passed through the hands of each person in line. A bubble in the pipeline is like having each fireman pour out their bucket. They then have to wait for the bucket from the end of the line. In this analogy, actual processor performance is how much water reaches the end of the line. In real programs, many things can cause a bubble. If the program can to take two different paths based on whether some condition is true or false, it can only prepare the pipeline for one of those paths. If it needs to take the other path, you have a bubble and the pipeline has to be flushed and start from the beginning. This is an interesting trade-off. Smaller pipeline stages let you run the processor at a higher clock speed. If you code an algorithm carefully and hand tune it, you will not see nearly the number of pipeline bubbles that Apple is describing. It's debatable which is better: a processor that runs most code fairly well or a processor that runs tweaked code like a banshee. The consumer probably wants the former while the professionals want the latter (and the software to go with it). Ironically the consumer is more apt to just look at the megahertz number on the box when he buys. He never really appreciates that he's not getting the performance he expects.

Turner: At Apple’s web site URL http://www.apple.com/powermac/processor.html and the supplementary URL http://www.apple.com/g4/, there is a graphic whose caption reads, "The PowerPC G4 Velocity Engine processes information at 128-bit chunks, compared to 32 - or 64-bit chunks in traditional chips." The graphic shows data blocks passing easily through a wide gate on the G4 but bottlenecked in the Pentium. The purpose of the graphic is to compare the design of the P4 and the G4 Velocity Engine. Video Systems readers will look at this graphic and believe that the Intel system has a severe bottleneck. Is this fair? What would you like them to know when seeing this graphic?

Rockwell: This is actually inaccurate now with the Pentium 4's SSE2. The statement may have been made prior the introduction of SSE2. Or maybe the P4 is not considered a “traditional chip.” Still, Pentium 4’s SSE2 processes data in 128-bit chunks just like the G4. The G4 does have an advantage in that it has a separate set of dedicated memory for the Velocity Engine. On the P4, its memory is shared with the normal floating point operations.

Turner: Why should Video Systems readers not care about John E. Warnock's statement (featured on the Apple website): "Currently, the G4 is significantly faster than any platform we’ve seen running Photoshop"? What is the missing information that they should consider when reading that statement?

Rockwell: Still-image processing is different than video. Also, there are other items to consider when looking at performance. Take PCI, for example. The Mac has 33MHz/64-bit PCI, which allows for a total throughput of 256MB per second. Workstation class PCs have 66MHz/64-Bit PCI, which allows for a total throughput of 512MB per second. Some PCs are coming out with 133MHz/64-bit PCI, which supports 1024MB per second. This becomes very important when looking at working with HD data rates.

Turner: Avid has selected Intel-based platforms for their Avid Xpress DV. The reason they give (according to Charles Russell, product marketing manager), is "megahertz matters." He goes on to say that with the faster processing available on Intel platforms, realtime transitions are available without the hardware acceleration boards that people with Macintosh computers need. Please comment on this. What is missing from the simplicity of the statement? Surely there is more to this technology than that? Please offer your personal view.

Rockwell: As I stated previously, memory throughput is as much of a concern as processing speed when it comes to video. Also, with properly tuned algorithms using SSE2, you can do better on a Pentium 4 then on a G4. The reasons for choosing Intel for Xpress DV include more than just performance, but I'll leave that to the marketing folks to answer.

Turner: Does this imply that we will not see a Mac version of XpressDV at NAB?

Rockwell: You know that Avid has a policy that we will not comment on any possible product before it is officially announced, and I am not going to say whether or not such a product is being developed. I will add that we have been advocates of crossplatform support for our editing software products.

Turner: Thank you, Mike Rockwell, for your candid replies, and for making technical concepts understandable for non-engineers.

Browse Back Issues