If it sounds good, it is good,” said the late musician duke ellington, in what might well be the cardinal rule of multimedia technology development. Today, the Duke’s dictum must be generalized to deal with sight as well as sound. As PC Week Labs examined this year’s candidates for Best New Technology at Comdex, we found many entries that tried to make multimedia look right.
The trade-offs are clear. Network bandwidth and hard disk storage requirements can be reduced by compressing multimedia content, but complex compression algorithms use many processor cycles. Economies in the use of processing power during multimedia compression and restoration will generally reduce the quality of the results. Using an add-on processor, with a standard interface such as a PCI bus, avoids burdening the central processor–but introduces bus overheads that may hamper the pursuit of peak performance.
The face of computing is changed when engineers find exceptions, or at any rate evasions, that to some extent let them escape the rigors of these rules.
Facing the bandwidth problem
Speaking of faces, PC Week’s Best New Technology winner at Comdex was Visionics Corp.’s FaceIt PC, whose core technology includes the ability to pick human faces out of a scene and to recognize them with startling accuracy.
This doesn’t sound, at first, like a multimedia technology. Among other applications, however, Visionics’ algorithms could be used in videoconferencing systems. In the stream of bits that encodes the video stream, a system could update the faces at full-motion video rates while devoting much less effort to the background. (The name of each person in the scene could also appear next to his or her image.)
Pumping fewer bits across the network is one way to gain performance. Another way is to get those bits in and out of memory more quickly. Buried under the hood of such a system might be high-bandwidth memory hardware, based on NEC Electronics Inc.’s Virtual Channel Memory specification (one of the finalists in the Best New Technology competition).
NEC’s technology, which the company will license royalty-free after releasing its specification at Comdex, lets a RAM chip service several data streams while separately optimizing each stream’s memory access. Simulations suggest that multitasking situations might see almost double the memory throughput, without any changes to the process technology involved in building the core of the memory chip.
In addition, buffering of core memory access by NEC’s Virtual Channel logic can make the difference between marginal and satisfactory performance from the worst-case cells in the memory array. This boosts the yield of the memory fabrication process, actually reducing costs.
‘Right’ is in the eye of the beholder
Innovations like those from Visionics and NEC reduce the amount of work required on the front end of a multimedia engine, and streamline processing of data on the back end. With Ellington’s rule in mind, though, multimedia engineers should optimize the middle of the process in the areas that viewers will actually notice.
Research shows, for example, that the eye-brain system is far more picky about point-by-point brightness than it is about the color of each point. This insight is applied in MPEG image compression, a scheme that takes its name from the Motion Picture Experts Group, which initiated the effort to make audio-video content a practical form of data.
Once requiring dedicated hardware, the complex MPEG algorithms are now within the realm of feasible processing on the high-speed chips in the latest PCs. This opens the door to products such as Video Clip MPEG-2, a software-only MPEG editing application from Vitec Multimedia Inc. that was another candidate for a Best New Technology award at Comdex. The product enables frame-by-frame editing of MPEG video, reassembling a sequence that matches the frame rate and quality of the original.
This task is more complex than it sounds, because MPEG achieves its valuable data compression ratios by storing information on streams of images rather than individual image pixels.
For example, when MPEG encodes the brightness and color of a scene, it doesn’t try to represent that information with equal precision for every point of the image. Instead, it stores a brightness value for every point, satisfying the demands of human vision, but it retains color information only on an average basis for larger blocks of pixels.
Overall, the MPEG approach halves the number of values required to represent an image, compared to the separate values of red, green and blue stored for each pixel by most display hardware.
Further compression is achieved by DSP (digital signal processing) techniques, with forbidding names like DCT (Discrete Cosine Transform), that take real-world complex signals and identify their most important components.
When a signal component’s value stays the same (or nearly so) for some time, the stored data stream can simply record the number of identical values and their common magnitude. Although not entirely faithful to the original, the resulting smoothed-out result might actually seem a more perfect (or at any rate, more consistent) performance–at least, to the untrained ear.
To the eye, for that matter, a video signal that has been processed in this manner might look cleaner than the original. This is because low-energy, high-frequency components (such as the subtle but annoying patterns caused by many kinds of interference) are eliminated in the process.
Although challenged of late by the new mathematics of wavelets, the well-established DSP algorithms such as DCT are broadly supported by thoroughly tested technology. (For more about wavelets, see PC Week, Nov. 17, Page A1.)
By the numbers
Another challenge for multimedia technology is that of matching the broad “dynamic range” of human senses: for example, the ability of the human ear to process sounds that range from a whisper to a roar. Such a range can be achieved by representing signal intensities using floating-point rather than integer numbers.
For example, the noise of a nearby jet engine is a signal roughly 10 million times as intense as a barely audible whisper. To represent such a range using integer values for signal intensity, without unacceptable distortion, would require 34-bit values. This is inconvenient in a world of 32-bit microprocessors: Furthermore, wide data paths increase hardware costs throughout the entire data chain.
Compared to integers, floating-point numbers are like the difference between saying “1.1 million” (a floating-point description) and saying “one million, one hundred thousand, three hundred and five” (an integer). The former is not as precise, but it’s easier to represent and manipulate. By Duke Ellington’s rule, this seems as if it could be an attractive option.
When DSP techniques, such as DCT (which involves calculating trigonometric functions), are combined with the potential benefits of floating-point representation, one comes up with a job description that sounds a lot like the areas in which Intel Corp.’s Pentium family of processors have made major improvements over their Intel X86 predecessors. Competing modern RISC-technology chips likewise possess these strengths.
Intel and other microprocessor makers seek to maintain their revenues by encouraging the development of multimedia technologies based on central-processor cycles, rather than external hardware. Competing alternatives to Intel’s MMX instruction set are all pursuing this goal.
System vendors, on the other hand, like the flexibility that comes from being able to offer their customers a wide range of multimedia hardware in external forms that can be installed conveniently in the field, whether by a technician or by the person who buys the machine. For example, an outboard MPEG encoder from AVerMedia Technologies Inc. (see photo, above) was another interesting entry in the Best New Technology field at Comdex.
The duel between monolithic CPUs and flexible add-ons guarantees that PC buyers will continue to enjoy a wide range of multimedia acceleration solutions, whether the processing takes place on dedicated hardware or on the same processor that handles the user’s other tasks. The buyer should merely remember that what sounds (or looks) good is good by definition–especially when the price tag is the subject.