By Jim Tung
I have been incredibly lucky over the last 25 years to be involved in several trends that have revolutionized how scientists and engineers work: PCs replacing minicomputers for data acquisition and analysis, high-level environments replacing Fortran for technical computing, and modeling and simulation playing a key role in embedded system development. Those trends reflected the end-user’s continuous thirst for more speed, the need to handle ever-larger datasets, and an ability to do their work better and faster. It was never simply an question of CPU clock speed, number of CPUs in the system, or addressable memory, but rather the user’s ability to harness that compute power to do useful work.
Nowadays, as I talk with life science customers – in the Boston area and around the world, from large pharmas to renowned universities such at MIT – about their challenges and their computing needs, I have a sense of déjà vu, as I see some familiar patterns mixed with some new twists.
Back in the centralized-computing days of VAX, Convex, Alliant, and Cray, the computer’s OS and system administrators bore the administrative burdens while end-users struggled at their terminals to set up their work. As standalone PCs took over, the end-users flourished: they controlled what their computer did, high-level environments such as MATLAB® enabled them to develop algorithms and analyze data without the need for low-level programming, and ever-faster processors gave them power boosts without additional effort. However, system administrators struggled with how to manage and support those dispersed computing activities.
At that time, it was groundbreaking to provide a technical computing environment that let the end-user work interactively while insulating them from computer variations, running on a PC, scientific workstation, or supercomputer without change. No need to recompile, figure out if the CPU was little-endian or big-endian, etc....
Today’s landscape has aspects that look familiar and other attributes that are new. End-users still use their personal computers, except these PCs now have dual- and quad-core processors (and more coming). Today’s multiprocessor clusters feel a lot like the old mainframe/supercomputer paradigm, except less expensive and based on “standard” processors and operating systems. And in this more diverse computing landscape, end-users still thirst for more speed, the ability to handle larger datasets, and ways to do their work better.
One twist is the mixed composition of users in biotech: biologists and chemists who rely on data and want turnkey, easy-to-use software; statisticians and mathematicians who need to create, refine, and deploy new algorithmic approaches; and computer scientists and programmers who want tools for rapidly generating production-computing applications that work on huge volumes of data.
And, at some point along the way from mainframes and minis to personal computers to today, the end-users and IT groups seem to have lost track of each other. End-users continue to crave speed and power, but don’t talk to their IT groups (except to ask again for a faster PC with more memory). Meanwhile, IT groups are buying more multi-core PCs (they don’t really have any other option nowadays) that users can’t take full advantage of. And they set up server farms and have to search in-house for projects and end-users interested in using them. It is ironic.
Hidden in this situation is a very interesting opportunity. The multicore personal computer enables end-users to do parallel and distributed computing without impacting IT groups. The improved OS, scheduling, and administrative tools of compute servers (along with the fact they’re based on the same processors and operating systems as the PCs) mean that more compute power is available, more affordably, than ever before.
Recently, enhancements in technical computing environments have started to provide consistency in how users can take advantage of today’s variety of computing systems. High-level technical computing tools that can distribute work on a server farm, without the need for low-level MPI programming, has made it easier for users craving speed to harness multiprocessor clusters for applications such as Monte Carlo simulations and sequence analysis. Those end-users can also work with larger datasets, since the number of processors and memory space can scale up. And these same environments can also take advantage of multi-core PCs, enabling end-users to make full use of the computing systems at their disposal, with minimal changes to what they do on their own PC.
While speeding up existing applications is unquestionably good news, I think the most interesting and exciting opportunity is coming next: providing tools so that algorithm and application developers can more easily create techniques that make explicit and optimal use of parallel-computing systems, regardless of whether it’s a dual-core PC or a server farm with hundreds of processors. The programming and sys admin tools are getting to the point where end-users can really focus on the problems and applications, taking advantage of available hardware without the need to deal with it explicitly.
But so what? Are you doing better drug discovery or advancing your understanding of the systems biology better than the next guy? Now you and your users have the opportunity to create that return on your investments in your computing resources.
|