CACM Article

From Education

Jump to: navigation, search

Note: This document was pulled from EAPF's shared folder on June 2, 2010, and may not accurately reflect its current state.

Contents

Introduction and Problem Statement

The transition to parallel computing as a model presents not only a significant challenge, but an enormous opportunity.

If you are a computer science educator, you might be pessimistic these days, and with good reason.

The "boom years" of the early 90's fostered strong computer science enrollments, but those days seem much further away since the massive decline when the bubble burst. The effect of this enrollment decline has ranged from stagnation to catastrophe for many computer science departments. As enrollments in undergraduate computer science programs continue to lag, we wonder how much our dated curriculum may be affecting this.

As we have been struggling with re-engaging young people in our discipline, another major problem has leaped upon us as well. For decades, computer scientists have had the luxury of a single programming model, which formed a largely unchanging foundation to our knowledge base. This luxury was enabled by engineers who constantly re-designed the ground on which this disciplinary foundation stood, all without allowing those changes to disrupt the Von Neumann model. Unfortunately, power and heat constraints have led these same engineers to re-architect the single continent CPU into many variants of interconnected islands.

Cores are everywhere, and more of them are headed our way. This chart illustrates Intel's recent history and near-term projections for where process technology and CPU core counts are going. From this it is clear that students (and teachers) of computer science cannot continue to think of parallelism as a topical elective for a small sub-set of the population.

Year	Process Technology	Core Count
2005	65nm	4
2007	45nm	8
2009	32nm	16
2011	22nm	32
2013	16nm	64

Chart data from Intel Corporation \cite{Intel}

The programming foundation of computer science now rests on parallel hardware architectures. Taking advantage of these new architectures now lays on the shoulders of software developers who have been abruptly (from their perspective) been forced to face this change. As a result, most have almost no experience in harnessing that power to extract gains in computational performance.

A “sea-change” like this is not a new occurrence in the field of computer science. There have been several inflection points that have required a major retooling of our discipline. For example, the von Neuman architecture fostered a set of common practices and principles that allowed for a faster and more structured expansion of the computing discipline. The twin inventions of the silicon transistor and the integrated circuit were the historical events that spawned a fast-paced semiconductor industry. The object-oriented programming model pushed our developers to re-think how they design large systems. The transition to multi-/many-core architectures, with 10s to 100s of cores (or more), now calls us to a new change.

We are certainly not alone in this conclusion. Charles Thacker, developer of the first personal computer and winner of ACM's 2009 Turing Award, was recently interviewed on National Public Radio's All Things Considered [20]. On the show, he agreed that we're at a point of inflection. We can't make single-core computers any faster now, so we are making more of them. His view is that the big near-term task is figuring-out how to make them cooperate through parallel programming.

Parallelism in the architecture, the programming models, and the whole application environment offers an opportunity for enormous innovation, and the promise of future growth of our discipline. This new adventure could also offer us a hand in sparking interest in CS again - If students don't see any relationship between the coursework and what's happening in the 'real world', then where is the meaning in studying computer science?

To seize this opportunity, however, our Computer Science curriculum needs to change and grow. In that context, it is worth considering the question in Dr. Thacker’s NPR interview that led him to talk about parallelism: “If we were to look for something today that's still a matter of research and that stands a good chance of going as universal as the ideas of the Alto did, what is it?”

Motivating the Transition

If parallelism is to become universal, in the same way that personal computing has become, then what helps to motivate this ecosystem-wide change? The dot-com bust and outsourcing panic created an enrollment plunge, and most institutions still have not recovered to those earlier levels. What do we need to do to motivate students to study computer science? And, as with any topic, when trying to motivate the use of these new parallel platforms, we must answer the most basic of questions, why bother?

For some answers to those questions, we only need to look at the three hottest CS educational topics of the past few years: Media Computing, Game Development, and Computational Science (along with its cousin, Computational Thinking). All three of these topic areas, besides being effective motivators for general computer science, have parallel computation as a requirement in much of their professional environment.

Media Computing

Our information-rich lives have created a burgeoning need for systems that do timely processing of the sound and images we digest every day. The goal of media processing is to offer real-time image processing, advanced 3D rendering (e.g. WebGL [16]), and video encoding/decoding, along with many other useful manipulations to media data. These applications share a common goal of producing results in a reasonable time and providing the user with a better media experience. Examples of these improvements include video jitter removal, face recognition, advanced rendering techniques (e.g. order independent transparency), and automated image classification.

To accomplish these operations in a timely manner, media processing constantly pushes the performance of high-end many-core and heterogeneous hardware. This generates the volume that has driven these powerful chips into the commodity consumer market. As a result, industry has discovered a desperate need for programmers with knowledge and experience in programming these parallel platforms. The train-as-you-go model is no longer sufficient to keep pace.

Most media processing is inherently parallel, which provides an excellent opportunity for accessing the advantages offered by emerging heterogeneous parallel hardware and their corresponding programming models (such as OpenCL [17]). As a direct consequence, industry increasingly expects that new hires will show excellent programming skills, application domain understanding, and knowledge of many-core programming. (media compute @ SIGCSE?)

Gaming

In the search for a solution to declining enrollments, many programs have placed their hopes on Gaming to generate enthusiasm in the next generation. By piggybacking the exploding market success of computer games they can not only excite their students, but the field of game development has also helped many schools to form bonds across campus with departments that are not traditional computer science partners, such as English, Theater, and Art. In his December 2009 CACM article [Zyda], Michael Zyda describes one such successful program - USC’s GamePipe.

One of the driving design constraints in game development has always been performance. Due to the drive to produce the most realistic (or fantastic) scenes, effects, and artificial intelligence, games have been a strong leader in maximizing the performance of computer hardware, which in turn has often driven the markets for high-end processors and graphics hardware. Game developers have also been good students of code optimization - for example, id is widely credited with popularizing the fast inverse square root method.

With the release of multi-core processors, this trend has continued - with modern games beginning to take significant advantage of parallel hardware in recent years. It naturally follows that, when looking for new hires, these companies want to see students with strong low-level programming skills, knowledge of computer performance, and specific knowledge of multi-threaded programming. Studies by McGill [McGill] and others in the game development community illustrate that these skill requirements aren't often being met, with knowledge of multi-threading having the largest "expectations gap" between education and industry.

Computational Science

(Note: There are a bunch of recent SIGCSE stuff on intro CPSC /CT stuff – cite?) (e.g., Qualls, Sherrell or Wing)

Practitioners of computational science build models of real-world entities and processes and then implement numerical solutions to those models, using them to study their real-world analogs. Computational techniques are at the heart of modern science, with climate analysis and genome sequencing being prominent examples. Computational science depends on computers and software to simulate laboratory experiments, to perform experiments for which there is no laboratory analog yet, and to mine and visualize large data sets. The methods employed are usually resource intensive, requiring massively parallel high performance computing (HPC) systems to create and use these large data sets.

A number of pioneering scientific disciplines have been attracted to the comparatively fast turnaround time and relative simplicity of computational methods, which has led to its steadily increasing use for basic and applied research in the natural science disciplines, as well as expanding into social sciences, economics, the humanities, and the arts. The need for computational scientists was clearly stated in the President’s Information Technology Advisory Committee (PITAC) June 2005 report, Computational Science: Ensuring America’s Competitiveness [cite PITAC]. To help meet the needs in this arena, the next generation of computer scientists needs to be facile with parallel programming and distributed computing techniques that are pervasive within computational science. Anecdotally, we have heard at various conferences that this industry often hires electrical engineers and then trains them to do parallel programming. While we have no gripe with electrical engineers being employed, we do take issue with having CS grads being unprepared to be hired as programmers.

Support for the importance of the parallel programming and distributed computing concepts used to harness the power of high performance computing gear for computational science can be found in Peter Denning's July, 2007 column from this journal Computing is a Natural Science [cite Denning]. In that column he traces the history of computational science and notes the boost it was given by the U.S. Congress in 1989 when they passed the High Performance Computing and Communication Initiative. This legislation made it clear that continued development of these resources, and the tools and techniques used to harness their power, are now critical to the broader mission of scientific discovery across a wide range of disciplines.

Necessary Skills

There is a lot of overlap/synergy here with our Tech Pack group – what do people need to know? That is at the heart of the Tech Pack, and should be reflected here as well.

These fields all have two very distinct things in common. First, they are all being used at the undergraduate level as motivating frameworks for computer science. Second, to do any of them well in a professional environment, some parallel computing expertise is essential. So what do new graduates need to know to enter these fields?

  1. What is parallelism - basic decompositions
  2. Memory Hierarchy - shared vs. distributed
  3. Interplay of the application, the parallel algorithm that solves it, and the parallel architecture it is run on
    1. Characterizing how these mappings work out.
  4. Pragmatic skills of collaborative debugging and performance tuning large code bases in the more complex arena of concurrent processes and threads


Rationale for a New Framework

Two decades ago, during the heyday of big-iron supercomputing, groups of educators contemplated how to infuse the undergraduate CS curriculum with parallelism. If you look at the working papers produced by the Forum on Parallel Computing Curricula in 1995[cite], for example, you see many of the same issues that still vex us today: how do we integrate parallelism into the knowledge base? Do we use a separate course, or integrate modules into existing materials? How do we train up the faculty?

It isn't surprising that, years later, we still don't have definitive guidance from the curriculum standards group on how to proceed, since, though we could see the handwriting on the wall, regularly increasing single CPU performance co-dependently supported the status quo, resulting in parallel topics being relegated to the sidelines.

Since the advent of the von Neumann architecture, computer programmers have operated under its sequential assumptions and implications. Architects and developers alike have been happy to work within its constraints, making use of a standard programming interface that provides "free" performance improvements every time a user purchases a new machine, at least until recently. While this standard interface has greatly boosted our collective software capabilities (Larus), it has also limited the exploration of innovative and high-performance architectures that break out of the sequential mode.[Hill Figure?] Indeed, parallel computing has been the "next big thing" for well over 30 years (Feynman), but widespread adoption has been elusive.

Besides creating an environment averse to creative architectures, the single CPU status quo has led modern programmers to spend very little time on performance optimization and evaluation, which has led to a widespread shortage of the related skill set. The soliloquy we've all heard goes something like this: "Why spend time optimizing code when I can just purchase a new machine in short order for far less money than the cost of a programmer's time, with the code running significantly faster?" Over time, this attitude has left theoretical "Big O" calculations as the only performance skill students typically take from college, which is helpful when choosing a new algorithm, but is of little help tuning an actual code to hardware.

Unfortunately (at least for non-architects), physical constraints have ended software's "free lunch" [18]. Multi- and many-core architectures are only the first step. The explosion of accelerators (such as GPUs) as a general computing facility are leading future machines down a more heterogeneous path. Core density and memory bandwidth characteristics are also motivating new macro architectures which have deeper and more varied storage and communication hierarchies than we have seen in the past.

If programmers want their code to take advantage of any of these new architectures, they will have to do so explicitly, at least under current models. And while it is probably infeasible to bring full treatment of numerical analysis and advanced architecture back into the CS curriculum, it certainly isn't too much to ask that students be able to understand the concept of speedup, the ramifications of Amdahl and Gustafson/Barsis's laws, and enough macro-architecture to support the basic understanding that there's now more than one "brain" in our computer, that the memory system exists to keep them working efficiently, and that how one's code interacts with that process can have startling effects on performance.

Curriculum Ideas and Directions

The natural reaction within the circle of educators is to ask what has to be taken out of a the current curriculum to accommodate the brave new world. We contend that this is the wrong question to ask, as it assumes that parallelism is just an independent knowledge unit that has no interaction with the rest of CS course material. Instead, look for where content can be adjusted and re-framed. When teaching sorting, teach a parallel version. When discussing loops, demonstrate how they often create naturally exploitable independence.

Packaging material in smaller chunks also makes it possible to incorporate material into existing courses with little overhead, mitigating the need for increased faculty resources, decreasing the possibility that key CS concepts are missed, and giving faculty a shallower learning curve for bringing parallel methods into their existing curriculum. Perhaps most practical of all, smaller chunks facilitate faculty fostering new expertise at a manageable pace.


New programming languages will undoubtedly surface, replacing the language du jour, that will necessitate future curricular changes. For now, we will have to press our current crop of languages - C, C++, Java, C#, etc. - into service as our pedagogical parallel programming languages. Thankfully, there exist frameworks within most of the widely-used teaching languages to study these topics.

If C++ underpins your programming courses, then OpenMP, a part of most widely-used compiler chains, is an easy first tool for learning basic decomposition topics. At any point after looping is covered, a couple of lines of code can elegantly parallelize a for loop, allowing instructors to spend their time discussing the decomposition, instead of the syntax. When Java or C# is the language of choice, then threads can still be covered, there will just be a bit more code rewriting. (C# task parallel libs) (Do we really want a laundry list here?)

Pedagogical ideas for parallel concepts are beginning to emerge, with some

It’s often overlooked that new students come to us with experience reasoning about parallel and concurrent systems; any driver must process and prioritize thousands of concurrent inputs to safely navigate the roads. At SIGCSE 2010, Kim Bruce reported his experience with introducing concurrent event-driven designs in the first programming course for the past decade [21]. His conclusion was that students naturally think in very concurrent terms.

These concepts can often transfer very clearly in media-rich programming languages like Alice or Scratch, two recent and popular introductory frameworks to teach computer science concepts and capture student interest. These languages are easily picked up, with projects that demonstrate concurrent operation being easy to create, often self correcting, and open ended - projects which lead students to a concurrent intuition that will serve them well in later classes.



Data Structures provides several opportunities for introduction of parallel concepts. "Big OO" is sometimes added to the discussion of Big O as a theoretical idea. Parallelism can successfully reduce the size of the "Big O" constant. A GPU example or two can highlight both the importance of considering the "Big O" value and the "Big O" constant. Parallel algorithms are easily and gradually introduced along with scalar ones. Writing and running parallel algorithms is fully of opportunities for learning, with data case dependant race conditions, problems of deadlock, and mutually independent memory access and other problems presenting themselves in frustratingly intermittent ways. Parallelism provides a vehicle for expanding on problem decomposition, with task and data decomposition being viable methods of "seeing" parallelism in a problem, which is a natural segue into parallel programming patterns, a la the Berkeley dwarves. Actual working code, illustrating each pattern, allows students to both absorb the pattern, as well as the opportunity to torture the code to further understand the resiliency of a particular pattern. Data structures can be expanded to show the affect of inter-process communication dominating CPU as the performance bottleneck as the number of cores/CPUS/processors increases.

Operating systems has always been the concurrent poster child with including semaphores and deadlocks and mutual exclusion, oh my! Students will benefit from an overview of the hardware architectures (re)emerging as current computational platforms and their implications on algorithms and data structures.

Challenges

The current aggregation of existing curricula, teaching experience, programming languages, and legacy code has a significant amount of momentum, but it is not unlike many of the periods of paradigm transitions in programming. With the use of machine language in the 1940s, the tedium and opportunity for error inevitably led to the creation of assembly language. In the early 50s, need for improved programmer productivity led to Fortran and LISP, both languages that are still used, and a short time later to the progenitor of languages: Algol. In the 60s, impenetrable goto laden code led to the structured program theorem, the back-fitting of older languages, and the development of newer languages like Pascal and C. In the 70's, need for good abstractions in large projects led to the creation of languages, like Simula and Smalltalk, to support modular and object oriented programming, which eventually reached the mainstream with C++.

While these sea-changes in programming all happened, they didn't do so overnight. Similarly, in the transition to parallel programming models, there is a lot of groundwork to be done - convincing faculty, finding the best teaching methods, and developing tools and abstractions which make concepts more graspable to students.

Tools, in particular, present a challenge. The most popular current parallel programming software platforms - OpenMP, MPI, and the GPGPU languages like CUDA - are all low-level parallel programming paradigms, and likely expose more issues than the typical student will need to see in the long term. However, as in the past, we can know the current issues and accommodate them with the tools at hand. (After all, it was possible to write structured and modular code in Fortran 66)

Even with a perfect crystal ball predicting the future parallel roadmap, how do we ready faculty for this unknown tools gap and unknown languages gap? More importantly, how do we ready our students to be able to adjust to this changing framework once they venture forth from our academic pipeline?

1. Language/tools gap o Turn--key solutions for specific problems exist, but no real general model --> need to know more about systems issues to capture parallelism in other cases. o How to achieve both performance/scalability and productivity in a general-purpose parallel language or environment? 2. faculty readiness gap o Even if we knew what we were doing, no one else is ready to teach it. 3. Teaching tools gap o How can we make the “easy parallelism” easy? 4. Teaching to the unknown

Models?

(examples of working curricula) 1. CS0/1 It is essential this not be taught solely via a single course, rather spread throughout the curriculum. A CS0 course can be used to attract students and give a solid survey of a Computer Science degree program. A graphically oriented language such as Alice or Scratch allow for an immediate self-correcting experience of concurrency, which may be the most important foundational experience for a student, fueling computer science intuition and consequent problem solving skills. A beginning course in programming can introduce thread based shared memory parallelism as a part of the normal introduction of the fundamental tools and concepts used to design programs. Although Java provides native support for threads and synchronization, the student needs to be able directly control the thread of execution on each core. Alan Kaminsky has done an outstanding job providing a Parallel Java Library, which more than suffices. MPI, OpenMP, CUDA, and OpenCL all easily work with C, which luckily bears more than just one letter in common with C++. Shared memory parallelism via OpenMP is probably the gentlest intro, since students can start with a working scalar code and see the affect of spreading a loop across several cores via a single compiler pragma, and thus begin their tumble down the rabbit hole into wonderland. 1. Alice up front 2. Need more here!

Conclusions (and Resources/Teaser?)

References

1. M. McGill - academia/industry expectation gap for game dev programs (FDG '09) 2. Mine CACM and IEEE Computer for the past 2 years (charliep) 3. Consider game design for education articles for connections into the curriculum (charliep) 4. David Joiner, Charles Peck, Thomas Murphy, Paul Gray, What we've learned about high performance computing education, outreach, and training, Computing in Science and Engineering, Volume 10, Number 5, September/October, 2008 5. Tom Murphy, High-Performance Computing in High Schools? IEEE Distributed Systems Online, vol. 8, no. 8, 2007 6. Tom Murphy, High-Performance Computing in Community Colleges?, IEEE Distributed Systems Online, vol 4, no 4, 2006 7. David A. Joiner, Paul Gray, Thomas Murphy, Charles Peck Teaching parallel computing to science faculty: best practices and common pitfalls, ACM SIGPLAN 2006 Symposium on Principles and Practice of Parallel Programming, 2006 8. J. Larus - Spending Moore's Dividend (CACM May '09) 9. Peter Denning, Computing is a Natural Science, CACM July, 2007 (v50, n7) 10. Gordon Moore, Cramming more components onto integrated circuits, Electronics, 1965 (v38, n8) 11. Jeanette Wing, Computational Thinking, CACM March, 2006 (v49, n3) 12. President’s Information Technology Advisory Committee (PITAC) Computational Science: Ensuring America’s Competitiveness, June 2005 13. Intel Corporation, personal communications with Wilf Pinfold, Stephen Wheat and Michael Haedrich. 14. National Public Radio, All Things Considered. Turing Award Winner on Future of Tech. (March '10) 15. oo reference Guttag and Liskov 16. WebGL - OpenGL ES 2.0 for the Web. Khronos WebGL Working Group, March 2010. 17. The OpenCL Specification, Version 1.0 rev 47. Khronos OpenCL Working Group, August 2009. 18. Herb Sutter, The Free Lunch is Over: A Fundamental Turn Toward Concurrency in Software, Dr Dobb's Journal, Vol 30, No. 3, 2005. 19. John V. Guttag and James J. Horning, The Algebraic Specification of Abstract Data Types, Acta Inf., 1978. 20. Charles Thacker, Turing Award Winner On Future of Tech, All Things Considered, NPR, March 15th 2010. 21. Kim Bruce, Andrea Danyluk, and Tom Murtagh, “Introducing Concurrency in CS 1”, SIGCSE 2010.


EXTRAS

We are in a time of a swing back towards performance concerns. Persistent single core speed-up fostered modularity and objects. OpenCL [17], OpenMP, and MPI are all low-level parallel programming paradigms. A parallel language will need to accommodate performance and scalability of algorithms to increasing numbers of cores. OpenMP does a fair job with low level parallelism, a parallel language will need to handle course grain parallelism, particularly the embarrassingly parallel. It seems to take about a decade for a stable new language to emerge. We can expect to say in the 10's of the languages accommodating many-core hardware and higher-level expressions of parallelism.

Personal tools
SC Education sites