jump to navigation

We Are Bending the Heck Out of the Von Neumann Bottleneck December 1, 2012

Posted by Peter Varhol in Architectures, Software platforms.
Tags:
add a comment

When I was taking graduate computer science classes, back in the late 1980s, we spent some time talking about SIMD and MIMD (Single Instruction Multiple Data and Multiple Instruction Multiple Data) computer architectures, with the inevitable caveat that all of this was theoretical, because of the Von Neumann Bottleneck.  John Von Neumann, as students of computer science know, was a renowned mathematician who made contributions across a wide range of fields.  In a nutshell, the Von Neumann Bottleneck defined processor architecture in such a way that the bandwidth between the CPU and memory is very small in comparison with the amount of memory and storage available and ready for CPU use.

I’ve recently returned from Supercomputing 2012, and I’m pleased to say that while we are not breaking the Von Neumann Bottleneck, new computing architectures are bending the heck out of it.  You can argue that the principle of parallel processing addresses the bottleneck, and parallel processing is so mainstream in the world of supercomputing that it barely rates a mention.

Programmers are well aware that writing parallel code is difficult and error-prone; we simply don’t naturally think in the sort of way that provides us with parallel ways to solve problems.  But with multiple processors and cores, we end up with more busses between memory and processor (although it’s certainly not a one-to-one relationship).

Because writing parallel code is so difficult, there are a growing number of tools that claim to provide an easy(ier) path to building parallelism into applications.  One of the most interesting is Advanced Cluster Systems.  It provides a software solution called SET that enables vendors and engineering groups with proprietary source code to easily parallelize that code.  In some cases, if the application is constructed appropriately, source code may not even be required.

In addition to parallel processing, we can look to other places for moving more data, and more quickly, into the processors.  One place is flash storage, which becomes virtual memory for an application, with only the working set loaded into main memory.  FusionIO offered a partial solution to that bottleneck with a flash memory storage device that was software-configured to act as either storage or an extension of main memory, with separate busses into the processor space.  The advantage here is that program instructions and data can be stored on these flash memory devices, which then have direct access to main memory and processor space.  The single bus isn’t such a bottleneck any more.

All of this doesn’t mean that we’ve moved beyond the Von Neumann architecture and corresponding bottleneck.  But it does mean that we’ve found some interesting workarounds that help get the most out of today’s processors.  And as fast as we think computers are today, they will be far faster in the future.  We can only imagine how that will change our lives.

Thanks to HP, the Itanium Still Lives March 27, 2012

Posted by Peter Varhol in Architectures, Software development, Software platforms.
add a comment

When PC software development meant working in MS DOS, programmers had to make a choice between using the small, medium, or large (and later huge) memory models.  These models required the use of different structures and pointers, and added an additional layer into architecting an application.  They had to do with the fact that Intel 80XXX processors segmented memory addressing into 64KB chunks.  Crossing a segment extracted a performance penalty, so you wanted to minimize the number of times you had to do that.  In time, it made software design more convoluted than it needed to be.

Fast-forward about ten years or so.  Intel 32-bit processors were able to abstract over this legacy structure, but the fundamental segmented architecture still exists under the covers.  Intel recognizes that it’s messy, and could possibly limit processor designs in the future.

So it starts over again, with a 64-bit processor called the Itanium (and later, Itanium 2).  I had a dog in this hunt; around 2000 Intel contracted with Compuware’s NuMega Lab, where I was a product manager, to make Itanium versions of its acclaimed SoftICE and BoundsChecker developer tools.  The Itanium used a completely different instruction set, but Intel implemented the x86 instruction set in microcode on top of it for the sake of backward compatibility.

It should also be noted at HP, shedding itself of its PA-RISC and DEC Alpha processor families, needed a high-performance replacement architecture, and invested heavily in Itanium systems and software.

As it turned out, there was no real business reason for reengineering the industry standard processor architecture.  It was an engineering decision, to make a better design, rather than to rally users around.  Intel spent billions of dollars over several years because its chip designers wanted something better than they had.

The Itanium was actually slower running x86 code than the best x86 processors at the time.  Other than HP, few system manufacturers adopted it.  And when AMD showed the way to 64-bit extensions to the x86, Intel made the same leap with the legacy processor architecture.

I’m reminded of this story by an ongoing lawsuit between HP and Oracle surrounding the availability of the latter’s flagship database on HP’s Itanium systems.  Oracle would like to cease supporting the Itanium, and cites HP’s substantial financial support to Intel to keep the Itanium alive.

I have no opinion on the lawsuit, except that it seems like a waste of good time and money.  But the story of the Itanium is of a project in which engineering triumphed over market requirements.  And it turned out badly for Intel, as well as HP and others.  If you build it, they won’t necessarily come.

But It’s Not a Commodore 64 April 11, 2011

Posted by Peter Varhol in Architectures, Software platforms.
1 comment so far

As a young adult, I was intrigued by computers, but didn’t have that focus to actually learn more about them.  A part of the problem was that computers were largely inaccessible to individuals at the time, and I lacked the ability to purchase one early in life.

So I largely missed out on the Commodore 64 revolution.  Sure, I had friends with them, but we mostly played games; I first encountered Zork on a Commodore 64.  I used timesharing systems in undergraduate and graduate school, but my first computer was an original 128KB Apple Mac (which I still have, and which still boots).  Commodore remained in business until the 1990s with the popular though niche Amiga, eventually folding for good.

Now it seems that the Commodore 64 is rising from the dead.  It looks like a Commodore 64, with only a keyboard in a small form-fitting console.  The original Commodore 64 had an eight-bit processor, 64 kilobits of memory, and required external units for display and storage.

This unit, manufactured by a licensee of the name called Commodore USA, is basically providing a low-end Intel machine in the Commodore 64 form factor.  It includes an Intel Atom processor, Nvidia graphics, an HDMI port, and optional Wi-Fi and Blu-ray drive.  A new and potentially interesting distro of Linux is promised, but not yet available, so the company may initially ship Ubuntu Linux.  Alternatively, once you get one, you can load Windows on it, but it doesn’t come with a Windows license.

The announced price is $595, the same as the original Commodore 64.  The linked article above describes how the most difficult part of the process was replicating the exact color of the case, and the enormous cost in doing so.

It’s potentially an interesting concept if it had a functional niche.  As it is, it’s a PC; Linux, it’s true, but a PC nonetheless.  The niche seems to be simply nostalgia for my generation, to remind us that we were young once, when the world was simple and we played computer games.  Commodore USA thinks they can sell a lot of them, simply with the name and an exact replica of the system case.

I’m not nostalgic.  I know there are people who will buy into this, but it simply doesn’t make any sense to me.  A computer is a tool, not an icon (well, you know what I mean).  It doesn’t get style points (unless it’s from Apple).  I imagine that some will be sold, but the attraction will wear off as the technology ages still further.  Um, just like its buyers.

GPU Technology Conference Approaches July 8, 2010

Posted by Peter Varhol in Architectures, Software platforms.
add a comment

NVIDIA reminds me that its GPU Technology Conference is 11 weeks away (I submitted a proposal to speak; I’ll certainly mention if it’s accepted).  It’s September 20-23 at the San Jose Convention Center in San Jose, CA.  In the meantime, the company is hosting a Live Chat series, featuring high-profile GPU technology personalities.  Each Live Chat is 30 minutes long and open to the public.  This is a great opportunity for anyone interested in GPU computing to get some virtual one-on-one time with the industry’s top GPU scientists, researchers and developers.

The first Live Chat first guest is Ian Buck, inventor of the Brook project and CUDA.  He’s currently Software Director of GPU Computing at NVIDIA. Ian’s talk at GTC last year, From Brook to CUDA, highlighted a perspective on how GPU computing has evolved and what the current innovations are in GPU technology.  During his Live Chat, Ian will give a preview of his GTC talk and will be taking questions about the future of CUDA and GPU computing.

I attended this conference last year, and found it to be one of the most energetic and informative conferences I have attended in many years.  You can tell the state of a particular technology by the enthusiasm of the attendees, and this conference has all the earmarks of a celebration for a significant new technology.

This year, keynote speakers include scientific computing authority Hanspeter Pfister from Harvard University and computer graphics pioneer Pat Hanrahan from Stanford University.

Anyone interested in high performance computing or GPU computing, whether specifically for graphics or for general-purpose computation, should check this out.

Are There Legal Ramifications to Cloud Computing? April 17, 2010

Posted by Peter Varhol in Architectures, Strategy.
add a comment

So says this article on MSNBC.com.  Actually, it refers to them as constitutional issues, which I suppose they are, in a broad way.  But while they can likely be overcome without changing the Constitution, they should be in the back of the mind of anyone using or advocating cloud computing.

Here’s the deal (disclaimer: I’m not a lawyer, so take what I say with appropriate skepticism).  Let’s say you work for a company that does some computing in your data center, but also lease server time in the cloud.  It turns out that current US legal practice is that any data inside your data center requires a search warrant to access; the data in the cloud generally requires a subpoena to the cloud provider.

The difference is a big one.  As we learned years ago when various Web sites propagated information that was proprietary or incorrect (or sometimes merely annoying to its target), it took relatively little effort to shut it down.  A court order to the ISP was typically sufficient, a result of a petition to a court and no trial or even representation by the accused party.

We make a legal distinction between things kept at a physical location we can call our own, and things placed outside of that location.  Law authorities can’t come into our homes or offices, or look at our records, without a warrant (whether or not authorities don’t occasionally deviate from that standard is a different story, and not mine to say).  That also works for a remote data center where we own or even rent the servers; we have ownership of the physical container.

That’s different in cloud computing.  We rent server time, not the server itself.  The box remains firmly in control of the cloud provider; our code just so happens to run on it.  Do we know how our cloud provider will respond if someone shows up at the door with a subpoena or search warrant?

This is what virtualization has wrought.

Surely this is no different than the mainframe timesharing systems of three decades ago.  After all, that’s where virtualization was invented (I was a VM/MVS programmer for a brief time in my career).  That’s true, but what we are doing on computers today is much more interesting and important in the grand scheme of things, and more people have a better understanding of it.

And it’s not just a corporate issue.  Many of us use Gmail or one of the other Web-based email services; the same principles, or lack of principles, apply.  Same with social network sites.

It is important enough so that a wide variety of corporations, public interest groups, and individuals have created Digital Due Process.  As the name implies, its goal is to apply the same expectations we have concerning searches in physical locations to virtual locations.

This may or may not have a lot of impact on our lives right now, but often new laws, or new interpretations of existing laws, take a long time to come to fruition.  And we would like to know that the data we store under our names on far-away servers has similar protections to the that on the computer right in front of us.

How Many Processor Cores are Enough? March 31, 2010

Posted by Peter Varhol in Architectures, Software platforms.
add a comment

I’m prompted to ask this question because of the announcement yesterday by AMD of the availability of its new 12-core Opteron processor.  But some context is in order first.  In 1965, Gordon Moore noted that the number of transistors on a processor was doubling approximately every eighteen months.  Through a combination of miniaturization, increasingly smaller manufacturing processes (which are of course related to the whole concept of miniaturization), and faster clock speeds (once again related to miniaturization), and overall better engineering, the semiconductor industry has managed to keep up this incredible pace.

Moore’s Law has since been expanded to mean other related but perhaps not entirely accurate things, such as the doubling of computing power, or the complexity of the processors.  However, there is no question that processors have been getting much faster over a period of decades.

But in computing there are no pure wins.  If you optimize one factor, you are doing so at the expense of one or more others.  I usually refer to this as TANSTAFFL, or There Ain’t No Such Thing as a Free Lunch (Wikipedia says that it has multiple origins, but I’ve always attributed it to Robert Heinlein’s 1947 novel “The Moon is a Harsh Mistress”).

In the case of processor design, the tradeoff is nothing more than the laws of physics.  There are physical limits to how far you can shrink electronic components without undesirable side effects, such as excess heat or radio frequency interference.

Around ten years ago, we seem to have pretty much reached the limits of miniaturization.  So Intel and others turned to putting multiple processor cores on a single processor die.  It has also commenced hyperthreading its processors, the practice of providing additional on-chip caches to be able to handle more than one thread at a time in a core.

Now we have a race to add more processor cores to a single die, culminating in this twelve-core processor announcement.

But most of our software can’t take advantage of it.  Granted, if you have a server running multiple applications, or perhaps a single Web application with reentrant code and multiple users, you may be able to dispatch enough threads to keep the processor fully busy, but in most instances that processor will have a lot of idle cycles.

It may come in time.  Software such as Netkernel can sit between the application and OS, and take care of dispatching threads efficiently.  These processors may actually be a bet by Intel and AMD that software and applications are moving in that direction.  I have my doubts, especially on the desktop (where I currently have two cores, one of which isn’t used very often), but it will be interesting to see where if the future of software makes these processors worth the investment.

Can We Abstract Away from Multiple Cores in Coding? March 14, 2010

Posted by Peter Varhol in Architectures, Software development, Software platforms.
2 comments

Readers know that I’m very interested in how software developers are adapting to programming for multi-core systems.  Traditional programs are written to be single-threaded, in that they execute code sequentially.  Taking advantage of multiple cores, and multiple processors in individual systems, is a technically difficult endeavor that most application programmers don’t even attempt.

You might think that it is easy; Web applications with multiple simultaneous users do it all the time, right?  Um, not necessarily.  In some cases, the underlying application server can dispatch multiple threads which may be scheduled on different cores, but those threads clearly have to be independent so that there is no possibility of a race condition or other multiprocessing error.

My friends at 1060 Research are releasing some interesting benchmark results on the scalability of the company’s NetKernel framework.  NetKernel practices what 1060 Research calls Resource-Oriented Computing, a type of Representational State Transfer, or REST.  In a nutshell, everything that constitutes an application is considered and treated as a resource, accessible through a URI.

The benchmark results, which can be found here, are fascinating from the standpoint of multi-core programming.  They demonstrate that NetKernel response time scales linearly across processors and cores as workload increases.  This immediately indicates that NetKernel can make effective use of multi-processor and multi-core systems, without the need for developers to do anything different.  That, in my mind, is a very big thing.

The other interesting point that the 1060 Research folks make is the linearly scaling in and of itself.  Performance degrades very predictably once the system is fully utilized (at close to 100 percent CPU utilization and throughput).

I asked Randy Kahle of 1060 Research about how response time can scale linearly with a fully loaded system.  His response: “This is actually a key finding. The response time is constant as long as there is a core waiting to process the request. Once cores are saturated then their availability is smoothly allocated to requests. The fact that this is linear shows that there is a smooth allocation of work.”

What does this mean for users of large-scale multi-processor and multi-core servers?  NetKernel takes care of what appears to be an efficient allocation of workload between CPUs and cores.  This isn’t quite the same as writing multi-core code within the context of a single application, but it’s the next-best thing.

Why Does Microsoft Ignore Its Own Research? February 16, 2010

Posted by Peter Varhol in Architectures, Software development, Software tools.
1 comment so far

I find some of the most fascinating concepts on the Microsoft Research site.  I’ve mentioned in the past that I’m pretty impressed with MapCruncher, and there are a number of other projects in Microsoft Research that have the potential to be invaluable in practice.  Unfortunately, such projects never seem to get the exposure they deserve.

Another such project is Doloto, a tool for speeding up the performance of Ajax applications.  Ajax, if you’re not familiar, is a technology that enables Web pages to accept input and return results without a page reload.  It uses JavaScript, in conjunction with an asynchronous connection with the server, to send inputs and get responses to post to the page.  If possible, it may also do processing on the client, saving the roundtrip time to the server.

The JavaScript that processes the data can be thousands or even tens of thousands of lines of code.  This can by itself be a burden to download, especially upon application startup.  Doloto tries to improve the startup time by reducing the initial application payload.

Doloto might be considered analogous to an operating system, in that it sets up a working set for an application on a client.  It analyzes the application during a “training” period to determine what code is needed to start the application and let users begin using it, and downloads only that code.  Training consists of instrumenting the functions, recording the time stamp of the first execution, then looking for gaps to identify clusters that can be grouped together.

In this manner it identifies the code needed to start up the application and begin use.  The rest of the code is stubbed out of the initial download, then downloaded later to the client either on-demand, or using a lazy loading technique (lazy loading means that it downloads when the system isn’t busy).  To be able to do this, Doloto also refactors the JavaScript code to put functions together if they are to be downloaded together.

Doloto doesn’t necessarily improve performance during execution, but it does provide the perception of overall improved application performance thanks to what it does to make the application download and start up more quickly.

I have to ask why Microsoft keeps such technology in the lab, where it gets almost no attention.  While the company makes research programs available for free download, there is no publicity or push to turn it into a product.  Give it a try.

I would be remiss talking about Web page performance if I didn’t mention Strangeloop Networks, a Vancouver company that sells an appliance for speeding up ASP.NET Web applications.  The innovative thing about Strangeloop is that the hardware itself, essentially a computer with lots of memory, isn’t really the interesting part of this solution.  The company has really smart people who understand in great depth how ASP.NET applications execute.  Their expertise is codified in the appliance software, where it makes decisions on how to process the application for optimum performance.  I’ll write more on how Strangeloop does this in the future.

Follow

Get every new post delivered to your Inbox.

Join 417 other followers