Wednesday, September 03, 2008

Big Numbers

In my algorithms class, at some point, I teach RSA and primality testing. For the corresponding assignment, I usually have students do a "numerical exercise" which involves, naturally, computing with some big numbers. They also have a later programming assignment where they have to use numbers that may not fit into a standard-programming-language-sized integer. Since this is not a programming class, and I do not prescribe a language for them to use, I essentially tell them it's part of the homework to figure this out.

Surprisingly, it seems to give a number of people significant grief every year. I know that for C and Java it's a pain -- the natural thing to do is to download and install a BigNumber library, which should be easy, but often some trouble arises. (And most students, even though they've had the intro programming courses, do not seem to have learned how to do useful things like download and run useful libraries.) There are other programming languages which handle big numbers essentially transparently -- ah, the days of programming in Lisp -- which are fine for the numerical exercise, but may not be as nice for the larger programming assignment.

My only real point here is that, in those lists of skills I would expect CS majors to have, I'd have to put "being able to do computations with very large numbers" somewhere on the list. Yes, you're unlikely to need it for your standard checkbook program, but it seems to me this issue arises in plenty of places. (It has for me -- and my graduate students -- several times.) And it's not so important that they know how to do this specific task off the top of their head, but more that when they're given a task like this, they are able to find the information they need and complete it in a timely fashion. For better or worse, that's something some of them get out of my class.

13 comments:

Anonymous said...

> I know that for C and Java it's a pain -- the
> natural thing to do is to download and install a
> BigNumber library, which should be easy, but
> often some trouble arises.

You wouldn't need to download or install anything in Java. BigInteger includes functions for modular arithmetic, gcd, and (randomized) primality testing on arbitrary-precision integers. What's awkward in Java is the lack of operator overloading so that you end up with expressions such as a.times(a).mod(p).

Mihai said...

Downloading and using libraries is always a big pain. People in the industry who have been doing this for years tell me that they only use a library if it would take them more than 2-3 hours to write the code (!)

As for big numbers, any old time UNIX hacker knows you should use bc :)

Michael Mitzenmacher said...

Anon -- yes, I remember my experience with BigInteger the problem wasn't downloading it, but just finding it, and figuring out how to write Java math instead of things like a*a.

mip -- bc, I'm impressed. :)

A related story -- coding the other day, I was getting seg faults for no reason -- until I realized that my large arrays were causing issues, and I called upon my ancient grad student knowledge to make the call "limit stacksize unlimited", as all was well... (why the stacksize default was so low on the machine I was working on, I don't know...)

Anonymous said...

Python is one of those languages that handles it transparently. And it's pretty good for a lot of other kinds of programming tasks as well.

Anonymous said...

I struggled with a lot of things in the class this spring, but I thought the programming assignments were really fun. I taught myself Python for the last two, and I'm glad I finally had an excuse to do it.

Michael Mitzenmacher said...

Anon #5 -- With a can-do attitude like "I taught myself Python...", I'm sure you're going to do great things -- regardless of what else you got out of CS 124. :)

Thanks for the positive words about the programming assignments!

Anonymous said...

One thing that's frustrated me in the past about Java's BigInteger library (other than the lack of operator overloading as a general Java issue, as previously mentioned) is that the implemented multiply function takes quadratic time. An unoptimized FFT which beats n^2 for a reasonable number of digits really isn't that hard to implement and include. At the very least, why is there no Karatsuba?

Anonymous said...

I took one of the Extension versions of CS124 maybe 5 years ago. I recall having to chase down and install the Java bigint libraries, thinking that this little hurdle was more than you'd get from the usual class's problem set. Moreover, the implementation became difficult in ways (arithmetic overflow, stack depth, etc.) that were not apparent to me in my first, newbie readings of the assignment. Then again, that was true of basically every problem set in the class. It and "Algorithms Over the Wire" remain two of my favorites.

I guess what I'm trying to say is that my own learning algorithm was iterative and certainly not O(1). I had to keep putting things down when stuck, reflecting, and coming back, which was a second-order lesson in itself. Too much of my time as an undergraduate (before the Extension School) was budgeted via procrastination.

Shaneal Manek said...

When I took your course, I completed all the programming assignments in Lisp with no trouble (and got pretty good grades on them all too, if memory serves ;-))

Anonymous said...

> Yes, you're unlikely to need it for your standard
> checkbook program...

Depends on where you write your checks. According to news reports, one reason that Zimbabwe devalued its currency by a factor of 10^10 was that banking software couldn't handle the bignums.

> -- ah, the days of programming in Lisp

Those days needn't be over. Lisp lives.

Anonymous said...

I took a course in Finite Groups and Number Theory (for computer engineers) where some project assignments from "Concrete Abstrac Algebra" (N. Lauritzen) were drawn to challenge students (winners entering the Hall of Fame of the Course!). Some of them required arbitrary precision libraries to be carried out.

I used MIRACL libraries (http://www.shamus.ie/) and those were a charm: easy to install (even without linking libraries, just compiling your source code with the needed classes) and use in C/C++.

The same Pollard-Rho-like algorithm implemented with Java (BigInt's) took about 10 minutes, while C++ with MIRACL just few milliseconds on the same instance.

Anonymous said...

I took and AI course via a separate extension. The course recommended a certain lisp compiler that maxed out at an inordinately low number of list cells. I crashed the compiler for a week in some recursion.

I learned to program in C, and so while lisp is garbage collected, I spent forever trying to find the pointer I was sure I had hosed. When it dawned on me [and after much stack tracing], I chopped down my hdd partition, installed a *ix, and used SBCL. Voila. Annoying, though.

Anonymous said...

This reminds me of programming multi-precision arithmetic back in high school (in Fortran). Division was the hard part - I tried to implement one of the uglier versions from Knuth Volume 2. I tried to use it to calculate pi to some large number of places using a known convergent series. Unfortunately, after getting roughly the first 30 digits right it started to diverge; the results were consistent, too. I couldn't debug it but when I ran the same code on a different machine (an IBM with 32-bit integers vs a CDC machine which had 48 or 60 bit integers) and it started to diverge after only 20 digits or so. Hello integer overflow!

I still couldn't find the bug after many days of search. I never came back to it and multi-precision arithmetic is a programming task that I never got right myself. Since then I've only ever needed the desk calculator version of multi-precision arithmetic so 'dc' (the predecessor of 'bc' that Mihai mentioned) has served well.