Tuesday, December 30, 2008

New Year's Affirmations

As the New Year beckons, I figured it was time for a post on affirmations, which I remember being interested in when reading about it one of Scott Adams' (of Dilbert) books (though I don't think he called them that). If you haven't heard of affirmations, it's basically the power of positive thinking. There are various forms; one is, take a list of goals you want to accomplish, write them down or repeat them to yourself every day, and you'll find they start happening.

Now, while I don't actually believe that positive thinking alone will allow me to prove P = NP (or the other way) in the coming year -- or, for that matter, win me a lottery! -- I do believe that the act of thinking clearly about the goals you want to achieve, and keeping them firmly in your mind, increases the probability that you will actually accomplish these goals. I personally find that when I set myself goals over multiple time scales -- a task list for the day, for the month, and for the year -- I'm surprisingly much better about getting things done. When I get distracted from setting goals, less happens. The conscious effort of writing tasks down and reminding myself of them makes them easier to accomplish.

So I encourage all my readers to take some time around the New Year and set some tangible, if difficult, work-related goals for the coming year. Maybe it's time to learn a new area or work with that person you've always wanted to work with. Or there's that result you know is just out of reach -- but you should keep reaching for it. Post them somewhere, remind yourself of them, and work to make them come true. I'd bet more do than you'd first think.

Whatever your goals are, best of luck with them, and for the New Year.

Tuesday, December 23, 2008

INFOCOM Miniconference

A number of commenters on my last post have mentioned the INFOCOM Miniconference. I hadn't actually known about the miniconference -- although I found out more details about it soon after the comments, as my second INFOCOM submission, rejected from the conference, was accepted to the miniconference. (I did have a paper in INFOCOM 2008, but my student Adam Kirsch went to the conference to deliver the paper, and I can't recall ever hearing anything about the miniconference format. Someone should have blogged about it before. :) )

100 additional papers -- a bit over 7% of the original submissions -- were apparently accepted to the miniconference, covering most of that "top 20-30%" range. The main difference appears to be the labelling (INFOCOM miniconference, not INFOCOM) and that the paper will be limited to 5 pages in the proceedings.

While I'm happy the paper got in, I must admit, I don't understand the reasoning behind the INFOCOM Miniconference, and I hope some readers in the know will explain and elaborate. If the purpose is to have a 2-tier conference, it seems an odd structure -- why not just accept 25-30% of the papers? (A few hundred pages in the proceedings wouldn't seem to matter much since it's on a CD?)

I wonder if theory conferences like STOC, FOCS, or SODA should adopt some sort of 2-tier structure in order to accept more papers. Certainly most people who have papers rejected from these conferences (myself included) believe they should have gotten in, and some fraction of them are probably right. On the other hand, such a structure would seem to lessen the prestige associated with these conferences. Any opinions?

Monday, December 22, 2008

INFOCOM paper, network coding

A paper I co-authored on network coding -- Network Coding Meets TCP (arxiv version) -- was accepted to INFOCOM. Full credit for the success goes to the graduate student Jay Kumar Sundararajan who led the project (and is graduating and looking for jobs this year...) Our goal (as the title suggests) is to make a TCP-compatible network coding congestion control scheme, and our approach uses an interesting variation on acknowledgments Jay Kumar had utilized previously; instead of acknowledging packets, you acknowledge "degrees of freedom" (or, encoded packets that will eventually decode to message packets).

The INFOCOM mail said 282 papers were accepted from 1435 submissions (post-withdrawals). A quick check shows that INFOCOM has been below a 20% acceptance rate regularly in recent years, and even assuming a completely unverified estimate that 10-25% of the submissions are things that really shouldn't have been submitted in the first place, in my opinion that's still a pretty low acceptance rate for what's supposed to be the big, open-tent networking conference of the year. (In the 1990s, the acceptance rate was more commonly around 30%.) I'm sure there were a lot of good papers that got rejected this time around.

Most networking conferences have acceptance rates around 20%. Is this a good thing? Conference competitiveness has been blogged about before, but there doesn't seem to be much of a high-level discussion about the issue -- I recently saw Ken Birman and Fred Schneier wrote an article about it for Communications of the ACM. Any ideas out there?

Thursday, December 18, 2008

What Else Should Grad Students Be Learning?

Apropos of application season...

Graduate school is a long stretch of time -- 5 years (or more) for most people. There are few clear goals during that time, although the obvious one is to learn how to do good research, with the hope of getting a tenure-track faculty job. With this somewhat singular -- and difficult -- goal, it's easy to fall into the extreme of focusing only on your research to the exclusion of most everything else, or to waste a lot of time not really working. Both of these things may be OK for various individuals. (As some will undoubtedly respond, if you're doing great research, it can excuse a lack of many other skills. And many people don't mind spending an extra year at graduate school with a more relaxed lifestyle than life after graduate school.) But with the new year approaching, I thought it worthwhile to suggest some of the additional skills one should try to develop in graduate school over those stretches where you need to break from research -- skills which, unfortunately and understandably, are often given short shrift by the university. (Please add to the list in comments.)

1) Time management: How much are you working each day? (And how much time do you waste reading -- or worse yet, writing -- blogs?) Even if you don't set yourself to a regular 9-5 or 10-6 schedule, it's a good time to learn to manage your working and non-working patterns. My suspicion is that people who manage a regular work schedule graduate on average a semester or year earlier.
2) Writing/speaking: If ideas are our business, idea presentation is a big contributor to the bottom line. And if you want a faculty position, the ability to give a good talk to a general audience goes a long way. If your institution doesn't have a program for improving writing and speaking, start your own (like a student seminar series, no faculty invited).
3) Leadership: Find a way to lead a research project -- maybe advising/mentoring some undergraduates. Or organize a club or student group to make your department a better place to be. Eventually, the ability to organize people to follow your goals will make you more productive.
4) Entrepreneurship: Have you looked at the economy? And professor's salaries? Graduate school is where you're supposed to learn to be creative, and to develop specialized skills. It's quite reasonable to spend some of those creative efforts or utilize those specialized skills on money-making endeavors. While it's not for everyone, for some the tangible reward of money helps unleash creativity; for others, you may learn the satisfying lesson that your intellectual achievements bring you higher rewards than a big paycheck could (a lesson worth learning early on).
5) The skill to learn additional skills: If you're a theorist, learn to program a little. If you're a systems person, learn some probability or other theory. Maybe set aside a few days to learn time-saving Latex tricks, or some other piece of useful software. There are plenty of skills that will make you a better researcher/teacher/writer in the future.

Monday, December 15, 2008

NSDI Program Committee , Part II

Some lessons from a 1-day PC meeting (nothing really new, but I thought I'd write it down):

1) Face-to-face PC meetings involve far too much sitting. Especially if you fly in and out on the same day. (I know there's a time tradeoff in scheduling an "exercise break", but seriously...)
2) Conferences with 20% acceptance rates are, by their nature, a bit depressing -- it's hard to reject so many papers, some of which simply MUST be pretty good.
3) It's easier to argue how a paper is flawed than to argue about how it's making an important contribution.
4) Taking reviewer expertise into account is important, and one outlier can cause problems; sometimes papers live on longer than they should if one reviewer gives too high a score, and sometimes papers are put way lower in the ordered list than they should be because one reviewer gives too low a score. (Of course, one expert reviewer can also bring an otherwise ignored paper back to life.)
5) There are more interesting papers than paper slots.
6) There's generally plenty of down time, when papers you didn't read are being discussed; bring something to work on quietly (but pay attention to what's going on).
7) You really do learn a lot reading 20-30 papers for a conference PC.
8) As you go down the paper list by score rank, eventually (and sooner than you think) you hit a paper that you start to question -- are these scores too high? And as you go up the list from the bottom, you'll hit a paper where you question -- are these scores too low? The rules of randomness tell us that some papers will get comparatively mis-scored in the first round of reviews, so it is good to stop and talk about the papers (and not just take the first X).
9) Hot trends come and go.
10) A steady supply of drinks (mostly caffeinated) and a good lunch can help the PC move happily along.

Thanks to the PC chairs and other members -- I had a good time. But now I'm very tired...

Sunday, December 14, 2008

NSDI Program Committee , Part I

I'm spending tomorrow at the NSDI (Networked Systems Design and Implementation) Program Committee meeting. It's been a few years since I've been on the PC for a networking conference, but I've found it so far to be a lot of fun.

First, it's very "civilized" -- I had only 20 papers to read the first round, and then had 5 more for the second round. (The first round was designed to get each paper 3 reviews; the second round was for paper missing reviews, or where the scores suggested another review would be helpful.) That's not too much, compared to most theory conferences.

Second, there are a good number of "algorithmically" oriented papers for me to read. Overall, networking has become a lot more theoretical, which is good. It still seems to me, though, for a network-oriented conference, you have to be careful not to go overboard with the theory. They want results -- backed by theory, preferably -- but at the end of the day, it's the results that matter. As usual when serving on a PC, seeing how it works gives insight into how to frame my own papers.

Third, one thing that's impressed me is how long and detailed the reviews are for this conference. I tend to write shorter reviews, covering what I think the high order points are. (And one thing that has been interesting -- there's generally a lot of agreement on these high order points.) But most reviewers go into a lot more detail -- the average review is at least a good page plus of text. Very different than what I usually find in theory conferences -- although I know there's a push to improve that.

I'm not sure why the culture of networking conferences has led to more detailed reviews. Fewer papers per PC member probably helps; maybe because few papers go on to journal papers (but that's equally true in theory, I think). But it's a marked and interesting change.

Anyhow, now I'm looking forward to SIGCOMM... except that I'll need to be ready to write longer reviews.

Friday, December 12, 2008

Summer internships?

The bad economy already has people thinking about jobs -- it is, I am sure, going to be a challenging year (years?) for people graduating.

I was wondering if there would also be an effect on summer internship programs. Summer interns are often a different budget line item, but it's hard to believe that the Microsoft/Google/Yahoo/everywhere else programs, for both undergraduates and graduates, won't be curtailed in this environment.

I haven't heard anything about summer internships yet, and though it's a bit early, late December/early January is usually when I start get reminders from people to have good students apply for the summer. Can anyone comment (anonymously if needed) if they have any actual information?

Wednesday, December 03, 2008

Online Advertising and How People Cheat

Ben Edelman (who actually does some theory when he's not doing law and business) gave a great talk today at the Harvard Center for Research on Computation and Society lunch lecture series. It was about Web advertising scams -- how people cheat pay per impression, pay per click, and even pay per conversion schemes, for big bucks. The talk was based on this book chapter (here available as a working paper). (If he gives me a link to the slides, I'll update the post with it.)

One of the scams he gave for pay per conversion schemes is related to this blog. I often encourage people to buy my book on randomized algorithms -- to remind you all, the Amazon link is here. Now, when I put on that Amazon link, I've embedded a link that includes my Amazon Associates information, so if you buy the book, I get a small cut. Very small.** In fact, if you buy anything else after clicking that link, I get a cut -- I actually think this link alone puts a cookie on your system, so if you happen to buy on anything on Amazon for the next week or so, I get a cut. I think I got about $100 in store credit from Amazon last year, from people buying the book off my home page or buying things after hitting an Amazon link on the blog.

So again, I'll encourage you to click that link. (Heck, really, while you're there, just buy the book!)

For me naturally the gain of doing this is small, but apparently, scammers have found a way to make big money off this sort of thing. They have sneaky ways of getting these cookies onto your system -- not just for Amazon, but for other vendors also, multiple vendors at a time -- so if you click one of their links, and then buy things on the Web, they get a cut for "recommending" the purchase to you, even though they've really done nothing of the sort. When you think about it, a 5% cut from the purchases of a million Amazon customers can really add up...

If Ben Edelman happens to be coming to present at a venue near you, I highly recommend you go. And that's not false advertising.

**Actually, not so small. I get almost as much from Amazon as I do from the publisher if you buy the book through the link. Of course, I don't get much from the publisher either.

Wednesday, November 26, 2008

The Funding Cycle

Most of the last week was spent working on an NSF grant proposal. (By the way, does anyone know why they kept the pdf and txt format of the call, but removed the html? That was annoying as I was trying to figure out exactly how many colons my title needed to meet government standards.) I don't have any really entertaining stories about it. That's just how I was spending too much of my time. November 28 is a funny day to have deadline, though. For us, that translated into a submission deadline of yesterday. (Is your Sponsored Research Office really open on the 28th?) Next deadline (for small grants) is December 17, and I plan to get a proposal in for that as well.

I try not to continuously send in NSF proposals, but it's seemed like that's the way it's been going recently. I'd prefer to metaphorically let the tank get close to empty as far as funds go, then apply and refill the tank, so I could spend less effort on the grant process. But there's ever-growing competition, and smaller amounts available. And given the long lag between calls, and the high risk factor each year, I feel the need to be more proactive than even a few years back. Especially with the economy tanking, so I'm not clear if corporate money will be available in the coming year.

Anyhow, since a proposal did go in yesterday, I can relax over the holiday weekend. I hope all of you will be as well.

Complexity Theory Humor

Ah, the joys of outsourcing at http://www.getacoder.com/projects/bug_finder_92913.html. Read down to the bottom to find responses by Godel and Cantor...

Update -- someone at the site noticed and took down the original page (although it's still in the Google cache here.)

Monday, November 24, 2008

Center for Computational Intractiblity

This popped into my mailbox, and I can't see any reason not to pass on the announcement here. Especially notice the bit at the end about postdocs (get your paperwork in NOW!!!).


Dear Colleague:

I am writing to tell you about a new Center for Computational Intractability that is funded by the National Science Foundation and is a collaborative effort between Princeton University, Institute for Advanced Study, Rutgers University, and New York University. The PIs are Allender, Arora, Barak, Charikar, Chazelle, Impagliazzo, Khot, Naor, Saks, Szegedy,Wigderson, and Tarjan.

The Center's mission views "intractability" fairly broadly, and covers both algorithms and complexity theory as well as other topics in theoretical CS. I would like to draw your attention specifically to the following: (a) Visiting possibilities for short-to-medium term (up to 1 year). We can pay travel costs and salary. (b) Several workshops a year, with possibilities to visit the center at the same time. (We can pay expenses.) (c) Postdoctoral positions both at the center and at the affiliated institutions. (d) Streaming videos of all center activities and seminars and other web-based resources.

I invite you to visit our website http://intractability.princeton.edu/ and to especially click on "Opportunities" from the sidebar.

I was wondering if you could forward this email to others in your department (including students and postdocs) who may be interested in the center's activities. Please note that the deadline for postdoc applications is December 15, and the deadline for postdoc applications to IAS is December 1.

With best regards,

Sanjeev Arora

Tuesday, November 18, 2008

STOC notes

It looks like we ended up with about 329 submissions. (Hopefully that won't change much at this point.) That's between 40-50 papers per PC member at 3 reviews per paper.

Thanks to everyone who withdrew the papers they weren't submitting. Otherwise, I had to go through and withdraw them manually myself.

Thanks to Shai Halevi for continuing to help me with the reviewing system.

Yes, I did get mail from about 10 people who hadn't known they'd need to file an abstract the week before, and I accommodated them. I think this should become a standard and everyone should get used to it. Again, keep in mind there's 40-50 papers per PC member; anything that makes their work easier is a good thing. (And Shai's interface for choosing preferences lets you see a paper's abstract pop up with a mouse rollover, so it's really nice to have abstracts early!)

Assuming things continue to go well, expect e-mail from PC members asking for you to handle a subreview before you head off for Thanksgiving....

Saturday, November 15, 2008

Technical Depth vs. Novelty vs. Potential Impact

Let's say you're the PC chair for a major (theory) conference, about to give instructions to the committee. The standard way to judge a paper in theory is primarily based on its technical depth, but there's certainly a push in the community (and, arguably, from our funders the NSF) to consider other aspects, including novelty (could this start a new research direction, or give us a new way to look at things) and potential impact (might people actually use these ideas)? How, exactly, should you instruct the PC to weight these various factors?

Conceivably, we could set up the reviews to have a score for each factor. For example, I'm on the PC for NSDI, a systems conference, and we have to give scores for Overall Merit, Technical Merit, Longevity (= how important will this work be over time), Novelty, and Writing (as if that score matters :) ). Personally, I don't like this, and I'm not intending to do it for STOC. It's more pain for me as a reviewer without I think giving meaningful information to the authors (instead of spending time trying to decide if a paper is a 2 or 3 in terms of novelty, let's give another comment in the review text!), and when it comes time to make the decisions, I'm not really sure what I'm supposed to be (Pareto-)optimizing in this multidimensional space.

I'm a big believer, for conferences, in the "simple" method, as I've said before -- papers just get a score from 1-5, under the following scheme:
1: Bottom 1/2 of submissions.
2: Top 1/2 but not top 1/3 of submissions.
3: Top 1/3 but not top 1/5 of submissions.
4: Top 1/5 but not top 1/10 of submissions.
5: Top 1/10 of submissions.
but that doesn't mean that reviewers shouldn't be using factors such as Longevity and Novelty, and even Writing, in deciding their overall score. So, as you're all finishing your submissions, now is your chance to make a suggestion -- how do you think the PC should weight these various factors?

Monday, November 10, 2008

"I write today about the global economic crisis and its implications for us at Harvard."

In what I'm taking as a sign of continuing Impending Doom on the economic front, I received mail from the 3 levels in the hierarchy above me regarding the financial state of Harvard in these tough times. (First from Harvard President Drew Faust which is where the title quote comes from, then the Dean of the Faculty of Arts of Sciences, and then the Dean for the School of Engineering and Applied Sciences.)

Little was given in way of specifics, but the general theme was clear. About 1/2 of Harvard's budget comes from the endowment payout each year. (The curse of a large endowment -- this number is probably higher than most institutions.) While nobody is giving a number for Harvard, the widely quoted statement from Moody's financial research is that endowments will lose about 30% this year. Given that Harvard has been outperforming the market and most other endowments over an extended time period, you can take your guess as to whether our losses will be above or below average. No matter how you do the math, it's not good.

So that means there will be some belt-tightening, and some delays in various plans. Again, very little in the way of specifics, but I'm sure (and Faust's letter suggested) that Harvard's new progressive financial aid program would not be touched. I imagine most everyone is getting similar messages at other institutions, but feel free to share your stories in the comments.

STOC deadline, reminder again

Remind your friends (and, if you're feeling generous, your enemies) that indeed the short abstract for your future STOC submission is due today (full conference submission deadline - Nov 17). Thanks to Shai Halevi's help, the server is even synched with the deadlines on the conference Web page.

I've seen over 150 submissions so far, and it's not even close to midnight...

Friday, November 07, 2008

Reminder, Abstract deadline for STOC

Remember that November 10 is the deadline for the "short abstract" for STOC.

More information at the web site, or head to the submissions server.

Tuesday, November 04, 2008

IPAM Networks of Networks (Day 2)

Day 2 of IPAM also had a bunch of interesting talks. My favorite for the day was by Ramesh Johari, who was considering the following problem: suppose that you only allow bilateral transactions -- corresponding essentially to a barter system, or in his view roughly how BitTorrent currently works. (People upload to you if you download to them -- not necessarily the same amount, but there nominally should be some equivalence in terms of utility.) How much do you lose over a system where you allow multilateral transactions -- which corresponds to the system we all know and love, where there's money behind exchanges, so you can always get money and use it to buy something else later. He set up an interesting model to look at these questions and say something meaningful about the differences between the equilibrium states in these two settings.

David Alderson also gave a very interesting talk on modeling the Internet topology using optimization methods, and using it to study the scale of the damage an adversary could do to the Internet. His point of view is very different (and I think much more compelling in the Internet setting) than the scale-free analysis popularized by for example Barabasi. I recommend checking out his papers (most of them co-authored with Doyle and Willinger, among others) on the theme.

Finally, I ended the day nicely by getting to talk with UCLA faculty Rafail Ostrovsky and Amit Sahai. Rafail expressed his usual enthusiasm insisting that we find a problem to all start working on -- so now I even have homework to keep me busy on the plane ride home!

IPAM Networks of Networks (Day 1)

I spent Monday at an IPAM meeting on Networks. (I'll be here today, too.) I gave a very "high-level" talk, covering my taxonomy of networking papers (just replace "power-law" papers with the more general "networking papers" in this editorial of mine) and discussing my thoughts on how we get closer to validation/control in the Internet with a universal hashing architecture. Slides are here; it's sort of a mix of some other longer talks here, here, and here.

The highlight for me for Monday was listening to Tim Roughgarden (who always gives excellent, crisp, clear talks) on comparing FIFO vs. FairShare queueing policies using the worst-case price of anarchy over all possible utility functions (and, unfortunately, only quadratic cost functions). This is work in progress, so I don't think Tim will have slides up, but it's a very nice analysis. [Spoiler: FairShare wins!]

David Clark gave a public lectre in the afternoon on the Internet -- what's wrong with it, and his ideas for fixing it. It got a big audience, and he was certainly interesting, amusing, and provocative. I've got to admit I have a lot of skepticism so far about how we get to a better Internet, one with security and designed with the economics of the creature in mind from the start instead of developing by accident. Which reminds me, what's the state of the NSF GENI project? I've been hearing rumors that it's headed to an early rest, but I'd be curious if anyone more in the know can comment anonymously.

Monday, November 03, 2008


In a recent review of a journal paper, I got the following comment:
In explaining why you are presenting simulation results, you say, "First we wish to check our theoretical analysis..." I don't understand this motivation. Your theoretical analysis is substantiated by mathematical proofs. What more evidence do you need of its validity?
Please keep this statement in mind.

I've stated frequently that theorists should actually implement their algorithms. I have some further recent anecdotal evidence to back up this claim.

I have students working on a variety of projects, and recently, I've had a strange convergence: in several of the projects, the students have found what appear to be non-trivial errors in recent theory papers (2 in CS theory, 1 in EE theory). It's not clear that any of the results actually break -- indeed, I don't think any of them do. I don't want to exaggerate. In one of them, a constant factor seems to be messed up -- not disastrous, but it is an important constant factor in context. And perhaps in one of the papers we could chalk it up to really just a sequence of confusing typos rather than an actual error.

Now I'm not naive enough to expect conference papers (or even journal papers) without errors. But the key here is that these errors were either easily found and/or proven to me by the students by having them implement the algorithms described in the paper. Then they sort of jumped out.

Implementing your own algorithm is a good way of checking your work. If you aren't implementing your algorithm, arguably you're skipping a key step in checking your results. Why should a reviewer go to the trouble of checking your work carefully if you're not?

Moreover, imagine not some student but an actual real systems-builder who decides your algorithm is worth implementing for their system. Imagine how they feel when they find things don't quite work as you've stated. That doesn't exactly inspire confidence, and is the sort of thing that discourages someone from using your algorithm. More globally, it gives systems people the feeling that theory (and theorists) aren't important. Why should they be debugging your proofs? You should give them evidence your algorithm can be implemented and works as claimed by actually implementing it if possible or reasonable to do so.

I'm not stating that you personally have to implement your algorithm. Hire a student to do it! It's good experience for them. You can apply for an REU, or other student research funding.

So, to answer the anonymous reviewer, I think there's always plenty of good reason to check our mathematical proofs by experiment and impelementation, and it's suitable to put those results in the journal paper as an aid to the reader (and reviewer!).

Tuesday, October 28, 2008

Computer Science AP Test

Harry Lewis went to a meeting about the future of the Computer Science AP Test. He wrote a short report for our faculty, and gave permission for me to put it on the blog. There are a few Harvard localisms in the report, but I think the information is important for everyone to see.

Guest post by Harry Lewis:

Over the weekend I attended a meeting in Chicago about the future of the AP Computer Science test, along with about 70 other faculty and a few high school teachers.

The College Board announced last year that it was dropping the CS AB test after the spring 2009 administration (the more advanced version, with more on data structures and algorithms) for business reasons (only 4000 test takers). The A test isn't that popular either -- 18,000 I think. By way of comparison, 180,000 take Math, 80,000 take Physics B, and 75,000 take the Statistics exam, if I wrote down the numbers accurately.

The CS AB test is the only one for which Harvard gives credit for advanced standing purposes (and then only if you get a 5). I don't think it would be wise for a student who has taken only the CS A course to skip CS50. So as far as I can see, Harvard won't be offering credit for AP CS after this year.

The numbers of students studying AP CS are pretty bad. Only 12% of American high schools even offer a CS AP course (even the A).

There were rumors that some effort going on to line up industrial support to keep the exam alive. (The Italian-American community got their act together when the College Board cancelled the AP Italian exam, and managed to keep it going by pledging to pay the College Board, even though even fewer take it than the CS AB test.)

The economics of this are interesting. Most of the costs of administering the exams are in grading the free response (non-multiple-choice) questions. But a psychometrician gave an interesting presentation -- it seems that the FR questions contribute nothing to the validity or reliability of the test scores that was not already there in the answers to the multiple choice questions. (Which is not to say that teacher and student behavior might not change if they knew the exam were nothing but multiple choice questions. And the psychometrician cautioned that he hadn't calculated the nontrivial and perhaps different costs of developing the MC and FR questions themselves.) On the other hand, the College Board is probably basing its cancellation judgment on trend lines. CS seems not to be growing in high schools, AP or not. (Whereas Chinese and Japanese, also currently small tests, are.) (The College Board made the decision to drop the AB test and has no plan to reconsider it. It announced its decision without any consultation with the faculty groups involved in developing the exam, by the way.)

A great many people at this meeting don't like the A test, and after taking and grading a few questions, I can see why. It's a Java programming exam. Graders are trained not to take off for trivial syntax errors (confusing commas and semicolons in written answers). They do an incredibly good job developing grading rubrics and keeping the grading consistent between graders. But there are more questions about OO programming concepts than anything else, which doesn't seem right (on the exam we looked at, there were 18 questions on inheritances and whatnot, and 2 on "logic"). Several people said, "if you wonder why kids are turned off on CS in high schools, just look at the questions on the AP exam to see what image of the field is projected." Some proposed, half-seriously, that we ask the College Board to rename the existing course and exam "AP Computer Programming" (no dice, it has to be the name of an academic department in colleges and universities).

There is a movement afoot not to tweak the A exam, but to develop an entirely different kind of intro course to be the basis of the CS AP, something broader, emphasizing both the principles and the importance of the field. A couple of prior working groups drafted proposals. They are right on target IMHO, though maybe better goals for what our majors would understand at the end of 4 years than at the end of the first course. We spent Sunday discussing these. (The College Board is emphatic that there will be only one exam in CS, so this would substitute for the A exam, and we are not discussing a substitute for the AB exam in addition to keeping the A exam.)

These 2 proposals have no operational content at this point. The discussions we had were not premised on there being any programming, or on what programming language might be used (though there certainly seems to be a consensus that the first programming course should use a light-syntax language). And the discussants yesterday were divided on that question. So we are still in early stage development.

We didn't even talk about testing. The two boundary conditions are that the course has to be testable (though not necessarily multiple-choice testable), and has to draw people into the field (the NSF was represented by Jan Cuny, who was evangelical on the subject of drawing more people into the field, especially groups now underrepresented in the field, and has curricular development money to put where her mouth is). Obviously the easiest things to test are programming and math, so this is a nontrivial feasibility issue.

A critical question was whether colleges would give the imagined new AP course credit. I found that hard to answer. With no programming, or very little, it certainly couldn't place people into CS 51 or 61. It seems to be in the CS 1-QR48 space (though maybe not, if there is significant programming after all). Harvard doesn't give Advanced Standing credit for courses that no department would count toward their major, and you can't place out of your Gen Ed requirements on the basis of high school work anyway. In the end I decided there were too many ill-defined variables for me to claim that Harvard would give credit for this course, though I think it has huge potential.

This would take a minimum of 5 years to mature into textbooks, teacher training protocols, etc. Probably more.

There is, by the way, some worry that the College Board's decision will (a) lead to a general dumbing-down of high school CS, marginal as it now is, and (b) deprive us of a small but important part of the flow of new majors (though the number of students taking the AB test nationally is tiny, 40% of them become CS majors, and I'll bet a significant number of our CS majors are in that number).

It will be interesting to see how this develops and there will be opportunities to be involved in the development.


Harry Lewis Tapes an Ad for Darcy Burner

Without trying to be overtly political, I thought I'd mention some "public service" performed by Harry Lewis. Apparently, Darcy Burner, a Harvard alum running for Congress, said in a debate, "I loved economics so much that I got a degree in it from Harvard." The Republican opponents are questioning her truthfulness, since her actual degree is in computer science. Back in the day (as I well know, as the requirements were the same as when I was an undergrad) for the computer science major you were supposed to take several classes in a related field of your choice. Mine was math. Economics was a popular choice, and was apparently Darcy's. This is (by Harvard policy) not noted on the transcript. There is no sign that she has ever attempted to mislead people about her degree; her resume says she has a computer science degree with a specialization in economics, which would be a completely accurate description in my opinion. You can search online for more info.

Harry Lewis publicly backed Darcy Burner's statement, and in fact is now featured in an ad explaining Darcy's academic qualifications. You can read the Crimson article here, and see Harry set the record straight here at Youtube, or here, and probably many other places. Thank you, Harry! No matter what party you favor, certainly everyone is entitled to the facts without the spin, and that is what Harry has provided.

Harry is naturally polite and reserved in his statements, but I'll be more clearly biased: any district should be thrilled to have a Congressperson with a computer science degree, and especially one with a degree from Harvard!

Friday, October 24, 2008

On the Ad Board (A Harvard-centric Post)

The Harvard Crimson had an editorial earlier this week on the Ad Board, the administrative board that implements the rules of the college for undergraduates, and which I served on for a year-and-a-half, ending last June. (I happen to read the Crimson semi-regularly as copies are left in the lounge areas of our building.)

The piece was just so over-the-top negative, and blatantly factually wrong (it's hard to find a stated fact in the text that is actually correct), that it makes this season's political ads look good by comparison. So I took it upon myself to respond.

I suppose only my Harvard readers might care about this, but here's the editorial, and the text of my response is below. We'll see next week if the Crimson publishes it.

UPDATE: The Crimson did print my letter here. (It changed a few things, making it shorter and a bit more generic, but the spirit is there.) Also, since it seems from the comments that many people are simply uninformed, let me point to the Student Guide for the Ad Board; I'd encourage people with questions (or complaints) to read that first, as it gives a lot of detail about how the Ad Board works.


Bad Board, No; Bad Editorial, Yes

To the Crimson editorial staff:

Having finished a year-and-a-half of service as a faculty member of the Ad Board last June, I was shocked by the opinion “Bad Board” that appeared in the Crimson on October 22. First, it was simply riddled with factual errors. For example, contrary to your statement that resident deans are “outranked” by the faculty, in fact there are only two or three faculty members on the Ad Board at any time, and over a dozen resident deans; we all get equal votes. Even if we didn’t take the resident deans seriously as you suggest (and we do), they could simply outvote us. As another example, contrary to your statement, in disciplinary cases students are always allowed to present their side of the story, both in written form and by attending an Ad Board meeting, where students can make a statement and, if they choose, respond to questions. I could go on, but there are so many additional factual errors that it would take a letter much longer than the original editorial to go through them all.

Second, your editorial fundamentally misunderstands the Ad Board’s setup and purpose. You complain that students cannot hire an attorney for an Ad Board hearing. That is because the Ad Board is not a legal institution, like a court or the police, but an academic institution, to administrate the rules of the college. The Ad Board’s purpose is fundamentally education, not punishment. As you quote, the Ad Board is “primarily concerned for the educational and personal growth of undergraduates, both as individuals and as members of the Harvard community.” Sometimes, when a rule is broken, a punishment must be given, but we view that as a learning process for the student. Attorneys should not be a part of that learning process, much in the same way you can’t hire a lawyer to complain about or negotiate for a better grade in my class. (Please don’t try.)

Finally, your article includes what I would call errors not of fact but of spirit. You say “rulings are clear before it [the Ad Board] convenes”. That’s a surprise to me, as I regularly spent multiple hours in these meetings each week listening to and deliberating cases. Punishments are not, as you say, “one-size-fits-all”; we discuss the appropriate response for each case, based on the rules of the Faculty and the needs of the individual student. We take seriously both these rules and these needs, with the goal of best serving both the students that come before us and the larger Harvard community.

I understand that, as you say, going before the Ad Board is intimidating and terrifying for a student. They are generally there because there is an accusation that they have broken a rule of the College, and there may be consequences. I know of no system we could possibly set up where that wouldn’t be intimidating and terrifying. But students should know going in that the Ad Board will listen to them, fairly, and that no punishment is given lightly.

Michael Mitzenmacher '91

Professor of Computer Science

Harvard School of Engineering and Applied Sciences

Thursday, October 23, 2008

PC Members Submitting Papers

I'm currently going through papers for NSDI, and the PC chairs had to send out a post saying how the papers where they are co-authors would be specially handled. NSDI not only allows PC members to submit papers, but the chairs as well!

In the theory community, we generally don't do this. PC members can't submit papers, apparently to avoid any conflict of interest. This is, from what I've seen, unusual, even for CS, which is already unusual in the competitiveness and importance of conferences. Most networking conferences, for instance, allow PC members to submit.

Does conflict of interest really exist in these situations? I don't think so; generally, from what I've seen, it gets handled, and handled appropriately. People on the PC realize it's a competitive process, and understand when their papers aren't accepted. Often when PC members can contribute papers the process is double-blind, so nobody "knows" the PC-authored papers. While I understand allowing PC-authored papers is a risk, I don't think it's much of one. As a comparison point, theory conferences almost never use double-blind reviewing, and I think just the name of a "prestige author" on the paper has a significant effect on the odds of acceptance.

What's the upside? The biggest is that it makes it easier to have larger PCs. I have 20 papers to read for NSDI. I can't remember the last time I had less than 40 papers to review as a PC member for a significant theory conference.

Tuesday, October 21, 2008

Less Busy October

After a few days of trial, the two sides in the case I'm acting as an expert witness on agreed to settle, before any of the experts reached the witness stand. So all of a sudden I'm less busy than I had expected. Maybe I'll have time to whittle down that never-empty to-do list. Or blog. Or maybe a new case will serendipitously come along.

Undoubtedly the people happiest with the settlement are the jury. They were going to have to sit through several weeks of trial, and while I'm sure they were all eager and thrilled to fulfill their civic duty, I'm sure they all have their own to-do lists to deal with.

Friday, October 17, 2008

Busy October (and November, and...)

I'm expecting a "blog slowdown" over the next few weeks, as life is busy.

I'm on the NSDI program committee, and we recently got our review assignments. There's an unfortunate overlap between this reviewing cycle and the STOC deadline/paper assignments but I'm hoping that won't be too problematic. (I got asked to be on NSDI before being asked to do STOC. And Jennifer Rexford asked, and as I'm a fan of Jennifer Rexford's work, it pushed me to say yes.) In other PC news, I was asked to be on the SIGCOMM 2009 committee, and again, just couldn't find a way to say no. I'll be finding it easier to turn down PCs the rest of the year...

A case I'm serving as an expert witness on is now in trial. That will take up significant time over the next few weeks. On the positive side, the trial is here in Boston, so I'm not traveling somewhere for it.

I'm scheduled to be at an IPAM workshop in a few weeks, and still have to prepare the talk.

Those NSF deadlines are looming large.

A few papers are in various stages in the journal process (either getting ready for submission, or revising based on reviewer comments). I'd like to get those off the stack.

Overall, it's all good and fun. Well, maybe not the NSF proposal(s), those are always hard to describe as fun. But it is busy. So I think blog posting will be more sporadic than usual.

Guest posters are always welcome....

Monday, October 13, 2008

STOC 2009, Web site + prereg of papers

The STOC 2009 submission page is now up, here. [ https://secure.iacr.org/websubrev/stoc2009/submit/].

While papers are due November 17, you must submit an abstract by November 10. Please pass the word.

For those of you with an interest in such things, we're using Shai Halevi's conference software. I was going to go with easychair, but the one major complaint (that I'm very sympathetic to) was that they didn't allow as a default the use of scorecards for offline reviewing. (They said they could do something for us, but we'd have to pay for it.) Shai also very generously offered to help me out with everything, so we all owe him thanks. (But especially me.)

If you notice anything not working right with the site, please let me know right away.

Saturday, October 11, 2008

Protecting Privacy

Just a note that the National Academies Press has released its report (well, at 376 pages, it's really a book) Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment, which is free to read online. You may have seen a news article on it somewhere or another. It's theory-relevant since Cynthia Dwork was on the committee that wrote it -- I'm guessing she played a very significant role in Appendix H: Data Mining and Information Fusion and/or Appendix L: The Science and Technology of Privacy Protection in particular. It's now on my to-read stack.

Wednesday, October 08, 2008


I went to a talk which was about a type of industrial software. At some point the speaker puts up a list about what typical corporate-research/programmer staff in this area do all day. The list was clearly meant to be a little humorous (inasmuch as people discussing such things can be). The last bullet was:
Find time to find a girlfriend.
Now, I'm not claiming to be most sensitive Neanderthal in the cave, but I am the father of three daughters, and this raised my hackles. The small number of women in the audience didn't seem to notice or mind, but maybe they did, or maybe one would the next time he gave the talk. As a field, this is exactly the type of throwaway comment that we have to, repeat, have to avoid.

I mentioned it to the speaker afterward, and to his credit, he was not defensive, and said he'd remove it or change "girlfriend" to the gender-neutral "significant other". (I'd vote for removal. Why even go with the stereotype computer scientists can't find love?)

Friday, October 03, 2008

Latest PhD Comics

I occasionally read PhD comics. This last one explains why our enrollments will be going up again; the one previous just made me laugh. (Excuse me now, I have to go sit back and enjoy as the rest of the world crumbles.)

Thursday, October 02, 2008

The Stupidest NSF Review I've Had, Ever

It's come out before on this blog that I have a love/hate relationship with the NSF. I love that they give out money for research -- especially for mine! However, I am not always the biggest fan of their structure, processes, priorities, etc. (And I surely acknowledge that some of this is not "their" fault -- as a government agency, they've got government rules to follow.)

Still, the last reviews for a grant we didn't get contained just the stupidest comment, I really have to share it, because it just frightens me. I'm used to reviews I don't agree with -- the typical excuse not to fund a theory grant being, "The proposal doesn't explain how it will solve the proposed problems," while if I already knew how to solve the proposed problem, I'd have written the paper already -- but again, this goes beyond that. If this was just an excuse to not fund the proposal -- because the NSF for some reason never says "We only have money for the top 10% this year, and I'm afraid there are some better proposals," but instead has to have a reason -- that's fine, but I hope it's not a real reason.

This was a CDI proposal (so apologies to my co-authors, who do not necessarily share my opinions). The primary theme was mechanism design, but we focused on network formation and ad auctions as examples. One reviewer wrote:
[ad placement] is a very active research area for corporate research labs at places such as Yahoo and Google. Given the vast resources that are being invested at these corporate labs (that have hired top economists and computer scientists) and that have direct access to logs documenting advertiser and user behavior, it is unclear how much of a contribution an academic team can make here.
One review might be forgivable. The panel summary listed the following as a weakness:
- There were also questions regarding how this will
compete with much larger-scale multidisciplinary
efforts (CS-economics) of similar kind in
industry (Google, Yahoo!, MS, etc.).
Let's ignore that the PIs all have relationships with industry, that ad auctions was just an example (of pretty wide interest right now), and that (I had thought) working on problems of interest to industry is, generally, a good thing.

With this kind of thinking, there's no way a couple of graduate students from Stanford (or, more academically, a future well-known Cornell professor) should have been working on a silly thing like "search engine algorithms", since Altavista was already out there leading the way. (That's my #1 big example, fill in your own.)

Is "industry will do it better than you could" really a good reason not to pursue (or fund) a research topic? How many research areas would that really preclude? I'd assume we should also stop funding research in operating systems, compilers, and even computer security based on that comment, but oddly, I don't see a rush to cancel programs in those areas. Seriously, anonymous reviewer, if you actually meant that, congratulations, you've truly scared me about the future of NSF-sponsored research.

As an addendum, some comments from Harvard colleagues:

1. Where does the reviewer think the people who are going to go work for Google/Yahoo/Microsoft will be coming from?
2. This was the kind of thinking that led Harvard (a CS powerhouse post-WW II) to decide to drop computer science decades ago. IBM was doing it, no need to have a department doing research in the area. It took a while to recover from that decision....
3. That kind of comment is common for chip/architecture research. "Isn't Intel doing that already? How can you possibly contribute?" [I have increased empathy for architecture people.]

Wednesday, October 01, 2008

Conference heads-ups

Following the trend, a reminder that the FOCS 2008 hotel/pre-registration deadline is October 3.

On the STOC 2009 side, some announcements:
1) Submission date is Nov 17, BUT abstracts will be due Nov 10 -- following the same sort of procedure used for SODA. Spread the word, more info on the Web site shortly.
2) Accept/reject decisions should be out February 6, in response to those who have asked me. (It seems like every other conference in the rough vicinity will have paper deadlines the following week!)
3) The day before the conference is shaping up to be an event, about which I'll hopefully have more news for shortly.
4) It's looking like EasyChair will be the submission/review system. The risks of self-hosting just seems too severe at this point...

Tuesday, September 30, 2008

Enrollments Up

The last two years, we've seen a huge increase in enrollments in our introductory computer science course. Last year it was about double the size it was the previous year; this year, it looks like it's gone up another 50%. Yes, the intro course has about tripled in the last two years.

Part of that might be that we have a new instructor, who is apparently young, dynamic, bringing a more modern perspective, and all those things that inspire students to take introductory courses. Another explanation is that Harvard last year introduced minors (called secondary fields), so now by taking 4 classes you can get a minor in computer science. A more timely explanation is that the financial meltdown is encouraging students to look at computer science and applied math this year (instead of the more commonly popular major of economics).

The gain is starting to translate into other courses. The intro theory course (automata/complexity) is up over 50% from last year.

Needless to say we're excited by this change, although still processing what it means and how to deal with it longer term. Obviously, we're hoping this will help us make the case for more hires...

Friday, September 26, 2008

Allerton : Moves on Deletions and Insertions

To wrap up, the other paper I presented at Allerton (pdf, talk slides), joint with Adam Kirsch, continues our work in analyzing the power of moving objects around in multi-level hash tables. The gold standard here, at least in theory, is cuckoo hashing, which achieves very high loads but can require a large number of moves when inserting an item. That might not be practical for many applications (especially in network router hardware). In previous work, we had considered the gains obtainable from moving at most 1 item on an insertion. In this work, we add the power to move items on a deletion as well. (Lookups we keep fast and don't do any extra work.)
[The discussion below is, admittedly, probably more for those at least familiar with our previous work...]

Moving an item after another item is deleted is difficult -- generally, you'd want to move some other item deeper in your Multi-Level Hash table into the now empty space higher up in your hash table, but how can you find what item to move? Actually, we'd had ideas on how to do this back when we were working on the insertion paper. But we didn't include it then, for several reasons. First, our ideas required additional space (as explained below). Second, we couldn't analyze them, and one of the points we were aiming for in that paper was to analyze schemes. Third, we didn't have space -- not in the hardware, but in the actual paper. Blame page limits (yes, even for the journal version).

Another recent INFOCOM paper that offered an idea of how to handle moves on deletions inspired me to have us write down our approach -- because I feel our approach is clearly better than that proposed by this other INFOCOM paper, as we discuss in our paper. Even if we can't prove what we'd like, we can still show it works well through simulation.

Our (obvious) idea is just to keep a hint at each hash table cell, if there was a collision there, of where the collided item was placed, so you can find it later on a delete. Hints take space, but not too much. If multiple collisions occur, you have to choose which possible hint to keep; we found most recent collision worked best. Moves on deletes work reasonably well, but the real gains naturally occur from combining moves on insertions and deletes. This gives pretty good performance -- you can get space utilizations more in line with (though of course still not as good as) cuckoo hashing, albeit with more hash functions -- but again, using just 1 move per insert/delete. More details in the paper.

Thursday, September 25, 2008

Allerton : The Day Here

I like the Allerton conference.

Part of it is I come most every year. So by now, I know my way around, which is a nice aspect of a conference being held at the same place every year. Got the early flight in, rented my car in about five minutes at the tiny but functional Willard airport in Urbana-Champaign, and had my badge about a half hour later. After a talk and catching up with some people I went to lunch at the Brown Bag, the crowded spot for people who don't like the conference lunch. Back to listen to more talks, and give my talks. After talks, more catching up with people, while they set up the open bar and snacks. (JeffE, forget beer at the business meetings, take a day -- or just an afternoon -- off and join us!) Then dinner with a very old friend (Steve Lumetta) and his family.

Saw interesting talks by Ramesh Johari, Balaji Prabhakar, Jean Walrand, Adam Wierman, and others. I talk with people about hashing, Bloom filters, deletion channels, sticky channels, floating codes, all manner of things. It's fun being at a conference where the usually disparate seeming parts of my research don't seem quite so disparate.

Sadly, a quick trip this year. I fly back early tomorrow.

Wednesday, September 24, 2008

Allerton : Floating Codes

A couple of decades or so ago, Ron Rivest and Adi Shamir wrote a fun paper on write-once memory. Think of a punch card -- you can turn a 0 into a 1, but you can't turn it back again.
Optical disks were a potential application. Suppose you have a k-bit value that is going to be written t times. How many write-once bits (or "wits") will you need to store it through the t changes?

Fast-forward a few decades, and similar models are coming back into vogue, because of flash memory. My understanding is flash memory uses floating cells; roughly, cells are organized into larger blocks of n cells, and each cell can hold a number of electrons from 0 to q-1. It's easy to add electrons to cells, but not to remove them; to do that, you have to reset the whole block back to 0's first, which is both expensive and wears out the memory, which has a limited lifetime of resets. Suppose you want to store k ell-ary variable values in a block. How many rewrites before you need to reset? Note that a "code" in this setting consists of two things: a mapping from cell states (the state of the entire block) to variable states, and a transition function giving how cell states change when variables change. For more of an introduction, see this paper by Jiang, Bohossian, and Bruck.

In our paper (pdf,talk slides) for Allerton, we introduce (as far as I know) the question of considering floating codes in the "average case"-- the previous work was considering the worst-case number of rewrites before a reset. Given the lifetimes available to flash memory, and the potential to model system behavior before productions, we think analysis of average case performance makes sense here. For our notion of "average case", we assume that the variable values follow a Markov chain, and the goal is to maximize the long-term average time between resets. Given a code, the Markov chain on the variable states induces a Markov chain on the cell states. So the question becomes how to design a code to maximize time between resets on this Markov chain.

Following in the footsteps of the JBB paper, we find some initial interesting results, and leave a lot of open questions -- including the complexity of computing optimal codes for general versions of the problem. I think is yet another potentially interesting coding area where CS ideas should be able to play a strong role. And while I'm not (yet) an expert on the practical implications of these theoretical coding models, the connection to flash memory seems promising.

Monday, September 22, 2008


I spent most of the day over at the Microsoft Research New England Opening Symposium over at MIT. It was a very nice event. Part of the fun was that it was well attended -- I got to catch up with some people I haven't seen for a while. But they also had a very nice collection of speakers.

There was a panel discussion on interdisciplinary research (a theme of the new lab), and a natural question was how such research should be encouraged. Nobody on the panel seemed to make what I thought was an obvious point: one way to encourage such research is to make sure people across the sciences are well trained in mathematics and (theoretical) computer science. Interdisciplinary research depends on finding a common language, which historically has been mathematics, but these days more and more involves algorithms, complexity, and programming.

Luckily, to make the point for me, Erik Demaine next gave a talk entitled "(Theoretical) Computer Science is Everywhere", whipping through examples in arts, business, society, games, other sciences, and so on. It was a really inspiring talk, one that should be given to incoming freshman in the field, or better yet, to high school students who don't know what computer science is about. The talk was taped and hopefully I'll have a link to it soon. Afterwards, as several of us were talking about it, there were some minor issues raised. I myself brought up my common concern that perhaps more theorists should take a more direct role in promoting the practice of CS theory, including in other fields. My colleague Greg Morrisett questioned whether Erik couldn't have replaced "Theoretical Computer Science" with "Applied Math" and given pretty much the same talk. I think it's an interesting point, although I do think TCS's emphasis on complexity and algorithmic thinking does give it a distinct perspective.

Another talk I enjoyed (more than I thought I would from the title) was "Understanding Socio-Technical Phenomena in a Web2.0 Era" by danah boyd, who will be starting at MSR NE this January. She is an ethnographer who studies how adolescents interface with technology. So, to summarize, the talk was about why kids (at least in the US) spend all their time on MySpace and Facebook, and what it all means. It was very well presented (similar in style to Lawrence Lessig, who gives fantastic talks), and perhaps my interest stems from now having three kids; getting a perspective on how adolescents use online spaces to interact and communicate (as well as rebel and establish their own identitites) was quite enlightening.

So while they've been around for most of the summer, it's nice to have an official welcome for the new neighbors!

UPDATE: Link for talks.

Saturday, September 20, 2008

This Week, Allerton

The 46th Annual Allerton Conference is this week. The program (not as easily accessible as it should be) is here. I'll be going, but sadly, it will just be a 1-day drop-in to give my talks. Travel is a bit of a luxury, with the new baby and all.

I wasn't sure if I was going to go to Allerton this year. While I enjoy the conference, I have had growing concerns that because many talks/papers are invited and their proceedings aren't widely distributed that the papers I present there aren't as noticed as they would be otherwise. They've alleviated that concern somewhat by having the papers now available on the IEEE online system. At least I know if anyone tries to find the paper, the official copy will be available somewhere (besides my home page). Hopefully the old Allerton proceedings will go online at some point too.

The other motivation for going to Allerton this year was that it forced me to get some things done. Between the faculty search last spring and the offspring this summer, it's been a bit difficult to complete some of my research tasks. There's nothing like promising to have a paper in on a certain date - a non-artificial deadline -- to get things to some sort of stable state.

I'll hopefully have posts up later this week about the new papers.

Wednesday, September 17, 2008

Small Steps...

Every once in a while, I see a blog entry or comment with the lament that the CS publishing model is just wrong. The common statement is that the conference-oriented publishing approach leads to incremental, trivial papers, solving problems that are easy instead of the "real" problem.

I'd like to offer a contrasting opinion, one I've stated before, but may be new to some. (And since the criticisms above are repeated cyclically, I don't see why I can't repeat my response.)

Having worked some in the area of heuristic algorithms, I've gained a healthy respect for the fundamental approaches of repeated refinement, parallelization of efforts to explore a search space, and the occasional bit of random wandering. Greedy-style heuristics don't get to good answers by big jumps, but a multitude of small steps. Parallelization, leveraging the power of crowds, greatly speeds up such heuristics, and frequent communication between the agents working in parallel helps move everything along. And most good heuristic algorithms require a bit of a random (or explicit!) push to keep moving through the space of possibilities, to find new parts of the search space to yield better solutions than the already plumbed over areas.

The CS conference publication model shares these features. Yes, there are many more small steps than big jumps. Yes, there are times where less fruitful and interesting directions are explored. But the community as a whole moves rapidly, churning through ideas and, I think, rapidly focusing on promising ones. Also, new ideas and directions arise with, I think, remarkable frequency. Looking at any single conference, you might think over 90% of it is junk -- or, at least, no so important in the grand scheme of things. And you'd probably be right.** But that doesn't a priori mean the conference publishing system is broken, and any argument that it is based on such reasoning doesn't even start to convince me.

This doesn't mean that we are excused from having taste and opinions as to what constitutes good work. Indeed, without this, we'd lack an evaluation function to lead us in the right directions. And I'm not saying the the contrasting "mathematics" style of publishing only fuller, complete papers is necessarily worse -- I don't think I've seen evidence one way or another. (I might admit to having a personal preference.) But if you want to argue the point, you need to do more than just look at individual papers in individual conferences, and focus on the whole.

[** A favorite quip from grad school that I still remember was when a systems friend told me "95% of theory papers are crap!", and I simply responded, "So that means we're 4% better than systems." Good times.]

Thursday, September 11, 2008

Communications of the ACM, Theory Bias?

Anyone else noticing a pleasantly refreshing pro-theory bias at the "new" Communications of the ACM? One news article on spectral graph theory extensively quotes Fan Chung Graham, Milena Mihail, and Jon Kleinberg. Another on privacy extensively quotes Cynthia Dwork and Frank McSherry. One of the presented research papers is on Distributed Selection by Fabian Kuhn, Thomas Locher, and Roger Wattenhofer (from SPAA 2007), with a perspective given by Hagit Attiya. Finally, there's a puzzle column by Peter Winkler.

I admit, I'm no longer just automatically throwing it away. The entire magazine seems more interesting since the editorial refreshening. But deserving nods to the work of theory community will certainly help hold my attention.

Tuesday, September 09, 2008

SODA, and Parallel Sessions

The list of SODA accepted papers is up.

For both my own curiosity, and probably Claire Mathieu's, does anyone have any ideas or suggestions on how to schedule parallel sessions? It seems like a nasty problem, particularly since you have limited information on what conflicts are, other than very general categories for papers.

Wednesday, September 03, 2008

Big Numbers

In my algorithms class, at some point, I teach RSA and primality testing. For the corresponding assignment, I usually have students do a "numerical exercise" which involves, naturally, computing with some big numbers. They also have a later programming assignment where they have to use numbers that may not fit into a standard-programming-language-sized integer. Since this is not a programming class, and I do not prescribe a language for them to use, I essentially tell them it's part of the homework to figure this out.

Surprisingly, it seems to give a number of people significant grief every year. I know that for C and Java it's a pain -- the natural thing to do is to download and install a BigNumber library, which should be easy, but often some trouble arises. (And most students, even though they've had the intro programming courses, do not seem to have learned how to do useful things like download and run useful libraries.) There are other programming languages which handle big numbers essentially transparently -- ah, the days of programming in Lisp -- which are fine for the numerical exercise, but may not be as nice for the larger programming assignment.

My only real point here is that, in those lists of skills I would expect CS majors to have, I'd have to put "being able to do computations with very large numbers" somewhere on the list. Yes, you're unlikely to need it for your standard checkbook program, but it seems to me this issue arises in plenty of places. (It has for me -- and my graduate students -- several times.) And it's not so important that they know how to do this specific task off the top of their head, but more that when they're given a task like this, they are able to find the information they need and complete it in a timely fashion. For better or worse, that's something some of them get out of my class.

Friday, August 29, 2008

A Survey on Hash-Based Packet-Processing Algorithms

Sometime way back, Graham Comorde invited me to a DIMACS workshop on Algorithms for Next Generation Networks, where I talked the usual talk about hash tables, Bloom filters, and things. The organizers later had the wherewithal to undertake putting together a book or collected chapters on the topic, and asked us (us being, of course, me and my student Adam Kirsch) for a chapter related to the talk. Adam was just finishing his thesis, and was willing to take on modifying his thesis to form the basis for a survey chapter. I wrote up a section or two, and for good measure, we got George Varghese, who is the world expert -- and author of the book Network Algorithmics -- to join in as well. (In particular, we really wanted George's much more practical perspective on what works and what doesn't!) The result here is a hopefully pleasant light read on current goings-on in the area of hash-based network algorithms, attempting to blend and show the connections between theory and practice.

There's still time for feedback for the final version; please mail me if you have any constructive suggestions.

Monday, August 25, 2008

And Now, a Word from Our Sponsors (Part 2)

Continuing from last post, my trip to CA.

I visited Google and gave the CAM talk there also, where it also seemed to find a receptive crowd. (Some challenging questions arose as to whether cuckoo hashing is the right approach for hashing in software, as opposed to hardware.) Visiting Google is now like visiting a college campus, or maybe a small city. I was greeted at the parking lot by someone who directed me to valet parking. (The valet didn't seem to be there, though, so I parked myself.) I passed the volleyball courts, cafes and other places to eat, and that dinosaur on the way in; saw the gym, laundry room, doctor's office, massage area, and many many coffee-soft drink-snack bar areas; and ate at the amazing cafeteria. (The sushi line was too long, so I had to skip it. However, they did have an entire freezer full of It's It ice cream sandwiches, packaged specially with the Google logo. It's It alone is worth coming to they Bay Area for.)

I find myself without envy for the Google campus and its well-publicized perks. My limited impression was that it's too crowded and busy for me; it seems like it would be a hard place for me to concentrate get work done. I'd surely balloon up in size surrounded by open larders of food, even with the gym. I suppose I'm now just too old to enjoy the place properly, though I imagine it's fantastic for recent graduates!

The last stop on my tour was Yahoo! Research. It's always great to catch up with my "mentor" Andrei Broder, who these days always seems to be running at 110% or more. Their research group (like Google's) seems focused on Web-related algorithmics, machine learning, and this new subfield of computational advertising (I believe Andrei coined the term, in any case I like it). I talked with people about some compression-related problems, and perhaps something further will come of that.

As usual, I find myself wishing these trips could last longer. There's always too much to do and too many people to see on these visits, although that's what makes the trip interesting and fun.

Saturday, August 23, 2008

And Now, a Word from Our Sponsors (Part 1)

I've just returned from a trip to Silicon Valley, where I visited Cisco, Google, and Yahoo -- all of whom have generously given me research money this year, and hence the CA visit. Besides thanking them in this blog, I thought I'd say a few things about the trip and what I saw on my brief stops at the various places. The purpose of these visits is mostly to see if I can get collaborations going, but I admit, some part of it is just giving face-time to the people and companies who have given me research money. They deserve some of my time, and I'd like to encourage them to keep providing funding!

The first stop on my trip was actually a visit to Microsoft Silicon Valley Research Lab (MSRSV). I still haven't figured out how to get research money from Microsoft, but MSRSV "started" when a lot of my colleagues at what had been DEC/Compaq/HP Systems Research Center moved en masse to Microsoft, so I have historical ties and recent collaborations with people there as well. Since my visit last year, MSRSV has moved into a very nice new building. Lots of open spaces and whiteboards everywhere. It seems wonderfully set up for group collaborations. (One very nice space for group-work, though, is a bit too close to a loud and frequently used coffee machine for my taste...) Besides catching up with everybody, Udi Wieder and others indulged me by talking about some of the many variations of trace reconstruction problems that are still open. Hopefully we'll get somewhere on some of them.

Cisco is a huge sprawling collection of buildings, and the visit there itself similarly felt chaotic. They asked me to give two talks, which caused me a bit of stress the week before the trip as I reworked some slides. I ended up talking about my work with Salil Vadhan on Why Simple Hash Functions Work (talk, paper, blog post), and gave a mini-survey covering my work with Adam Kirsch (and part with Udi Wieder) on how to use CAMs to improve cuckoo hashing (talk, various papers on my papers page). [Actually, I have a new survey article covering this stuff I'll put up shortly.] Cisco still seems very, very generally interested in hashing, and applications of hashing in network measurement and monitoring in particular. I had about 40-50 people show up for the first talk, and the second mini-survey talk was broadcast and recorded for Cisco -- about 50 people showed, and apparently more than that were also listening remotely. (Just like when I teach, and my class is taped...) They have a pretty elaborate setup for these recorded technical talks, with a room set up for guests like Steve Wozniak (who was there a couple of weeks ago) rather than me. Besides giving talks there were a lot of high-level discussions about things going on with Cisco where I might be able to collaborate usefully with them.

One thing I noticed at Cisco was a much larger number of women than usual at my talks. Perhaps EE is turning out more female graduates than CS recently, or it's somehow reflective of Cisco's hiring practices.

Visiting Cisco is always very exciting. They're a lot more short-term focused than research labs, but there is this wonderful sense that what you're talking about could become a part of the fundamental network architecture. They keep me away from details, but multiple-choice hash tables and Bloom filters seem to be standard tools in their arsenal now. I'm hoping some form of cuckoo hashing might be as well someday.

Wednesday, August 20, 2008

NSF Expeditions, Complexity

I'm glad to hear of the news that Sanjeev Arora's team at Princeton was one of the winners for the NSF Expeditions grants, working on the general theme of complexity. I think it shows that some of the public relations work our community has been doing, especially with the NSF, is paying off in concrete ways. I also think that more money for theory generally just has to be a good thing -- it's $10 million more for theory than there was before.

That being said, I'll express two concerns:

1) It's odd to see so much money for theory concentrated into such a small geographic area. I realize that was the nature of the Expeditions program, and I don't fault the proposal for it. It just strikes me as strange when the general budget for CS theory is so small to earmark such a large sum of money to this project. It feels like an over-concentration of resources in what's already a small community.

The solution to this, of course, is to get more money for the general CS theory program. And I'm sure a significant chunk of the Expeditions money will go to open DIMACS-style collaborations like workshops and other events, minimizing this concern.

2) I know it's just the nature of theory, but reading over the blurbs about the various funded Expeditions proposals, I can't help but notice that while the others seem to have some sort of statement of clear goals to take things in new directions ("hope to create a new field of computational sustainability", "It aims to create an "open" alternative to mobile ubiquitous computing and communication that can spur innovations, which will have a dramatic impact on the choices users will have in the way their data and information is computed, stored and communicated", "The project aims to develop tools and theories for molecular programming--such as programming languages and compilers--that will enable systematic design and implementation of technological and biotechnological applications that require information processing and decision-making to be embedded within and carried out by chemical processes."), the complexity grant will "hope to better understand the boundary between the tractable and the intractable" and "attack some of the deepest and hardest problems in computer science". Doesn't that sound, I don't know, just like business as usual? My concern is that it's probably important to the theory community long-term for this Expedition to have some major concrete success attributed to it at the end of the day. I have no doubt that good things will come out of this, just based on the people, who already do good work -- but will the output be the sort of things that in retrospect justify this investment?

Tuesday, August 19, 2008

Book by FemaleScienceProfessor

I'm an occasional reader of the blog FemaleScienceProfessor. Often the blog is just about being a science professor, which is interesting, and I can relate to. And sometimes the blog is specifically about being a female science professor, which is also interesting, even if I relate to it less.

Well, FSP has re-worked past blog entries into an on-line book available at lulu.com. I haven't yet bought and downloaded it yet, but from the Table of Contents, it appears to be a particularly worthwhile book for graduate students thinking about a life in academia, and for new faculty. The bulk of the book seems gender-neutral, if that's a concern. I thought I'd give it a free plug.

Sunday, August 17, 2008

SIGCOMM 2008, Part 3

Here are a few more papers from SIGCOMM which should be of particular interest to a more theoretical audience. (Generally, SIGCOMM papers are interesting -- but again, I'm focusing here on papers that I think might be of special interest to theory people. It strikes me that I should, at some point, similarly summarize papers from a major theory conference -- like STOC 2009 -- that would be of special interest to networking people. Of course, SIGCOMM makes that easier, posting abstracts and all the papers online...)

There's a paper on analyzing BitTorrent in the game-theoretic incentive-style analysis sense. It will require a more careful reading from me, as I'm not a full-fledged game-theory/CS type researcher, but it sure looks interesting on the first perusal. I'm naturally biased to the idea that if all this current effort on game theory that is going on in computer science (and particularly in theory) is to have payoff, real-world protocols must be considered and analyzed. So in that sense, this should be a really interesting paper.

While it doesn't appear particularly theoretical (it looks like what I like to joke is a standard networking paper -- lots of pictures and tables, no equations...) this paper on spamming botnets from Microsoft includes Rina Panigrahy (well known for his work in both theory and practice) as one of the co-authors. (I figure Rina had something to do with where I saw the words "entropy reduction", but that's just a guess...)

Saturday, August 16, 2008

SIGCOMM 2008, Part 2

The Accountable Internet Protocol (AIP) paper asks the question: what if we re-architectured the Internet to start with self-certifying addresses, so that there was a layer of accountability -- you'd know where packets are coming from. This paper clearly fits square in the mold of the NSF FIND program. They suggest what a self-certifying architecture would look like, how routing would work with such an architecture, consider potential attacks on the proposed architecture, and discuss whether technology trends would make such an architecture feasible. Certainly interesting, although I admit to high-level unsubstantiated concerns about the specific address architecture they propose. (I suppose as a "kid" I saw too many key-exchange-style protocol papers where a subtle flaw was exposed by a subsequent paper...)

I notice they used a Bloom filter in the paper without even giving a citation. Have Bloom filters now become so successfully widespread in the networking community that no citation is needed? What a nice thought! (Or maybe the authors just ran out of space for the citation.)

Another SIGCOMM paper continues on the path set out by for example Feigenbaum, Papadimitriou, Sami, and Shenker, on using game theory to study the behavior of BGP. They propose a more realistic model (where, for example, Autonomous Systems can be paid for attracting traffic) which, naturally, leads to more negative results in terms of the truth-telling behavior of ASes. (Why is reality so often disappointing this way?)

Friday, August 15, 2008

SIGCOMM 2008, Part 1

About a year ago, I took a look at some SIGCOMM 2007 papers. I won't be attending SIGCOMM this week, unfortunately, so in the interest of self-education, I thought I'd look at some of the papers this year. (The site currently has the papers up. Wow, what a neat idea...)

Before getting into papers, I thought I'd mention that Don Towsley is being given the ACM SIGCOMM award. This is a great choice, and well deserved. And relevant to this site's audience, Don is, in my mind, primarily a theorist. Not a FOCS/STOC theorist to be sure, but a theorist nonetheless. As the award announcement states:
Towsley, who is Distinguished Professor of Computer Science, has made innovative and pioneering contributions in developing foundational modeling and analysis techniques that have enabled a better understanding of some of the most important aspects of today's computer networks, network protocols and networked applications.
Modeling, analysis, understanding... that's what theory is all about. It's people like Don that made networking an open and understanding place for people like me. Thanks! And hooray for Don!

Now for papers. As before, I'll give brief synopses (at the level of the posted abstracts :) ), as I'm just looking at these papers on the fly. The network coding crowd has attacked again with the MIXIT system, which seems to throw together a bunch of ideas in a clever fashion to improve performance on wireless mesh networks. Recall that the basic working definition of network coding is that intermediate nodes do more than store and forward, they can process the packets as they come through (creating encoded packet variations). Here, the basic unit is not taken to be a packet, but a symbol (a small collection of bits), with symbols being packed into a packet. This allows nodes can "take apart" packets; if a whole packet doesn't come in error-free, the node can take symbols that appear to be right with high enough probability (based on information from the physical layer), and re-package (via linear combinations, a la "standard" network coding) and send on only those symbols. Because erroneous symbols might get through, an end-to-end error-correcting rateless code is also used. All of this appears to improve throughput.

The paper seems interesting -- another proof-of-concept paper for network coding in wireless systems, which is where I suspect network coding will be able to make the most inroads over the next few years. I can't tell yet how practical this really seems (without a more detailed reading), but the idea of taking apart packets and sending only the good pieces in combination with multiple coding techniques seems quite nice.

As an aside, the pdf for this paper seems to contain a picture or something that crashes my poor Mac around the 8th or 9th page. Help!

Sunday, August 10, 2008

Security Issues in Cambridge

Harvard is getting new ID cards next year, thanks to an ambitious student who apparently figured out how to forge IDs (including a duplicate ID for University President Drew Faust). Because, really, how could using unencrypted ID numbers on the card, and giving access to undergraduate computer user assistants access to all ID numbers, ever lead to a problem? (The student also apparently made fake state driver licenses as well. Who says Harvard students don't learn useful real-world talents?)

Of course, Harvard isn't the only institution in Cambridge where students can obtain skills in the security area. Some MIT students, working under the famous Ron Rivest (the R of RSA!), figured out several flaws with the new ticket system for the Boston subway system, including ways to rewrite tickets so that they have lots of money available on them. So, naturally, the subway system sued to keep them from talking about the flaws at a security conference.

In both cases, the systems seem easily breakable (well, at the least the Harvard IDs were easy, not sure about the subway) with a card writer that can be obtained for a couple hundred bucks.

Of course, I'm not surprised, based on previous experience.

I wonder when organizations that want secure cards will realize that perhaps they ought to ask the students to try to break the system before they deploy it, rather than wait for them to break it after.

Wednesday, August 06, 2008

On Simulations

I've been coding up some simulations for some Allerton papers that are due all too soon. Of late I've depended far too much on my (now former) student Adam Kirsch to take care of doing our simulations, but he's graduated, and they're needed, so off I go. (Adam's graduating is all to the good, but clearly, I'll be missing him, especially when it comes time to write simulations.)

I'm always amazed at how averse theory people seem to be to doing simulations. I find them useful for generating ideas and thinking about problems in the early stages -- cutting off wrong directions and giving insight into the right ones. If you don't like doing simulations for such purposes, because it doesn't work for you, or you're clever enough to not need data, I have no issue with that -- people work differently.

But I also use simulations as a way of checking my work. If I have a theorem that says that a random process will behave a certain way, and it's possible to code a simulation of the process, I'll check my theorem with code. If the theory and the code don't match up, my assumption is that something is wrong somewhere, and the result is not ready until the two match or I know why they don't. Surprisingly, I think it's about 50-50 as to which I end up finding is wrong, the code or the theorem. (By the same token, if I don't have a theorem, and there's more than one way to simulate a process, I'll code multiple simulations, and make sure they match!)

Of course not all results can be checked by coding something up -- but many can. Particularly in the study of random processes, which is my area. And it's clear to me that many researchers don't check by coding -- because I (or students working with me) have several times found mistakes by doing a simple implementation and finding that we get different numbers out than the paper gives. Generally the mistakes aren't "fatal" -- usually a constant is off somewhere, and often eventually the O notation will take care of it -- but of course it is grating as a reader when something in a paper is plain wrong and you're left to figure out why. When someone doesn't do a check-by-code, I must admit, it lowers my trust of and my overall opinion of the person's work. Sure, people make mistakes (myself included) -- but if you're ignoring a straightforward approach for checking your work, that doesn't inspire confidence.

I imagine some theory people are so out of practice coding they "can't" do a simulation. (But hey, that's not really an excuse, that's what grad students are for...) And others probably just consider it a waste of time. If you really are good enough not to need to check your work this way, more power to you. Me, I'll get back to the (admitted drudgery of) coding my things up and seeing if they work the way I think they should....