Friday, July 28, 2006

Constructive Finger Pointing

Ryan Lortie made a very good post on the waste of power due to timer use. In the past, I have made many similar posts about how memory is wasted. I think these types of blogs provide good insight to the developer community. It's a way to say "wow, this is really a problem". I'd like to see the process automated.

On the Google Intranet, there is a status dashboard. The dashboard is sorted by latency, slowest first. So, if your service is slow, you get highlighted on the home page, with a big red (!) next to your project. I think this is a good incentive to make services fast

I wonder if the same thing can be done for GNOME. We could gather data about which components are sucking and put it in a high profile place (the planet would be a good one). Some metrics are easy to get (modules with the most unreviewed bugs). In time, I think GNOME and distributions should build tools to get other types of data: "what applications are taking up memory", "which apps are segfaulting", "which apps are abusing timers".

Luis von Ahn Talks at Google

Luis von Ahn, inventor of CAPTCHAs and the ESP Game, gave a talk at Google yesterday. This talk is worth listening to (would I blog it otherwise?). Luis describes how, by using people's spare time, all images on the web could be labeled in months (or even weeks). This talk is non-theory-person friendly (no math). It's also quite funny (Luis' lectures are even better). (direct link if you can't see the embeded version below)

Thursday, July 27, 2006

Google Code Hosting

Hosting Google Code is very interesting. While, at first glance, the site really doesn't compare to Sourceforge, I think it's a really interesting offering:

  • The Google Code issue tracker is very elegant. The tagging system is much, much smarter than what Bugzilla does.
  • It will scale. Period. It's interesting to note that some of the biggest investment was on making subversion very scalable (see this interview). Anyone who has ever used Sourceforge knows that the performance of their version control is...less than stellar :-).
  • Possibly the most interesting thing is the fact that very little infrastructure will be needed to make this service usable; it can just connect to what Google already has. Google Pages, GMail, Google Groups, Blogger and Writely all provide services that in other services would just be one-off hacks. The future of online interaction seems to be combinations of tools that do one thing and do it right into a powerful system. For example, I'm currently trying to convince the people running 15-211 at CMU (the Data Structures course which I TA) to use the combination of Blogger + Google Groups + Google Calendar rather than Blackboard, which is a content management / community system gone wrong. In order to create an integrated experience, I wrote a "portal" (read: 200 line asp.net hack) using the ATOM feeds provided by all three services. It's going to be really interesting to see what a company with so many offerings can do for open source

Wednesday, July 19, 2006

Yahoo Portal

I saw that Yahoo released an ajaxish portal today. I tried it, to see if they've improved at all. Let me say, I'm shocked these guys are still around. The site is horrible. First, the home page is filled with ads. I mean the moving, flashing, distracting ads that are so 1999. In addition, the home page has some text ads offering me a range of services I really don't need (Vonage -- no thanks, I don't use the phone that much, domain name registration -- maybe, but does everyone need to see this, "Degrees in as fast as 1 year" -- thanks, I go to a school with reputation, "What’s your credit score 560? 678? 720? - See it free." -- this might as well be in my spam folder).

The featured items on the page are completely irrelevent to most people. In the prime location on the page, I'm offered a contest to "Design Janet Jackson's new album cover". The Yahoo Pulse tells me that the number one "Top Guilty Pleasures Ringtones" is "PYT Pretty Young..." by Michael Jackson.

Well, at least the page has a search box. The focus is on the text box by default (good!), so I can get going right away. Let's give Yahoo a hard search, linq. This is the C# 3.0 Object/Relational mapping. Typical search, not really. But I want a search engine that finds things that are hard to find.

It turns out that Yahoo and Google have about the same search results. However, the difference is in the ads.

Each page has more ads, however for advertisers, the top three are the most important spots, so I'm just going to look at those. The Yahoo ads in positions 1 and 3 offer me some obscure products that happen to be named "linq". Yahoo ad 2 links to www.restaurant.com with no connection what so ever to linq. Compare this to the Google ads, all of which might be relevent to somebody looking at O/R mapping in .NET. Let's just say Google is getting lots more revenue from its ads.

Ok, let's give Yahoo a break. I'll try an easy query "restaurant". Yahoo highlights restaurant results in Pittsburgh (right now, I'm in CA). Now, I know I've used Yahoo's farechase to find flights from PIT to SFO, but my IP address should very clearly tell Yahoo where I am. All of Yahoo's ads are for restaurant supplies Now, Google doesn't try to highlight local restaurants (to do that I have to say "restaurant near mountain view ca"), however the ads are geo-targeted, giving local restaurants (not restaurant suppliers).

Well, I don't think I'll be changing my home page any time soon. There are some things I did like about Yahoo's design (I really like that you can change from Web to Image search without the page being refreshed. Google should totally copy that idea). However, it's pretty clear why Google has so many users.

Tuesday, July 04, 2006

Today in performance

I started off today by using massif and the traditional memcheck to see where memory was allocated in the libgnomeui stack

  • We use libgnutls to handle ssl in gnome-vfs. This program mallocs 65 kb of memory in the intializer (which is called from gnome-vfs's initializer). I sent off an email to the address their website told me to use for bugs (no bugzilla!). If you are interested in fixing this on the gnutls side, the code to look at is gnutls_global_init, specifically the two calls to asn1_array2tree. In the mean time, I think we should fix this in GNOME by lazily initializing the tls library. I think this is a pretty rare use case. On my (basically empty) desktop, 18 processes are using gnome-vfs. That makes 1.1 MB (probably more, countin malloc overhead).
  • Noticed lots of allocations from inside glibc when calling setlocale (which the gnome option parser does). Turns out there is a glibc cache for this stuff, but Ubuntu wasn't packaging it. Filed a bug. I have 40 processes using the cache right now (even bash uses it!). Not using the cache costs about 70kb. That's 2.7 mb.

I then went on to look a bit at gedit, which I noticed was taking up a pretty high amount of memory

  • First sign of trouble: gedit loads python. Why? By default a plugin called "modelines" which describes itself as "Emacs, Kate and Vim-style modelines support for gedit". I'm not sure exactly what that does, or how I'd use it. Disabling just that plugin gives me back 3.2 mb of ram. It also made gedit feel faster to start up. Filed a bug suggesting that it either be disabled or written in C.

    It looks like python+gtk could use some memory optimization. For startup, a hello world in Python has 4 mb of private dirty rss. Compare this to Mono and Gtk# which takes only 2.7 mb. Granted, even Mono is large compared to 608 kb for a C based GTK app. I'll see what I can do about that some weekend :-).

  • Ubuntu's launchpad-integration library was taking up quite a bit of memory by allocating pixbufs. strace -eopen gave the issue away:

    open("/usr/share/pixmaps/lpi-help.png", O_RDONLY|O_LARGEFILE) = 17
    open("/usr/share/pixmaps/lpi-translate.png", O_RDONLY|O_LARGEFILE) = 17
    open("/usr/share/pixmaps/lpi-help.png", O_RDONLY|O_LARGEFILE) = 17
    open("/usr/share/pixmaps/lpi-translate.png", O_RDONLY|O_LARGEFILE) = 17
    ... (78 lines of this)
    

    Whoops :-). Filed a bug. Not sure how much this saves, as it's not easy to count how many times this code is used.

So, that's about 7 mb of memory from all these issues (and my estimates are fairly conservative -- I'm rounding things down, not counting malloc overhead, and looking at my desktop with just xchat, gaim, firefox, gedit, and a few terminals), I'd actually expect the total effects of these to be 8-10 mb. Even with 1 gb of ram, that's pretty large.

Monday, July 03, 2006

gnome cups icon leak

The printer status icon (gnome-cups-icon) leaks quite a bit. There is a Debian and a GNOME bug. It looks like there has been no maintainer attention to the bug since it was filed. Now that there's a patch, it'd be great to see some action. It looks like this is a fairly visible leak.

Sunday, July 02, 2006

Why does libgnomeui cost so much

So, after measuring the benefit of removing libgnomeui from a program, I thought I might dig deeper into the cause of the bloat. I first made three hello world style applications: one used plain GTK, another used GTK but also initialized GNOME VFS, the last used libgnomeui (which also uses vfs). Each of these three programs loads a superset of the libraries of those before it. I used smaps to gather data about the heap space allocated (used by malloc) and the writable mappings due to shared libraries.

Some observations to make here:

  • Malloced memory causes the most trouble at the gtk level. However, the gnome vfs and libgnomeui are still responsible for quite a bit of mallocing
  • libgnomevfs is the worst offender with respect to loading libraries.
  • libgnomevfs is a much larger jump in memory than libgnomeui

I then dug further into what libraries were being loaded by vfs and gnomeui. To get useful data here, I excluded the libraries loaded by gtk from consideration when looking at vfs and similarly excluded libraries loaded by vfs when looking at gnomeui.

  • Bonobo, ORBit...ugh
  • libgnutls, libxml2, and libgcrypt have quite a bit of writable memory. If they could be cut to 4 kb each, we'd save 50kb for each process with gnomevfs.
  • The "Other" category has all the 4 kb libs. A few worth special mention: avahi loads three .so files. First, having avahi here seems a bit silly; second, three libraries. Also, libpopt is used, isn't there something in glib for this now?
  • Maybe all those sound related libs should be dynamically loaded. Not many apps use sound!

Investigation to do

  • Look at the malloced memory. Valgrind is a good tool here
  • Look at the size of the writable memory in libraries mentioned here

Saturday, July 01, 2006

Kill libgnomeui

libgnomeui would be better named "libkitchensink". It brings in all kinds of libraries from avahi to zlib. How much effect would removing this dependency have on memory? I decided to try out on gnome-volume-manager (which handles volumes as in mounts, not as in sound). Hackishly, I commented out the session management stuff that requires libgnomeui. The results were pretty good.

That's 800kb of memory (I'm using the private dirty rss number, aka "the number that matters"). There are 17 processes on my desktop using libgnomeui right now. If we can remove the dependency from all of those, it would get us 13 MB of savings. In addition to the memory savings, this would likely speed startup speed quite a bit. FYI, the list of processes using libgnomeui right now is:

/usr/bin/gnome-session
/usr/lib/control-center/gnome-settings-daemon
/usr/bin/gnome-panel
/usr/bin/nautilus
/usr/bin/update-notifier
/usr/bin/gnome-volume-manager
/usr/bin/gnome-cups-icon
/usr/bin/gnome-power-manager
/usr/lib/gnome-applets/trashapplet
/usr/lib/gnome-panel/clock-applet
/usr/lib/gnome-applets/mixer_applet2
/usr/bin/gnome-terminal
/usr/lib/firefox/firefox-bin
/usr/bin/eog
/usr/bin/evolution-2.6
/usr/lib/evolution/2.6/evolution-exchange-storage
/usr/lib/evolution/2.6/evolution-alarm-notify

I'm quite sure that we can kill the library from many of those. If one looks for GNOME VFS (which is responsible for quite a bit of the bloat), there are even more processes that could use dependency pruning.