Tuesday, September 04, 2007

Firefox Caching

Federico posted about some work he was doing on making Firefox not cache as many uncompressed bitmaps in memory. I was playing around with the cache stuff and noticed something: my Firefox cache is full of youtube videos. YouTube videos aren't exactly the best thing for Firefox to cache. My internet connection is fast enough that streaming the videos works just fine. I suspect that most people who use online video frequently do so on a connection that can support streaming (otherwise, YouTube would be painfully slow, and they'd go do something else).

It turns out that Firefox's cache is based only on least-recently-used. So, let's say you have a 50 MB cache. Right now, all 50 MB of it is full with cached javascript, css, images, etc. You go to youtube and start watching a 10 MB movie. 20% of your cache gets blown away. In all likelihood, you'll never view that video again.

Even worse is if you listen to a flash-based media player. The MP3s that this downloads are cached just like anything else. So if you listen to 50 MB worth of music your disk cache gets blown away.

Probably LRU isn't the best technique to use here. I'm not sure how one would evaluate various choices (what is a representative test set of browsing sessions?)


Anonymous said...

Well, maybe you could create a "large stuff" cache on the side. Anything that goes over, say, a meg or two gets moved into the large stuff cache.

There are times when, for example, I need to reload a page because I need to enable Javascript, or something crashes and FF needs to be restarted. It'd suck to have downloaded 50 MB worth of video, and then have to start all over again.

Just a random thought.

Ben Maurer said...

Yeah, having a large stuff cache (or, for the same effect -- limiting the amount of cache large objects use) was a solution I thought of.

When making eviction decisions I think frequency should be taken into account somehow.

What's really needed is a good framework for evaluating eviction policies.

Anonymous said...

That explains why Firefox is faster if i run squid on the same computer. Seriously.

Anonymous said...

YouTube doesn't stream. It's progressive download.

quark said...

Yea, Firefox should indeed have a new caching algorithm. The one in use now isn't nearly intelligent enough. I think a good strategy would be to weigh content after a Bayesian algorithm where certain properties makes a file last longer in the cache while other properties makes it last shorter.

Small files should be weighed higher than large files, for example. And certain file types (like JavaScript, CSS and images with a seemingly static URI) should weigh higher than Flash, video and audio files.

To just split the cache in two is in my opinion not enough. The two caches would still not be intelligent enough about what it preserves and what it purges. At least not when the default size for the "small cache" is 50MB: My guess is that this one gets purged rather often.

Anonymous said...

How about using the mime types reported by web server? Afaik for instance flash videos, downloaded mp3 and other files have that distinction from the small and actually useful ones (html, images, css).

All the sane servers should report them correctly. In case not, the browsing continues but with slightly degraded performance at some minor situations.

I wouldn't really use the size alone, as there are huge images on pages (that I really might want to cache), sometimes even huge html, ...

Anonymous said...

Do you think it's firefox or the flash plugin which set it into the tmp dir ?

From my point of view, it's the current implementation of the flash plugin...

mariuz said...

on my system i enabled the cache to be >1G
so i have fast flash loading and caching also for all the js and images
Next would be to install an local proxy with more than 20G cache
My hdd bandwidth is greater than my isp bandwidth

Michael Wolf said...

I don't think your assumption about how long people are willing to wait for videos holds.

Dumb youtube users (there's no lack of them -- read a few videos' comments if you can stomach it) don't realize the value of their time.

The smart ones open a video, hit play, hit pause immediately after, do something else in another tab or program entirely while the video loads, and come back to it later.

Ben Maurer said...

Michael, I don't mind the video being cached while it's being played. Maybe there should be a different accounting category for "items that are being watched right now". But if you have 10 youtube videos open, that shouldn't completely blow away your cache.

Ben Maurer said...

I like the ideas about using the mime type.

On websites, javascript and CSS are loaded synchronously while images are loaded asynchronously. In a sense the javascript and CSS are much more important to cache because they cause latency problems.

Further, HTML documents are much more likely to change than static resources.

Again, what we really need is a way to analyize various eviction policies. There are lots of good ideas, but in order to get a patch in (and to make improvements), it's important to measure what is being done.

Stuart said...

Image data is stored in a separate cache from the file cache fwiw.

Ben Maurer said...

Well, it seems like image data is stored both in an in memory cache *and* in the on disk cache. The in-memory cache is for putting the image on the screen, and stores the image uncompressed.

Anonymous said...

