Making Offline Sites/Applications

Last week Amazon launched a new web based kindle book service that got the geekiest parts of the web all abuzz. Why? Because the site can run in complete offline mode and act just like it was online (well almost, more on that later). This might seem like a difficult thing to implement, but in reality it’s quite simple. They made their site using two of the new HTML5 features (hence it only working in webkit browsers, although their restrictions agains firefox is a little confusing); Local Storage and Offline Applications. These are completely independent items that can easily be used apart, in their case they just happen to be used together, I did the same in my example. First lets talk about the more time consuming one to implement, Offline Applications.

Offline Applications
To get your applications working offline you need to give it a local cache. This is done using a cache manifest, usually with called something like cache.appcache (originally it was cache.manifest, hence the name, but it seems the settled on a name back in June). Before we can do that however he need to make sure the server will handle that new file correctly, so we need to add a little something to our .htaccess:

Snipplr: http://snipplr.com/view/57820/offline-sites-in-html5/

The first two lines help with debugging and are not strictly required (but when your working on it you’ll want them on there) the last one is required. That last one tells apache to give any file with the extensions .appcache the type text/cache-manidest (obviously), this is very important because the cache file must be sent with that type or the browser will not recognize or use it leaving you without an offline copy. Those first two lines simply prevent the short term caching that your browser will do, all browsers do it and it’s actually quite useful, without it the web would slow if it could stay up at all, but for testing purposes it should be turn off temporarily.

Then we have to tell the browser to look for the cache manifest, to do this we need to add a menifest tag to out html opening tag, the results should look something like this when it’s all done:

Snipplr: http://snipplr.com/view/57820/offline-sites-in-html5/

Now that we got that out of the way we need to actually create the manifest file. Here is the one from the example:

Snipplr: http://snipplr.com/view/57820/offline-sites-in-html5/

Starting at the top, the first line, it always needs to be the same, simple re-iterates what the file is doing. The Next line with anything on it is the network header. This tells the browser what items are whitelisted to grab from the web, in this case it’s everything when possible. Firefox doesn’t support using just the * wildcard like the rest of the browsers do, so adding the http:// (or https:// for a secure page) before it will make sure your not leaving them out.

Then there is a cache header. Below this is every file that you want to cache. Things to note about this section; you don’t technically have to add the file your placing this cache manifest on, it will be included by default, and you can’t link to ssl items on other servers (except in chrome, where there not following the guidelines). Unfortunately you can’t use wildcards either, so keep that in mind while creating your offline sites and applications. It’s a good ideal to place every file your going to have viewable offline in the manifest, that way if a user comes in at an unexpected place they can still see the whole site when offline. If you have a really large site you might want to consider caching just the included files (css/scripts/images) and leaving the rest to get cached as it’s loaded. That way while they don’t have your entire site when offline they do have they they’ve been to. This will also prevent the initial load from being abnormally long and keep the cache size at a more reasonable level.

Another section not in my example is a fallback. It’s given a head just like the others but it contains a page to load when it can’t find the requested page in the cache and can’t load it from the server. Something like the “/ offline.html” would take them to a simple offline page when all else fails. Note the space between the “/” and the “offline.html”, the first part before the space is the page it will replace if it can’t find it, the second is the page it will replace it with. In this specific case it will replace anything at “/” (aka as everything on your site) and if it can’t find it take you to offline.html instead.

Before we go into debugging this when you have a problem I think it’s important to understand how this works. When you load the page the browser checks for the for the cache manifest, if it find it it’ll download everything from it and the next time you load it, or any page covered by it, it will use the cache instead. That is, of course, assuming it actually downloaded everything without a hitch, one wrong file and the whole thing will stop. While the user won’t notice this while online, it wont be visible while offline and will act just like every other site out there. Google Chrome is a great browser for this because it’s inspector console tells you every file it’s loading and the status, you can easily see which one 404s on you and correct the problem. If all is well that browser will also tell you and you’ll know your ready to go offline.

When the browser comes back to the page after it’s cached it will display the cache like mentioned above, but if you have an internet connection it will also checks the cache manifest again, if it changed then it will re-download everything and (assuming no errors in the process) will feed you the new cache version on the next page load. That means if you changed your stylesheet and updated the cache (more on this in a bit) you wont see the changes until you next time you reloaded. This means double reloading to check every change, quite a pain. I found it’s better to just not use your cache file until you have everything ready, that way your not double reloading every time you make a change.

Something important to note about this whole process, the browser is only going to check the cache manifest, not the files within it, so if you do make a change to something like a stylesheet or javascript file the browser will never bother updating for you. There are a few solutions to this, one would be to change the cache manifest file name dynamically, add an unused get variables to like the current timestamp and delete that bit of code before you launch, the other is to update the cache manifest every time you make a change to a file beyond it. It can be any change, simply adding a comment with the current revision number will work. I still believe that whenever possible though it’s easier to just not use the cache file at all until your as done as possible with the site, I do this by changing it’s filename so when the browser goes looking for it it 404s and reverts to using the code on the server. Hopefully by this time you’ve squashed all those layout and scripting bugs and have little to no changed needed – debugging the offline application will be so much simpler that way.

There are is one large but obvious limitation to this I don’t feel I need to mention but I’m going to: You have to the information cached before it will work in offline mode. Amazon for example doesn’t cache the help sections of the site, and as such if you go there when offline you get the traditional “no network connection” error. Make sure to cache absolutely everything that you will need and for good measure you’ll want a fallback if anything might be used that isn’t backed up (I left it out in my example because there are one 3 pages in it, and no other links).

I have an example of this on my Local Storage / Offline Applications Example that you can see in action. I suggest loading the page, killing your internet, then checking the links at the bottom of the page. You should be able to load those up without ever going to them and without a connection. The first page of the example is the for local storage, which I’ll go over next.

Local Storage
Local storage is the second part to a making an offline application. Having no server connection means that you can’t save the data normally, so you have to put it somewhere. Cookies work but are limited to 4KB total, and your probably already using a few for authentication and such. Local storage however it a bit more flexible, only limiting you at 5MB. Local storage uses javascript for it’s interactions, and as such I’ve used jQuery in my examples to save time and energy.

My local storage example comes in three parts; saving the data, retrieving the data, and deleting the data. I think the logical start point is saving the data:

Yup, thats it. localStorage is the reserved name, and you can treat it like any other object and easily use the square bracket notation. In that case I’m storing the data from my text area into the localStorage unit called “localstorageexample”. Couldn’t be simpler. That information will stay there until the user clears it from their browser or until you do. One thing to note, the user may get a message asking for permissions to store the information first, but thats relatively minor and not something you have control over anyway.

The next section is retrieving the data, and as you may have guess it’s just as simple as saving it.

Snipplr: http://snipplr.com/view/57821/local-storage-in-html5/

In this case I’m doing the opposite, taking the data from the “localstorageexample” unit and placing it into the textarea. when they load it’s like nothing ever happened. Finally we should know how to clear it. You can easily just set it to '' but if you truly want it done you have to use the removeItem() function. Again, simple.

Snipplr: http://snipplr.com/view/57821/local-storage-in-html5/

Thats all there is to it. I have a Local Storage example running on the Local Storage / Offline Applications Example page, they both play nice (as you’d expect sine they are at their core very different) and you can play with it. If your familiar with javascript than using the local storage should be nothing different for you, in fact it’s quite simple. For a full fledged application your going to want to do a little planning. I think the best rout is to take the data you get and store it in the localStorage till you get an internet connection again and then using ajax to store it on the server. That is, of course, if you have a server side part to your application, if not they just let it sit on the local users computer until they decide to get rid of it.

This is al still some relatively unused tech in the HTML 5 spec. Amazon did an amazing job implementing it so well, but it’s not outside of the reach of any do-it-yourself programmer. I have a few ideal for larger, more complete sites to use this, but there all back burner stuff, I’m currently to busy with other project to spend any real time on my own. If you have any questions about this let me know with a comment below, I’ll help you out as best I can. Also, if you use this information to create something great I’d love to see it, post a comment about it.

Leave a Reply

Your email address will not be published. Required fields are marked *