Wednesday, January 11, 2012

Simplicity and JavaScript modules

All of us are looking for simplicity, but there are different levels to simplification. This is a story of what could be considered simple for modules in JavaScript. This post was prompted by the removal the optional AMD define() call in underscore.

For a post on simplicity, it is a bit long, but I'm not a great writer, and I find I normally edit myself so much as to lose interest in posting, so then I end up not communicating. Better to start communicating even if imperfect.

I want to lay out why AMD modules are the simplest overall module solution for JavaScript at the moment, and where other approaches are not as simple as they may appear.

Script Tags

JavaScript does not have syntax for modules, but most programming languages do. "import", "require", "include" seem like popular choices.

JavaScript on the web has meant using script tags and manually ordering those script tags so that dependencies on global objects are worked out correctly. It is also important that those dependencies execute in order.

That sucks for the following reasons:
  • You use a separate language, HTML, to specify your JS dependencies and their order.
  • A script's dependencies can be unclear.
  • If you want to later load some code on demand, it is very difficult to work it out so that the scripts execute in order. This has gotten better over time -- newer browsers that support the async="false" attribute help with this. But older browsers still suck.
So what happens?
  • You constrain yourself to only creating simpler applications that can get by with concatenating all the scripts in a certain order, and depend on server tools to help you write out your HTML with the script tags and the rewrite the page after a build to not have those script tags. Think Ruby on Rails or Django.
  • Toolkits start building APIs to do this work for you. Dojo and YUI were some of the very first to do so. The Closure library does too, although it was more of a cousin to Dojo.
Option 1 is clearly not appropriate for the full spectrum of web development. There are HTML game engines, webmail/office suites, and then browser-based "no server" development like Phonegap-backed mobile apps.

So it is best to have some API or syntax in JavaScript to get units of code. For it to work well, we should all be using the same API. Otherwise things get messy real fast.

CommonJS

The CommonJS group tried to work out a system for doing this. It is pretty simple too:
  • require('some/id') to reference a bit of code. Since a string literal is used, it allows for easy static analysis of dependencies -- no more manually ordering script tags! The ID also has a convention of mapping to a partial path. So you can get by without having to specify full URLs and it opens up for ways to map an ID to a path does not fit the ID-to-path convention. This is really helpful most immediately for mock testing, but it has many other uses.
  • Modules do not give themselves a name, they are anonymous.  This is great -- if you load require('jquery') then you do not have to have special knowledge to access some global with different spelling, like jQuery. Same for 'backbone.js' loading an object called Backbone. The end user is in control of the name.
  • exports is handy for circular dependencies. Yes you should avoid circular dependencies. If you have one, there is a good chance you are doing it wrong. But there are valid circular dependencies, and a script referencing system needs to support them.
  • module is important because modules are not named. Sometimes though you need to know the name they ended up with, and sometimes from what path, because you may have some non-JS assets you need to reference. module.id and module.uri give you that ability.
There were some wrinkles though:
  • They did not formalize a way to export a module value that was not just a plain object with properties. It is extremely common and useful to want to export a function as the module value. jQuery is probably the most known example. Constructor functions are another set of useful export values.
  • It was not so simple to get it to work in the browser.
AMD

The original participants on the CommonJS list thought it was better to blue-sky the development of a module syntax, not be held back by what might work in the browser, just what might work with existing JS syntax. The group also started off as ServerJS, so they were also in the mindset of what would work best on the server, where file I/O is cheaper/easier.

The hope was that if they worked out a module system that worked well with the existing JS syntax but had problems in the browser, hopefully they could convince browser makers to plug the holes to make it possible. In the meantime, since they were developing servers that could run JS -- they could just bundle up/transform the JS using server tools, or offer a compile step, while browsers caught up.

However, for those of us who came from Dojo, requiring a server tool or compile step to just develop in JS was a complication. I'm going to mangle Alex Russell's quote on this, but "the web already has a compile step. It's called Reload".

Why force the use of a tool to just start developing? It should be simple: just a browser and a text editor.

This was accomplished by taking the CommonJS format and allowing a function wrapper around the code:

define(function (require){
    var dep = require('dep');
    return value;
});

Or, the shorter form::

define(['dep'], function (dep) {
    return value;
});

Since a function wrapper was used, it meant:
  • the scripts could load in any order they want, easy to parallelize even on old browsers.
  • "return" could be used to return the module value, even functions.
  • No special tooling was needed to convert source to an acceptable browser format.
  • It worked in browsers, today, and would in the future.
So the story around modules gets very simple, no special tooling to start, no worrying about "I need to run a converter to share this module", no special sets of instructions for your server of choice to do conversions.

Yes, a loader library is needed, but one is needed in any case, there is no native JS syntax. That is the basic ante for any module system that scales up beyond a simple JS concatenation for a Rails application.

For some people, a function wrapper with a level of indent was not seen as simple enough. However, designing a system without it meant a bunch of complexity once you stopped looking at an individual file. A miscalculation of the overall complexity cost.

Loader Plugins

Another simplicity in AMD systems: loader plugins. Loader plugins would not be considered if you had copious, synchronous IO capabilities. However, by fully embracing the network/async loading on the web, you start to see how creating a loader plugin that treats a dependency as a simple string as useful.

Some dependencies are not static scripts, but could have more complex loading (Google Maps code) or simple HTML templates that need to be loaded for the module to be useful.

In AMD, this can look like so:

define(function (require) {
    var maps = require('googleapi!maps?sensor=false'),
        template = require('text!form.html');

    return function (data) {
        //You can  synchronously return a value
        //based on the template.
        return template.replace(...);
    });
});

Compare that with a non-plugin approach. Do you want to load those two resources in parallel? That would be ideal, but then to do that, you probably need something like a promise library to help out. And now you have two problems. I mean that partly in jest -- I use a promise library for some types of code -- but it is definitely a complexity hurdle to jump.

The async networking also means your module is not completely ready until those resources are available, so your public module API now must be a callback API. For "simplicity" assume you just load the dependencies serially, to avoid some of the promise-isms.


define(function (require) {
    var googleapi = require('googleapi'),
        text = require('text');
        googleapi.fetch('maps?sensor=false', function (map) {
            text.fetch('form.html'), function (text) {
                //dependencies are now loaded.
            });
        });

    return function (data, callback) {
        waitForDependenciesToLoad(function (data, callback) {
            callback(template.replace(...));
        });
    }
});

Loader plugins simplify async development, and some of their resources, like the text templates, can be inlined in a build file with other JS modules. Loader plugins give you simplicity and speed.

If you like transpiling other languages into JS, they are great for that purpose too.

ECMAScript

The ECMAScript group wants to get modules in for the harmony effort, the next ECMAScript release. They chose to not go with the CommonJS syntax, but what did they did choose looks fairly similar on the surface. It favors the introduction of new syntax to get some advantages with compile time checks. You can check out the following for more information:

Those links have changed over time, so the rest of this feedback may not be valid in the future. As I read it today, I do not believe harmony modules give much benefit over what AMD can do now, but harmony modules do introduce more complexity, and has some unanswered questions:
  • It uses module Foo {} syntax for declaring an inline module, where the module ID, Foo, is a JS identifier. However dependencies can string names/URLs. How do you optimize this code for delivery in a web browser? In AMD, string names are used both for references and for the module names. This is better because it allows for loader plugin IDs that can have their output inlined in build output, and it ties module IDs more directly to the string names used in dependency names.
  • It does not support a loader plugin API out of the box. You can construct one by using the module_loaders API, but at that point you need to ship a .js file for the "loader", so now the developer has the same complexity cost as AMD today.
  • New syntax means the module support cannot be shimmed in older browser via a runtime library. It requires a compile step/server transform to work in older browsers. This is one the same problems the CommonJS format had.
  • It is unclear how a JS library that works in browsers without ES module support "opts in" to register as an an ES module since ES modules use new syntax. Maybe the suggestion is to use the module_loader's loader.createModule syntax? I cannot see how that fits with the compile-time export checking though. If a runtime capability check cannot be used to opt in to ES module registration, it makes it very hard to upgrade web libraries to ES module syntax. We're back to the problems that made it hard to use CommonJS syntax on the web.
The new syntax in harmony modules is used to enable some compile time checking of export names and if they are referenced correctly.

However, compile time checking to see if an export name is use correctly is a very small benefit for the end developer given the other costs above. Furthermore, I want more than just an export name/type check. I want intellisense on the arguments that can be passed to functions, general data type information and comments on usage.

Working out a comment-based system that can reflect this info into text editors will provide much more value. Since it is comment based it fits in with old browsers. I know it is harder to agree on that comment syntax (mostly because it is easy to bikeshed), but it will simplify developers' lives more, and lead to faster turn-around time on development.

If something like loader plugins are not natively supported, and the optimization story is not sorted out, it does not have an advantage for AMD today, and AMD is simpler.

However, the harmony module_loaders API is really useful. I'm not sure it needs to be as big as it currently is. I would be fine with something like Node's vm module API as a start. Basically, some container or vat I can load scripts into without interfering with other scripts. This is one area that is very hard to do on the web today.

So for me, the "simple" solution for any ES6 module-related work:
  • Make it a module API, not new syntax. Then we can shim it easier for older browsers, and existing libraries and capability check it and opt in. If you need a suggested API, I hear AMD is quite nice. It has a few implementations and has been used in the real world.
  • Support loader plugins natively. It really helps with async programming, and optimizations. If I need to ship a library to deliver the benefits that loader plugins give for async programming, then there is very little motivation for developers to switch away from AMD.
  • Provide something like Node's VM API, maybe with a little bit of the intrinsics stuff in the current module_loaders API.
  • Do not bother with compile time checking of export names. Instead put effort into a comment based system that can reflect more information for use in editors. That will save developer more time. Since it is comment based, it will work in older browsers and optimizes out cleanly.
I will post this feedback to the es-discuss group. I wanted to hone my feedback before giving it to the es-discuss group, but this will have to do, otherwise I may never give it.

Underscore and AMD

This brings us to the recent code change to remove the AMD block from underscore. The arguments for removing it are listed here, but I think the main argument is really about what is simple for Jeremy: He does not need it for the kinds of sites he builds, and by adding it, he has gotten reports of problems.

Those problem reports go away if he also exports underscore as a global, but he does not think he should have to do that if AMD is available. I do not think that is a fair bar to hold for AMD, since he would have to do the same for a harmony syntax, so his lib could be used in cases where ES loading and old style global loading is still in play.

Some feedback on his specific reasons:

Folks requesting other module formats for other loaders.

No other module format comes close to the level of support AMD has: AMD has multiple implementations, better support in other libraries (Dojo, jQuery, libraries friendly to Ender-bundling, MooTools), it is used in real sites, and has a thriving amd-implement list and thriving implementations. What else can claim all of those things? What else is being asked for inclusion?

I can appreciate it is easy for someone to come up with a loader syntax and want people to use it. I think AMD has gone the extra steps though to be considered more of a standard. But it depends on what you want to use as for standards of legitimacy.

If any library that depends on Underscore (and there are many of them: http://search.npmjs.org/) does not yet 'support' AMD, but Underscore does, things get royally screwed up.

npm is for node modules, not browser modules, so they would not have the error that was being reported for Underscore.

That said, I do believe Underscore is a common dependency for browser-based code. But for the browser, as mentioned, registering a global is perfectly acceptable for this transition period, something I believe will need to happen anyway no matter what modular format is chosen. There will always be a transition period.

In an ideal situation, libraries do not have to be modified to support a particular script loader (or group of loaders).

The point is to make developer's lives simpler. A script loader like that does nothing to help order the dependency tree correctly, or use a naming convention that can be parsed easily by tools like optimizers.

The browser globals with implicit dependencies just do not scale well past maybe 15 dependencies? Not everything fits as "lets concat all the scripts" Rails app. But I'm open to seeing a design that might scale up.

JavaScript's upcoming native module support is entirely incompatible with AMD.

They are very compatible as far as base semantics. As mentioned above, ES harmony modules are still baking, not ready for prime time. But even with that, it would be very easy to convert AMD code to harmony module code -- since the dependencies are all string names it fits in. Loader plugins would be a problem for sure, but basic modules are easy.

It is much more compatible than the current "browser globals and implicit dependencies" approach.

Loading individual modules piecemeal is a terrifically inefficient way to built a website. Because of this, there's the great RequireJS optimizer, which will turn your modules into ordinary packages.

The browser globals approach already depends on build tools, so I'm not sure why this is a knock against using AMD. A bunch of manually typed HTML script tags perform just as poorly.

Fortunately, since dependencies can be easily statically determined with AMD calls, the developer no longer has to worry about manually figuring out the load order, and there can be (and are!) many different build tools built on top of the standardized AMD API.

Summary

In the end, though I just think it boils down to Jeremy not needing this personally based on the scope of his work, and he has other things he would rather work on. It is hard to get a standard of legitimacy in this area, so it is easier to just wait it out. For the particular issue in Underscore, the simple export of a global even in the AMD case would have solved the issue, but such is life.

I'm a bit sad because Backbone and RequireJS have been a very popular combination. They fit very well together. The thought of maintaining a fork/branch is distasteful, particularly since the AMD patch for Backbone was smaller than the code it replaced.

Auto-wrapping tools are difficult to do generically given how scripts want to grab globals. The dependency name can have weirder names that do not match the file name, so it loses the nice, simple dependency parsing of AMD module IDs. It means creating a centralized list of global names that map to IDs/paths. Not very webby/distributed. Not very simple.

Oh well. It probably means AMD just needs a bit more time out in the wild, even more adoption, and we'll see how people feel in 6 months.

Another Approach?

I have heard complaints from folks on the internet about AMD from time to time, but they have not offered anything better, particularly given the simplicity tradeoffs mentioned above. I think it is just an NIH thing most of the time, or getting it mixed up with generic browser script loader.

Here is a survey of things I know about:

Ender is fine if you want to just build a file that you will not change often and use it in place of jQuery, but it really does not help with the larger site structure and loading issues, being able to dynamically load. It is not a general module system for the web. It basically is just like the CommonJS "do a build before starting development" approach. Same as SproutCore/Ember. Same complexity problems as mentioned above, and complicates individual module debugging.

Dan Webb started something with loadrunner. It supports part of AMD, and the "native format" it supported do not seem better than AMD, maybe just another function nesting and different API names.

Ext has something similar to AMD, but mixes in a particular class declaration syntax into the format, so that will not fly as a general solution.

YUI is close, but obscuring what a dependency name loads on a Y instance makes it hard to associate what came from what module, and the API to add a module relies on naming the module and putting named fields on the Y instance. This will lead to name collisions.

What else am I missing?

One thing these approaches all have in common: they get away from the browser globals and implicit dependency approach used by traditional browser scripts. If you think something using that traditional pattern is the future, please describe how it might work. Remember, requiring build step to just start developing is a complication. That does not scale well across all the types of JS development mentioned above.

While Jeremy and I were talking in the documentcloud room, he mentioned that Ryan Dahl, if he could do it over again, would prefer not to use CommonJS modules system for Node, but do something closer to how browsers load scripts in script tags.

I think I have heard that comment too, but in the context I heard it, it seemed like an off-hand comment. I'm not sure how much that was just about having to wade through CommonJS discussions or actual problems with the module API. I would love to hear more about it though. Ryan or someone how has more info on this, if you happen to see this post, please clue me in. I'm on github, the amd-implement list, or on gmail as jrburke.

I want to get to a workable, simple solution that works well in web browser too. I only do AMD because I need it and I think it is the simplest overall path for end developers. But I want to solve other higher level problems. I want something to good to win. It does not have to be AMD, but I do think it hits the right simplicity goals particularly given browser use. It will not be brain-dead simple because upgrading the web is not simple. But it does a pretty good job considering.

I feel like the folks doing AMD have done their due diligence on the matter. ES harmony may be able to do something a bit different, but the basic mechanisms will be the same: get a handle on a piece of code via a string ID and export a value. The rest starts to look like arguing paint colors.

Anyway, enough of that. Back to making things to make things better.

12 comments:

Collin said...

Amen!

I personally prefer module = require("module")

But this is the biggest issue I personally see with JS today.

Part of why I'm on board is for the "run it in the client and the browser." Which is QUITE a tall order today if you want keep your sanity.

Too many libraries are wrapped in so much code attempting to figure out if this or that needs to go here or there.

I've found temporary happiness with packing everything together serverside ala stitch. With a combination of setting script tag innerHTML and replaceState you can convince some browsers to give you individual files for debugging. And if your module/module.exports wrapper doesn't introduce new lines your line numbers match up just fine.

Bart said...

I admire the effort en energy pumped into the issue, but this aspect is haunting me:

I dislike the fact that in CommonJS the module names/paths in require() have to be string literals. This so some loader script can scan/parse and try to extract them and preload/bundle the resources so it can fake syncronous require()'s.

Why? It's like attaching magic behaviour to something as elementary as calling a function with an argument: we should be able to use variables for the actual value (it's a plain method call ffs!), or use it with Function.apply(), or pass references to it or whatever funky stuff you do with functions.

A javascript function should behave like one, and be properly evaluated (executed) so it's actual runtime behaviour is what counts, not some arbitrary, 'slapped-on', co-parsing syntax rule like requiring literal strings.

Anonymous said...

"npm is for node modules, not browser modules, so they would not have the error that was being reported for Underscore."

I believed this as well and thought @isaacs would back me up on it when I commented on Dustin Diaz's ender page. Unfortunately @isaacs was in full support of NPM being used for browser modules.

You can see the full discussion here: http://dustindiaz.com/ender-cli

Anonymous said...

James - sincere thanks for pressing the issue. Your efforts are very much worth it and appreciated. While transitioning into the world of AMD has had its bumps (mostly due to situations as described above), I'd never imagine going back to the pre-module dev experience. It's a real shame we can't gain some kind of consensus on modules in the JS community... but it does feel like it's only a matter of time.

James Burke said...

Collin: glad to hear stitch is working well for you. Server side tools are great, I just do not think they should be mandatory to start development.

Bart: it is not just about dependency scanning, it allows another awesome feature: you can swap out the implementation for a module with just a loader config or file name change. You do not need to mess with the source of either the consuming module or the dependency. Real world example:

If backbone uses AMD to load dependencies, it could just require('jquery'). However, if I want to use Zepto or some other jQuery-like lib, and it called define() as an anonymous module, then I can just name zepto.js to jquery.js (or map jquery: 'zepto' in loader config) and I am all set.

Backbone does not need to then be notified of every jQuery-like lib and do the awkard $ = root.jQuery || root.Zepto || root.Ender thing it does now.

It also makes it possible to build code that does dump globals in the page.

tbranyen: that is good to know. I still do not think the list of npm usages corresponds to to the number of browser libs that use underscore. I definitely agree though that there are libs that do depend on it. Exporting a global in the amd branch fixes the problem for those libs.

Anonymous said...

I wrote a tool for building node modules into one source file that can be run on the browser or any non-CommonJS environment.

https://github.com/rolandpoulter/node_modulator

James Burke said...

Anonymous/Roland; it would be good to put in your README how your project differs from browserify, which I have also seen used to bundle up node modules.

If you need a wrapping format for the modules, you might look at using AMD as that wrapping format instead of inventing your own.

The almond AMD shim has been used wrap up node modules for delivery to the browser.

Andi said...

AMD should be supported, at least by basic libraries, so that less advanced users can just start to use a loader. Some thoughts about my own solution and problems that I see in the context of platform independent JavaScript:

Solution

I am sure that a great solution (that works both in node.js and the browser) is already existing. Write your node modules as you are used to, wrap them with AMD like that:

define('moduleId', [ 'require', 'exports', 'module' ], function (require, exports, module) {
// Here lives the original code
})

RequireJS is analyzing all internal require() calls and is resolving dependencies properly.

Problems

1) What is missing is a server+client API to load modules with variable ids. RequireJS supports require, but the syntax is not the same as in node. This is why I add a new function on the server AND the client: require.async, with the same syntax as RequireJS' require and a separate implementation on the server-side.

2) I think this is very important: DRY
This leads to another problem: Tag parts of your code as server/client only. I will present my solution soon. It seems to be overly complicated to tag some parts of your code server or client only. But I am using this solution in a current project - with great success. This opens the way to true platform independency = code focussed on pragmatic problem solving with flexible platform contexts.

James Burke said...

Andi: You could also use amdefine to enable define() to work in Node, and the RequireJS optimizer will remove that if() block for the amdefine shim, so there is not an impact on the browser.

AMD supports anonymous modules and a simplified CommonJS wrapping, so I think it is possible to support the code in multiple environments with reasonably compact code.

The other option is to npm install requirejs and use requirejs as a node module to bootstrap into loading AMD modules.

numan said...

I don't really understand why your patch was not accepted.

If it exports to globals and registers as a AMD module, then what is the main reason why the patch was not accepted?

Eric said...

I came on this post after searching for information on AMD usage. I must say, several of your statements are misleading.

First, you made a general comment about CommonJS being unable to export a function. True, but Node-style modules (based on CommonJS) can export a function, and that feature is used all the time (and encouraged) in the Node universe.

Second, you stated that npm is for Node modules, NOT for browser modules as if it is some sort of indisputable fact. While it's true that npm modules should be able to run as Node modules, a whole lot of npm modules run equally well in browsers, and quite a few even have comprehensive cross-browser compatibility test suites.

Third, you can use underscore in node with require('underscore'). Surprise! That also works in the browser, assuming you're using a browser packaging solution that's compatible with Node-style modules, which brings me to my next point...

Yes, you're missing something: There are browser packaging solutions for Node-style modules. Perhaps the most popular is Browserify, written by Substack, one of the top npm package contributors.

There is a new trend growing in the JavaScript community -- using Node-style modules as an authoring format, and (optionally) AMD as a transport format. As it turns out, it's pretty trivial to automatically wrap Node-style modules for delivery as AMD modules.

And if you want to, you can skip the AMD step and deliver them with Browserify, instead. (My preferred solution).

It's good to have choices.

James Burke said...

Eric: Thanks for the comment. Some context for this post:

It was written over a year ago, and definitely some things have changed since then.

The main point was to express how AMD properly addresses the full breadth of module loading for the browser case.

It is true that node has come a long way, and they do support, even encourage the single anonymous export style.

However, the module system in Node does not support things like loader plugins, or a callback-style require for loading some modules on demand later.

Not all use cases need those capabilities, but for first class modules in the browser, they are important, particularly for some performance-minded cases, where delayed loading of functionality can really help.

browserify is great for taking some npm-installed modules and bundling them for use in the browser. For me, though, it is more analogous to the asset pipeline in Rails than something that provides a full module solution for the browser.

It is still very useful though, and if the developer is using Node as their server already and they just need to pack up some JavaScript for the browser, it is a great way to go.

On using npm for browser-based code: yes, it can be done, but only so far as the code fits in with how Node and npm does things. npm, and node, are not interested in considering concerns for browser usage.

I prefer to not rely on that type of contract, and prefer to have the best tool for the job, something that is willing to consider front-end needs. But for a developer already using Node for their server, I can see where using npm even for their front end code is nice.

I wish browserify would have used AMD for the wrapping of modules instead of making up its own thing, but I can see where it would have been hard given that it is restricted to what Node's module system provides.

Hopefully with ECMAScript modules, that will bridge some of the gaps though, and may improve that story over time. Once we have a common foundation for modules, it will be much easier to mix and match bundling approaches, and even package manager approaches.

Thanks for providing an update on what is available now for Node-based developers!