Multi-touch on the desktop

Now that the iPhone has given us all a taste of a multi-touch user interface, I have been hearing many people say how cool it would be to have touch-based input on a new line of desktop displays from Apple.

If you’re one of the people who think that a multi-touch monitor is a good idea, try this little experiment: touch the top and bottom of your display repeatedly for five minutes. Unless you’re able to beat the governor of California in an arm wrestling match, you’ll give up well before that time limit. Now can you imagine using an interface like this for an eight hour work day?

One of the things that people don’t realize about the iPhone is that it works at a low angle (as opposed to the high angle of your desktop or laptop display.) Our bodies are more comfortable and adept at handling repetitive physical tasks when they are performed at these low angles. What works well for the eyes does not work well for the hands.

If you’re old enough to remember a time before CAD systems, you’ll likely remember drafting tables. These tables were adjustable from completely flat (a low angle) to somewhere around 40° (a medium angle.) A drafting table is an environment where it is easy to work with your hands and associated tools for hours on end.

The iPhone’s multi-touch UI works similarly: if you watch people use it, I think you’ll see a lot more people working at waist level than at chest level. The only time you need the interface close to your head is when you’re enjoying those 3 pt fonts in MobileSafari :-)

Of course, Apple could come up with some kind of ergonomic multi-touch desk. Or we could all go out and buy a Microsoft Surface real soon now. However, I’m pretty happy with the recent demise of the glass-based CRT and not looking forward to the added weight that a touch based interface would add to a 30″ LCD monitor.

But even if there was a solution to the ergonomic issues, there would be problems mixing mouse-based applications (with small hit areas) with touch-based inputs (and large hit areas). Touch-based UI is not something you just bolt onto existing applications—it’s something that has to be designed in from the start.

You can already see this mismatch between the mouse-based and touch-based environments. All you need to do is view a web application that is targeted at the iPhone browser. In a desktop environment the controls seem large, but on the phone they are comfortably sized.

Resolution independent interfaces may solve some of the problems with control sizes, but the fact remains that a desktop interface has a much higher information density than a mobile one. A desktop is a multi-tasking environment while a mobile device is typically oriented towards a single task (making a call, finding a restaurant, getting directions, etc.) Don’t assume that the multi-touch you are using in a single task environment, with its lower information density and more focused interface, will be equally successful in a high density, multi-tasking desktop. Take another look at Jeff Han’s amazing demo and realize that he’s only working in one application at a time—what happens when you add a browser, an e-mail client, and some of your other favorite applications to that desktop?

I also find it difficult to believe that any kind of touch-based UI will replace the keyboard anytime soon. For people who are touch typists, you can’t beat the feedback of a key press for common applications like word processing and e-mail. Eventually we’ll have haptic interfaces with simulated feedback which will obviate the need for a separate keyboard.

The bottom line is that we’ve only just begun a journey that will fundamentally change the way we interact with machines. A major part of this change will be evaluating new and better ways to use computers—what has worked well in the past may not work so well in the future. And because of the magnitude of this change, I think there will be an extended period where touch-based, mouse-based and keyboard-based interfaces will need to coexist. If we’re not careful about developing these new interfaces, we’ll end up with something like Victor Frankenstein’s creation: pieced together and frightening.

Update: Tog agrees with me. Make sure to check out the Starfire video for ideas on how horizontal and vertical work surfaces can be integrated. Even though the cultural and technological elements are a bit dated, the human-centered design is still relevant.

Quartz and Javascript, sitting in a tree…

Even if everything isn’t copacetic in the land of “sweet”, at least Javascript and Quartz are getting along.

Thanks to Apple’s contribution to the WHATWG’s HTML 5 specification, it’s pretty easy to use Quartz graphics technology in an iPhone application. Together with MobileSafari’s event handling, you can start to do some fairly sophisticated drawing using a <canvas> element on the iPhone.

Here’s a sample application that draws and updates a graphic based on user input: canvas_test.html

(Make sure to resize your browser to have a 360 pixel height if you’re running on a desktop instead of the iPhone. And don’t be a fool: view the source.)

A few things to note:

  • The Javascript timer events only fire if the page is frontmost in Safari. Don’t make assumptions about when stuff will happen based upon your previous AJAX experience.
  • The minimum interval for the timer is much higher on the iPhone than a typical desktop browser. Change the setTimeout() parameter from 1000 to 10 milliseconds, and you’ll see that it’s not fast enough for serious gaming. You might also want to look at CPU usage on the desktop as a clue to why the iPhone developers chose to limit the timer interval.
  • It appears that MobileSafari isn’t very fast at recognizing mouse (multi-touch) events. Try pressing the screen quickly in different locations: you’ll see that many of the events are not captured.
  • I’ve said it before, and I’ll say it again. The finger is a very imprecise pointing instrument. Try to get the graphic centered at 100, 100 and you’ll see what I mean.

And if you had any doubts about the WHATWG being a good idea: this example works just fine in Firefox, Camino and Opera. I’ll let you guess how well it works in Internet Explorer…

Update: I built a graphing calculator using the concepts presented in this essay. Try launching the application on your desktop and iPhone: there’s quite a performance difference between the two environments. To understand the differences, I ran some benchmarks.

Bittersweet

Take a look at every application on the iPhone: what do they have in common?

The answer is a navigation bar at the top and a toolbar at the bottom. The navigation bar at the top gives the user a well known location for “backing up”, starting an editing session, and canceling operations. The toolbar provides a way to switch modes, change views or other operations with the dataset at hand.

Both of these key interface elements, navigation and tools, are in fixed locations on the iPhone screen. If you’re trying to develop a “sweet” web application for this device, you’ll quickly find that you can’t follow these standard conventions. That’s because there is no fixed positioning in Safari for the iPhone. Bittersweet, indeed.

The position of the navigation and toolbar items is very important since they allow the interface designer to avoid the “finger shadow“. When you are navigating, your finger can obscure content since it’s going to change anyway. Locating a toolbar at the bottom of the screen allows you to work with data without the risk of hiding it.

Some might argue that there’s no way to have fixed positioning on the iPhone since it’s based on view ports rather than scrolling. But this argument is quickly dismissed when you consider the <meta name=”viewport”> configuration supported by Safari on the iPhone:

<meta name="viewport" content="width=320; initial-scale=1.0; maximum-scale=1.0; user-scalable=0;"/>

This allows you to specify the exact size of the viewport your web application is using. It can also prevent or limit the scaling of the viewport, so you’re guaranteed to be working in a consistent coordinate system. Most iPhone web applications I have looked at are optimized using this feature.

The lack of fixed positioning in the browser also makes it clear that everything else on the iPhone is not a web app because the elements at the top and bottom of the screen don’t move. So much for Apple eating their own dog food. Instead, we end up with a dog shit sandwich.

(Note: These applications may well be using WebKit within Cocoa views to facilitate the rendering of content, but the layout of the views on screen is not done with HTML.)

So let’s look at how this affects usability. As an example, I’ll take a look at PocketTweets—a recently released web client for Twitter. A beautiful application that’s marred by the limitation mentioned above.

When viewing a page of tweets (posts) you will quickly find yourself scrolling like crazy on your iPhone. Switching a view requires that you scroll to the bottom of a long page, often causing Safari to render portions of the page just so you can get to the buttons. Obviously, these important buttons should be located in a fixed <div>.

This limitation with fixed elements also affects the amazing work done by Joe Hewitt. Take a look at the example code on the iPhone and you’ll notice that the navigation bar scrolls off the screen when there are more than nine items in the list (e.g. the song titles.) A fixed element would solve this problem.

For those readers who don’t yet have an iPhone, here’s an example that demonstrates how the fixed element should work, and how it’s broken on the iPhone: not_fixed.html When you are testing in Safari 3, you’ll need to resize the window so that the content area is 360 pixels high. When you switch to “absolute mode” you’ll have the same experience as on the iPhone.

Finally, it’s interesting to note that this bug implies that Safari on the iPhone uses a different rendering engine than Safari 3. There’s nothing else like it, even Safari 2. Obviously, this is a disappointment for Mac and Windows developers who were hoping to use a desktop browser as a proxy for iPhone development.

If you have an ADC account, I’d suggest that you submit a bug report so that the iPhone team realizes the importance of this bug. Please reference Bug ID# 5325294.

Postscript: While writing this essay, it became quite cumbersome to use “Safari on the iPhone”. Next time, I’ll use iSafari” even though it’s not very descriptiveMobileSafari“.

Update: Apple reports that this is a known issue, please refer to Bug ID# 5327029.

The HIG still matters, even with special effects

Summary

Changes to the Dock in Leopard do not follow the Human Interface Guidelines

Steps to Reproduce

  1. Set desktop background to a light and solid color to make the shadows appear more clearly. In the examples, I used Solid Mint.
  2. Make sure Preview and TextEdit are displayed in the Dock. Additionally, you can download and launch Transmit as another example.
  3. Look at the Dock.

Expected Results

The Human Interface Guidelines contain three salient points:

  • Application icons look like they are sitting on a desk in front of you.
  • Utility icons are depicted as if they were on a shelf in front of you. Flat objects appear as if there were a wall behind them with an appropriate shadow behind the object.
  • Perspective and shadows are the most important components of making good Aqua icons. Use a single light source with the light coming from above the icon.

Reference: Icon Perspectives and Materials and Tips for Designing Aqua Icons

Based on this information, you’d expect the Dock to use either the desk or shelf perspectives. You would also expect a single light source to be used.

Actual Results

The floor displayed on the Dock does not use the perspective of the desk in front of you, nor does it appear as a shelf. Because there’s a difference between the floor angles and the traditional desktop icon angles, many icons look wrong.

An example is the Trash, which has a slight tilt forward. The Transmit truck also looks like its pirouetting on the front-left tire.

HIG AngleFigure 1. Angle defined by Human Interface Guidelines

Leopard AngleFigure 2. Angle defined by Leopard Dock

Whoops!Figure 3. Whoops!

Also, the shadows displayed in the Dock are coming from three separate light sources:

  • The traditional icon shadow, where the light source is above the icon (traditionally from the viewer’s left-hand side.)
  • A new dynamically generated shadow which uses a light source in the top-middle of the screen.
  • Another dynamically generated shadow which uses a light source in the lower-middle part of the screen.

The dynamically generated shadows often conflict with the shadow added by the icon artist.

As an example, look at the loupe in the Preview icon or the halo on the Trash icon. The shadow underneath the Transmit truck is another example.

Dock ShadowsFigure 4. How many light sources do you need?

Another inconsistency is that the “built-in” shadow is shown in the reflection—the dynamic shadows are not.

Finally, the shadows make no sense at all when the Dock is placed on the left or right side of the screen. The shadows below the icon end up floating out in space because they have no surface to be cast upon.

Regression

Previous versions of the Dock used a shelf perspective and did not have dynamically generated shadows, so this was not an issue.

Hundreds of designers have been producing icons for tens of thousands of applications by following the Human Interface Guidelines. Changes to the Dock should respect these guidelines since changing existing artwork is not an option on such a large scale.

If you’re a developer or designer with an ADC account, you might want to let Apple know if this will be a problem for your applications. Reference Radar Bug ID# 5301211.

Update: It’s good to know that there is at least one other person who thinks this is a problem. This bug report is a duplicate of Bug ID# 5176881.

Update: More proof that perspective in the new three dimensional Dock is not well thought out: Physics still matter, even with special effects.

Update: I wouldn’t have written this bug report if the Dock in Leopard looked like this.

Update: Bet you didn’t know that the Dock also defies the laws of gravity.

Beyond sweet

Now that we all have our iPhones and are discovering what they can do, attention will turn to what they cannot do. And that, in turn, will lead to the fact that there is no third party development of native applications for the device.

Of course, you can use HTML and AJAX to do web-based applications. But as Steve Jobs said regarding their Google Maps implementation:

And, you know, that client is the result of a lot of technology on the client, that client application. So when we show it to [Google] , they’re just blown away by how good it is. And you can’t do that stuff in a browser.

Personally, I do not begrudge Apple for the lack of a Cocoa-based iPhone SDK. The amount of time and effort required to develop this new and compelling device is not insignificant. Discovering new metaphors and mechanisms for a touch-based user interface is much harder than it appears. Documenting best practices and exposing APIs for the internal frameworks are equally difficult.

There’s also the issue of maintaining a small and efficient footprint on the iPhone. It’s smooth and natural interface could easily be destroyed by a rogue application.

(I’m convinced that is also the reason why Flash and Java are currently off-limits, and why I think that will continue in the future. If you doubt this, try monitoring your CPU usage with iPulse while watching a YouTube video. As your fans kick in, think about how much battery the application is chewing up.)

Of course, the hallmark of a great user interface is that it looks easy. So easy, that it hides the underlying difficulty that went into its production.

I can guarantee you that the past couple of years have not been easy for the developers at Apple. I’m sure that countless approaches have been brainstormed, prototyped and evaluated. Version 1.0 of the iPhone is not the first attempt.

So even though we don’t have an iPhone SDK, we can begin this difficult process of rethinking our designs. And in many cases, HTML and Javascript can be used to prototype these redesigns.

The following observations are a result of my thinking about porting both our Frenzic and Twitterrific applications. Both are well suited to a mobile environment, and would be outstanding products for the iPhone. It’s also clear that the Iconfactory will be working on new versions of these applications, not just ports of existing user interfaces. It’s a brave new world.

Finger size limits the UI

An interface on the iPhone has much larger areas for hit testing. Try this experiment: press your fingertip against a ruler. You’ll see somewhere between 1/4″ and 1/2″ diameter at the point of contact. That corresponds to anywhere between 40 and 80 pixels of screen real estate on the phone’s 160 dpi display.

Contrast that with the typical 20 pixel hit area on a mouse-based UI and you’ll see your standard assumptions change quite a bit.

As an example, let’s look at a popular phone game: Bejeweled. Typically, this game consists of 8 columns of graphics which the user manipulates. With the iPhone’s 320 pixel width, that’s 40 pixels or 1/4″ for each column of graphics. Too small to manipulate with your finger, especially considering that this game involves working with adjacent pieces.

Another issue is that your finger has a “shadow.” Unlike a mouse, you can’t see what’s directly below the object you are pointing at. Primary control mechanisms reside at the bottom of the screen, unlike a desktop application whose control layout normally works from top to bottom. Tabs are at the bottom of the panels they control so your finger does not obscure their contents.

Throughout the iPhone UI, you see examples of how Apple has had to work around these limitations. Things like adaptive hit testing on the keyboard (using predictive word completion) and the magnifier for the insertion point are all due to the fact that our fingers are big and bulky when compared to a mouse pointer.

One hand or two? Portrait or Landscape?

A subtle, but critical, decision is whether you plan on having an interface that requires one or two hands to operate. Some iPhone applications, like iPod, benefit from being one handed in portrait mode and two handed in landscape mode.

Of course, one handed interfaces give the user more mobility and flexibility to do other things while controlling the device. It also limits what you can do in your UI.

It also appears that landscape mode is not conducive to a one handed interface since your thumbs have limited reach. Portrait mode gives you a lot more freedom: two thumbs, one thumb, or your palm and index finger.

You also need to remember that not everyone is right handed.

Simplicity is good, but it means stuff is missing

The iPhone’s UI is decidedly simple to use. But this simplicity comes at a cost. Features such as cut/copy/paste are notably absent. The same is true with drag and drop.

It’s conceivable (and likely) that clipboard operations will be added as gestures in a future version of the iPhone UI. Swiping your index finger to the left as a shortcut to delete is an example of how these common operations could be implemented.

On the other hand, I don’t expect drag and drop to ever become a widespread metaphor in the iPhone. That’s because dragging your finger on the display is reserved for scrolling. It’s possible that a tap to pick up, drag to scroll, and tap to drop interface could be developed, but that would be highly modal and something hard for people to discover. More importantly, it’s a difficult mode to escape from when it happens accidentally.

Another thing that is missing is loads of memory. Clearly, the iPhone team has been focused on the memory footprint of applications. Examples are the limited number of fonts available and the number of items each application can work with—8 windows in Safari, 3 panels in Weather, etc. Any application written for the iPhone will need to take these bounds into consideration. Swapping can’t save your ass.

Filesystem? What filesystem?

Of course there’s a filesystem on the iPhone. Crash logs show standard paths to files and folders (/System and /usr/lib, for example.)

But that doesn’t mean you get to show it to the user. Do you see any open and save dialogs on the iPhone? How is your application going to manage documents?

Much of the current document management is handled externally by syncing with iTunes. Getting a document onto the iPhone is a source of frustration with many earlier adopters—with arcane techniques like mailing items to yourself as a workaround.

In my mind, the iPhone is really a satellite device that’s dependent on another machine for managing documents. As the mobile environment evolves and becomes a more central player, we may see local management of information. Until that happens, it’s best to think of document management being either a simple set of +/- buttons or something that is out of your control (e.g. syncing.)

Another option is to rely on web-based storage mechanisms. It’s probably more viable in the short term and fits into the “iPhone as satellite” metaphor.

Tap vs. Gesture

Tapping is the action we are most familiar with as desktop application developers and users. A mouse click and a tap are analogous—button clicks are going to work the same way.

But things get interesting when you begin to consider gestures. Swiping to the right is a natural gesture for deleting: similar to drawing a line through a completed list item in your Moleskin.

Would a circular gesture be a natural way to represent undo? Or would a back-and-forth motion (like an eraser) be more appropriate?

Would pinching (squeezing) a text selection mean delete? Or would flicking your fingers apart combined with a poof be a better way to represent removal?

Think about the the original Mac user interface—the basic operations defined in 1984 are still with us. But there have also been a lot of things added as our knowledge of the desktop metaphor has improved. Chording with modifiers like Control and Option keys are a good example. These additions allow more advanced users to use the interface more efficiently without adversely affecting novice users.

I expect the same to be true with the iPhone UI. Gestures and other advanced features will evolve as the new environment becomes more familiar. It will evolve as its developers and users mature.

Putting these concepts into practice

A lot of geeks, including your humble author, are dreaming of a ssh client for the iPhone. Obviously it would be a great application for an Internet enabled device. But development of such an application is challenging in light of the considerations above. Let’s take a look at some of the issues and pitfalls:

  • The keyboard is going to take up at least 1/2 of the screen, possibly more if you include common shell characters like !, $, `, “, ‘, etc. That’s not going to leave you a lot of room to view output unless you implement some sort of show/hide mechanism.
  • Once you get the keyboard input sorted out, you’re going to want need shortcuts to deal with a command line environment has been traditionally bound to multiple key sequences: the Control and C keys simultaneously or the Escape key followed by the D key are good examples.
  • Similarly, managing the insertion point on the iPhone involves dragging a point on screen. How does this integrate with curses and it’s move left/right/up/down paradigm?
  • Predictive completion is addictive—it quickly becomes a necessity. Some kind of mechanism to complete shell commands and paths as you type would make the interface much more efficient.
  • You’re going to want a shell on the iPhone to be able to work in both portrait and landscape modes. Apple has raised the bar very high in this regard: the switch between modes needs to be seamless within the shell environment.
  • Courier. It’s a crappy font for a shell, but it’s the only fixed pitch font available. Deal with it.
  • No copy and paste. Get ready to retype commands from web pages, email and notes. Or come up with something more elegant…
  • The filesystem is hidden. Where is your .ssh directory going to go? How are your private and public keys protected? How do you manage your known_hosts and authorized_keys files? Try typing in a public key without copy and paste and then you’ll see the importance of these questions.
  • There are no scrollbars and arrow keys. How do you view output history (scrollbars) and command history (arrow keys)? Since there is only one scrolling gesture, do you go modal? If so, how?

All of these things can, and will, be overcome. The point I’m trying to make is that it will take a lot of thought, experimentation and feedback during the implementation. So let the thought and experimentation begin…

Update: Help us get Frenzic on the iPhone.

Update: Apple guidelines for developing web applications on the iPhone.