Direct Manipulation and Drag & Drop

Apple’s Human Interface Guidelines have long embraced the principle of direct manipulation: when the user handles the icons/text/etc on the screen, she should feel she’s “really” handling the objects they represent. This concept dovetails with that of perceived stability: The onscreen environment should behave as though it conforms to a set of predictable, understandable rules; i.e., the user should generally be able to explain how the screen got to be in its current state just by looking at it. Interfaces which implement these principles are the most transparent — this is one of the reasons why users of the original Finder often didn’t realize they were working with an abstraction.

A lot of little usability bits have been lost in the transition to and evolution of Mac OS X (though not as many as some people think, and more than some other people think); some are difficult to find or to judge, but one that’s quite apparent is the drag-and-drop process. Just implementing drag-and-drop isn’t enough to fully conform to principles of direct manipulation and perceived stability: in order to make the interface truly transparent, what’s going on should be explicitly clear throughout the entire drag operation.

But a number of apps introduce unnecessary abstractions when they generate dragging images (the graphic, often a translucent image or outline, that follows your cursor around as you drag). For example, can you tell which message I’m dragging in this screenshot?

Bad Dragging Feedback in Mail

No, it’s not the one about “wonderful russian brides” — you can drag any message, not just the one(s) selected. Mail’s drag image is generic: it doesn’t say anything what you’re dragging, or even how many items you’re dragging.

Consider also these examples from Safari:

Safari Single DragSafari Multiple Drag

If you drag one bookmark, you get a drag image which clearly shows exactly what you’re dragging; but if you drag multiple bookmarks you get an image completely unlike the first, telling you nothing more than “you’re dragging N bookmarks”. (Several other applications work similarly, including iPhoto and Address Book.)

By contrast, the Finder (and numerous other applications) show you exactly what you’re dragging by having the drag image be nearly identical to the static representation(s) of the object(s) being dragged:

Finder Dragging in List ViewFinder Dragging on the Desktop

You can tell exactly what’s going on in the above screenshots; no matter what happens during the drag operation, the dragging image clearly shows the objects being dragged. This helps to reinforce the notion that the onscreen environment behaves according to principles similar to those which govern the real world: When you pick up an object, you see it being picked up. This aspect of existence is so consistent that we can take it for granted; when objects on the computer screen work the same way, we can apply the same assumptions to them without even thinking about it.

Lame excuses

It could be argued that behavior such as Safari’s is acceptable because in order to drag multiple items you must select multiple items, and this selection is likely to remain visible throughout the drag operation. But that’s not a safe assumption to make in all cases: window ordering can change during a drag, and various types of “spring-loaded” drop targeting may cause the source of a drag to become obscured or hidden by the time you’re finally ready to drop. It could also be argued that dragging is typically a transient state, and thus the user isn’t likely to forget what they’re dragging. But that’s not always the case either: users of trackpads and various other input devices don’t have to keep a button held down in order to maintain a drag, so there’s nothing saying they can’t get up and answer the phone (or whatever) in the middle of a drag.

But even if you brush off the above as edge cases, is there appreciable value to adding further abstraction to your interface? Abstractions of abstractions and metaphors within metaphors can get hard to keep track of. Imagine a library where you’re free to browse the stacks, but when you pull a book from the shelf it’s magically transformed into a card which says, “You’re holding one book”… until you set it down on a desk or the checkout counter, at which point it magically changes back into the book. The real world doesn’t work like that — neither should the computer.

(Don’t read this as a general endorsement of the “everything should look and act just like real-world objects” school of thought — we’ve all seen how spectacularly Microsoft Bob and its ilk have failed. Rather, the idea is that it’s best to keep the number of abstractions in an interface small: “this represents this” is okay, but “this is a placeholder for this which represents this which is a shorthand for this” is to be avoided.)

Developer Considerations

Sometimes bad human interface decisions are made for the sake of ease of implementation. Here’s some pointers to help you stay on the right track when writing your own drag-and-drop code:

  • Design your drawing code so you can turn parts on and off and run through it multiple times. In a normal situation you can draw your background, selection highlight, supplemental text, etc. Then when you need to draw into a drag image, set a flag and run through the draw function/method again so it draws only the relevant parts of the items being dragged (e.g. in something like a Finder list view, you want to draw just the icons and titles, not the date modified, size, etc.)
  • Cocoa developers using the OmniAppKit framework will find additional methods for drag image control in table and outline views.