discoveryd Clusterfuck

I usually keep things fairly clean on this site. I have a simple metric: would I be embarrassed if my Mom read this post? As you’ve probably guessed from the title, this post is going to be different.

So, Mom, it’s time to stop reading. I’m pissed off and you know how I get when that happens.

In case you’re wondering what I’m talking about, look at this shit. A network process using 100% of the CPU, WiFi disconnecting at random times, and names, names (1), names (2), names (4). All caused by a crappy piece of software called discoveryd.

I started reporting these issues early in the Yosemite beta release and provided tons of documentation to Apple engineering. It was frustrating to have a Mac that lost its network connection every few days because the network interfaces were disabled while waking from sleep (and there was no way to disable this new “feature”.)

Regardless of the many issues people were reporting with discoveryd, Apple went ahead and released it anyway. As a result, this piece of software is responsible for a large portion of the thousand cuts. Personally, I’ve wasted many hours just trying to keep my devices talking to each other. Macs that used to go months between restarts were being rebooted weekly. The situation is so bad that I actually feel good when I can just kill discoveryd and toggle the network interface to get back to work.

Only good thing that’s come of this whole situation is that we now have more empathy for the bullshit that folks using Windows have suffered with for years. It’s too bad that Apple only uses place names from California, because OS X Redmond would be a nice homage.

It’s no secret in the tech community that discoveryd is the root cause of so many problems. There are even crazy workarounds. With so many issues, you’d expect some information from Apple explaining ways to mitigate the problems.

Nope.

The only explanation I can come up with for this astounding lack of information is that there’s some mid-level product manager at Apple who’s covering their ass. I hope this person who’s responsible for withholding advice feels good about themselves, because the rest of us hate them with the burning passion of a thousand suns. Being stingy with knowledge in an engineering organization is a fucking stupid career move.

To give you an idea of how helpful a tiny piece of information is towards people’s productivity, let me give you a simple example that’s already saved me hours of frustration.

For months, I’ve seen bullshit like this in Bonjour:

Phantom

That shows the xScope service on the Mac that provides data for the Mirror on iOS. In that screenshot, the service is being shown as available on three devices: one with just an IPV6 address, one with no IP addresses, and one with a duplicate IPv6 address and a valid IPv4 address. The name “CedarX” was the only way I could find to prevent names from incrementing (and breaking things that use the host name of that device.)

The “funny” thing is that this Mac is running the latest version of 10.10 with fixes for “WiFi issues”. And after tweeting about it in frustration, I got this response:

I followed Hendrik’s advice and guess what? No more network issues.

Bonjour keeps a cache that’s shared amongst devices on the network. This is so that if the device is asleep, another one that’s awake can provide the necessary information. I suspect that a device running an older version of discoveryd poisoned this cache. For some reason, the invalid cache information couldn’t be corrected by a newer version of the software which screwed things up in the first place.

But this is all just conjecture because Apple hasn’t written that fucking tech note.

This situation also shows another important aspect of the discoveryd clusterfuck: this code is all over the place. It’s in use by iOS, OS X and presumably whatever is running on the Apple Watch. As such, any one of those devices can poison Bonjour for everything else on your network.

This workaround is fairly simple if you’re on a home network where you have direct physical access to the all the devices. But as we all know, wireless networking is essential in places like an office, an airport or a coffee shop. Good luck rebooting everything in that kind of environment. And what happens when someone running an older version of OS X connects to that network and poisons it? Time to reboot!

You also can’t rely on software updates to fix everything: I have both an Airport Express and Apple TV that are no longer receiving fixes. Having to buy new hardware because of crappy software adds insult to injury.

Ironically, these issues are most likely to affect Apple’s best customers. The more devices you have, and the longer you have them, the more likely you are to get an unstable network. The only advice I can offer is to restart your entire network.

C:\ONGRTLNS.OSX

A New Way to Display

To date, Apple’s Retina displays have relied on LCD technologies. Since the iPhone 4, our mobile devices have used in-plane switching (IPS) technology to achieve high resolution with accurate color from a wide range of viewing angles. The IPS LCD panels are also present in many of the desktop monitors we use on our Macs.

Chances are good that’s about to change with an OLED display in the Apple Watch.

Unless you’ve done work on Android, you’re probably unaware how different AMOLED displays are from LCD. Let’s take a look at what lies ahead.

Physical Differences

An LCD display relies on a backlight that’s projected through three layers: one for each of the primary colors. Red, green and blue appear on screen because crystals in those layers can be aligned electrically to allow the backlight to pass through.

OLED has no backlight and has only one layer that produces light. That layer is an organic compound that emits light when subjected to an electric current. When you put them in a two dimensional matrix of transistors, you end up with an AMOLED display.

Based on these short descriptions, it’s easy to see why an OLED is thinner than its LCD counterpart: there are simply fewer layers of electronics. And it gets even better when you discover that the compounds and electronics can be fabricated on flexible plastic substrates.

LCDs have always been problematic in direct sunlight because the backlight must pass through filters which can also reflect ambient light. This has also been an issue for OLED displays where the light is emitted below a reflective metal cathode. The good news is that recent advances with the technology are producing brighter results than an LCD.

The biggest challenge for OLED is with the organic material where light is emitted. Basically, byproducts of the chemical reaction that produces light accumulate over time and reduce the efficiency of the output. Again, this is an area where manufacturers are focusing their efforts. In just a few years the lifespan of these devices have increased by several orders of magnitude, but is still limited to tens of thousands of hours. Don’t expect your Apple Watch to become a family heirloom.

How Not to Do It

OLED displays have gotten a bad rap on mobile devices primarily because of a thing called PenTile.

PenTile mimics how our eye works: 72% of the luminance we perceive is determined by the green wavelengths of the electromagnetic spectrum. The RGBG arrangement of sub-pixels lets a display get brighter without increasing the overall number of transistors needed. This, of course, keeps manufacturing costs down.

Unfortunately this physical layout of the light emitters also makes colors grainy and text hard to read. Color accuracy also suffers. PenTile is also a trademark of Samsung. I can’t see Apple using this approach in their Retina displays.

It’s much more likely that Apple’s industrial designers have been working hard to find a new and better way to use OLED technology without losing fidelity. I can’t wait for someone to look at the Apple Watch’s display under a microscope.

Black is Best

From Apple’s point-of-view, one of the most important things about OLED is how it consumes power. A transistor on the display only uses energy when it’s producing light. Compare this with an LCD backlight which must be lit in order to see any pixel.

Folks with OLED displays on their Android devices have figured out that a lot of black pixels makes their battery last longer. So too Apple.

It’s also important to remember that each pixel on the display has a limited lifetime. The less time the OLED spends producing light, the longer it lasts.

One of my first impressions of the Apple Watch user interface was that it used a lot of black. This makes the face of the device feel more expansive because you can’t see the edges. But more importantly, those black pixels are saving power and extending the life of the display. It’s rare that engineering and design goals can align so perfectly.

And from what we’ve seen so far of the watch, that black is really really black. We’ve become accustomed to blacks on LCD displays that aren’t really dark: that’s because the crystals that are blocking light let a small amount pass through. Total darkness lets the edgeless illusion work.

Flat Black

I’ve always felt that the flattening of Apple’s user interface that began in iOS 7 was as much a strategic move as an aesthetic one. Our first reaction was to realize that an unadorned interface makes it easier to focus on content.

But with this new display technology, it’s clear that interfaces with fewer pixels have another advantage. A richly detailed button from iOS 6 would need more of that precious juice strapped to our wrists. Never underestimate the long-term benefits of simplification.

The Future

Apple is a company that likes to leverage its technologies across a wide range of products. Look at how many of their devices are using IPS LCD displays now, then imagine a move to an OLED display pioneered by the Apple Watch.

When Jony Ive taps the home button on his iPhone and says, “The whole of the display comes on. That, to me, feels very, very old.” it’s a sign that individually addressable sources of light are the wave of the future.

Of course it will take time for this to happen, but you know it’s going to look awesome when they’re done. And along the way, we’ll learn to think about pixels differently.

Xcode Compromised

There’s a very good chance that Xcode has been compromised.

The article refers to “Xcode” generically, but as we all know, there are a lot of pieces to this puzzle: I’m going to examine a few of them below. It’s your job to think about how these things might affect your own products.

Code without a source

Any libwhatever.dylib in your project contains code you’re not compiling from source. You don’t really know what that code is doing, do you?

To get an idea of the extent of this problem, try this:

$ find /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk/usr/lib/ \
  -name "*.dylib" -or -name "*.a"

I count 190 libraries available to iOS in the current version of Xcode.

Of course, open source code is easy to review. But even there, if you miss a simple goto, it can have serious consequences.

Third party build phases

Do you use something like Crashlytics? Every time you do a build the Crashlytics.framework/run binary is executed on your behalf. Does this code do anything beside process your .dSYM files?

Who knows? I sure don’t.

Chain of trust

As we’ve recently seen, breaking the chain of trust is remarkably easy given the right amount of privilege and malice.

We have all seen Xcode verify before launching. There are even simple ways to disable that check.

Unfortunately, these checks only verify the delivery of the binary code. There’s no check of the package’s origin or the toolchain used to create it.

Trusting trust

You can’t even trust your compiler. If LLVM is somehow compromised, even clean source code can become an attack vector.

Now think about the other problems above and how a compromised compiler can make them much more likely and pernicious.

Fuck.

Updated March 10th, 2015: More details on the specifics of the attack can be seen here.

“Must Fix for Next Release”

In the current version of xScope, there is a memory leak caused by a change in OS X 10.10.2. While the Loupe is in the background grabbing the screen, something in the frameworks is leaving images in the autorelease pool. The fix is literally two lines of code that forces the pool to empty.

But that’s not why I’m writing now.

This fix was submitted two weeks ago on February 2nd. A week later it went into review and was quickly rejected. The problem was that a buy button was accessible from our Help window.

The bulk of the help is static and built into the app, but there is a part online that we can update easily. This makes it really easy easy for us to add tips and other useful information for our customers. But since it’s just a web browser, it’s possible to wander into a part of our site that shows a header with mentions the word buy which is not allowed per rule 7.15. (Yes, the buy button is for something the customer has already purchased and is actively in the process of using, but technically it’s still a violation.)

My issue is the way that we must fix these problems. In this particular case, the issue was resolved by editing some HTML on our server, not by changing anything in the app itself. But we still must submit a “new” binary and go through the lengthy review process again. This is a huge waste of time for both developers and app reviewers (who are clearly lagging behind these days.)

I think there’s an easy way to fix these minor transgressions that would benefit both parties: add a new kind of approval with strings attached. A “Must Fix for Next Release” state where the app can go into “Ready for Sale” but the issue remains in the Resolution Center. At that point, both the app reviewers and developer know that an issue has to be dealt with before it’s approved the next time.

It would be like getting pulled over for a broken taillight on your car. You don’t need to visit your mechanic immediately to get the problem fixed. But you’ll certainly have to get things in order the next time you register the vehicle.

Please be sure to dupe Radar #19921616 if you agree that this would be a good change for iTunes Connect.