Skip to main content


Showing posts from 2020

HashFile: A disk-based hash structure

Previously , I introduced a problem I was trying to solve where a large datastructure was being pinned into memory for occasional lookups. This post delves into the implemented solution which pushes it onto disk but retains (relatively) fast lookups. I think using a database or a B-Tree is a good solution to this kind of problem in general, but it was fun and inexpensive to implement this utility, and it turned out to be generally useful. Bear with me if you already understand HashMaps pretty well, because I'm basically describing a HashMap here, but it's a disk-based HashMap. Logically, the data consists of a series of key-value pairs. The keys and values are of variable size, because they contain strings. If we were to write only the values to disk in a binary format, we might have something like this for the JSON example in the previous post : There are two records, at offsets 0x00000000 and 0x000000A4 . If we had some way to map a key to one of these offsets, we could seek

The case of the unwieldy HashMap

Some data structure was pinning over 70MB of heap space in Android Studio. Our developers have limited memory on laptops, and are often upset about memory consumption in general. This behemoth (retained by our internal plugins) was the second largest allocated and pinned single object in the whole of AS's heap. buck project generates IDE project definitions from buck targets. It can be configured it to emit a target-info.json file, which contains simple mappings that look something like this: { "//src/com/foo/bar/baz:baz" : { "buck.type": "java_library", "intellij.file_path" : ".idea/modules/src_com_foo_bar_baz_baz.iml", "" : "src_com_foo_bar_baz_baz", "intellij.type" : "module" "module.lang" : "KOTLIN", "generated_sources" : [ "buck-out/fe3a3a3/src/com/foo/bar/baz/__gen__", "buck-out/fe3a

In the lunch line with Larry Page

The Whisper project (which became Nearby ) got started as part of the Google+ org. The Google+ team sat in the same building as Larry Page, and we'd often see him in his office or walking around the building. Google building 1900 to the right One day, a few of my teammates and I were standing in line at Cloud Cafe, in the restricted part of building 1900 where the Google+ team sat. Right in front of us in the line was none other than Larry Page himself. One of the engineers on the team struck up a conversation with him, and Larry asked us how Whisper was going. We hadn't launched anything yet, and were deeply in the midst of doing crazy cool things with ultrasound. My teammate answered quite honestly that it was hard, and it was taking much longer than we expected to get it to a dogfoodable and eventually launchable state. Larry asked him why - what made it hard? Probably a fairly innocuous, polite question, but coming from the CEO and founder of Google, it sort of takes on an

Shoes and secret projects with Vic

Google used to have a tradition called TGIF. It still seems sad that I'm talking about this in past tense , but hey ho. At TGIF, the founders and other executives got up on stage, welcomed nooglers (new Google people), introduced a bunch of prepared speakers who talked about interesting things that were going on, then opened themselves and other executives up to pre-voted and audience questions.  There are many infamous things that happened at TGIF during my time there that I can't talk about, but I was physically present in Charlies for these: the time they let off fireworks inside  Charlies to celebrate a Nexus device launch and gave everyone in the company the new device the time they announced that everyone at Google was getting a significant pay raise and bonus. People went wild. There was screaming and yipping. It was like a music concert in the 80s. the several times Patrick Pichette came by with a huge backpack of cash and everyone got an envelope with 10x$100 bills as

The very scientific microkitchen testing event

One of the things I miss most about the office are the microkitchens. Facebook and Google both have fantastic selections of yummy snacks to fuel folks through the day. It's actually quite great for adhoc conversations and just getting away from your desk for a bit. I've tried to replicate this by purchasing a box of Funyuns from Amazon to keep at home, but it's just not the same. As I'm... you know... pathetically eating my Funyuns at home on my own with my shorts on. At Google, the microkitchens were legendary when I started in 2008. But by the time I left in 2019, they had changed a lot. The stock was intentionally kept low, and the snacks were healthier. This wasn't always a popular thing. For a period of about 3 years or so, the snack selection, which used to rotate fairly regularly, was frozen in time with the same set of things. I think this was due to Google trying to plan the future of the microkitchen program, and it just took a while. Or something like tha

It's a like a startup inside a startup, Topaz

At Google, I knew a great engineer (let's call them Topaz) who, after working on a bunch of different teams and being pretty successful at what they did, decided to join the hot new team that was hiring like crazy. It was pretty exciting for them, they told me many times. For me, I've been in some of those situations where you're anticipating something new in your life so much you have the most vivid dreams about it actually happening. As if it were actually happening now. It was like that for this person.  Now this was part of Google was hot at the time; it was in the realm of all things social when Google was trying to do that. It was breathtaking how all-in Google went on the social stuff so quickly, and there was a buzzy aura around the teams who worked on it. The building they were in had restricted access (because reasons), still a relatively rare thing at that time, and its own not-so-secret restaurant.  So, the big day came, and Topaz joined the new team (let's

Intercepting behavior with java agents

A previous post showed using JVMTI to log method calls in a non-intrusive way, and without having to make modifications to upstream libraries. JVMTI is much more powerful than that post showed - for example it can replace and modify code in a running JVM altogether, which can be useful for things like logging or performance measurements, but also intercepting or changing behavior at runtime. It is, however, quite cumbersome to write code for that sort of thing in C or C++ using the JNI interfaces. It turns out Java provides a higher level interface to instrument or redefine classes using the Java programming language itself. This post will demonstrate a ridiculously simple example of such an agent. You can find the example code in GitHub . A simple program Let's start off with the really simple program that we want to instrument. The Greeter class does the time honored thing of saying Hello World . We've for some reason awkwardly and weirdly moved the Worl

Aapt2: Tower of Babel

OK, we'd fixed a bug that brought us back to our baseline build speed with aapt2 . But could it be made even faster? This is the final part of a three part series about adventures with  aapt2 , Android's resource compiler / optimizer. You can read the intro bit and get more context  here . The second post explained a small fix that resulted in a nice performance win. Big Android apps like the hypothetical ones from that hypothetical company that I hypothetically work for tend to contain a lot of strings that are translated into hypothetical languages (er, maybe I didn't need that last hypothetical). However, during the compile-run cycle, most developers are usually working with a single language. This is not to understate the importance of testing with a variety of languages, but it's reasonable and normal to restrict things somewhat in dev builds for developer efficiency reasons. There are really a lot of strings in a lot of languages in some of these hypothetica

Aapt2: Please don't delete me!

By making one weird change in aapt2 , we sped up our build by 45 seconds. Developers love that stuff. This is the second in a three part series about adventures with aapt2 , Android's resource compiler / optimizer. You can read the first bit and get more context  here . Proguard is an optimizer that many Android apps use. It can do nifty things like removing unused code and resources, inlining things that have no real reason to be in separate methods, and even obfuscating symbols so you can pretend like nobody will ever be able to figure out what your clever code is doing. In modern Android, the  r8 shrinker  has a similar function, and is driven by proguard configuration files. However, r8 / Proguard can't always figure out if something is used, or sometimes optimizes more aggressively than you'd like. Configuration directives can be used to tell it to keep things that would otherwise be removed.  aapt2  has options that let it emit configuration files for resource

How I learned to love aapt2

The Android Asset Packaging Tool ( aapt2 ) takes all those lovely resources (images, strings, cat pictures, and whatnot) in your Android app, and compiles them into a binary format that the runtime understands. It's also the thing that generates numerical identifiers and constants for them in , which is the class you use to refer to resources in code. You should rarely have reasons to interact with aapt2 directly, since for most Android developers, it's something that happens automatically during a build with Gradle, or your build system of choice (e.g. Bazel , or in our case Buck ). Suffice to say, you're either doing seriously hardcore interesting things, or you're maybe working on a build / developer infra or something like that (we're hiring!) if you're interacting directly with this tool. aapt2 operates in two phases; in the compile phase, it converts individual resources into a binary representation (either a . flat file or a . flata,  which is j

Dynamic Method Tracing in Java: The Implementation

In the last blog entry , I talked about the need for a tool in Java that can be configured easily to log method calls without redeploying a binary, attaching a debugger, or obtaining root in order to trace system calls. In this blog, I'll dive into some of how the tool was implemented. A caveat: I knocked this tool together in my spare time one afternoon while working on a bunch of other things, so it's a bit rough around the edges. I'm also not a particularly experienced C programmer, so apologies if my code is a bit rough.  However, it's already proven useful to me, and hopefully either the tool itself or the approach will be useful to others. I found a general lack of detailed information about using JVMTI when doing research. OnLoad: Config, capabilities, events, and callbacks The main entrypoint to a native JVMTI agent is the Agent_OnLoad function. I want to do three main things when my agent is loaded: Load the configuration file so we know which breakpoints to s

Dynamic method tracing in in Java

Are you a printf debugger? I certainly am. IDE based Debuggers (or tools like jdb / gdb) have their place, and I use them a lot too. But I often want to know when and why some function is called in contexts where attaching a debugger isn't an option.  As one example, a lot of people at my current company use IntelliJ / Android Studio. Like other large companies, we have a gigantic monorepo. Understanding how and why the IDE accesses the filesystem can help us understand performance problems. Everyone experiences slightly different behavior with the filesystem depending on what has been changed, and what's being cached. I'd like to turn on verbose logging of filesystem operations remotely for particular users so we can diagnose issues they're having, and I want to do this in a way that doesn't degrade performance. Just add logging (or println !) This is often a simple and good choice, although it often leads to this kind of thing:  public void doSo

Hello again :)

After a long absence I'm planning to blog a bit more actively. This is partly motivated by the fact that I've been working on some quite fun, gnarly stuff recently, and I think it'd be useful to me to get back into the habit of writing more about the things I'm futzing around with. Quite a lot has changed since I last wrote in this blog eleven and a half years ago (oh, wow), in the world, and in my own life. The world seems exceptionally messed up in a variety of ways, and I wish it'd be better, but I sort of try to be eternally optimistic about things and hope that we're going through a tough patch and we'll all pull together and make things better (and I'm not just talking about Coronavirus, although that's pretty terrible in its own right).  In my own life, since I last wrote, I got married to the most amazing person in the world, grew two cheeky and rambunctious kids who are lovely and keep me terribly busy, and manage to surprise and fascinate m