Little 418


The resulting entity body MAY be short and stout

Moving out of Amazon Drive

tl;dr:

I’m moving my data out of Amazon Drive (formerly known as Amazon Cloud Drive). I have 4TB across 1,000,000 files. I’ve struggled to download my data, but I found some tricks to make it easier. Here they are in a handy listicle.

More detail

Every couple of years, someone announces an unlimited capacity cloud storage product targeted at consumers. Then, inevitably, a few jerks with multiple terabytes of data swoop in and ruin the deal for everyone.

I’m one of those jerks. I’ve migrated 4TB of data from one consumer cloud storage provider to another over the course of several years.

With their prices increasing significantly, it gave me a challenge: download all of my data using only the official sync client. This blog entry describes the lessons I learned from that process.

When you install Amazon Drive, you can select a folder to use for synchronization. On macOS the default is ~/Amazon Drive, and the setup configurator prevents you from pointing it to a removable disk. This is a bummer because modern computers have smaller, faster boot disks. None of my computers have a boot disk bigger than 1TB.

Changing the configuration file in ~/Library/Application Support/Amazon Cloud Drive seems to make the sync client angry, but there is another way: symbolic links.

# Dangeously stop the sync client with this shell-fu, or just quit from the menu
$ ps -ef | grep 'Amazon Drive'  | awk '{print $2}' | xargs -n1 kill
# Delete the old target
$ rm -rf '~/Amazon Drive'
# Swap in your removable storage
$ ln -s /Volumes/4tb/ '~/Amazon Drive'

The client is happy to sync down to a removable disk behind a symlink, but I have no idea what will happen if the disk is removed while the client is working (I found out… it purges all client metadata, and you have to start over). So, don’t do that.

Tip 2: Use a SSD boot disk

tl;dr:

Your sync host computer must have an SSD boot disk. The sync client has very poor performance when managing metadata on spinning disks.

Details

My primary computer is a laptop. I often carry it with me. This means it’s disconnected from the Internet, and not a great sync host.

I had this brilliant idea of dusting off an old mac mini from 2011, putting the Amazon Drive sync client on it, and letting it churn away for a few days to recover all of my data.

This did not work. First, Amazon Drive spent two days Preparing. After that, file synchronization proceeded at about 10 files per minute, regardless of their size. There were a few spikes of CPU and network usage, but nothing that explained the glacial pace. At this pace, it would not finish until early October.

I did what any engineer would do, and whipped out dtrace. A little probing found the problem. The sync client was doing a staggering number of tiny, scattered I/O operations. This probably has something to do with their heavy use of SQLite. Check this out:

~/Library/Application Support/Amazon Cloud Drive$ ls -l
-rw-r--r--   1 mim  eng  758280192 Jul 31 00:58 amzn1.account.MSSM74Z-cloud.db
-rw-r--r--   1 mim  eng      32768 Jul 31 12:00 amzn1.account.MSSM74Z-cloud.db-shm
-rw-r--r--   1 mim  eng  212966952 Jul 31 14:55 amzn1.account.MSSM74Z-cloud.db-wal
-rw-r--r--   1 mim  eng       4096 May 28 14:24 amzn1.account.MSSM74Z-download.db
-rw-r--r--   1 mim  eng      32768 Jul 31 12:00 amzn1.account.MSSM74Z-download.db-shm
-rw-r--r--   1 mim  eng    2171272 Jul 31 14:00 amzn1.account.MSSM74Z-download.db-wal
-rw-r--r--   1 mim  eng        129 May 28 14:25 amzn1.account.MSSM74Z-settings.json
-rw-r--r--   1 mim  eng   81358848 Jul 31 14:56 amzn1.account.MSSM74Z-sync.db
-rw-r--r--   1 mim  eng      65536 Jul 31 14:31 amzn1.account.MSSM74Z-sync.db-shm
-rw-r--r--   1 mim  eng   44982192 Jul 31 14:56 amzn1.account.MSSM74Z-sync.db-wal
-rw-r--r--   1 mim  eng       4096 May 28 14:24 amzn1.account.MSSM74Z-uploads.db
-rw-r--r--   1 mim  eng      32768 Jul 31 12:00 amzn1.account.MSSM74Z-uploads.db-shm
-rw-r--r--   1 mim  eng    2171272 Jul 31 14:00 amzn1.account.MSSM74Z-uploads.db-wal
-rw-r--r--   1 mim  eng        352 Jul 31 13:01 app-settings.json
-rw-r--r--   1 mim  eng        368 May 28 14:24 refresh-token
-rw-r--r--   1 mim  eng         32 May 28 14:23 serial-number
~/Library/Application Support/Amazon Cloud Drive$ sqlite3 amzn1.account.MSSM74Z-cloud.db 'select count(*) from nodes;'
1077668
~/Library/Application Support/Amazon Cloud Drive$

Yeah, that’s over a gigabyte of SQLite databases! Some tables have more than a million records. Count queries take a few seconds, and toggling an option in the client sometimes can trigger millions of SQLite queries across multiple databases. This had the read head of my spinning disk thrashing back and fourth. Fortunately, random access penalties are much lower on SSDs.

Tip 3: Take smaller bites

The client is more stable when attempting to sync fewer files in one batch. Sync at most 100,000 files at a time, allow it to finish, and then sync another batch.

If you try to sync too many files at once, the client gets CPU and memory hungry, slows down, and becomes unstable. If the sync request is over 1,000,000 files, the client may start crashing on launch. Once this happens, you must delete the SQLite databases, and start over.

Tip 4: Don’t copy files into the sync path

Don’t copy files into sync client’s target path. This means no attempting to help it along by copying in previous partial download attempts. Let the client sync every file down itself.

Copying files into the sync path confused my sync client, and it delete a bunch of stuff from Amazon Drive. If you suspect this happened, don’t panic. You have a few days to restore files from the web interface. Sign in, navigate to trash, and restore deleted files from there.

Conclusion

At this pace, I’ll be able to download all of my data out before the new rates hit for me. Yay!

In retrospect I should have written my own sync client on the API, or tried to get the possibly-banned rclone client working. However, I did enjoy the adventure in exploring how the sync client works.

With this migration wrapping up, I’ve given up on consumer cloud storage products. They’re too painful to use for large volumes of data. It’s time to switch to an enterprise storage product so I can use real APIs to move data around, and benefit from SLAs and deprecation policies.

Update

I shared this post around, and got some great feedback on r/DataHoarder, the subreddit for people who laugh at my meager 4TB of accumulated data.

Here are their proposed solutions:

  • The Syncovery client supports Amazon Drive. The interface is a bit complicated, but it actually works! I was able to slurp my data down using the trial, and plan to purchase a real license next time I need to cart my data around.
  • The Amazon Cloud client runs on Windows Server. So, if the final home of my data is Google Cloud Storage, I could run it on a Windows virtual machine.

Thanks for the advice Redditors! :)


Windows XP at DEFCON: Preparation

I’m on my way to my first ever DEFCON. It’s a very popular hacker / cybersecurity conference in Las Vegas that some people compare to Burning Man for hackers. I’m super excited.

Over the years I’ve received advice about how to survive the event without getting pwnd. The advice ranges from don’t use the open wifi with your work laptop to don’t bring any electornics at all.

But, I’m a fool. So instead, I’m bringing the least secure computer possible to see how deep pwnage can go. Then, after the event, I’ll try to figure out what happened.

Laptop & OS selection

You’d think this would be as easy as finding a Lappy 486 at a thrift store and slapping Windows ME on it, but you’d be wrong. Laptops that old can’t connect to modern wifi, and might not even be able to mount malicious USB devices. I’d probably go the whole conference without getting hacked, and that’s unacceptable.

I need vulnerabilities that are still in the wild, and a computer that is capable of connecting to modern wifi networks. Specifically, I need a computer that meets these requirements:

  • Vulnerable to attacks that people still remember
  • Capable of connecting to modern wifi and USB drives
  • Swappable boot disk, in case it becomes unusable in the middle of a session
  • Cheap enough that I won’t feel bad if someone manages to kill it entirely

Laptop Fail: Ideapad U110

I discussed this with my helpful coworkers. One of them had recently found a laptop in a recycling heap: a Lenovo Ideapad U110. It seemed to meet most of my requirements, and had a bonus feminine red case.

I started adjusting it to fit my needs, but quickly encountered some issues. The boot disk was connected with a delicate, proprietary ZIF40 cable, and required partial disassembly to access. I attempted to repurpose some components widely available for iPod hacking to change it to a CompactFlash, but I only ended up learning way too much about the challenges of booting Windows XP from CompactFlash.

After hours of hacking away I coerced Windows XP to install, and boot off of a comically long chain of adapters: ZIF40 to CompactFlash to SD to MicroSD. Unsurprisingly, it was slow and flaky. It booted about 30% of the time.

I decided my time was better spent coming up with easter eggs, so I cleaned up the U110, and moved on.

Laptop Win: Thinkpad X220

A couple hours of research turned up the Thinkpad X220. It met all of my minimum requirements, and was a respectable laptop in its day. I snagged one in great shape for under $100 on ebay.

I picked up some 16GB SATA SSDs that fit great once I removed some of the drive casing. Installing Windows XP on this laptop went smoothly, and drivers were pretty easy to find from official sources.

The only challenge I encountered was around programmable chips on the main board. I know that the BIOS, WiFi module, and several other components could conceivably be reprogrammed, but I was unable to figure out how to dump them, or even get a checksum. All I could find was a mysterious CD-ROM bootable BIOS flasher. Theoretically, this should help me recover from a compromised BIOS, but reverse engineering it to get a checksum was one too many yaks to shave.

All the ThinkPad needed was some Windows XP era tech stickers, and it was ready to go!

Easter eggs

With the laptop usable, I had to make it worth exploring to my hacker friends. I registered 5 new email accounts, and hid them in various places about the laptop. Some are pretty easy to find, others are hidden behind riddles that I’m not sure anyone will solve.

Hopefully, someone will find some of them and drop me an email to say hi :)

The DEFCON plan

Here’s the full project kit. It includes:

  • An ethernet cable for plugging into suspicious ports
  • A big USB disk, for saving images of compromised boot disks
  • The IBM X220, dubbed Not a honeypot by my friends
  • An LTE wifi hotspot, for slightly less corrupt Internet access
  • A Windows XP install CD and USB CD-ROM, just in case all of my boot disks get trashed
  • 3 extra boot disks for easy swapping on the go
  • Kali Linux on a bootable USB stick, for imaging compromised boot disks
  • A drive duplicator, for restoring clean Windows XP installations
  • A USB SATA reader with write blocker, for capturing boot disk images

Once I make it to DEFCON, I plan to engage the conference community through a variety of channels:

  • If I can connect to a wifi SSID, I’ll connect and try to surf the web
  • If I see a USB thing, I’ll plug it in and see what happens
  • If there’s an ethernet port, I’ll connect to that too

If the laptop becomes unusable due to excessive pwnage, I’ll swap the boot drive and repeat. Whenever I get a break I’ll save an image of the compromised disk and restore it to a clean Windows XP installation.

Hopes and Fears

I hope that I’ll encounter a variety of interesting attacks that will be educational and enjoyable to investigate for weeks to come. I hope I make some new friends who each out to me on the planted email addresses. But, projects like this one rarely go as planned.

It’s possible that every time I boot up, my laptop immediately stops working, and every time it’s for the same boring reason. That’d be a bummer, but I can deal with that by actually patching some of the vulnerabilities or running a different operating system.

It’s also possible that nothing happens. If I make it a day without any visible intrusion, I’m going to step it up a notch. I’ll fire up a bunch of known insecure services like obsolete versions of IIS, MySql, and WordPress.

In any case, I’m going to find out soon. Wish me luck, and check back later for an update on the aftermath.


Personal Information Spring Cleaning: Connected Apps & Sessions

This is the second entry in a series about Spring Cleaning for your Internet-connected 21st century lifestyle. Confused? Check out the first entry about authentication and passwords.

This entry is all about user connected apps and sessions.

Connected apps / OAuth tokens

You know when you log in somewhere with Facebook, or click on a button that gives a website access to your profile photo on Google? That’s called a connected app, or in techie jargon OAuth.

OAuth is a wonderful protocol. It gives you the power to selectively share your data between websites and mobile apps.
But that convenience comes with a wrinkle of complexity: it’s easy to forget about all the apps you’ve connected. To make matters worse, some of these app connections will keep working after you reset your password.

So, take some time and review all of your connected apps. Don’t recognize something? Disconnect / revoke / remove it.

Here’s a list of places you might have connected apps:

If you have more than one account at any of these providers, don’t forget to check each one.

Browser sessions & Devices

Some online services keep a memory of previously authenticated web browsers and mobile apps. This phenomena goes by many names including trusted computers, sessions, devices, and recognized browsers.

Trusted devices and browsers have easier access to your account. Sometimes they bypass multi-factor auth, sometimes they are completely trusted. In any case, they’re another gateway to you account that you should keep tidy.

  • Apple iCloud - Listed as ‘My Devices’ and ‘Sign Out Of All Browsers’
  • Apple Id - Listed as ‘My Devices’ and may list different devices than iCloud
  • Dropbox - Listed as ‘Session’ and ‘Devices’
  • Facebook - Listed as ‘Where you’re logged in’
  • GitHub - Listed under ‘Sessions’
  • Google - Listed as ‘Recently used devices’
  • Twitter - Listed as ‘Your devices’

App Passwords

App passwords are a weird consequence of multi-factor authentication. They are passwords that allow you to sign in from applications that do not support multi-factor auth, even if your account it set to require it for all access. This may seem silly, but you might have used it for an old desktop email client. They typically give full, unrestricted access to your account, so it’s important to keep them tidy too.

  • Apple Id - Listed under ‘App-specific passwords’
  • Google
  • GitHub - Listed under ‘Personal Access Tokens’

SSH Keys

While you’re in there, go ahead and clean out any old / unused SSH keys too. These are not as common, but if they get out bad things can happen. You may have accidentally leaked one when you sold an old laptop on Craig’s List.

As a general practice, never reuse SSH keys. Generate a new one for each device / service connection.

  • GitHub
  • Any servers you used SSH to log into

Conclusion

There are more ways into your accounts than just your password. Keep them tidy too!

Did I miss any services that you use? If so, please tell me on Twitter.

In the next blog entry, I’ll move away from authentication (as exciting as it is), and into a more meaty part of the series: almost forgotten social media posts and other uploaded data. I promise an exciting journey down memory lane.