As of January 2014, it’s not worth bothering with the “s3fs” software for mounting Amazon S3 buckets on your local filesystem.
The idea of s3fs is simple and great: use FUSE (Filesystem in Userspace) to “mount” the S3 bucket the same way you’d mount, say, an NFS drive or a partition of a disk. Manipulate the files, let s3fs sync it in the background. Sure, you lose some reliability, but we’ve had NFS and SMB and all kinds of somewhat-latent-over-an-unreliable-link-but-mostly-with-filesystem-semantics software for decades now, right?
Well, forget it. s3fs as of January 2014, used on Mac OS X and against an existing set of buckets, is so utterly unreliable as to be useless.
First, s3fs cannot “see” existing folders. This is because folders are a bit of hack on S3 and weren’t done in a standardized, documented way when s3fs was first written. However, since then, at least two other ways of creating folders on S3 have gained currency: an older, deprecated way with Firefox plugin S3Fox, and a newer defacto standard way with Amazon’s own management dashboard/browser for S3. Whatever the historical reason, you can’t see the existing folders.
Second, although from mailing list posts, theoretically you can *create* an s3fs folder with the same name as your existing folder, and then its contents will magically become visible, empirically something rather different happens. A mkdir on the s3fs mount leads to the creation of a mangled *regular* file on the S3 dashboard. Now you have two “folders,” each of which is unusable as a folder on the other (S3 dashboard, or s3fs) system. Argh.
Finally, you might say, ok, fine, this will just make me use flat-level, non-folder-nested choices about my S3 architecture. (Leave aside for the moment that the very reason you want to use S3 is probably exactly so that you can have lots and lots and lots and lots (like 10^8+) files in a way that will cripple any reasonable filesystem tools that see them all in one “directory.”) However, even that doesn’t work reliably, as s3fs demonstrated today when it went into “write-only mode” such that I could create files locally that would show up on S3 but that subsequently would disappear from my local filesystem. WTF?!?
The unfortunate answer is: S3 is not a filesystem, and it was created by people who are smarter than you, and who have very craftily calculated that if you are forced to weave in the S3 API and its limitations into your application code, you will have a damn hard time ripping it out of your infrastructure, and so they are going to have you do just that. They do not want it to be used as a filesystem, and so guess what: you are *not* going to use it as a filesystem. Not gonna happen.
Say what you will, but our hometown heroes here in Seattle are no dummies. Embrace, extend, extinguish, baby. Not just for OS companies anymore…
(Yes, I know that s3fs is not an Amazon project. But it appears to be the community’s best attempt to put filesystem semantics around S3, and that attempt has been rejected by AWS’s immune system.)
This is untrue. It sounds to me as though you are having a problem with Unix file/folder ownership and permissions on your S3FS mountpoint and folders recursive to that mountpoint.
Rather than claiming S3FS is garbage, why not correct your mountpoint and folder permissions so it works?
S3FS has worked, and continues to work (April 2014) outstandingly well for communicating with S3 from our web servers to allow our web applications to store files directly to S3 buckets, and we deal with terabytes of data.
Robbie, I appreciate the comment. I have edited the title of the post to add “… on Mac OS X,” because that is the environment that was giving me trouble.
There are a host of issues, however, including both OS X specific as well as platform general issues:
– s3fs is no longer s3fs, but rather s3fs-fuse. The Google SERPs for this are sort of confusing, as is the web presence of the project. There’s the old homepage, the wiki page, and then github, all of which interlink, and use subtly different terminology. Not something that adds a lot of confidence when it takes 5-10 minutes to unambiguously sort out which is the canonical, current version and what is it called.
– Currently (20 May 2014) the latest release of s3fs-fuse (1.77? hard to say because it looks like it’s now a “just pull from github” kind of project) requires FUSE >= 2.8.4. However, OSXFuse (formerly Fuse4x, formerly MacFUSE) is at version 2.6.4. Fun!
– My platform-agnostic point about how AWS strategists are smarter than you/me still holds.
That said, I’m glad it works for you. It seems like a pretty Linux-native approach and I bet it can be made to work very well on Linux. If someone can figger me out how to get ‘er done on OS X (stably, repeatably, scriptably, you know, *right*) I’d be happy, but my confidence in that is just a weeeee bit shy of 0% right now.
Hmm, so here we are in June 2014. Have you found a workaround?
Well, it’s only been a month and a half!
I have used “Transit,” a Mac OS X proprietary application, with some success. It still fails pretty hard occasionally, especially when using an s3 bucket mounted at /Volumes/* from the command-line using standard unix tools (cp, or even rsync) over a sometimes-flaky wireless connection — it seems like more than 2 or three dropped packets throws it in to a tizzy.
Transit was, however, able reliably to get large (~ gigabyte) files transferred to s3 over that same flaky connection when using the Transit GUI. Whether this is a sampling anomaly or whether Transit’s GUI uses a different mechanism with fallback capability, I do not know.
Good luck!
Don’t understand the ferocity of this rant; s3-fuse has been working great for us.
Tim, thank you. Under what circumstances has it worked? OS X? Versions? Clean-room to start with s3fs or integrating with legacy? In “lab” or “data center” or “real world?”
For me, with OS X, and in circumstances that range from 0.00% to 0.50% packet loss, interop with legacy setup folders and schema in general, results are as described. (I took another college try at it in late May 2014, honest.)
So… It is not working on a platform where FUSE is more or less hacked on, and then you blame Amazon for issues that has nothing to do with their service?
This works great for most. The issue is in your end, either because of bad configuration, or compatibility issues with OSX.
This has nothing to do with any kind of “strategy”.
Dennis, you’re spot on with one point and off in the weeds with your other.
Yes, of course, the problem is on “my end” (rather than S3’s end) because of the combination of using OS X etc. and the desire to mount S3 as a mostly-filesystem-like device (including with legacy folders, multiple buckets, real-world TCP/IP drama, etc.). That’s why I wrote the post, to let those similarly situated to me know.
But you’re out to lunch on the Amazon “strategy” comment. I can’t help you if you really think that architectural, API, and delivery/pricing decisions of AWS aren’t “strategic.” Cross my heart, I promise it’s true.
Regardless, though, I’m not blaming Amazon. If anything, I’m lamenting my own shortsightedness in trying to make water run uphill. What I really wanted was some ultra-cheap, no-work version of a SMB mount or maybe a Dropbox folder; obviously, s3fs is not that, and one reason is that in turn S3 is not meant to be that.
Randall, if you are still looking for a POSIX filesystem backed by S3 for OS X (and don’t mind some always on strong encryption) I would be interested in your opinion on our ObjectiveFS.
Looks interesting. Does it work with the AWS S3 interface (or others like S3Fox) to download individual objects? (I rather doubt it could, given encryption etc., and that’s not meant to be a “gotcha” question. But if it does, that would be *really* interesting.)
Russel, you are right about the individual files/metadata not being visible in your S3 bucket due to encryption and also to allow partial updates of files. We also bundle updates to keep the number of S3 puts low even if you have lots of small file.
With our newest update you can use your own dedicated S3 bucket and we also added a 30-day free trial.
I would say that this solution does exactly what this article says is “impossible”
http://www.expandrive.com/docs/map-network-drive/connecting-to-amazon-s3
I haven’t tried it yet, but I will be exploring any S3 as a remote over the Internet filesystem solutions that exist. I would say that Randall Lucas has a lot of SysEng and SysAdmin background, enough to get by and look at Mac OS, Linux, FreeBSD, etc., internals.
But to say “They do not want it to be used as a filesystem, and so guess what: you are *not* going to use it as a filesystem. Not gonna happen.” is complete and utter conjecture. Do you know any engineers on the S3 team or Product Managers? How did you come to this conclusion?
Is it not possible that the AWS and S3 teams don’t care whether you use it as a filesystem or not? As long as the S3 service stays reliable, performant and durable, what you do on top of the S3 service at the OS and Application level is your business.
Obviously you would have to wrap the S3 API and Services with appropriate layers and manage Mac OS, Linux, etc. application code as a separate abstraction, but that is exactly what some third parties are doing.
If they didn’t want S3 to be used as a distributed filesystem, they would prohibit it explicitly in their S3 API SLAs.
Keep at it Randall, I think S3 as the backend to a mounted FUSE like network drive is just around the corner. I really just want to use it as a longer term archive and backup.
Google Drive suits me very well for day to day storage, multi machine sync, etc. When I’m ready to throw things into archive or go retrieve something rare every month or so, that’s when I want to mount something like ExpanDrive (w/ Eleven 9’s durability), do my business and then unmount it when done. I believe that day is not too far off, 🙂
I do know Amazon product managers and engineers, though I cannot claim to have specific insider knowledge of S3 management and strategy.
However, my “complete and utter conjecture” about how very big and sucessful technology platform companies in Seattle work is not unfounded. Come back after you read the Findings of Fact in the Microsoft antitrust case and the Halloween memos.
There are people whose entire career is based on figuring out the specific strategic implications of technology product features and planning how best to push their or their clients’ advantage based on those implications. I guarantee you many of them work at Amazon. This is not an accusation or anything bad; it’s how technology companies work at two or three levels removed from where developers tend to think daily.
Nice lead on the Expandrive, I have not looked at it.
I use ExpanDrive every day for accessing S3 as a file system and it works great and has for years.
Steve
Steve, thanks for the comment and the pointer to ExpanDrive. It looks a lot like Transmit from Panic, which I’ve been using happily for almost a year now (but happily, ExpanDrive looks cross platform).
I’m having some success with yas3fs on OSX El Capitan.
See https://github.com/danilop/yas3fs.git
I also had to install osxfuse via “brew install Caskroom/cask/osxfuse”
And yas3fs is written in python, but I already had that installed.