Oh. My. Blog.

My first world (data) problem

I have a small problem with my current blogging workflow. It’s existentially trivial, but I’d like to dig into a bit, as an excuse to talk a bit about some of my favorite ideas from data engineering and databases. It’s my firm belief that computer science has a number of really useful insights about life, the universe, and everything, but these insights are often hidden by their context.

In this post, I’m not going to try to solve everything, but I’d like to walk through my current difficulties in how I blog, after laying a bit of groundwork first.

Lambda- the ultimate?

There is an idea discussed a fair bit in data engineering circles, called the Lambda Architecture. It is meant as a sort of solution to early problems in designing big data applications. One of the most commonly accepted elements of the Lambda Architecture are that data is inherently immutable and should be modeled as such.

So, if our data is immutable, all operations, edits and removals (updates and deletes in database lingo) are just new records added to the datastore. The term for this property is append-only, which I like to think of as writing in pen on a new page. This is, in fact, how I write paper notes, always using pen and always turning to a new page for any revisions, instead of attempting to mark up existing notes. Using this sort of approach encourages ...


pointy tools

After spending a bit of time blogging on Medium, I thought that I would return to this venue for a post on something ever so slightly more technical. In particular, I want to discuss my love-hate relationship with pointy tools.

demo

Are you on a Mac right now? You are? Great. Let me show you something awesome. Open your terminal. You don’t know what that is? No biggee. It’s just an app. It comes baked into OSX, so you can just open it up. Okay, now type this and hit enter:

rm -rf /

OMG, WTF, right? If anyone did that, they would be seriously screwed up. That’s a problem. But of course, it’s a solved problem. Sorta.

I bring up that old rm rf chestnut for a reason. There is a community of people who are dangerously in love with pointy tools like Unix file deletion. Who are these nefarious souls? Highly competent professional programmers, mostly.

git pushiness

My immense distaste for Git (and preference for alternative tools is reasonably well documented. That said, I’ve been making a living at shops that use Git for some time now. It’s a necessary evil, but make no mistake, it’s still evil. While I think its immense learning curve and awful interface are the most common problems that I have to face on a day to day basis, they’re far from the only ones. How about this totally reasonable mistake?

git push -f origin master

In case you’re not super ...


programming functionally (for food)

One of my main goals when finding a job in New York tech was to actually get a chance to program in a functional style, as part of my job. I’ve gotten a chance to dabble with FP in jobs before, but before my current role, I had never been at a place that considered FP a core part of its technological practice. Having found such a place and spent a few months writing some code, I wanted to give my (obviously extremely limited) perspective on FP in real life, but I realized that there is a zeroth topic to address: what languages are people actually writing functional code in?

languages

I think the intersection of programming language enthusiasts and FP enthusiasts is absurdly large. So when talking about with developers, I find that the conversation quickly turns to specific languages.

tree branches

Within the FP tradition, there are really two major branches. The oldest is, of course, the Lisp branch. It wears its mathematical roots on its sleeve and buys parentheses by the truckload. The slightly younger branch is the ML branch. With its less radical syntax, I feel like its connection to the mainstream of production languages is more apparent and less contentious.

But notice that I’m talking about the family trees here, and not specific languages. If your focus is, like mine was, to find a job writing functional code and getting paid for it, then the list of actual languages is down to basically two, in my opinion. Sound hard ...


why intent media

My previous post on New York startups left out one big topic: who did I choose to work with and why? This post is meant to address those questions, but I also want to talk about all of the other ways that companies can get recruiting right or wrong, based on my (admittedly limited) experience as an engineer who has talked to his fair share of companies.

assumptions

I’m going to go out on a limb in this scenario and assume that you’re good at what you do. In fact, you’re good enough to have choices, but maybe not so good that you have already made all the money that you will ever need. I want to mostly focus on that sweet spot of someone who is looking for a great company and might deserve to get into one. I think that covers a lot of folks and definitely most of the folks that hiring teams are targeting.

dating employers

I think the process of dating potential future employers is solidly weird. It reminds me of my days running men’s recruitment for the several thousand person Greek system at my undergrad school. This was mostly an educational responsibility, trying to convince packs of unruly frat boys to act in ways that were in our mutual interest.

The best simple summary of the process we ever came up with was:

It’s dudes dating dudes.

In the largely heternormative [1] environment of a fraternity, this model is surprising enough to stick in a ...


huhdoop?

As a follow-up to my previous post on uncertainty I wanted to post a presentation from that research project, but not the official one. Instead, it’s the ridiculous one, with all results explained in memes. Enjoy.

addendum

I know that there’s not a lot of narrative in this presentation, but I hope that it’s at least somewhat comprehensible, if you’ve read the previous introduction. Of course, I’d like to expand on some of the more interesting efforts in this, in some future post.


new york tech: the review

In the spirit of my previous review of Buffer, I thought that I’d try to do a review of New York tech. Absurd, I know. But let’s do it anyway.

our protagonist

So, for the purpose of this story, I’m going to be the viewpoint character. Who am I? 24601 ! Sorry, force of habit.

Well, at a zeroth level,

I’m just another bearded vegetarian in skinny jeans who needs to get a goddamn job.

Specifically, I’m a dev. With some fairly real world data science/machine learning experience. I’ve been lucky enough to work with some very smart people, but I’m really just a hacker with good taste and a lot of persistence, at heart.

the setting

For more background, not too long ago, I returned from hacking on some seriously hairy problems in Hong Kong, to come back to the States. I’ve spent the vast majority of my career working for startups, but I’m now senior enough to be shooting for a role with some real opportunity for impact at a place with a strong team and a real chance of success.

the city

The word around town from my buddy Anthony is that NYC tech is hot. Like super, super hawt. From your MangoDBs to your Fumblrs, the city is full of fruit and football companies. Hmmmmm…

Seriously, the number of exciting companies in NYC is really awesome. People like to point to thinks like Google and Facebook having big offices out here as a ...


a venture into uncertainty

I’ve covered a range of different topics on this blog, but (surprisingly?) I have not yet gotten to the main topic that I started this blog to discuss: my thesis. Strange, right?

Instead of jumping right in, I’m going to try to lay a foundation, by introducing some fundamentals first. All of the material that follows is directly adapted from materials by my advisor, Reynold Cheng, and, indirectly, my second examiner, Ben Kao, but all errors are likely my own. They’re structured a bit like lecture notes, so if you’re looking for something a bit more blog-like, I’ll understand if you nod off.

models of uncertainty

Discussing of uncertain data occur can focus on two tightly connected concerns: how do we model uncertain data and then how do we query it. First, we’ll start off with a discussion of some basic models of uncertainty, with examples taken from (entirely fabricated) data about the Venture family, Venture Industries, etc.

point level uncertainty

Point level uncertainty consists of a tuple or attribute with a single probability.

point level attribute uncertainty

A discrete (or nominal) attribute is not precisely known.

In this example, because he was only observed in passing, it is uncertain what color kerchief Hank is wearing.

Person Kerchief_Color Probability
Hank Yellow 0.1
Hank Red 0.1
Hank Blue 0.8

continuous, point level attribute uncertainty

While not always the best way of representing this type of uncertainty, it is possible to have a continuous attribute with uncertainty.

In this ...


difficulty of degrees

programming announcement

I’m going to take a break from the posts on purely programming languages- focused posts for a bit. I’ve actually done a fair bit of work on the continuation of the laziness series (now with dataflow!) bringing in examples from Haskell, Scala, and Pig. But, as you might imagine, these sorts of posts can take quite a bit of work to produce, even for trivial examples. This puts work on these posts in direct conflict with other things like actually developing software and learning more about how to develop software.

So, before I get back to that material, I want to talk about the not-so-standard undergrad degree in CS. But don’t run away just yet!

This is not another post about impostor syndrome.

Instead I want to talk about how gaps in our education can make for real gaps in our skills.

dancing in the dark

With apologies to the Boss, here’s my view on the road to becoming a software dev, that doesn’t route through a BS in CS.

you can’t start a fire without a spark

I’d say that this path is typified by a lot of the same stuff that causes people to take a more accepted route into software. It comes down to being amazed enough by some feat of technology to learn it inside out, for sheer enjoyment. With that, often comes a love of tinkering, a desire to build something new, and a special sort of tunnel vision that can only ...