Automated Instagram

When I first started using Instagram, I decided to code a little project to help improve my account’s visibility. The idea was to have a program that automated what I’d otherwise be doing manually – in this case, to follow people who were interacting with other, more popular gals on the platform, give them some amount of grace time to follow me back, unfollow the ones who didn’t, then try a new batch. For something so dumb, it was pretty sophisticated:

Information (for example, my current set of followers) was persisted to a database so it wouldn’t have to be re-fetched on every run.
Complex operations like finding good follow candidates were pushed entirely into the database using tested, elaborate SQL queries. This handed complex optimization off to the planner, and vastly reduced the amount of data that needed to move between program and database.
Persisted data had a TTL before it was considered stale and automatically refetched.
Following campaigns were tracked so that the same user was never included twice. Batch size was calculated according to current follower count to keep a reasonable ratio.
Automatic exponential back off (and global coordination across Goroutines) when encountering rate limiting.

At the time, I thought writing it was an excellent idea. Now, I realize it was a stupid one. Either way, it didn’t matter as it ran successfully about one time before falling over (more on this below). But despite that, it was a fun project.

Defective by design

It turns out that Instagram’s API is a pain to work with, by design. They have an official API which lets you do basically nothing, so you end up falling back to something like goinsta for even the most menial operations like fetching your own follower list. Libraries like this one reverse engineer Instagram’s private GraphQL API and repackage it in a usable form. They function by simulating a normal user session – logging in with username and password, getting a usable session/cookie, then making HTTP requests just like the app would. They even send specific benign-looking user agents to make it all look more legitimate.

And despite all that community effort, it’s still not easy. Here’s a few problems you can expect to encounter on a daily basis:

Instagram has some of the least generous rate limits imaginable, and you’ll run into them after just a handful of requests. I made my program wait randomly up to a few seconds between every request to help avoid them as much as possible.
That’s compounded by the fact that getting a list of any non-trivial size involves paginating, and a lot of it. Instagram sends back pages with seemingly random numbers of items on them – sometimes 2, sometimes 50, but in small enough batches that pagination 1000s of items takes a long time.
There are regularly “holes” in data that make operations non-deterministic. Paginating my entire follower list twice wouldn’t always produce the same result. And not only because I might have gained or lost followers – some users would be listed, then not listed, then listed again.

That sort of odd behavior makes me want to see what their backend looks like. I imagine it’s explainable by an eventually consistent HA data store and caching, but it’d still be interesting to see under the covers. The quirks also exist when you’re interacting with it on the web or in the app (try hitting the + a few times to load more comments on a post with a lot of them and you’ll notice how inconsistent it is), but because photos and comments on photos aren’t a mission-critical domain, they can get away with it.

A flash in the pan

But those were all easy to deal with compared to the final nail that nixed the project. I’d gotten it to the point where it could generate follower and following data sets, come up with likely leads for campaigns, and just finished implementing the part where it actually followed and unfollowed. I ran it about one time before Instagram started sending back something along the lines of, That operation is not permitted, if you think you're seeing this message in error, contact us. I was soft banned, and probably justifiably so.

I toyed with a human-interactive mode where the program would tell me who to follow and unfollow instead of doing it automatically, but luckily came to my senses in time, and just stopped working on it instead. In the end, it was a waste of time, but not a complete waste. Just the act of building a new, novel piece of software was good for my mind and skill set. I came up with some nice patterns around working with and testing databases and transactions in Go, which I’ll reuse in future projects. For example: using a combination of subtests, test transactions, and fixture helper functions to produce short, succinct tests ¹:

func TestUsersFollowRandomly(t *testing.T) {
	t.Run("SelectGoodCandidate", func(t *testing.T) {
		igtesting.TestTransaction(func(tx *pg.Tx) {
			dbUser := upsertInstagramUserDetailed(t, tx, &igapi.User{
				NumFollowers: 200,
				NumFollowing: 200,
			})

			dbUsers, err := query.UsersFollowRandomly(tx,
				[]igapi.InstagramUserID{dbUser.InstagramID}, maxFollows)
			assert.NoError(t, err)
			assert.Equal(t, []int64{dbUser.ID}, userIDs(dbUsers))
		})
	})
		
    ...
})

Nowadays, I just use Instagram like everyone else does, and it’s fine(ish) (as good as Instagram ever is, which is okay).

¹ Go is a very verbose language, and it’s a major detriment in some places like testing. Achieving succinctness without hundreds of if err != nil { ... } littered throughout the code takes effort.

September 15, 2020 (5 years ago) by Frey·ja