StoryDevs Compact Queries
When performing a search on StoryDevs a query is attached to the URL. Usually, queries are a series of key-value pairs. If we were searching for an online conference the query portion of the URL might look something like this:
storydevs.com/event?category=conference&setting=online
As StoryDevs searches offer a lot of options I felt this would quickly lead to long, obnoxious URLs. If it can be made shorter it's less intrusive when pasted into a group chat. While I'm sure the query probably factors into SEO I decided it'd be a fun challenge to try to make the most dense query I could. Taking the above example, I managed to get it down to:
storydevs.com/event?s=A4B0
Compression can be very simple if you know in advance what sort of information you're going to be sent. If I know that I'll only ever be sent the entire works of Shakespeare or Homer's Illiad I don't actually need to send those. Instead, we keep a copy of each on the server and send 0 for the bard and 1 the Greek. We could even send 2 when it's both. This compresses thousands of words into a single digit, provided you know what it's mapped to.
For StoryDevs search there's more than two options. And each option can be assigned one or more values. This makes it unfeasiable to use the trick where we not only assign fields a single character but also their combinations too — the possible combinations grows very quickly.
So let's get into how we transform this:
storydevs.com/event?category=workshop&category=con&setting=online&timezone=Australia%2FHobart&overlap=overlap&start=Present&day_start_0=mon&day_finish_0=fri&time_start_0=1700&time_end_0=2100&day_start_1=sat&day_finish_1=sun&time_start_1=900&time_end_1=1700
Into this:
storydevs.com/event?s=A3.4B0C252D0E0G0I0J4L5M9G1I5J6L66M4G2
First we assign all fields an alphabetical ID: A, B, C, ... This is simple to do because we know in advance what the names are and they don't change. Next, we follow the field ID with numerals representing the field's value. This is much more tricky as it's not always obvious how to translate a value into a number. We'll get to this later.
If you look at the query now you'll notice the letters generally ascend and the numbers tend to be low. And by separating names into letters and values into numbers we mostly remove the need to add delimiters, further shrinking the query. No semicolon or plus sign need separate things because we know a letter following a number always signals the end of the previous field and the beginning of another. This little regular expression separates them on the server:
[A-Za-z]+|-?\d+(\.-?\d+)*
Fields that allow multiple values, such as checkboxes, separate their values with a period:
storydevs.com/event?s=A3.4
Groupings
It's a little more difficult when there's groupings with duplicate fields. In StoryDevs' event search the client can add multiple ranges of days of the week and times of day therein. We can't number our fields because they'd be mistaken as values and we can't use an additional letter because it'd be interpreted as part of the name. Oops.
Ultimately I use delimiters for this. Here's a shorter version of the URL above with only grouped fields:
storydevs.com/event?s=G0I0J4L5M9G1I5J6L66M4G2
The group itself is assigned an ID, "G" (coincidence — it could just as easily be A, H, K, W, etc). Groups can have three values. G0 means we're opening a group and everything that follows is inside it. G2 (yes, I skipped 1) means we're closing a group. G1 means wearing closing the previous group and opening another, which saves us writing G2G0. Nested for clarity this portion of the query looks like:
G0
I0
J4
L5
M9
G1
I5
J6
L66
M4
G2
Dates
I mentioned earlier that it's difficult to map some values to numbers. For checkboxes, radio buttons, dropdown lists, simple ranges and the like we just loop through the items and use the index its at. If it's the first item we use 0, the fourth item we use 3, etc. The aforementioned tendency for numbers to be low comes from the fact that most fields average something like 3 – 4 options.
This gets tricky with dates. A unix timestamp works fine but it's very long — sort of the opposite of what we want. Luckily a date only needs day-level precision so we can divide it by 24 * 60 * 60 and lose no information meaning the present day's timestamp of 1632787200 becomes 18898.
Unfortunately date fields allow a special value, "Present". It's the default value for event searches because we assume this is the common use case: most people want to know about upcoming events, not ones that have already transpired. We could use the min or max int values to signal the present since they're unlikely to be used in practice. But they're too long so I went with 0. This means it's impossible to search for events specifically on the day of January 1st, 1970 — an acceptable tradeoff for this use-case.
Time Of Day
Time values are similarly painful. I could have gone with military time and a consistent 4 digits. Instead I did something incredibly stupid and unnecessary. Firstly, I realised that it's very unlikely for someone to search for, say, 4:34pm or 6:07am. We tend to make appointments on boundaries of 15 minutes. There are four such intervals in an hour and 24 hours in a day. 24 * 4 = 96. That is, we can represent the most common intervals in two digits or less. The values in between can be mapped to higher numbers.
function mapCompactTime(hh, mm, isAm) {
if (!isAm && hh >= 1 && hh <= 10 && mm === 0) {
if (hh === 10) {
return 0;
}
return hh;
}
/* ... */
}
The first thing we do is check if the hours are between 1pm and 10pm inclusive and there's no minutes component. If so, return the numbers as-is, besides 10 which we convert to 0. I made the arbitrary assumption that most event searches will be in the afternoon to evening ranges and thus prioritised them to have the most compact value of a single digit. Will most searches be in this range? I don't know. But it seemed reasonable. Continuing in the mapping function we have this:
/* ... */
let next = 10;
if (mm % 15 === 0) {
for (let i = 1; i < 13; i++) {
for (let j = 0; j < 4; j++) {
thisMM = j * 15;
if (isAm && hh === i && thisMM === mm) {
return next;
}
next++;
if (i > 10 || j > 0) {
if (!isAm && hh === i && thisMM === mm) {
return next;
}
next++;
}
}
}
}
/* ... */
It's not as bad as it looks. We first check if the number of minutes remaining after a modulo operation is zero. If so, we've got a value of 0, 15, 30, or 45 and we want to prioritise these for 2-digit value assignment. Then we're iterating through the possible hours and minute combinations the time widget might hold. The "next" variable we keep incrementing is the compact number we'll ultimately map the time to. It starts at 10 because we've already assigned 1pm – 10pm.
/* ... */
next = 96;
let zeroes = 0;
let seen100 = false;
let seen10 = false;
for (let i = 1; i < 13; i++) {
for (let j = 1; j < 60; j++) {
if (j % 15 === 0) {
continue;
}
for (const thisAm of [true, false]) {
if (!seen100 && next === 100) {
next = -9;
zeroes = 2;
seen100 = true;
}
if (!seen10 && next === 10) {
next = -99;
zeroes = 3;
seen10 = true;
}
if (i === hh && j === mm && thisAm === isAm) {
return addLeadingZeroes(next, zeroes);
}
next++;
}
}
}
/* ... */
This part is as bad as it looks. We start next at 96 because if we got here the other conditions failed and haven't incremented it up to this point. Basically we're exhausting the possible 2- and 3-digit combinations before finally using 4 digits.
Because the compact value for the date widget is a truncated unix timestamp, which can be negative, the query can handle numbers < 0. This means the time mapping can use, -99 through -1. Further, we can use leading zeroes to extract even more 2- and 3-digit numbers: 01, 02, 03, ... 097, 098, 099
Through this convoluted logic we're able to assign most of the remaining times a value mapping of 3 digits or less. Only the minutes between 11am/pm and 1am/pm that are not divisible by 15 are assigned 4 digit numbers.
Here's groupings of just the time fields from the URL. They represent the ranges of 5pm to 9pm, 9am to 4pm. First, this is what they'd be in military time:
storydevs.com/event?s=G0L0500M2100G1L0900M1400G2
And here's what they are in the compact format. Even with the constraint of the group fields around them the query is nearly half the length:
storydevs.com/event?s=G0L5M9G1L66M4G2
Conclusion
There's a couple of remaining loose ends that I'll address in closing.
First, user-defined input such as keyword searches. These use a separate query key, kw=[search_term]:
storydevs.com/event?kw=fundraiser
Second, what if the ordering of fields or values changes? Won't that change what existing URLs link to? Yes. I first noticed this problem with the timezone field. Timezones retain a consistent name but variable offset throughout the year if they observe daylight saving. The issue was that I sorted timezones by their offset! Now there's an optional property in the UI generation process that can override the numeric value a dropdown item corresponds to. But at some point this needs to be done for all fields and their values such that additions, removals, and reorderings don't make URLs link to things the user didn't intend.
So that's how the compact queries work. I've been meaning to write about this for a while — I hope it was interesting.