jake bowkett

< Back To Blog

StoryDevs Compact Queries

Sept. 28th, 2021

When per­form­ing a search on StoryDevs a query is at­tached to the URL. Usu­al­ly, queries are a se­ries of key-value pairs. If we were search­ing for an on­line con­fer­ence the query por­tion of the URL might look some­thing like this:

storydevs.com/event?category=conference&setting=online

As Story­Devs search­es offer a lot of op­tions I felt this would quick­ly lead to long, ob­nox­ious URLs. If it can be made short­er it's less in­tru­sive when past­ed into a group chat. While I'm sure the query prob­a­bly fac­tors into SEO I de­cid­ed it'd be a fun chal­lenge to try to make the most dense query I could. Tak­ing the above ex­am­ple, I man­aged to get it down to:

storydevs.com/event?s=A4B0

Com­pres­sion can be very sim­ple if you know in ad­vance what sort of in­for­ma­tion you're going to be sent. If I know that I'll only ever be sent the en­tire works of Shake­speare or Home­r's Il­li­ad I don't ac­tu­al­ly need to send those. In­stead, we keep a copy of each on the serv­er and send 0 for the bard and 1 the Greek. We could even send 2 when it's both. This com­press­es thou­sands of words into a sin­gle digit, pro­vid­ed you know what it's mapped to.

For Story­Devs search there's more than two op­tions. And each op­tion can be as­signed one or more val­ues. This makes it un­feasi­able to use the trick where we not only as­sign fields a sin­gle char­ac­ter but also their com­bi­na­tions too — the pos­si­ble com­bi­na­tions grows very quick­ly.

So let's get into how we trans­form this:

storydevs.com/event?category=workshop&category=con&setting=online&timezone=Australia%2FHobart&overlap=overlap&start=Present&day_start_0=mon&day_finish_0=fri&time_start_0=1700&time_end_0=2100&day_start_1=sat&day_finish_1=sun&time_start_1=900&time_end_1=1700

Into this:

storydevs.com/event?s=A3.4B0C252D0E0G0I0J4L5M9G1I5J6L66M4G2

First we as­sign all fields an al­pha­bet­i­cal ID: A, B, C, ... This is sim­ple to do be­cause we know in ad­vance what the names are and they don't change. Next, we fol­low the field ID with nu­mer­als rep­re­sent­ing the field­'s value. This is much more tricky as it's not al­ways ob­vi­ous how to trans­late a value into a num­ber. We'll get to this later.

If you look at the query now you'll no­tice the let­ters gen­er­al­ly as­cend and the num­bers tend to be low. And by sep­a­rat­ing names into let­ters and val­ues into num­bers we most­ly re­move the need to add de­lim­iters, fur­ther shrink­ing the query. No semi­colon or plus sign need sep­a­rate things be­cause we know a let­ter fol­low­ing a num­ber al­ways sig­nals the end of the pre­vi­ous field and the be­gin­ning of an­oth­er. This lit­tle reg­u­lar ex­pres­sion sep­a­rates them on the serv­er:

[A-Za-z]+|-?\d+(\.-?\d+)*

Fields that allow mul­ti­ple val­ues, such as check­box­es, sep­a­rate their val­ues with a pe­ri­od:

storydevs.com/event?s=A3.4

Groupings

It's a lit­tle more dif­fi­cult when there's group­ings with du­pli­cate fields. In Sto­ry­De­vs' event search the client can add mul­ti­ple ranges of days of the week and times of day there­in. We can't num­ber our fields be­cause they'd be mis­tak­en as val­ues and we can't use an ad­di­tion­al let­ter be­cause it'd be in­ter­pret­ed as part of the name. Oops.

Ul­ti­mate­ly I use de­lim­iters for this. Here's a short­er ver­sion of the URL above with only grouped fields:

storydevs.com/event?s=G0I0J4L5M9G1I5J6L66M4G2

The group it­self is as­signed an ID, "G" (co­in­ci­dence — it could just as eas­i­ly be A, H, K, W, etc). Groups can have three val­ues. G0 means we're open­ing a group and every­thing that fol­lows is in­side it. G2 (yes, I skipped 1) means we're clos­ing a group. G1 means wear­ing clos­ing the pre­vi­ous group and open­ing an­oth­er, which saves us writ­ing G2G0. Nest­ed for clar­i­ty this por­tion of the query looks like:

G0
  I0
  J4
  L5
  M9
G1
  I5
  J6
  L66
  M4
G2

Dates

I men­tioned ear­li­er that it's dif­fi­cult to map some val­ues to num­bers. For check­box­es, radio but­tons, drop­down lists, sim­ple ranges and the like we just loop through the items and use the index its at. If it's the first item we use 0, the fourth item we use 3, etc. The afore­men­tioned ten­den­cy for num­bers to be low comes from the fact that most fields av­er­age some­thing like 3 – 4 op­tions.

This gets tricky with dates. A unix time­stamp works fine but it's very long — sort of the op­po­site of what we want. Luck­i­ly a date only needs day-level pre­ci­sion so we can di­vide it by 24 * 60 * 60 and lose no in­for­ma­tion mean­ing the pre­sent day's time­stamp of 1632787200 be­comes 18898.

Un­for­tu­nate­ly date fields allow a spe­cial value, "P­re­sen­t". It's the de­fault value for event search­es be­cause we as­sume this is the com­mon use case: most peo­ple want to know about up­com­ing events, not ones that have al­ready tran­spired. We could use the min or max int val­ues to sig­nal the pre­sent since they're un­like­ly to be used in prac­tice. But they're too long so I went with 0. This means it's im­pos­si­ble to search for events specif­i­cal­ly on the day of Jan­u­ary 1st, 1970 — an ac­cept­able trade­off for this use-case.

Time Of Day

Time val­ues are sim­i­lar­ly painful. I could have gone with mil­i­tary time and a con­sis­tent 4 dig­its. In­stead I did some­thing in­cred­i­bly stu­pid and un­nec­es­sary. First­ly, I re­alised that it's very un­like­ly for some­one to search for, say, 4:34pm or 6:07am. We tend to make ap­point­ments on bound­aries of 15 min­utes. There are four such in­ter­vals in an hour and 24 hours in a day. 24 * 4 = 96. That is, we can rep­re­sent the most com­mon in­ter­vals in two dig­its or less. The val­ues in be­tween can be mapped to high­er num­bers.

function mapCompactTime(hh, mm, isAm) {

    if (!isAm && hh >= 1 && hh <= 10 && mm === 0) {
        if (hh === 10) {
            return 0;
        }
        return hh;
    }

    /* ... */
}

The first thing we do is check if the hours are be­tween 1pm and 10pm in­clu­sive and there's no min­utes com­po­nent. If so, re­turn the num­bers as-is, be­sides 10 which we con­vert to 0. I made the ar­bi­trary as­sump­tion that most event search­es will be in the af­ter­noon to evening ranges and thus pri­ori­tised them to have the most com­pact value of a sin­gle digit. Will most search­es be in this range? I don't know. But it seemed rea­son­able. Con­tin­u­ing in the map­ping func­tion we have this:

/* ... */

let next = 10;
if (mm % 15 === 0) {
    for (let i = 1; i < 13; i++) {
        for (let j = 0; j < 4; j++) {
            thisMM = j * 15;
            if (isAm && hh === i && thisMM === mm) {
                return next;
            }
            next++;
            if (i > 10 || j > 0) {
                if (!isAm && hh === i && thisMM === mm) {
                    return next;
                }
                next++;
            }
        }
    }
}

/* ... */

It's not as bad as it looks. We first check if the num­ber of min­utes re­main­ing after a mod­u­lo op­er­a­tion is zero. If so, we've got a value of 0, 15, 30, or 45 and we want to pri­ori­tise these for 2-digit value as­sign­ment. Then we're it­er­at­ing through the pos­si­ble hours and minute com­bi­na­tions the time wid­get might hold. The "nex­t" vari­able we keep in­cre­ment­ing is the com­pact num­ber we'll ul­ti­mate­ly map the time to. It starts at 10 be­cause we've al­ready as­signed 1pm – 10pm.

/* ... */

next = 96;
let zeroes = 0;
let seen100 = false;
let seen10 = false;
for (let i = 1; i < 13; i++) {
    for (let j = 1; j < 60; j++) {
        if (j % 15 === 0) {
            continue;
        }
        for (const thisAm of [true, false]) {
            if (!seen100 && next === 100) {
                next = -9;
                zeroes = 2;
                seen100 = true;
            }
            if (!seen10 && next === 10) {
                next = -99;
                zeroes = 3;
                seen10 = true;
            }
            if (i === hh && j === mm && thisAm === isAm) {
                return addLeadingZeroes(next, zeroes);
            }
            next++;
        }
    }
}

/* ... */

This part is as bad as it looks. We start next at 96 be­cause if we got here the other con­di­tions failed and haven't in­cre­ment­ed it up to this point. Ba­si­cal­ly we're ex­haust­ing the pos­si­ble 2- and 3-digit com­bi­na­tions be­fore fi­nal­ly using 4 dig­its.

Be­cause the com­pact value for the date wid­get is a trun­cat­ed unix time­stamp, which can be neg­a­tive, the query can han­dle num­bers < 0. This means the time map­ping can use, -99 through -1. Fur­ther, we can use lead­ing ze­roes to ex­tract even more 2- and 3-digit num­bers: 01, 02, 03, ... 097, 098, 099

Through this con­vo­lut­ed logic we're able to as­sign most of the re­main­ing times a value map­ping of 3 dig­its or less. Only the min­utes be­tween 11am/pm and 1am/pm that are not di­vis­i­ble by 15 are as­signed 4 digit num­bers.

Here's group­ings of just the time fields from the URL. They rep­re­sent the ranges of 5pm to 9pm, 9am to 4pm. First, this is what they'd be in mil­i­tary time:

storydevs.com/event?s=G0L0500M2100G1L0900M1400G2

And here's what they are in the com­pact for­mat. Even with the con­straint of the group fields around them the query is near­ly half the length:

storydevs.com/event?s=G0L5M9G1L66M4G2

Conclusion

There's a cou­ple of re­main­ing loose ends that I'll ad­dress in clos­ing.

First, user-de­fined input such as key­word search­es. These use a sep­a­rate query key, kw=[search_ter­m]:

storydevs.com/event?kw=fundraiser

Sec­ond, what if the or­der­ing of fields or val­ues changes? Won't that change what ex­ist­ing URLs link to? Yes. I first no­ticed this prob­lem with the time­zone field. Time­zones re­tain a con­sis­tent name but vari­able off­set through­out the year if they ob­serve day­light sav­ing. The issue was that I sort­ed time­zones by their off­set! Now there's an op­tion­al prop­er­ty in the UI gen­er­a­tion process that can over­ride the nu­mer­ic value a drop­down item cor­re­sponds to. But at some point this needs to be done for all fields and their val­ues such that ad­di­tions, re­movals, and re­order­ings don't make URLs link to things the user did­n't in­tend.

So that's how the com­pact queries work. I've been mean­ing to write about this for a while — I hope it was in­ter­est­ing.