Application Cockpit Model Outline

By Przemysław Walat

Przemysław is a Expert Software Engineer based in our Szczecin office

< Back to list
Architecture

I’m a model, now you know what I mean

    Welcome, fellow concerned developers. In our previous installment, we have promised to delve deeper into the model that Application Cockpit is using underneath to see how it synergizes with the high-level architecture in order to make the whole product simple yet powerful and extensible. Be advised, that while we have always aimed for simplicity, our initial take on the model was a little bit more complicated, but after more-than-a-few heated debacles, we have stripped it down even more and so far we are pretty happy with it, despite more than a few "I wish we had the initial design in place" moments of weakness.

    In honor of the early version, we are also going to take a short trip down the memory lane to see the trade-offs and compromises we have decided to make along the way and features we have decided to add. We are going to discuss the direction we took and what were our reasons to do so (or in other words, "what were they thinking ?"). But for now...

Where we're at

    As we hinted in the previous blog post, Application Cockpit is actually binary, in a sense that its primary domain consists of only two data types: Resources and Readings and the (extremely) short story is that a Resource is an entity for which we are going to track Readings, while a Reading is a piece of information pertaining to a given Resource. The longer story follows below.

Resources

    Like mentioned above, a Resource is mostly something that we are going to collect readings for, it can be a physical entity, or a purely logical one. Examples of currently tracked Resources include:

  • (semi)physical

    • LambdaDeployment

    • RdsInstance

    • ServiceDeployment

    • ServiceInstance

  • logical

    • Lambda

    • Service

    • SoftwareDependency

    • SoftwareDependencyVersion

The Resource has a very rigid model, comprising of only a few, well-defined properties, that is:

  • id -  the unique UUID

  • type - the type of the resource, see above for examples

  • name - a textual identifier, unique across one type; needs to be deterministically derived from resource's properties available at runtime (further details in tradeoffs chapter)

  • parent_id - id of the parent resource (if any)

  • created_on - a timestamp and a late-comer to the party, but we find it useful in many ways, especially during troubleshooting

In order to publish (a.k.a create) a Resource, the collector will provide only the name, type and optionally the parent identifier.

When it comes to type, it is a free-form String. We have no predefined enumeration to represent it and the collectors (see previous article to learn more about them) are free to push whatever types they want, akin to the front-end which is allowed to ask for any type it wishes.

In addition, the inclusion of parent id allows us to build hierarchies, so that our data closely follows the domain that we are modeling.


Readings

    Given the above, you should now have a general idea of what a Resource is and how we use it, so we can move on to Readings. Since we have established that it is generally a piece of data pertaining to a Resource, let's have a look at its properties:

  • id - the serial identifier;

  • type - type of the Reading; free form String, akin to Resource type

  • resource_id - uuid of the Resource this Reading describes

  • value - free-form json document to hold the data of the Reading

  • storagemode - determines the behavior when multiple Readings are published with same resourceid and type:

    • SNAPSHOT - only the most recent Reading is preserved

    • TIME_SERIES - all of the published Readings are stored

Since the values we store for Reading are free-form JSON, it gives developers a lot of flexibility in how to handle their flows. We store both trivial things, e.g.:

To the humongous too-big-to-fit-in-the-picture guys:

Again, akin to Resource type, the Reading's type and value are completely opaque to most of the Application Cockpit infrastructure, so individual contributors are free to ship anything they wish in those properties. The only components which need to be aware of the exact structure are the Collector that is originally publishing the data and the UI component which is querying the Data Provider in order to render the respective view. That way, many developers are free to add their own custom extensions, without stepping on each other's toes or needing to seek approval or assistance from the core Application Cockpit team, which in addition can remain blissfully unaware of the atrocities that the others may commit and attempt to hide within their custom Readings.


The tradeoffs

    Having dealt with the detailed model description, we can now move on to discuss how our journey with its design looked like. Of course we, being the pragmatic, more-than-once-bitten little members of the software engineering club that we are, were under no delusions about what is going to happen to our nifty design when it gets into its first bar fight against the necessities of real life. The only question was when we are going to be faced with our first moral/design dilemma and how much are we going to be willing to compromise our collective conscience.


The Resource id conundrum

    When it comes to the little trade-offs, actually, the more observant reader might have noticed by now that something is a bit amiss in our picture:

  1. In this article we have learned that in order to publish a Reading for a Resource, the Collector probably needs to provide the resource_id to bind it to the right entity

  2. In the previous article we have mentioned, that Collectors are generally stateless and have no access to the storage holding Resource data

  3. In consequence, how can a Collector find out the correct id to publish the reading for ?

Well, it would that it could, but it can’t, so it shan’t. It’s quite a bind we find ourselves in, right ? We were adamant about not allowing collectors access to our DB, so we had to find another way to make things happen… A short look at the ReadingPublishMessage model, might provide some hints, though, so let’s start with that:

Does anything stand out to you ? It is almost a reflection of a Reading, but with a slight twist, and you probably got it right, so enter Resource's name - that tiny little property looks like it has been put there in order to provide a human-friendly name for a Resource, but it couldn't be further from the truth than that. In fact, initially it was never there (we use Readings to store human-friendly names of the Resources anyway) and we added it specifically to solve the issue at hand.

If you recall, the description says that it needs to be unique and possible to deterministically assemble from the data available in the runtime. Leveraging that, the Collector doesn't need to know the right id, it just needs to provide the correct textual name. The good news is that we have a way of overcoming the initial obstacle, but there is also bad news... This forces every collector to be aware of the logic required to construct names of the Resources it is interested in, moreover, the logic needs to be in multiple places and it needs to be synchronized. But let's leave it at that, since data ingestion is going to be the focal point of the next article in series.


Can I haz a Resource ?

    At this point a meticulous reader with a penchant for domain modeling might begin to wonder:

"All right, a Resource can have a parent... That's nice, but is that really all that there is? You say you want to closely match the domain, but not everything is a child of something, sometimes you might want a Resource to 'own' Resources, so... how about that, smarty pants ?"

This was also something we were intensively brainstorming about and lots and lots of coffee cups later (and probably more than a few beer mugs, too) we have arrived at something we were not very uncomfortable with. Our main priority was to avoid complicating the model (remember ? we want people to integrate easily, so we want to keep things simple), so we didn't want to introduce any new concepts or tables. Instead, we elected to cheat a little, so please have a look at the example below with a picture being worth a thousand words:

Basically we have elected to model a "has a" relationship between two Resources using an intermediary Reading to store ids of the referenced (one or many) Resources. In our example a Service Deployment has a Reading Called "Service Deployment Dependencies". The main purpose of this Reading is to hold data which we can use to construct id of "Software Dependency Version" Resources that our "Service Deployment" has. As you can see, if we wanted to obtain vulnerabilities of all dependencies a specific Service Deployment Resource has, we could easily construct ids of each and every "Software Dependency Version" and then query for its "Dependency Version Vulnerability" Readings. 

There aren't any places where we use this not-so-orthodox approach, but there are some and the example below is the most prominent real use-case. We are closely monitoring the performance impact it has on our service and so far, it has been negligible, or at least negligible enough that we are still able to look at ourselves in the mirror.


and the (mostly) harmless evolution

    Having confessed our most prominent sins, let's wrap things up with a less controversial story. There are several items that were not part of the initial design, which we have decided to add along the way, because we have found them useful.


To replace or... not to replace

    The storage mode for a Reading was a last-minute addition to the picture, but it serves its purpose well. Initially we thought of Readings solely as means of representing the current state of things, e.g. the single-point-in-time data like:

  • tech stack given service is using

  • AWS account id of the deployment

  • IP address of an instance, etc.

This meant that we were only interested in the most recent value of a specific Reading Type for a Resource, always replacing it whenever a new value was collected. From the more technical standpoint - we could get away with performing upserts on Readings with the key being comprised of resourceid and readingtype. However, after a few initial releases, we've discovered that there are certain pieces of data more linear in their nature, which Application Cockpit users could really appreciate on a daily basis.

A prominent example of such an item is Service Events Graph, which as you can probably guess represents a service's life cycle timeline, so a mixed bag of Service Deployments, Instance Startups and Shutdowns, Incidents etc. Here's what it currently looks like:

What we have also found out rather quickly was that it would be quite awkward to represent this kind of data using our initial point-in-time-based approach to Readings.

Enter storage mode - a property of a Reading which tells the underlying infrastructure how to behave when a new Reading for the same resourceid and readingtype is persisted. 

To preserve all the already existing flows, we have assumed the default mode to be SNAPSHOT, which upholds the default behavior of upserting new values, keeping all the implemented features intact without any additional effort. 

As an extension of the functionality, we have introduced TIMESERIES mode, which preserves all the values that were collected and allows us to construct e.g. neat timelines like what you can see in the screenshot above. 

As a bonus, here's how the actual data underlying the graph looks like:

A little housekeeping never killed nobody... I hope

    Whilst it seems quite natural for a Reading to have a timestamp since it is collected (or, as you will, measured) at a specific point in time, it's not so straightforward in case of a Resource, especially not from the Business Domain side of things, since it mainly represents a certain Entity that simply is. Seems like we're not all about Business, I guess, but we'll get to that in a while.

With the above in mind, the first iterations of the Resource model did not have a good reason to include timestamps, so they didn't. However, as we kept adding new features and collectors, we have started noticing that the system is accreting quite a lot of Resources and that a rather large portion of them is only relevant within a short timeframe. It wasn't a problem for us at that point, but even though we hate doing the house chores as much as we do, why not keep our room tidy and "let go of the things that don't matter anymore", as we've all certainly been told at least a few times in our life ?

The part where it most stood out for us were Service Instance resources which by their nature are short lived, since as soon as the instance is dead, we don't care about it anymore (apart from audit reasons perhaps). The point was - how do we know whether we can safely delete the Instance Resource ? 

As you have seen above, we have ServiceEvents telling us when instances were terminated, but can we rely on the event to always be generated and are we willing to trust it ? Not really.

Alternatively, we do track instance health checks but they are not ideal by themselves, particularly since a freshly minted Instance would have no checks at all in the beginning and then report as unhealthy for some time before it becomes stable, not to mention that in some cases it is all right for an instance to fail some of the inspections for a while and still survive. The list of things to consider goes on and on, which went contrary to our goal which was to avoid creating a too-sophisticated cleanup-logic for fear it would become too brittle to be effective.

The introduction of Resource timestamps allowed us to reach all the objectives in a relatively low-cost and simple manner. Once we started tracking when an Instance was created it was a simple manner scanning Instance Resources on a schedule, discovering those which had no health reports for quite a while and were not created recently to safely trim them away from our storage.

The added bonus is that having a Resource creation timestamp comes in handy during investigation of issues (not that there are too many), since it makes it easier to track the timeline and wrap your head around what happened when and why.


That's all, folks

    At this point I am going to conclude my musings and I hope that you've found them interesting or perhaps even inspiring at some point. The topic is far from exhausted, especially given that our journey with Application Cockpit is still well under way and we keep on adding new features into the application, so please look forward to the next article in series.

Stories you might like:
By Mariusz Sondecki

Our journey building the AUTO1 service inventory search using DB views

By Alena Nazarava

Machine learning systems are complicated, but good infrastructure can reduce this complexity by...

By Oleg Tudoran

Building single-page applications