Data compression

Idea: From historical data, further data should be “condensed”.
Examples:

  • From a temperature, a ¼ hour mean value, or daily mean value or monthly mean value is to be calculated.
  • From a meter reading, a “consumption profile” (¼ consumption, or average power) or an hourly consumption or a daily consumption, etc.) is to be calculated.
  • The same from a “tank level” such as an oil tank.

Of course, a system like Grafana can be used for diagrams. But there are some limitations that are not easy to solve with Grafana. Another solution would be to use an SQL database and carry out the compression there by using procedures (or whatever). But this requires deep knowledge of databases.
I would like to have this data in openHAB and not depend on a third party system. By calculating the data in advance (e.g. every 15 minutes), access is faster than if it is calculated on the fly. And since they are also historised, you can also access them with Grafana (or whatever). This could also be useful for energy management (later topic).

If you describe the problem in the openHAB community, the answers always refer to “Grafana, Rules, Servlet, third-party software, etc.” or even that openHAB is not made for this.

My first approach was to write a node.js module that I could then use in a rule. But I gave up on that for various reasons.

It was also not clear to me what would be the right concept in openHAB.
I am now thinking of a binding (data compression). A compression would then be a Thing. This Thing has 2 channels, the input and the output, which are each linked to an item.
The settings of the summarisation are made in the Thing (calculation interval, calculation method, summarisation interval, etc.).
Since the output item should not save the data either on change or cyclically, persistence would have to be ignored or switched off without any further necessary configuration. Nevertheless, the user must be able to specify the target persistence. But how often and when the data is saved must be decided and managed by the compression itself. No idea whether this is possible.

How does compression work?
When the calculation is triggered (typically cyclically or manually), the system first looks in the historical data from the output to see when the last entry was made. The calculation must be made from this point in time. If no data is available yet, the largest possible range must be calculated.
Data is calculated and stored as far as possible. If data is missing, interpolation is to be done. However, extrapolation is not necessary and is not provided for.
The calculation should be done in a separate thread and should not block any other functions of openHAB. I have no idea how openHAB manages this, whether it is perhaps even given “just like that”. Or is there a “main thread” and if you don’t do anything special, the main thread is busy?

I would like to develop this binding. Is this rough concept ok? Or are there other better solutions? Or have I overlooked any problems?

Hello Frank,
Welcome and thank you for sharing your idea. What you describe is quite common requirement as very first thing you learn about energy meters… is that tracking a “counter” is just beginning. After all, we are being billed for (more or less) monthly consumption of media / gas etc. I agree that giving such functionality out-of-the box would be quite helpful for pretty much everyone.

From your description I see two primary use cases:

  1. aggregation of counter values into “differences”, in case of missing slot you can make some basic assumptions based on values near slot (or slots).
  2. averaging of instant values (power, flow rate, speed etc.), in case of missing slot you may assume 0 since there is no basic way to get such information back. Unless you have a counter values from which you can make a derivative in time.

Another reason why these two are different cases is that from implementation point of view the calculation “loop” is a bit different. For first use case you need values closes to edges (last counter value from earlier slot, last from current), while for second you need all values.

Implementation approach

While I am not feeling myself an alpha and omega to determine how such thing would need to be implemented I think that in this particular case binding may add extra complexity for no benefit. I will try to explain why.
In case where you have multiple counter values (water, electricity, gas, heat) you will end up with multiple things for each and every of it. The thing you will be implementing will probably have to provide user with possibility to define dynamic (extensible) channels or assume available slots (Y/M/D/h/30m/15m/5m/m/s). Then, it will have to trigger update for each channel when slot is closed. This part is clear, however difficult part is passing the calculated value to the persistence layer. You could write directly into persistence service of other item, but then you will have to create manually item for each slot. The forecast/historical updates which are going to be introduced in 4.0 could help a bit here.
Since you will have input item and output item plus persistence layer of both you implement more or less a rule, without calling that thing a rule.
Another point in question is - whether you want to make item for each medium have different name, like. Water_15m, Electricity_15m, Gas_Quarterly? If no, then going with channels will actually prevent you from making a stable naming pattern which in the end will impact possibility to unify visualizations.
For above reasons I think it closes to a “miscellaneous extension” which will bridge persistence with extra calculations. From configuration point of view all you need are item tags and metadata. You can use tags to point your extension which items are interesting for your extension (i.e. everything with “Energy” tag) while item metadata can bring you additional configuration like slots for which you need to make calculation.
Final point - based on slots declared in metadata you can inject ad-hoc items which will be managed by extension and not end user, Effectively you can create some kind of “template” where marking “15m” slot in the item metadata will result in creation of “$ItemName_$algorithm_QuarterHour”.

I think there were some discussion about making item metadata more usable with OH 4.0. It was somewhere in webui, so I have to look what was resolution there.

Hi Lukasz

Well, as far as the algorithm is concerned, that’s not such a problem, because I’ve already implemented something like that for the Niagara Framework and we don’t have any problems with it, so to speak. The biggest problems are wrong raw values. That’s why the possibility to correct raw data would also be a nice feature. But that’s another story.

Yes. It would mean one Thing per Compression.

Triggering can happen cyclically. Typically every 15 minutes. Whether and what is calculated is determined by the data that is available.

I had already imagined that it would be easy. With the API, you can (supposedly) simply write data. With:
PUT /persistence/items/{itemname}
I don’t remember why, but I was convinced that it wouldn’t work. But there was talk in the community that it would work.

I don’t understand the statement

you will have to create manually item for each slot

I have to write in Persistence when I want and what I want. This means that “I” am responsible for what is written. Of course, I must also be able to write the timestamp with the value. Just like with the API. I need the serviceId, the timestamp, the state and the item name. Otherwise it won’t work.
But why an “item for each slot”? Or maybe I don’t understand “slot”. What do you mean by that?

Do I understand you correctly? You think the idea of a “Thing”, “Channels” and “Items” doesn’t make sense?
Maybe I have to go back to my very first consideration. How would that be implemented in openHAB? Does it have to be a “service”? A “Binding”? Anything else? I had also looked at the PID controller and thought I might be able to adopt some of it. But the way the PID controller is to be used (although I don’t have it yet), I think it’s a mess :slight_smile: . But maybe I don’t quite understand it either.
To be honest, I still don’t understand enough how openHAB works at its core. My first idea was to develop a “service”. That should be a component that runs in the background and does its job. I also had the idea of tagging items to simplify configuration.
Is that what you mean?
My problem was that I didn’t know where to start.

I have no idea about that either. Sorry, I just don’t follow the whole thing that closely.

That would be ideal, of course.

I have limited access to computer, thus just a quick ingestion of links:

Later one is a bit less obvious, however the background of this request is similar to what I mentioned. Author proposes a way to modify item metadata via existing widgets.

I agree that from algorithm point of view its not that relevant how you query data, but from resource usage perspective it has a lot of impact. If you have a energy counter which is sampled every five minutes and you wish to calculate consumption between 01-01-2021 and 31-12-2021 you will end up with 105120 samples to go over (3652460/5). If you have a sample every 15 minutes you get “just” 35040 rows, and in extreme case with minute sampling 526000 rows to scan. Any of above will probably require paging, otherwise computing thread will get stuck for quite long time.
I brought idea of two different loops for above reason. You may have a lot of data to pass over memory just to get first and last record.

I think with proper access token it will fly. Its just going to store data with default persistence service. AFAIR there is serviceId parameter which can by used to choose which persistence service to use (if other than default).
I had experiences of interacting with persistence api within openHAB itself through direct calls to ModifiablePersistenceService and it was quite awful with 3.0-3.2, starting from 3.3 I think it was cleared out and even jdbc service works more or less fine. :wink: Previously if you would call store(item, state, date) the state would be ignored and present state of item would be used!

I do think that item1 > thing > channel > item2 chain is fairly long to make derivative of item1 stored in item2. Not sure how exactly you wish to model thing and channel, but still these are elements which people and you have to manage. If you intend to have high granularity of configuration which allows to manipulate each bit and piece of how computation is made for each and every item - getting it via things or rules is way to go.
WRT the PID controller - I think difficulty there is trigger vs action which are theoretically two separate things, but for PID both are part of algorithm loop. I was trying to get “compute” trigger with rules and resigned from it because it was to verbose to keep across multiple installations.
One of “undocumented” features of openHAB are “configurable services” which can use configuration descriptors to render UI forms/views visible to end user. These services are listed in openHAB UI unser settings page. For example “Regional settings” are built upon that. If you take a look on lines 70-76 in this file:

there is a service label and category as well as config.description.uri which is used as a config descriptor to render form in user interface:

Descriptor itself is quite basic: connectorio-cloud/proxy.xml at connectorio-cloud-0.2.0 · ConnectorIO/connectorio-cloud · GitHub
:point_right: If you look closer, it is the same format you use normally to describe thing/channel config, :slight_smile: without having to define thing/channel itself.

Whole point of using above is getting the UI form and a way to provide configuration into system. More over this configuration can be stored in .cfg file which can be provisioned through userdata/etc if needed. The PR which I was linking above was about to provide same forms item metadata. Your service could accept several parameters such as “time slots” to compute and update interval to fire computation.
Then items, through their metadata, could provide overrides (different slots or algorithms).

Not sure if OH 4.0 will get extended support for metadata config descriptors (it would be logical to have them), however I would not worry about that too much, metadata can be appended to item definition.

For the item themselves you can create them programmatically:

// SimpleProvider to be found in connectorio-addons
public class StaticItemProvider extends SimpleProvider<Item> implements ItemProvider {

  public StaticItemProvider(List<Item> items) {
    super(items);
  }

  public void add(Item item) {
    super.add(item);
  }

}

If you will register StaticItemProvider as an OSGi service (bundleContext.registerService) with given list of manufactured items, then these items will be recognized by openHAB and be read only in UI. Items can be created programmatically through ItemFactory / ItemBuilderFactory (I don’t remember exact name).
What you need is a process which:

  1. sources interesting items through ItemRegistry and registry listeners
  2. filter items which qualify for computation/compression (quantity or tags)
  3. use MetadataRegistry.get(new MetadataKey("my-namespace", item.getName())) to get computation config
  4. push derivative items to managed registry and register a computing job.
  5. fire computations periodically.
  6. store data through ModifiablePersistenceService#store(Item, Date, State) call.
  7. update item status, if computed slot happens to be most recent one.

Hope that this helps and sheds some more light on how it could be approached.

Sorry, I can’t keep up with your speed to answer :wink:
I almost feel guilty that you are taking so much time to explain all this to me.

There was a lot to read. But if I understand correctly and summarise briefly:

  1. The persistence service should not be accessed directly.
  2. The Persistence Service reacts to events, e.g. when an item changes its value (depending on the strategy). But for past data it does not exist yet. But it will be implemented.

So that means either I wait for the implementation or I break this convention.

Ok. I think I’ve got it.

Yes. Absolutely. Thank you very much.

Boundaries which are defined and defined separation of concerns is made to make it easier for everyone to understand what happens where, however there are different bindings. From ones which use serial communication, over these which tap network traffic up to ones which query data from cloud. Why querying data from database or making network communication to database would be considered wrong?
There was a fairly long discussion about querying database from openHAB - eventually dbquery addon was contributed and included in official distribution: openhab-addons/README.md at 3.4.x · openhab/openhab-addons · GitHub (pull request).
Above proves that it is possible to “break” convention, if you make a separate connection from binding. :wink: There was even a PR with support for jdbc querying.

I think that with service concept for a start you will get results sooner with a chance to practice just tiny fragment of framework. Use of config descriptors + OSGi configuration admin is more straight forward and less constrained than starting with a binding.

Best,
Łukasz

Hey @Fredo, there was recently a separate topic in openHAB which used a term “downsampling” in context of influxdb: Downsampling data in InfluxDB v2.X - Persistence Services - openHAB Community.

It seems to be also a generic statement used elsewhere.