Hi, in the context of a Custom Server DataSource, I’d like to know if there is a best practice for performing post-processing actions at the transaction/queue level.
Announcement
Collapse
No announcement yet.
X
-
That's a little too vague to answer as it stands. Some clarifying questions:
1) what is the post-processing? is it SQL?
2) does it need to happen inside the transaction or after it?
3) what triggers the post-processing? A particular set of operations within the queue?
4) why is this necessarily post-processing rather than just something performed by the operations within the queue?
-
Hi, thanks for your feedback. Let me clarify my use case.
I’m trying to implement a Custom Server DataSource that behaves like a standard SQLDataSource with a binary field, but with the difference that the binary data is stored in an external storage system rather than directly in the database.
The main challenge is maintaining transactional integrity: I need to avoid situations where a file is stored without a corresponding database reference, or vice versa. For this reason, I was considering whether it would be possible to “defer” the actual storage operations until the end of the transaction (ideally after the database transaction has successfully completed?).
I’m also open to alternative suggestions or best practices you might recommend for this type of scenario.
Comment
-
Is this storage to S3 or a similar CDN?
Is the problem that the storage could take a while and you don't want to leave the transaction open? Because if you close the transaction without knowing whether file storage completed, that invites data corruption, whereas waiting until external storage to complete before committing the transaction would prevent that problem.
It also doesn't seem to help to move the storage outside of the transaction, as the HTTP request will still not be complete if you are performing the file storage in the same HttpRequest thread, but just after the SQL operation completes. You would have to start a separate thread. And at that point it would be best to mark the SQL record as "pending" in some way, then come back and remove that marker from the separate thread once file storage is complete.
So, with our understanding right now, it seems like you may actually not want to do file storage as post-processing after the SQL operations.
Comment
-
I’m not sure I’m fully getting your suggestion, so I'll try to clarify my side.
I’m using Azure Blob Storage.
I understand that moving file storage to “post-processing” does not solve the lack of true atomicity between the DB and external storage. My main concern is simply to avoid triggering file operations on Blob until after the SQL transaction has successfully committed.
For example, in the case of a DELETE, I want to avoid deleting the file from Blob storage if the database transaction later rolls back.
So I was wondering if there's a way, in my Custom Server DataSource, to hook into the server-side lifecycle of a DSTransaction, after the commit is complete, in order to safely trigger external actions (synchronous or async).
That is what I was trying to get at when I mentioned “post-processing at the transaction/queue level.”
Comment
-
Well, it's one or the other:
1) if you commit the transaction and update storage afterwards, if there is a storage error, then for an add, you'll have the DB record but not the storage if the storage fails
2) if you don't commit the transaction until the storage operation completes, if there is a DB error after the storage completes, you'll have the storage, but not the DB record
This is a classic distributed transactions problem - as you said, an atomicity issue - here's an overview of approaches to handle this.
https://chatgpt.com/share/68c1d7c3-0...a-dc07fb1fea76
It is generally better to have orphaned files that have to be cleaned up later, as opposed to DB records that point to files that don't exist. That suggests doing the blob storage mid-transaction for an "add" or "update" while doing blob storage after the transaction for a delete. This way, you never have a record pointing to a non-existent file, but you may need to clear blob storage periodically, to get rid of orphaned files.
A simple way to do this is to just set a servletRequest attribute that triggers logic at the servlet level to go and delete the blob storage. You could optionally write a DB record if the blob operation fails, to make the later reconciliation process a bit easier.
Comment
-
Thanks for the suggestions. So if I understand correctly, the way to know that the transaction has finished and to trigger cleanup/reconciliation logic would be via a servlet filter that checks for some attribute on the servletRequest.
However, in that case I wouldn’t really have a way to know the actual outcome/status of the transaction/queue - or am I missing something?
Comment
Comment