Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
Clear All
new posts

    Prohibit resource loading for PDF export

    Isomorphic,

    The built-in RPC method getPdfObject will automatically download resources specified in HTML tags such as <img src="..." />. Is there a way to sanitize the incoming HTML content (as well as the externalStyleSheet URL) to only load resources for allowed URLs.

    Thanks

    #2
    You could attempt to sanitize it, because RPCManager.exportContent() allows you to pass a String of HTML, which could be a modification of HTML obtained via Canvas.getPrintHTML().

    However, the structure of the HTML returned from getPrintHTML() isn't documented and can't be, since we need room for workarounds and for representing more complex structures as features are added to components. So, you would need to be pretty careful in how you did the transform.

    It's probably worth taking a step back and thinking about other approaches and/or providing more information. Some questions / points:

    1. is the problem that the export process is going to be run in some kind of batch mode, where network connectivity is abnormal (perhaps some "secure" server)?

    2. is the problem that the export just fails due to lack of external resources, or just that those resources are missing in the export, and that looks bad?

    3. just to state the obvious: if a bunch of images and CSS are omitted, you may have an unusable export, even if you can succeed in getting it generated. If the images really aren't needed, then in at least some cases, they could be omitted from the export before you call getPrintHTML() (hide an <Img> that is just decorative, for example).

    4. if you do want the images and styles to be used in the export, use of "data:" URLs might eliminate the need to externally reference resources. However, note we have not tested whether this works with iText, and you might need to upgrade SmartClient to be able to use "data:" URLs directly as part of an SCImgURL (as opposed to using them in CSS, which has been possible for a very long time)

    5. there is an IResourceRetriever API in iText which can be used to control how lookup is done. It could be customized to be a no-op for any disallowed URLs. We could do this as a Feature Sponsorship, or for this particular export, you might build your own use of iText. However, see #3 above - you may not have a usable export if you omit these resources

    Comment


      #3
      Thanks for the detailed reply.

      Is the getPdfObject method intended for use in a production setting? The main issue is that the getPdfObject method will accept arbitrary HTML not just the print HTML generated by the Canvas. For example, if HTML containing <img src="file:///..." /> is received by the method, then the image will be loaded from the local filesystem and included in the pdf. Because of this we want to be able restrict which URLs are allowed. This is includes prohibiting protocols such as FILE and FTP as well.

      The RPCManager.exportContent() functionality is being used to export ListGrid data as the downloadClientExport method does not support PDF format.

      Your fifth point comes closest to what I think would be required.
      Last edited by stonebranch3; 24 Apr 2024, 07:21.

      Comment


        #4
        Yes, getPDFObject() is intended for production use. However, like most any server API, you can't necessarily allow public, unvalidated access. You can't do that with DSRequest either, obviously.

        To clarify, file:/// URLs and such is your actual concern here - it's a security concern? Because this wasn't clear and we were off considering other reasons you might have raised this, like resources only accessible to the browser but from the server.

        Comment


          #5
          Yes, this is a security concern since requests to getPDFObject may be able to load resources that are accessible to the server only and possibly leak information in the generated PDF.

          The ability to include images from the local file system is one example, however, there may be other potential vulnerabilities.

          Comment


            #6
            FYI, we are working on this:

            1) we're going to default to not allowing resources from the filesystem to be loaded. This is technically a security fix, so we'll backport it. However, from various attempts, we haven't found a way to sneakily use HTML tags (<object>, <embed> etc) to get any other kind of content into the generated PDF. Have you tried this with any success?

            If not, it's hard to see the exploit here: you would need a library of highly confidential images sitting around on disk. So the scenario would be that someone installed SmartClient on a Predator drone and allowed public access to PDF generation.. perhaps you have a more plausible scenario?

            2) for future versions, we'll likely add a configuration property with an allowed list of URLs to fetch from. The trick here is that you'll need to understand the URLs coming from the client and how they might map to how the server would resolve them. And again the "exploit" seems a bit crazy: something like the server has privileged (non-authenticated) access to confidential images hosted on a firewalled server that is not publicly accessible. Do you happen to have a more plausible scenario?

            Comment


              #7
              Thanks for the update.

              1) Other than local image file inclusion, I wasn't able to get content into the generated PDF. I've seen other exploits using the <iframe> tag but this appears to be ignored by the FlyingSaucer (XHTML renderer) library. I don't think the <script> tag is supported either.

              This came from a third-party assessment which made the following observations regarding the PDF export:
              • Images from the local filesystem can be included in the generated PDF.
              • HTTP, HTTPS, FILE, and FTP protocols are supported.
              • HTML content and externalStyleSheet URLs are not checked if they are allowed.
              This fix would address 1 and 2 (partially) so we are interested in the fix once it's available.

              2) I don't have a more plausible scenario other than perhaps a denial of service.

              Having an allowed list of URLs would be part of the defense-in-depth strategy suggested here.

              We only need to support loading images from the same server hosting the application.
              Last edited by stonebranch3; 30 Apr 2024, 05:56.

              Comment

              Working...
              X