XML-schema 'best practice' question

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Frank Millman

    XML-schema 'best practice' question

    Hi all

    This is not strictly a Python question, but as I am writing in Python,
    and as I know there are some XML gurus on this list, I hope it is
    appropriate here.

    XML-schemas are used to define the structure of an xml document, and
    to validate that a particular document conforms to the schema. They
    can also be used to transform the document, by filling in missing
    attributes with default values.

    In my situation, both the creation and the processing of the xml
    document are under my control. I know that this begs the question 'why
    use xml in the first place', but let's not go there for the moment.

    Using minixsv, validating a document with a schema works, but is quite
    slow. I appreciate that lxml may be quicker, but I think that my
    question is still applicable.

    I am thinking of adding a check to see if a document has changed since
    it was last validated, and if not, skip the validation step. However,
    I then do not get the default values filled in.

    I can think of two possible solutions. I just wondered if this is a
    common design issue when it comes to xml and schemas, and if there is
    a 'best practice' to handle it.

    1. Don't use default values - create the document with all values
    filled in.

    2. Use python to check for missing values and fill in the defaults
    when processing the document.

    Or maybe the best practice is to *always* validate a document before
    processing it.

    How do experienced practitioners handle this situation?

    Thanks for any hints.

    Frank Millman
  • Lorenzo Gatti

    #2
    Re: XML-schema 'best practice' question

    On 18 Set, 08:28, Frank Millman <fr...@chagford .comwrote:
    I am thinking of adding a check to see if a document has changed since
    it was last validated, and if not, skip the validation step. However,
    I then do not get the default values filled in.
    >
    I can think of two possible solutions. I just wondered if this is a
    common design issue when it comes to xml and schemas, and if there is
    a 'best practice' to handle it.
    >
    1. Don't use default values - create the document with all values
    filled in.
    >
    2. Use python to check for missing values and fill in the defaults
    when processing the document.
    >
    Or maybe the best practice is to *always* validate a document before
    processing it.
    The stated problem rings a lot of premature optimization bells;
    performing the validation and default-filling step every time,
    unconditionally , is certainly the least crooked approach.

    In case you really want to avoid unnecessary schema processing, if you
    are willing to use persistent data to check for changes (for example,
    by comparing a hash or the full text of the current document with the
    one from the last time you performed validation) you can also store
    the filled-in document that you computed, either as XML or as
    serialized Python data structures.

    Regards,
    Lorenzo Gatti

    Comment

    • skip@pobox.com

      #3
      Re: XML-schema 'best practice' question

      Frank1. Don't use default values - create the document with all values
      Frankfilled in.

      Frank2. Use python to check for missing values and fill in the defaults
      Frankwhen processing the document.

      FrankOr maybe the best practice is to *always* validate a document
      Frankbefore processing it.

      FrankHow do experienced practitioners handle this situation?

      3. Don't use XML.

      (sorry, couldn't resist)

      Skip

      Comment

      • Frank Millman

        #4
        Re: XML-schema 'best practice' question

        On Sep 18, 8:28 am, Frank Millman <fr...@chagford .comwrote:
        Hi all
        >
        This is not strictly a Python question, but as I am writing in Python,
        and as I know there are some XML gurus on this list, I hope it is
        appropriate here.
        >
        XML-schemas are used to define the structure of an xml document, and
        to validate that a particular document conforms to the schema. They
        can also be used to transform the document, by filling in missing
        attributes with default values.
        >
        [..]
        >
        Or maybe the best practice is to *always* validate a document before
        processing it.
        >
        I have realised that my question was irrelevant.

        xml's raison d'etre is to facilitate the exchange of information
        between separate entities. If I want to use xml as a method of
        serialisation within my own system, I can do what I like, but there
        can be no question of 'best practice' in this situation.

        When xml is used as intended, and you want to process a document
        received from a third party, there is no doubt that you should always
        validate it first before processing it. Thank you, Lorenzo, for
        pointing out the obvious. It may take me a while to catch up, but at
        least I can see things a little more clearly now.

        As to why I am using xml at all, I know that there is a serious side
        to Skip's light-hearted comment, so I will try to explain.

        I want to introduce an element of workflow management (aka Business
        Process Management) into the business/accounting system I am
        developing. I used google to try to find out what the current state of
        the art is. After several months of very confusing research, this is
        the present situation, as best as I can figure it out.

        There is an OMG spec called BPMN, for Business Process Modeling
        Notation. It provides a graphical notation, intended to be readily
        understandable by all business users, from business analysts, to
        technical developers, to those responsible for actually managing and
        monitoring the processes. Powerful though it is, it does not provide a
        standard method of serialsing the diagram, so there is no standard way
        of exchanging a diagram between different vendors, or of using it as
        input to a workflow engine.

        There is an OASIS spec called WS-BPEL, for Web Services Business
        Process Execution Language. It defines a language for specifying
        business process behavior based on Web Services. This does have a
        formal xml-based specification. However, it only covers processes
        invoked via web services - it does not cover workflow-type processes
        within an organisation. To try to fill this gap, a few vendors got
        together and submitted a draft specification called BPEL4People. This
        proposes a series of extensions to the WS-BPEL spec. It is still at
        the evaluation stage.

        The BPMN spec includes a section which attempts to provide a mapping
        between BPMN and BPEL, but the authors state that there are areas of
        incompatibility , so it is not a perfect mapping.

        Eventually I would like to make sense of all this, but for now I want
        to focus on BPMN, and ignore BPEL. I can use wxPython to design a BPMN
        diagram, but I have to invent my own method of serialising it so that
        I can use it to drive the business process. For good or ill, I decided
        to use xml, as it seems to offer the best chance of keeping up with
        the various specifications as they evolve.

        I don't know if this is of any interest to anyone, but it was
        therapeutic for me to try to organise my thoughts and get them down on
        paper. I am not expecting any comments, but if anyone has any thoughts
        to toss in, I will read them with interest.

        Thanks

        Frank

        Comment

        • Lorenzo Gatti

          #5
          Re: XML-schema 'best practice' question

          On 20 Set, 07:59, Frank Millman <fr...@chagford .comwrote:
          I want to introduce an element of workflow management (aka Business
          Process Management) into the business/accounting system I am
          developing. I used google to try to find out what the current state of
          the art is. After several months of very confusing research, this is
          the present situation, as best as I can figure it out.
          What is the state of the art of existing, working software? Can you
          leverage it instead of starting from scratch? For example, the
          existing functionality of your accounting software can be reorganized
          as a suite of components, web services etc. that can be embedded in
          workflow definitions, and/or executing a workflow engine can become a
          command in your application.
          There is an OMG spec called BPMN, for Business Process Modeling
          Notation. It provides a graphical notation
          [snip]
          there is no standard way
          of exchanging a diagram between different vendors, or of using it as
          input to a workflow engine.
          So BPMN is mere theory. This "spec" might be a reference for
          evaluating actual systems, but not a standard itself.
          There is an OASIS spec called WS-BPEL, for Web Services Business
          Process Execution Language. It defines a language for specifying
          business process behavior based on Web Services. This does have a
          formal xml-based specification. However, it only covers processes
          invoked via web services - it does not cover workflow-type processes
          within an organisation. To try to fill this gap, a few vendors got
          together and submitted a draft specification called BPEL4People. This
          proposes a series of extensions to the WS-BPEL spec. It is still at
          the evaluation stage.
          Some customers pay good money for buzzword compliance, but are you
          sure you want to be so bleeding edge that you care not only for WS-
          something specifications, but for "evaluation stage" ones?

          There is no need to wait for BPEL4People before designing workflow
          systems with human editing, approval, etc.
          Try looking into case studies of how BPEL is actually used in
          practice.
          The BPMN spec includes a section which attempts to provide a mapping
          between BPMN and BPEL, but the authors state that there are areas of
          incompatibility , so it is not a perfect mapping.
          Don't worry, BPMN does not exist: there is no incompatibility .
          On the other hand, comparing and understanding BPMN and BPEL might
          reveal different purposes and weaknesses between the two systems and
          help you distinguish what you need, what would be cool and what is
          only a bad idea or a speculation.
          Eventually I would like to make sense of all this, but for now I want
          to focus on BPMN, and ignore BPEL. I can use wxPython to design a BPMN
          diagram, but I have to invent my own method of serialising it so that
          I can use it to drive the business process. For good or ill, I decided
          to use xml, as it seems to offer the best chance of keeping up with
          the various specifications as they evolve.
          If you mean to use workflow architectures to add value to your
          business and accounting software, your priority should be executing
          workflows, not editing workflow diagrams (which are a useful but
          unnecessary user interface layer over the actual workflow engine);
          making your diagrams and definitions compliant with volatile and
          unproven specifications should come a distant last.
          I don't know if this is of any interest to anyone, but it was
          therapeutic for me to try to organise my thoughts and get them down on
          paper. I am not expecting any comments, but if anyone has any thoughts
          to toss in, I will read them with interest.

          1) There are a number of open-source or affordable workflow engines,
          mostly BPEL-compliant and written in Java; they should be more useful
          than reinventing the wheel.

          2) With a good XML editor you can produce the workflow definitions,
          BPEL or otherwise, that your workflow engine needs, and leave the
          interactive diagram editor for a phase 2 that might not necessarily
          come; text editing might be convenient enough for your users, and for
          graphical output something simpler than an editor (e.g a Graphviz
          exporter) might be enough.

          3) Maybe workflow processing can grow inside your existing accounting
          application without the sort of "big bang" redesign you seem to be
          planning; chances are that the needed objects are already in place and
          you only need to make workflow more explicit and add appropriate new
          features.

          Regards,
          Lorenzo Gatti

          Comment

          • Lorenzo Gatti

            #6
            Re: XML-schema 'best practice' question

            Sorry for pressing the send button too fast.

            On 20 Set, 07:59, Frank Millman <fr...@chagford .comwrote:
            I want to introduce an element of workflow management (aka Business
            Process Management) into the business/accounting system I am
            developing. I used google to try to find out what the current state of
            the art is. After several months of very confusing research, this is
            the present situation, as best as I can figure it out.
            What is the state of the art of existing, working software? Can you
            leverage it instead of starting from scratch? For example, the
            existing functionality of your accounting software can be reorganized
            as a suite of components, web services etc. that can be embedded in
            workflow definitions, and/or executing a workflow engine can become a
            command in your application.
            There is an OMG spec called BPMN, for Business Process Modeling
            Notation. It provides a graphical notation
            [snip]
            there is no standard way
            of exchanging a diagram between different vendors, or of using it as
            input to a workflow engine.
            So BPMN is mere theory. This "spec" might be a reference for
            evaluating actual systems, but not a standard itself.
            There is an OASIS spec called WS-BPEL, for Web Services Business
            Process Execution Language. It defines a language for specifying
            business process behavior based on Web Services. This does have a
            formal xml-based specification. However, it only covers processes
            invoked via web services - it does not cover workflow-type processes
            within an organisation. To try to fill this gap, a few vendors got
            together and submitted a draft specification called BPEL4People. This
            proposes a series of extensions to the WS-BPEL spec. It is still at
            the evaluation stage.
            Some customers pay good money for buzzword compliance, but are you
            sure you want to be so bleeding edge that you care not only for WS-
            something specifications, but for "evaluation stage" ones?

            There is no need to wait for BPEL4People before designing workflow
            systems with human editing, approval, etc.
            Try looking into case studies of how BPEL is actually used in
            practice.
            The BPMN spec includes a section which attempts to provide a mapping
            between BPMN and BPEL, but the authors state that there are areas of
            incompatibility , so it is not a perfect mapping.
            Don't worry, BPMN does not exist: there is no incompatibility .
            On the other hand, comparing and understanding BPMN and BPEL might
            reveal different purposes and weaknesses between the two systems and
            help you distinguish what you need, what would be cool and what is
            only a bad idea or a speculation.
            Eventually I would like to make sense of all this, but for now I want
            to focus on BPMN, and ignore BPEL. I can use wxPython to design a BPMN
            diagram, but I have to invent my own method of serialising it so that
            I can use it to drive the business process. For good or ill, I decided
            to use xml, as it seems to offer the best chance of keeping up with
            the various specifications as they evolve.
            If you mean to use workflow architectures to add value to your
            business and accounting software, your priority should be executing
            workflows, not editing workflow diagrams (which are a useful but
            unnecessary user interface layer over the actual workflow engine);
            making your diagrams and definitions compliant with volatile and
            unproven specifications should come a distant last.
            I don't know if this is of any interest to anyone, but it was
            therapeutic for me to try to organise my thoughts and get them down on
            paper. I am not expecting any comments, but if anyone has any thoughts
            to toss in, I will read them with interest.

            1) There are a number of open-source or affordable workflow engines,
            mostly BPEL-compliant and written in Java; they should be more useful
            than reinventing the wheel.

            2) With a good XML editor you can produce the workflow definitions,
            BPEL or otherwise, that your workflow engine needs, and leave the
            interactive diagram editor for a phase 2 that might not necessarily
            come; text editing might be convenient enough for your users, and for
            graphical output something simpler than an editor (e.g a Graphviz
            exporter) might be enough.

            3) Maybe workflow processing can grow inside your existing accounting
            application without the sort of "big bang" redesign you seem to be
            planning; chances are that the needed objects are already in place and
            you only need to make workflow more explicit and add appropriate new
            features.

            Regards,
            Lorenzo Gatti

            Comment

            Working...