Data Warehousing

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • David Harris

    Data Warehousing

    Ok, so I'm semi-new to .NET, having done everything manually with SQL code
    back in VB6. So before I program this up completely manually again, I thought
    I'd ask for better ways to think through this problem.

    We have several client machines, and a central data warehousing server. Each
    machine may contain hundreds of surveys, and they all are sent to the central
    server. Only they can never be networked together, forcing us to use files. I
    currently use an XML file to control what happens on each end (deletions,
    confirmations of actions taken, configurations, and survey transfers). Only
    the survey transfer part has me.

    Our database model uses GUIDs for all unique keys, since each survey can
    have several tens of thousands of individual data points. The problem with
    this is that if I merely serialize the dataset for each survey into files, we
    wind up with GUID collisions at the data warehouse, since each machine may
    have hundreds of thousands of GUIDs. This forces me to loop through the
    surveys being transferred, add each manually with a fresh GUID, and then add
    each subcomponent of the survey in turn, all with freshly generated GUIDs.
    This turns into a huge process that eats up CPU time and may take upwards of
    fifteen minutes for a single larger survey. Needless to say, most people want
    our old file-based DOS system back because it was "instant".

    Is there a better way to perform the task at hand? I didn't think
    replication or subscription models would work because we have to use files as
    an intermediary. I've only ever done basic SQL operations, so I'm not sure if
    there's something readily available that can basically automate the process.
    And I'm new to C# and .NET framework in general, so I'm not sure if there is
    something that can help me out there. Would appreciate any tips on how to
    best accomplish this.

    Thanks,
    David
  • Kevin Spencer

    #2
    Re: Data Warehousing

    If these machines aren't networked together, how do they send the files to
    the central data warehousing server? Unless the file is put onto a thumb
    drive or some other portable medium and physically carried to the server,
    they are networked together. In addition, if these machines are connected to
    the Internet, they are networked together (The Internet is a vast TCP/IP
    network).

    --
    HTH,

    Kevin Spencer
    Microsoft MVP
    Professional Numbskull

    Hard work is a medication for which
    there is no placebo.

    "David Harris" <DavidHarris@di scussions.micro soft.com> wrote in message
    news:B90D20D1-7437-481B-A045-229E7FD0410F@mi crosoft.com...[color=blue]
    > Ok, so I'm semi-new to .NET, having done everything manually with SQL code
    > back in VB6. So before I program this up completely manually again, I
    > thought
    > I'd ask for better ways to think through this problem.
    >
    > We have several client machines, and a central data warehousing server.
    > Each
    > machine may contain hundreds of surveys, and they all are sent to the
    > central
    > server. Only they can never be networked together, forcing us to use
    > files. I
    > currently use an XML file to control what happens on each end (deletions,
    > confirmations of actions taken, configurations, and survey transfers).
    > Only
    > the survey transfer part has me.
    >
    > Our database model uses GUIDs for all unique keys, since each survey can
    > have several tens of thousands of individual data points. The problem with
    > this is that if I merely serialize the dataset for each survey into files,
    > we
    > wind up with GUID collisions at the data warehouse, since each machine may
    > have hundreds of thousands of GUIDs. This forces me to loop through the
    > surveys being transferred, add each manually with a fresh GUID, and then
    > add
    > each subcomponent of the survey in turn, all with freshly generated GUIDs.
    > This turns into a huge process that eats up CPU time and may take upwards
    > of
    > fifteen minutes for a single larger survey. Needless to say, most people
    > want
    > our old file-based DOS system back because it was "instant".
    >
    > Is there a better way to perform the task at hand? I didn't think
    > replication or subscription models would work because we have to use files
    > as
    > an intermediary. I've only ever done basic SQL operations, so I'm not sure
    > if
    > there's something readily available that can basically automate the
    > process.
    > And I'm new to C# and .NET framework in general, so I'm not sure if there
    > is
    > something that can help me out there. Would appreciate any tips on how to
    > best accomplish this.
    >
    > Thanks,
    > David[/color]


    Comment

    • David Harris

      #3
      Re: Data Warehousing

      Sorry for omitting that information, but the machines are normally located in
      secure facilities, so have no access to either the internet or a network of
      any kind. Many times they even must have any networking cards removed before
      they enter the facility. We will be using USB thumb drives to transfer the
      files from the machines to the server, and have a small download/upload app
      at each side.

      Thanks,
      David

      "Kevin Spencer" wrote:
      [color=blue]
      > If these machines aren't networked together, how do they send the files to
      > the central data warehousing server? Unless the file is put onto a thumb
      > drive or some other portable medium and physically carried to the server,
      > they are networked together. In addition, if these machines are connected to
      > the Internet, they are networked together (The Internet is a vast TCP/IP
      > network).
      >
      > --
      > HTH,
      >
      > Kevin Spencer
      > Microsoft MVP
      > Professional Numbskull
      >
      > Hard work is a medication for which
      > there is no placebo.
      >
      > "David Harris" <DavidHarris@di scussions.micro soft.com> wrote in message
      > news:B90D20D1-7437-481B-A045-229E7FD0410F@mi crosoft.com...[color=green]
      > > Ok, so I'm semi-new to .NET, having done everything manually with SQL code
      > > back in VB6. So before I program this up completely manually again, I
      > > thought
      > > I'd ask for better ways to think through this problem.
      > >
      > > We have several client machines, and a central data warehousing server.
      > > Each
      > > machine may contain hundreds of surveys, and they all are sent to the
      > > central
      > > server. Only they can never be networked together, forcing us to use
      > > files. I
      > > currently use an XML file to control what happens on each end (deletions,
      > > confirmations of actions taken, configurations, and survey transfers).
      > > Only
      > > the survey transfer part has me.
      > >
      > > Our database model uses GUIDs for all unique keys, since each survey can
      > > have several tens of thousands of individual data points. The problem with
      > > this is that if I merely serialize the dataset for each survey into files,
      > > we
      > > wind up with GUID collisions at the data warehouse, since each machine may
      > > have hundreds of thousands of GUIDs. This forces me to loop through the
      > > surveys being transferred, add each manually with a fresh GUID, and then
      > > add
      > > each subcomponent of the survey in turn, all with freshly generated GUIDs.
      > > This turns into a huge process that eats up CPU time and may take upwards
      > > of
      > > fifteen minutes for a single larger survey. Needless to say, most people
      > > want
      > > our old file-based DOS system back because it was "instant".
      > >
      > > Is there a better way to perform the task at hand? I didn't think
      > > replication or subscription models would work because we have to use files
      > > as
      > > an intermediary. I've only ever done basic SQL operations, so I'm not sure
      > > if
      > > there's something readily available that can basically automate the
      > > process.
      > > And I'm new to C# and .NET framework in general, so I'm not sure if there
      > > is
      > > something that can help me out there. Would appreciate any tips on how to
      > > best accomplish this.
      > >
      > > Thanks,
      > > David[/color]
      >
      >
      >[/color]

      Comment

      • AMDRIT

        #4
        Re: Data Warehousing

        I do not assume to have all the facts in your environment. The first
        question I have is how do you get the survey from the client machine to the
        database without a network? Do you use SneakerNet? I assume that you do
        have a network inplace, or you wouldn't be worried about GUIDs.
        Additionally, I assume you are using SQL server or you wouldn't be worried
        about GUID's.

        A couple of ideas come to mind immediately as I read your post:

        After thinking on it, I would consider a smart client connecting to web
        services. The smart client allows the user to work offline and selectively
        push/pull data from the host server. The web services allow you to cache
        active activity for further processing prior to pushing the data to the
        database.

        As for the data model, if GUID's are working for you, stick with 'em. You
        resolution is simpler than you might think.

        Example data model

        Surveys
        pk, SurveyID GUID not null
        SurveyName varchar not null
        SurveyDate datetime not null (getDate())

        SurveyedPeople
        pk, PersonID GUID not Null
        Firstname varchar not null
        Lastname varchar not null

        SurveyElements
        pk, ElementID GUID not null
        fk, SurveyID GUID not null
        ElementName varchar not null
        ElementResponse Type int not null

        SurveyResponses
        fk, ElementID GUID
        fk, PersonID GUID
        Response varchar

        1. create a dataset that represents this model, create the relationships.
        2. have your SQL adapter use stored procs to process updates, deletes,
        inserts.
        3. send your changes from the smart client to the webservice

        sample insert script

        declare @NewIdentity uniqueidentifie r
        insert into surveys (SurveyID, SurveyName, SurveyDate) Values(NewID(),
        @NewSurveyName, SurveyDate)
        select @newidentity

        3. As you perform inserts, the adapter will update the associated datarow
        with the new guid, this will cascade down to the child tables
        4. return the updated dataset from the web service
        5. merge changes
        6. accecpt changes


        sample code

        client side

        sub save(ds as dataset)
        dim ws as mywebservice
        dim dsChanges as dataset
        ... connection logic....
        dsData = ds.getchanges(I nserted, Updated, Deleted)
        ws.save(dsData) 'Serialzation is automatic
        ds.merge(dsdata )
        ds.acceptchange s
        end sub

        webservice
        <webmethod> _
        sub save(ds as dataset)
        surveys.update( ds.surveys) 'call to SQLDataAdapter
        people.update(d s.people)
        responses.updat e(ds.responses)
        end sub

        Take a look at Issue Vision from www.windowsforms.com


        "David Harris" <DavidHarris@di scussions.micro soft.com> wrote in message
        news:B90D20D1-7437-481B-A045-229E7FD0410F@mi crosoft.com...[color=blue]
        > Ok, so I'm semi-new to .NET, having done everything manually with SQL code
        > back in VB6. So before I program this up completely manually again, I
        > thought
        > I'd ask for better ways to think through this problem.
        >
        > We have several client machines, and a central data warehousing server.
        > Each
        > machine may contain hundreds of surveys, and they all are sent to the
        > central
        > server. Only they can never be networked together, forcing us to use
        > files. I
        > currently use an XML file to control what happens on each end (deletions,
        > confirmations of actions taken, configurations, and survey transfers).
        > Only
        > the survey transfer part has me.
        >
        > Our database model uses GUIDs for all unique keys, since each survey can
        > have several tens of thousands of individual data points. The problem with
        > this is that if I merely serialize the dataset for each survey into files,
        > we
        > wind up with GUID collisions at the data warehouse, since each machine may
        > have hundreds of thousands of GUIDs. This forces me to loop through the
        > surveys being transferred, add each manually with a fresh GUID, and then
        > add
        > each subcomponent of the survey in turn, all with freshly generated GUIDs.
        > This turns into a huge process that eats up CPU time and may take upwards
        > of
        > fifteen minutes for a single larger survey. Needless to say, most people
        > want
        > our old file-based DOS system back because it was "instant".
        >
        > Is there a better way to perform the task at hand? I didn't think
        > replication or subscription models would work because we have to use files
        > as
        > an intermediary. I've only ever done basic SQL operations, so I'm not sure
        > if
        > there's something readily available that can basically automate the
        > process.
        > And I'm new to C# and .NET framework in general, so I'm not sure if there
        > is
        > something that can help me out there. Would appreciate any tips on how to
        > best accomplish this.
        >
        > Thanks,
        > David[/color]


        Comment

        • AMDRIT

          #5
          Re: Data Warehousing


          <Snip>
          Sorry for omitting that information, but the machines are normally located
          in
          secure facilities, so have no access to either the internet or a network of
          any kind. Many times they even must have any networking cards removed before
          they enter the facility. We will be using USB thumb drives to transfer the
          files from the machines to the server, and have a small download/upload app
          at each side.
          </Snip>

          This solution still applies, only the transport is SneakerNet, you would
          simply serialize the datasets to the hard drive. Pick them up with your
          thumb drive, process them, push them back to the thumb drive, then copy them
          back to the client machine. GUID or Autonumber contention is not an issue
          here, especially since this process seems more serial and sequential than
          random updates.

          I could make an argument for parallel port, USB, or serial port
          communication similar to the old laplink days, and using it to push update
          in batch once per day. Certainly, that would be better than manually
          touching every computer and inserting a "floppy disk"


          "AMDRIT" <amdrit@hotmail .com> wrote in message
          news:uoZLDaceGH A.3996@TK2MSFTN GP04.phx.gbl...[color=blue]
          >I do not assume to have all the facts in your environment. The first
          >question I have is how do you get the survey from the client machine to the
          >database without a network? Do you use SneakerNet? I assume that you do
          >have a network inplace, or you wouldn't be worried about GUIDs.
          >Additionally , I assume you are using SQL server or you wouldn't be worried
          >about GUID's.
          >
          > A couple of ideas come to mind immediately as I read your post:
          >
          > After thinking on it, I would consider a smart client connecting to web
          > services. The smart client allows the user to work offline and
          > selectively push/pull data from the host server. The web services allow
          > you to cache active activity for further processing prior to pushing the
          > data to the database.
          >
          > As for the data model, if GUID's are working for you, stick with 'em. You
          > resolution is simpler than you might think.
          >
          > Example data model
          >
          > Surveys
          > pk, SurveyID GUID not null
          > SurveyName varchar not null
          > SurveyDate datetime not null (getDate())
          >
          > SurveyedPeople
          > pk, PersonID GUID not Null
          > Firstname varchar not null
          > Lastname varchar not null
          >
          > SurveyElements
          > pk, ElementID GUID not null
          > fk, SurveyID GUID not null
          > ElementName varchar not null
          > ElementResponse Type int not null
          >
          > SurveyResponses
          > fk, ElementID GUID
          > fk, PersonID GUID
          > Response varchar
          >
          > 1. create a dataset that represents this model, create the relationships.
          > 2. have your SQL adapter use stored procs to process updates, deletes,
          > inserts.
          > 3. send your changes from the smart client to the webservice
          >
          > sample insert script
          >
          > declare @NewIdentity uniqueidentifie r
          > insert into surveys (SurveyID, SurveyName, SurveyDate) Values(NewID(),
          > @NewSurveyName, SurveyDate)
          > select @newidentity
          >
          > 3. As you perform inserts, the adapter will update the associated datarow
          > with the new guid, this will cascade down to the child tables
          > 4. return the updated dataset from the web service
          > 5. merge changes
          > 6. accecpt changes
          >
          >
          > sample code
          >
          > client side
          >
          > sub save(ds as dataset)
          > dim ws as mywebservice
          > dim dsChanges as dataset
          > ... connection logic....
          > dsData = ds.getchanges(I nserted, Updated, Deleted)
          > ws.save(dsData) 'Serialzation is automatic
          > ds.merge(dsdata )
          > ds.acceptchange s
          > end sub
          >
          > webservice
          > <webmethod> _
          > sub save(ds as dataset)
          > surveys.update( ds.surveys) 'call to SQLDataAdapter
          > people.update(d s.people)
          > responses.updat e(ds.responses)
          > end sub
          >
          > Take a look at Issue Vision from www.windowsforms.com
          >
          >
          > "David Harris" <DavidHarris@di scussions.micro soft.com> wrote in message
          > news:B90D20D1-7437-481B-A045-229E7FD0410F@mi crosoft.com...[color=green]
          >> Ok, so I'm semi-new to .NET, having done everything manually with SQL
          >> code
          >> back in VB6. So before I program this up completely manually again, I
          >> thought
          >> I'd ask for better ways to think through this problem.
          >>
          >> We have several client machines, and a central data warehousing server.
          >> Each
          >> machine may contain hundreds of surveys, and they all are sent to the
          >> central
          >> server. Only they can never be networked together, forcing us to use
          >> files. I
          >> currently use an XML file to control what happens on each end (deletions,
          >> confirmations of actions taken, configurations, and survey transfers).
          >> Only
          >> the survey transfer part has me.
          >>
          >> Our database model uses GUIDs for all unique keys, since each survey can
          >> have several tens of thousands of individual data points. The problem
          >> with
          >> this is that if I merely serialize the dataset for each survey into
          >> files, we
          >> wind up with GUID collisions at the data warehouse, since each machine
          >> may
          >> have hundreds of thousands of GUIDs. This forces me to loop through the
          >> surveys being transferred, add each manually with a fresh GUID, and then
          >> add
          >> each subcomponent of the survey in turn, all with freshly generated
          >> GUIDs.
          >> This turns into a huge process that eats up CPU time and may take upwards
          >> of
          >> fifteen minutes for a single larger survey. Needless to say, most people
          >> want
          >> our old file-based DOS system back because it was "instant".
          >>
          >> Is there a better way to perform the task at hand? I didn't think
          >> replication or subscription models would work because we have to use
          >> files as
          >> an intermediary. I've only ever done basic SQL operations, so I'm not
          >> sure if
          >> there's something readily available that can basically automate the
          >> process.
          >> And I'm new to C# and .NET framework in general, so I'm not sure if there
          >> is
          >> something that can help me out there. Would appreciate any tips on how to
          >> best accomplish this.
          >>
          >> Thanks,
          >> David[/color]
          >
          >[/color]


          Comment

          • David Harris

            #6
            Re: Data Warehousing

            Yeah, as said in my other reply above, we have no access to any networking,
            between machines, or to the server. No internet, and not even an ad hoc
            connection. Herein lies the problem. It'd be easy if the databases could
            speak directly. :)

            For future reference, these are radiation surveys, so you'll understand if I
            start using weird terms that don't make sense in a regular personal survey
            context.

            This led to the idea of serializing each survey (located on each machine's
            individual SQL server) into xml (each file would likely be huge though), and
            have an xml control file that tells the server which files it needs to
            upload, and the server sends a control file back that tells the client to
            delete a survey (after it is processed) or update configuration information.
            It also has a section for confirmations of all actions. Each side has a few
            database tables keeping track of what the other (client or server) should be
            doing, and the control file is generated from that.

            The problem with serializing a survey was the GUID collision issue.
            Currently I solve this by loading the serialized survey into a dataset on the
            server, and manually navigating the dataset, reinserting the data manually
            for each record, generating new GUIDs along the way. The problem with this is
            that we're dealing with up to 100 strips of data per survey, each with
            hundreds of data acquisitions (one per second of survey time), each
            acquisition with thousands of data points, so it becomes rather tedious to do
            new inserts for every record, taking a ton of time. With probably 20 surveys
            per day from each machine, up to ten or so now, that winds up adding up big
            time. It would take the entire day to upload all those surveys like that.

            So, I don't know what the best route would be. It appears your solution is
            implemented similar to my current solution, just over a webservice, which
            unfortunately we don't have the ability to use. Unless I'm misinterpreting
            your comments...

            Appreciate the help and helping me think through this; it's a rather weird
            problem since we have to use files as an intermediary.

            David


            "AMDRIT" wrote:
            [color=blue]
            > I do not assume to have all the facts in your environment. The first
            > question I have is how do you get the survey from the client machine to the
            > database without a network? Do you use SneakerNet? I assume that you do
            > have a network inplace, or you wouldn't be worried about GUIDs.
            > Additionally, I assume you are using SQL server or you wouldn't be worried
            > about GUID's.
            >
            > A couple of ideas come to mind immediately as I read your post:
            >
            > After thinking on it, I would consider a smart client connecting to web
            > services. The smart client allows the user to work offline and selectively
            > push/pull data from the host server. The web services allow you to cache
            > active activity for further processing prior to pushing the data to the
            > database.
            >
            > As for the data model, if GUID's are working for you, stick with 'em. You
            > resolution is simpler than you might think.
            >
            > Example data model
            >
            > Surveys
            > pk, SurveyID GUID not null
            > SurveyName varchar not null
            > SurveyDate datetime not null (getDate())
            >
            > SurveyedPeople
            > pk, PersonID GUID not Null
            > Firstname varchar not null
            > Lastname varchar not null
            >
            > SurveyElements
            > pk, ElementID GUID not null
            > fk, SurveyID GUID not null
            > ElementName varchar not null
            > ElementResponse Type int not null
            >
            > SurveyResponses
            > fk, ElementID GUID
            > fk, PersonID GUID
            > Response varchar
            >
            > 1. create a dataset that represents this model, create the relationships.
            > 2. have your SQL adapter use stored procs to process updates, deletes,
            > inserts.
            > 3. send your changes from the smart client to the webservice
            >
            > sample insert script
            >
            > declare @NewIdentity uniqueidentifie r
            > insert into surveys (SurveyID, SurveyName, SurveyDate) Values(NewID(),
            > @NewSurveyName, SurveyDate)
            > select @newidentity
            >
            > 3. As you perform inserts, the adapter will update the associated datarow
            > with the new guid, this will cascade down to the child tables
            > 4. return the updated dataset from the web service
            > 5. merge changes
            > 6. accecpt changes
            >
            >
            > sample code
            >
            > client side
            >
            > sub save(ds as dataset)
            > dim ws as mywebservice
            > dim dsChanges as dataset
            > ... connection logic....
            > dsData = ds.getchanges(I nserted, Updated, Deleted)
            > ws.save(dsData) 'Serialzation is automatic
            > ds.merge(dsdata )
            > ds.acceptchange s
            > end sub
            >
            > webservice
            > <webmethod> _
            > sub save(ds as dataset)
            > surveys.update( ds.surveys) 'call to SQLDataAdapter
            > people.update(d s.people)
            > responses.updat e(ds.responses)
            > end sub
            >
            > Take a look at Issue Vision from www.windowsforms.com
            >
            >
            > "David Harris" <DavidHarris@di scussions.micro soft.com> wrote in message
            > news:B90D20D1-7437-481B-A045-229E7FD0410F@mi crosoft.com...[color=green]
            > > Ok, so I'm semi-new to .NET, having done everything manually with SQL code
            > > back in VB6. So before I program this up completely manually again, I
            > > thought
            > > I'd ask for better ways to think through this problem.
            > >
            > > We have several client machines, and a central data warehousing server.
            > > Each
            > > machine may contain hundreds of surveys, and they all are sent to the
            > > central
            > > server. Only they can never be networked together, forcing us to use
            > > files. I
            > > currently use an XML file to control what happens on each end (deletions,
            > > confirmations of actions taken, configurations, and survey transfers).
            > > Only
            > > the survey transfer part has me.
            > >
            > > Our database model uses GUIDs for all unique keys, since each survey can
            > > have several tens of thousands of individual data points. The problem with
            > > this is that if I merely serialize the dataset for each survey into files,
            > > we
            > > wind up with GUID collisions at the data warehouse, since each machine may
            > > have hundreds of thousands of GUIDs. This forces me to loop through the
            > > surveys being transferred, add each manually with a fresh GUID, and then
            > > add
            > > each subcomponent of the survey in turn, all with freshly generated GUIDs.
            > > This turns into a huge process that eats up CPU time and may take upwards
            > > of
            > > fifteen minutes for a single larger survey. Needless to say, most people
            > > want
            > > our old file-based DOS system back because it was "instant".
            > >
            > > Is there a better way to perform the task at hand? I didn't think
            > > replication or subscription models would work because we have to use files
            > > as
            > > an intermediary. I've only ever done basic SQL operations, so I'm not sure
            > > if
            > > there's something readily available that can basically automate the
            > > process.
            > > And I'm new to C# and .NET framework in general, so I'm not sure if there
            > > is
            > > something that can help me out there. Would appreciate any tips on how to
            > > best accomplish this.
            > >
            > > Thanks,
            > > David[/color]
            >
            >
            >[/color]

            Comment

            • Kevin Spencer

              #7
              Re: Data Warehousing

              You say that each machine has its own SQL Server? The solution is simple.
              Copy the database file to the thumb drive, and then bring the database file
              to the server. Attach the database file to the server, and then import the
              data using a query or Stored Procedure. Don't import the GUIDs, but have the
              server database use a column, such as an Identity column, that automatically
              assigns a unique ID to the record. As the data is in its original binary
              format, it should be very fast.

              See the SQL Server Books Online for details.

              --
              HTH,

              Kevin Spencer
              Microsoft MVP
              Professional Numbskull

              Complex things are made up of
              lots of simple things.

              "David Harris" <DavidHarris@di scussions.micro soft.com> wrote in message
              news:07C3F16C-FEF2-4258-A609-DFA7B68517F7@mi crosoft.com...[color=blue]
              > Yeah, as said in my other reply above, we have no access to any
              > networking,
              > between machines, or to the server. No internet, and not even an ad hoc
              > connection. Herein lies the problem. It'd be easy if the databases could
              > speak directly. :)
              >
              > For future reference, these are radiation surveys, so you'll understand if
              > I
              > start using weird terms that don't make sense in a regular personal survey
              > context.
              >
              > This led to the idea of serializing each survey (located on each machine's
              > individual SQL server) into xml (each file would likely be huge though),
              > and
              > have an xml control file that tells the server which files it needs to
              > upload, and the server sends a control file back that tells the client to
              > delete a survey (after it is processed) or update configuration
              > information.
              > It also has a section for confirmations of all actions. Each side has a
              > few
              > database tables keeping track of what the other (client or server) should
              > be
              > doing, and the control file is generated from that.
              >
              > The problem with serializing a survey was the GUID collision issue.
              > Currently I solve this by loading the serialized survey into a dataset on
              > the
              > server, and manually navigating the dataset, reinserting the data manually
              > for each record, generating new GUIDs along the way. The problem with this
              > is
              > that we're dealing with up to 100 strips of data per survey, each with
              > hundreds of data acquisitions (one per second of survey time), each
              > acquisition with thousands of data points, so it becomes rather tedious to
              > do
              > new inserts for every record, taking a ton of time. With probably 20
              > surveys
              > per day from each machine, up to ten or so now, that winds up adding up
              > big
              > time. It would take the entire day to upload all those surveys like that.
              >
              > So, I don't know what the best route would be. It appears your solution is
              > implemented similar to my current solution, just over a webservice, which
              > unfortunately we don't have the ability to use. Unless I'm misinterpreting
              > your comments...
              >
              > Appreciate the help and helping me think through this; it's a rather weird
              > problem since we have to use files as an intermediary.
              >
              > David
              >
              >
              > "AMDRIT" wrote:
              >[color=green]
              >> I do not assume to have all the facts in your environment. The first
              >> question I have is how do you get the survey from the client machine to
              >> the
              >> database without a network? Do you use SneakerNet? I assume that you do
              >> have a network inplace, or you wouldn't be worried about GUIDs.
              >> Additionally, I assume you are using SQL server or you wouldn't be
              >> worried
              >> about GUID's.
              >>
              >> A couple of ideas come to mind immediately as I read your post:
              >>
              >> After thinking on it, I would consider a smart client connecting to web
              >> services. The smart client allows the user to work offline and
              >> selectively
              >> push/pull data from the host server. The web services allow you to cache
              >> active activity for further processing prior to pushing the data to the
              >> database.
              >>
              >> As for the data model, if GUID's are working for you, stick with 'em.
              >> You
              >> resolution is simpler than you might think.
              >>
              >> Example data model
              >>
              >> Surveys
              >> pk, SurveyID GUID not null
              >> SurveyName varchar not null
              >> SurveyDate datetime not null (getDate())
              >>
              >> SurveyedPeople
              >> pk, PersonID GUID not Null
              >> Firstname varchar not null
              >> Lastname varchar not null
              >>
              >> SurveyElements
              >> pk, ElementID GUID not null
              >> fk, SurveyID GUID not null
              >> ElementName varchar not null
              >> ElementResponse Type int not null
              >>
              >> SurveyResponses
              >> fk, ElementID GUID
              >> fk, PersonID GUID
              >> Response varchar
              >>
              >> 1. create a dataset that represents this model, create the
              >> relationships.
              >> 2. have your SQL adapter use stored procs to process updates, deletes,
              >> inserts.
              >> 3. send your changes from the smart client to the webservice
              >>
              >> sample insert script
              >>
              >> declare @NewIdentity uniqueidentifie r
              >> insert into surveys (SurveyID, SurveyName, SurveyDate) Values(NewID(),
              >> @NewSurveyName, SurveyDate)
              >> select @newidentity
              >>
              >> 3. As you perform inserts, the adapter will update the associated
              >> datarow
              >> with the new guid, this will cascade down to the child tables
              >> 4. return the updated dataset from the web service
              >> 5. merge changes
              >> 6. accecpt changes
              >>
              >>
              >> sample code
              >>
              >> client side
              >>
              >> sub save(ds as dataset)
              >> dim ws as mywebservice
              >> dim dsChanges as dataset
              >> ... connection logic....
              >> dsData = ds.getchanges(I nserted, Updated, Deleted)
              >> ws.save(dsData) 'Serialzation is automatic
              >> ds.merge(dsdata )
              >> ds.acceptchange s
              >> end sub
              >>
              >> webservice
              >> <webmethod> _
              >> sub save(ds as dataset)
              >> surveys.update( ds.surveys) 'call to SQLDataAdapter
              >> people.update(d s.people)
              >> responses.updat e(ds.responses)
              >> end sub
              >>
              >> Take a look at Issue Vision from www.windowsforms.com
              >>
              >>
              >> "David Harris" <DavidHarris@di scussions.micro soft.com> wrote in message
              >> news:B90D20D1-7437-481B-A045-229E7FD0410F@mi crosoft.com...[color=darkred]
              >> > Ok, so I'm semi-new to .NET, having done everything manually with SQL
              >> > code
              >> > back in VB6. So before I program this up completely manually again, I
              >> > thought
              >> > I'd ask for better ways to think through this problem.
              >> >
              >> > We have several client machines, and a central data warehousing server.
              >> > Each
              >> > machine may contain hundreds of surveys, and they all are sent to the
              >> > central
              >> > server. Only they can never be networked together, forcing us to use
              >> > files. I
              >> > currently use an XML file to control what happens on each end
              >> > (deletions,
              >> > confirmations of actions taken, configurations, and survey transfers).
              >> > Only
              >> > the survey transfer part has me.
              >> >
              >> > Our database model uses GUIDs for all unique keys, since each survey
              >> > can
              >> > have several tens of thousands of individual data points. The problem
              >> > with
              >> > this is that if I merely serialize the dataset for each survey into
              >> > files,
              >> > we
              >> > wind up with GUID collisions at the data warehouse, since each machine
              >> > may
              >> > have hundreds of thousands of GUIDs. This forces me to loop through the
              >> > surveys being transferred, add each manually with a fresh GUID, and
              >> > then
              >> > add
              >> > each subcomponent of the survey in turn, all with freshly generated
              >> > GUIDs.
              >> > This turns into a huge process that eats up CPU time and may take
              >> > upwards
              >> > of
              >> > fifteen minutes for a single larger survey. Needless to say, most
              >> > people
              >> > want
              >> > our old file-based DOS system back because it was "instant".
              >> >
              >> > Is there a better way to perform the task at hand? I didn't think
              >> > replication or subscription models would work because we have to use
              >> > files
              >> > as
              >> > an intermediary. I've only ever done basic SQL operations, so I'm not
              >> > sure
              >> > if
              >> > there's something readily available that can basically automate the
              >> > process.
              >> > And I'm new to C# and .NET framework in general, so I'm not sure if
              >> > there
              >> > is
              >> > something that can help me out there. Would appreciate any tips on how
              >> > to
              >> > best accomplish this.
              >> >
              >> > Thanks,
              >> > David[/color]
              >>
              >>
              >>[/color][/color]


              Comment

              • David Harris

                #8
                Re: Data Warehousing

                Unfortunately, a secondary problem is that our server is not located
                physically near the machines, it must be in a separate building. Radioactive
                environments are usually dusty and noisy, so our server and processing
                station is located outside the buildings being surveyed.

                A tertiary problem is that this is sensitive information and must 1) stay on
                the machine until it has been processed and finalized, 2) must stay in the
                data warehouse until all cleanup efforts have completed, and 3) must have the
                ability to put back on the client machines even after deletion. The reason
                this is a problem is that if we only had to keep the survey up long enough to
                process it, there would be no GUID duplicates. And it must remain on the
                machine also, until the survey is complete, processed, and marked as ok (no
                contamination).

                Not sure what you mean about there being no GUID contention. Currently, the
                lowest level identity is an Acquisition, a member of Strip, which is a member
                of Survey. There may be tens of thousands Acquisitions in a survey, sometimes
                more. For a few seqential surveys on the same machine, this is a non-issue.
                However, when you take into account there are ten machines, each with a
                hundred surveys, each with 10000 acquisitions, your chances of duplicating
                one of those Acquisition GUIDs increase by a large enough margin to matter.
                So just forcing the whole dataset over without redoing all the GUIDs is not
                an option. It doesn't collide on just a few surveys, but the average seems
                tobe eight surveys from various machines, before one Acquisition GUID is
                duplicated.

                Hence my current solution of serializing them as-is, and replacing all the
                GUIDs on the server side to new ones with no collision. I guess the only way
                to speed that up would be first checking if any of the GUIDs actually
                collide, and only go through the replacement if they do, otherwise just pop
                the serialized dataset into the server's database. Only when we get to about
                a thousand surveys in that data warehouse (not uncommon, our largest project
                had a few hundred thousand surveys), those collisions will happen more and
                more often.

                Unless I'm missing the point somewhere?

                Thanks again for the help,
                David

                "AMDRIT" wrote:[color=blue]
                >
                > This solution still applies, only the transport is SneakerNet, you would
                > simply serialize the datasets to the hard drive. Pick them up with your
                > thumb drive, process them, push them back to the thumb drive, then copy them
                > back to the client machine. GUID or Autonumber contention is not an issue
                > here, especially since this process seems more serial and sequential than
                > random updates.
                >
                > I could make an argument for parallel port, USB, or serial port
                > communication similar to the old laplink days, and using it to push update
                > in batch once per day. Certainly, that would be better than manually
                > touching every computer and inserting a "floppy disk"[/color]

                Comment

                • AMDRIT

                  #9
                  Re: Data Warehousing

                  Kevin Spencer's response is more eloquent than mine in explaining the guid
                  contention issue.

                  Essentially, the guid's used on the client side are just place holders so
                  that work can be completed. guid's on the server side are assigned as the
                  stored procedures are processed. Since the server is assigning the guid's,
                  the likely chance of a duplicate guid in the same table is like 2 billion to
                  1.

                  The paradigm that I proposed treats each survey as "unprocesse d" work and it
                  remains unprocessed until the server side logic processes the data and the
                  data is then sent back to the client. Essentially, it is a round trip, just
                  a very slow round trip.

                  possible flow

                  1. create a dataset for a new survey
                  2. modify the data (insert, update, delete)
                  3. save the dataset to the disk
                  4. copy the file(s) to the thumb drive
                  5. load the file(s) from the thumb drive into the server control program
                  6. process the data
                  7. commit the data to the data warehouse
                  8a. save the processed data back the thumb drive
                  8b. save lookup and reference data to disk
                  9. copy the data back onto the client drive from the thumb drive
                  10. load the file(s) back into the client application
                  11. commit the data to the local client

                  The client code will manage the initial guid assignment on the client side.
                  The dataset itself, along with the stored procedures will manage the guid
                  assignment on the server side, ignoring the initial guids assigned by the
                  client.

                  Since the data was not initially committed to a local data store (SQL or
                  Access), when you store the data in the dataset on the local server, the
                  guids will be assigned the values the server had given them. There is no
                  contention on guid assignment, since the data warehouse is assigning the
                  guid's.

                  I do not think that log shipping is a viable solution here
                  I do not think that replicated databases are a viable solution here
                  I do not think that hitting the local SQL server, prior to commiting data to
                  the warehouse is a feesible solution

                  If the XML files become to large to handle via the thumb drive, there is
                  always csv and binary formats to work with. By using either the XML, CSV,
                  and binary solutions, you are only moving affected data and not the entire
                  database.

                  I hope that makes it more intelligable.

                  "David Harris" <DavidHarris@di scussions.micro soft.com> wrote in message
                  news:94D6D7E7-5624-4390-8F1C-2E8310E4C01E@mi crosoft.com...[color=blue]
                  > Unfortunately, a secondary problem is that our server is not located
                  > physically near the machines, it must be in a separate building.
                  > Radioactive
                  > environments are usually dusty and noisy, so our server and processing
                  > station is located outside the buildings being surveyed.
                  >
                  > A tertiary problem is that this is sensitive information and must 1) stay
                  > on
                  > the machine until it has been processed and finalized, 2) must stay in the
                  > data warehouse until all cleanup efforts have completed, and 3) must have
                  > the
                  > ability to put back on the client machines even after deletion. The reason
                  > this is a problem is that if we only had to keep the survey up long enough
                  > to
                  > process it, there would be no GUID duplicates. And it must remain on the
                  > machine also, until the survey is complete, processed, and marked as ok
                  > (no
                  > contamination).
                  >
                  > Not sure what you mean about there being no GUID contention. Currently,
                  > the
                  > lowest level identity is an Acquisition, a member of Strip, which is a
                  > member
                  > of Survey. There may be tens of thousands Acquisitions in a survey,
                  > sometimes
                  > more. For a few seqential surveys on the same machine, this is a
                  > non-issue.
                  > However, when you take into account there are ten machines, each with a
                  > hundred surveys, each with 10000 acquisitions, your chances of duplicating
                  > one of those Acquisition GUIDs increase by a large enough margin to
                  > matter.
                  > So just forcing the whole dataset over without redoing all the GUIDs is
                  > not
                  > an option. It doesn't collide on just a few surveys, but the average seems
                  > tobe eight surveys from various machines, before one Acquisition GUID is
                  > duplicated.
                  >
                  > Hence my current solution of serializing them as-is, and replacing all the
                  > GUIDs on the server side to new ones with no collision. I guess the only
                  > way
                  > to speed that up would be first checking if any of the GUIDs actually
                  > collide, and only go through the replacement if they do, otherwise just
                  > pop
                  > the serialized dataset into the server's database. Only when we get to
                  > about
                  > a thousand surveys in that data warehouse (not uncommon, our largest
                  > project
                  > had a few hundred thousand surveys), those collisions will happen more and
                  > more often.
                  >
                  > Unless I'm missing the point somewhere?
                  >
                  > Thanks again for the help,
                  > David
                  >
                  > "AMDRIT" wrote:[color=green]
                  >>
                  >> This solution still applies, only the transport is SneakerNet, you would
                  >> simply serialize the datasets to the hard drive. Pick them up with your
                  >> thumb drive, process them, push them back to the thumb drive, then copy
                  >> them
                  >> back to the client machine. GUID or Autonumber contention is not an
                  >> issue
                  >> here, especially since this process seems more serial and sequential than
                  >> random updates.
                  >>
                  >> I could make an argument for parallel port, USB, or serial port
                  >> communication similar to the old laplink days, and using it to push
                  >> update
                  >> in batch once per day. Certainly, that would be better than manually
                  >> touching every computer and inserting a "floppy disk"[/color]
                  >[/color]


                  Comment

                  • David Harris

                    #10
                    Re: Data Warehousing

                    Actually, that may just work perfectly. I think I'll do a simple
                    backup/restore on each end and do what I want from there. Should be plenty
                    fast using stored procs. I can probably even drop that xml control file and
                    just store that control information in a table.

                    Either that, or my failsafe idea is just to store it in SQL on the clients,
                    and then keep it in an XML serialized files on the server, opening up one at
                    a time to process, transferring to DB, and then releasing when done. We'll
                    never need to asynchronously process files so that's a final option, but a
                    slightly uglier solution than we wanted. The backup/restore should work
                    perfectly though. Simple solution, but it was hard to see given constraints.
                    :) Thanks much guys.

                    David

                    "Kevin Spencer" wrote:
                    [color=blue]
                    > You say that each machine has its own SQL Server? The solution is simple.
                    > Copy the database file to the thumb drive, and then bring the database file
                    > to the server. Attach the database file to the server, and then import the
                    > data using a query or Stored Procedure. Don't import the GUIDs, but have the
                    > server database use a column, such as an Identity column, that automatically
                    > assigns a unique ID to the record. As the data is in its original binary
                    > format, it should be very fast.
                    >
                    > See the SQL Server Books Online for details.
                    >
                    > --
                    > HTH,
                    >
                    > Kevin Spencer
                    > Microsoft MVP
                    > Professional Numbskull
                    >
                    > Complex things are made up of
                    > lots of simple things.
                    >
                    > "David Harris" <DavidHarris@di scussions.micro soft.com> wrote in message
                    > news:07C3F16C-FEF2-4258-A609-DFA7B68517F7@mi crosoft.com...[color=green]
                    > > Yeah, as said in my other reply above, we have no access to any
                    > > networking,
                    > > between machines, or to the server. No internet, and not even an ad hoc
                    > > connection. Herein lies the problem. It'd be easy if the databases could
                    > > speak directly. :)
                    > >
                    > > For future reference, these are radiation surveys, so you'll understand if
                    > > I
                    > > start using weird terms that don't make sense in a regular personal survey
                    > > context.
                    > >
                    > > This led to the idea of serializing each survey (located on each machine's
                    > > individual SQL server) into xml (each file would likely be huge though),
                    > > and
                    > > have an xml control file that tells the server which files it needs to
                    > > upload, and the server sends a control file back that tells the client to
                    > > delete a survey (after it is processed) or update configuration
                    > > information.
                    > > It also has a section for confirmations of all actions. Each side has a
                    > > few
                    > > database tables keeping track of what the other (client or server) should
                    > > be
                    > > doing, and the control file is generated from that.
                    > >
                    > > The problem with serializing a survey was the GUID collision issue.
                    > > Currently I solve this by loading the serialized survey into a dataset on
                    > > the
                    > > server, and manually navigating the dataset, reinserting the data manually
                    > > for each record, generating new GUIDs along the way. The problem with this
                    > > is
                    > > that we're dealing with up to 100 strips of data per survey, each with
                    > > hundreds of data acquisitions (one per second of survey time), each
                    > > acquisition with thousands of data points, so it becomes rather tedious to
                    > > do
                    > > new inserts for every record, taking a ton of time. With probably 20
                    > > surveys
                    > > per day from each machine, up to ten or so now, that winds up adding up
                    > > big
                    > > time. It would take the entire day to upload all those surveys like that.
                    > >
                    > > So, I don't know what the best route would be. It appears your solution is
                    > > implemented similar to my current solution, just over a webservice, which
                    > > unfortunately we don't have the ability to use. Unless I'm misinterpreting
                    > > your comments...
                    > >
                    > > Appreciate the help and helping me think through this; it's a rather weird
                    > > problem since we have to use files as an intermediary.
                    > >
                    > > David
                    > >
                    > >
                    > > "AMDRIT" wrote:
                    > >[color=darkred]
                    > >> I do not assume to have all the facts in your environment. The first
                    > >> question I have is how do you get the survey from the client machine to
                    > >> the
                    > >> database without a network? Do you use SneakerNet? I assume that you do
                    > >> have a network inplace, or you wouldn't be worried about GUIDs.
                    > >> Additionally, I assume you are using SQL server or you wouldn't be
                    > >> worried
                    > >> about GUID's.
                    > >>
                    > >> A couple of ideas come to mind immediately as I read your post:
                    > >>
                    > >> After thinking on it, I would consider a smart client connecting to web
                    > >> services. The smart client allows the user to work offline and
                    > >> selectively
                    > >> push/pull data from the host server. The web services allow you to cache
                    > >> active activity for further processing prior to pushing the data to the
                    > >> database.
                    > >>
                    > >> As for the data model, if GUID's are working for you, stick with 'em.
                    > >> You
                    > >> resolution is simpler than you might think.
                    > >>
                    > >> Example data model
                    > >>
                    > >> Surveys
                    > >> pk, SurveyID GUID not null
                    > >> SurveyName varchar not null
                    > >> SurveyDate datetime not null (getDate())
                    > >>
                    > >> SurveyedPeople
                    > >> pk, PersonID GUID not Null
                    > >> Firstname varchar not null
                    > >> Lastname varchar not null
                    > >>
                    > >> SurveyElements
                    > >> pk, ElementID GUID not null
                    > >> fk, SurveyID GUID not null
                    > >> ElementName varchar not null
                    > >> ElementResponse Type int not null
                    > >>
                    > >> SurveyResponses
                    > >> fk, ElementID GUID
                    > >> fk, PersonID GUID
                    > >> Response varchar
                    > >>
                    > >> 1. create a dataset that represents this model, create the
                    > >> relationships.
                    > >> 2. have your SQL adapter use stored procs to process updates, deletes,
                    > >> inserts.
                    > >> 3. send your changes from the smart client to the webservice
                    > >>
                    > >> sample insert script
                    > >>
                    > >> declare @NewIdentity uniqueidentifie r
                    > >> insert into surveys (SurveyID, SurveyName, SurveyDate) Values(NewID(),
                    > >> @NewSurveyName, SurveyDate)
                    > >> select @newidentity
                    > >>
                    > >> 3. As you perform inserts, the adapter will update the associated
                    > >> datarow
                    > >> with the new guid, this will cascade down to the child tables
                    > >> 4. return the updated dataset from the web service
                    > >> 5. merge changes
                    > >> 6. accecpt changes
                    > >>
                    > >>
                    > >> sample code
                    > >>
                    > >> client side
                    > >>
                    > >> sub save(ds as dataset)
                    > >> dim ws as mywebservice
                    > >> dim dsChanges as dataset
                    > >> ... connection logic....
                    > >> dsData = ds.getchanges(I nserted, Updated, Deleted)
                    > >> ws.save(dsData) 'Serialzation is automatic
                    > >> ds.merge(dsdata )
                    > >> ds.acceptchange s
                    > >> end sub
                    > >>
                    > >> webservice
                    > >> <webmethod> _
                    > >> sub save(ds as dataset)
                    > >> surveys.update( ds.surveys) 'call to SQLDataAdapter
                    > >> people.update(d s.people)
                    > >> responses.updat e(ds.responses)
                    > >> end sub
                    > >>
                    > >> Take a look at Issue Vision from www.windowsforms.com
                    > >>
                    > >>
                    > >> "David Harris" <DavidHarris@di scussions.micro soft.com> wrote in message
                    > >> news:B90D20D1-7437-481B-A045-229E7FD0410F@mi crosoft.com...
                    > >> > Ok, so I'm semi-new to .NET, having done everything manually with SQL
                    > >> > code
                    > >> > back in VB6. So before I program this up completely manually again, I
                    > >> > thought
                    > >> > I'd ask for better ways to think through this problem.
                    > >> >
                    > >> > We have several client machines, and a central data warehousing server.
                    > >> > Each
                    > >> > machine may contain hundreds of surveys, and they all are sent to the
                    > >> > central
                    > >> > server. Only they can never be networked together, forcing us to use
                    > >> > files. I
                    > >> > currently use an XML file to control what happens on each end
                    > >> > (deletions,
                    > >> > confirmations of actions taken, configurations, and survey transfers).
                    > >> > Only
                    > >> > the survey transfer part has me.
                    > >> >
                    > >> > Our database model uses GUIDs for all unique keys, since each survey
                    > >> > can
                    > >> > have several tens of thousands of individual data points. The problem
                    > >> > with
                    > >> > this is that if I merely serialize the dataset for each survey into
                    > >> > files,
                    > >> > we
                    > >> > wind up with GUID collisions at the data warehouse, since each machine
                    > >> > may
                    > >> > have hundreds of thousands of GUIDs. This forces me to loop through the
                    > >> > surveys being transferred, add each manually with a fresh GUID, and
                    > >> > then
                    > >> > add
                    > >> > each subcomponent of the survey in turn, all with freshly generated
                    > >> > GUIDs.
                    > >> > This turns into a huge process that eats up CPU time and may take
                    > >> > upwards
                    > >> > of
                    > >> > fifteen minutes for a single larger survey. Needless to say, most
                    > >> > people
                    > >> > want
                    > >> > our old file-based DOS system back because it was "instant".
                    > >> >
                    > >> > Is there a better way to perform the task at hand? I didn't think
                    > >> > replication or subscription models would work because we have to use
                    > >> > files
                    > >> > as
                    > >> > an intermediary. I've only ever done basic SQL operations, so I'm not
                    > >> > sure
                    > >> > if
                    > >> > there's something readily available that can basically automate the
                    > >> > process.
                    > >> > And I'm new to C# and .NET framework in general, so I'm not sure if
                    > >> > there
                    > >> > is
                    > >> > something that can help me out there. Would appreciate any tips on how
                    > >> > to
                    > >> > best accomplish this.
                    > >> >
                    > >> > Thanks,
                    > >> > David
                    > >>
                    > >>
                    > >>[/color][/color]
                    >
                    >
                    >[/color]

                    Comment

                    Working...