Design/implementation question

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Yves Dhondt

    Design/implementation question

    Hello,

    I've got the following UML design :

    C
    |
    A _____|______ B

    So 2 objects A and B are connected through a relation C. (For example an
    employment scheme : person A1 worked for company B1 at time C1, person
    A1 worked for company B2 at time C2, person A2 worked for company B1 at
    time C3, ...)

    My current implementation exists of :

    public class objectA { ... }
    public class objectB { ... }

    public class relationC {
    private objectA A;
    private objectB B;

    // other things about the relation
    }

    I use a list with all relationCs in it. This takes not much memory but
    if I want to check which objectB's are connected to an objectA or vice
    versa, I need to check every relationC.

    ArrayList result = new ArrayList();

    foreach (relationC c in list)
    {
    if (c.getA().equal s(A))
    {
    result.Add(c.ge tB())
    }
    }

    Considering the fact I have about 10000 objectA's and 25 objectB's I can
    have up to 250000 relationC's to check which makes my implementation
    rather slow.

    I considered adding direct links from A to B :

    public class objectA {
    private ArrayList objectBs

    // ...
    }

    and the same the other way around but then the amount of memory used
    goes up pretty fast.

    I wonder if someone has an idea on how to make this faster without
    needing a huge memory chunk.

    TIA

    Yves
  • Bob Grommes

    #2
    Re: Design/implementation question

    Cody,

    Your system clock is off by one year.

    --Bob

    "cody" <please_dont.sp am.deutronium@g mx.de> wrote in message
    news:OJw0nHkgEH A.704@TK2MSFTNG P12.phx.gbl...[color=blue]
    > It is generally not useful to load *all* data contained in a database and
    > then try to make *queries* against it.
    > First, you have to consider what you want to do with the data.
    >
    > If you want to display all companies where person A worked in, do a query[/color]
    to[color=blue]
    > return exactly that and display it. If you want to display all persons[/color]
    that[color=blue]
    > worked in a certain company query exactly that from the database and[/color]
    display[color=blue]
    > it.
    > Do not worry to much about performance: Databases are designed just for
    > that.
    >
    > --
    > cody[/color]


    Comment

    • David

      #3
      Re: Design/implementation question

      On 2004-08-14, Yves Dhondt <no@privacy.net > wrote:[color=blue]
      > Hello,
      >
      > I've got the following UML design :[/color]

      Sorry, but you ask design questions, you get long-winded answers...
      [color=blue]
      >
      > C
      > |
      > A _____|______ B
      >
      > So 2 objects A and B are connected through a relation C. (For example an
      > employment scheme : person A1 worked for company B1 at time C1, person
      > A1 worked for company B2 at time C2, person A2 worked for company B1 at
      > time C3, ...)[/color]

      From a design standpoint, that just seems wrong to me. In the real
      world, companies have employees and people have jobs. It's hard to be
      definitive here without knowing what app you're writing, but it doesn't
      seem natural to me to think in terms of time spans having
      company-employee relationships.

      That's how relational database think, of course, but relational
      databases are pretty much antithetical to OOP design, and it's a mistake
      to have the middle-tier mirror the relational design IMHO (I realize
      this area of OOP design isn't really settled, of course, but that's my
      opinion).
      [color=blue]
      >
      > My current implementation exists of :
      >
      > public class objectA { ... }
      > public class objectB { ... }
      >
      > public class relationC {
      > private objectA A;
      > private objectB B;
      >
      > // other things about the relation
      > }
      >
      > I use a list with all relationCs in it. This takes not much memory but
      > if I want to check which objectB's are connected to an objectA or vice
      > versa, I need to check every relationC.
      >
      > ArrayList result = new ArrayList();
      >
      > foreach (relationC c in list)
      > {
      > if (c.getA().equal s(A))
      > {
      > result.Add(c.ge tB())
      > }
      > }[/color]
      [color=blue]
      >
      > Considering the fact I have about 10000 objectA's and 25 objectB's I can
      > have up to 250000 relationC's to check which makes my implementation
      > rather slow.
      >[/color]

      OK, I just got flamed every which way in another thread for suggesting
      this, but I'm stubborn. If a relationC object really is a key object in
      your design, then groups of them shouldn't be gathering together in dumb
      collections. They should belong to typed collection classes that can
      answer reasonable domain-specific questions:

      ArrayList results = list.GetBs(A);

      Now you haven't solved the problem, but at least you've encapsulated the
      search algorithm into a relationCCollec tion class, rather than having
      the search logic spread out all over your code. At that point there's a
      lot of implementation options that should come quite easily. You could
      maintain an index of references to A's relations in a Hashtable, which
      would make lookup very fast. If memory's the issue, you could implement
      a load-on-demand scheme, where the relationCCollec tion pulls down the
      records from the database when they're requested. You could implement a
      lazy-load scheme, where records are pulled down when requested, but are
      then kept in the collection.

      What I'd probably try is lazy loading the records, then keeping them in
      collections of WeakReferences, and then indexing the keys that were
      important to me. But it depends on what exactly you're trying to
      achieve. There's always a natural trade-off between memory and
      performance, so where the balance should be depends on your application.

      Basically, though, the trade-off comes down to this. If you want fast
      lookup based on a key, you want a hash. If you want fast lookup based
      on different keys, you need multiple hashes. If you don't have the
      memory for multiple hashes, you need to offload the calculation (e.g.,
      to the database).


      Comment

      • Nick Malik

        #4
        Re: Design/implementation question

        Hi David,

        This time, I understood what you are trying to say. (better coffee, more
        sleep... :-)
        I agree with what you said. You've clearly been thinking about this kind of
        encapsulation for a while.

        To Yves: you have a historical concept that is reminiscent of a data
        warehouse: on a particular date, these two items were joined. Unfortunately
        for you, data warehouses are large animals and require specialized
        algorithms for searching and producing rolll-up calculations. You don't
        make it clear what you are actually doing with this data, or even if the
        employee-employer statement is just an example. However, David is right to
        suggest that you should not try to mirror the logic of a database engine in
        your middle layer. I hate to ask this, and I mean no disrespect, but are
        you aware of SQL?

        You are describing a simple many-to-many relation in a relational database.
        Table A: Employees
        Employee Id, Employee Name, etc

        Table B: Companies
        Company Id, Company Name, etc

        Table C: EmployeeCompany
        SomeUniqueKey -- GUID
        EmployeeId
        CompanyId
        StartDate
        EndDate

        To find which Companies are connected to an Employee: (in Transact SQL)

        Select Company.Company Id, Company.Company Name, EmployeeCompany .StartDate,
        EmployeeCompany .EndDate
        from Company inner join EmployeeCompany on Company.Company Id =
        EmployeeCompany .CompanyId
        where EmployeeCompany .EmployeeId = @paramEmployeeI d

        Why not just program your "relation" object to perform the above query in
        the constructor where you pass in an employee id (which is passed to TSQL in
        @paramEmployeeI d), place the results into your collection, and work from
        there...

        The only deviation I'd suggest from David's excellent post would be this:
        cache the data if it is not likely to change and you are likely to need it
        again. Otherwise, destroy the resultset the moment you've answered the
        original question.

        I hope this helps,
        --- Nick

        "David" <dfoster@woofix .local.dom> wrote in message
        news:slrnchtab4 .ubf.dfoster@wo ofix.local.dom. ..[color=blue]
        > On 2004-08-14, Yves Dhondt <no@privacy.net > wrote:[color=green]
        > > Hello,
        > >
        > > I've got the following UML design :[/color]
        >
        > Sorry, but you ask design questions, you get long-winded answers...
        >[color=green]
        > >
        > > C
        > > |
        > > A _____|______ B
        > >
        > > So 2 objects A and B are connected through a relation C. (For example an
        > > employment scheme : person A1 worked for company B1 at time C1, person
        > > A1 worked for company B2 at time C2, person A2 worked for company B1 at
        > > time C3, ...)[/color]
        >
        > From a design standpoint, that just seems wrong to me. In the real
        > world, companies have employees and people have jobs. It's hard to be
        > definitive here without knowing what app you're writing, but it doesn't
        > seem natural to me to think in terms of time spans having
        > company-employee relationships.
        >
        > That's how relational database think, of course, but relational
        > databases are pretty much antithetical to OOP design, and it's a mistake
        > to have the middle-tier mirror the relational design IMHO (I realize
        > this area of OOP design isn't really settled, of course, but that's my
        > opinion).
        >[color=green]
        > >
        > > My current implementation exists of :
        > >
        > > public class objectA { ... }
        > > public class objectB { ... }
        > >
        > > public class relationC {
        > > private objectA A;
        > > private objectB B;
        > >
        > > // other things about the relation
        > > }
        > >
        > > I use a list with all relationCs in it. This takes not much memory but
        > > if I want to check which objectB's are connected to an objectA or vice
        > > versa, I need to check every relationC.
        > >
        > > ArrayList result = new ArrayList();
        > >
        > > foreach (relationC c in list)
        > > {
        > > if (c.getA().equal s(A))
        > > {
        > > result.Add(c.ge tB())
        > > }
        > > }[/color]
        >[color=green]
        > >
        > > Considering the fact I have about 10000 objectA's and 25 objectB's I can
        > > have up to 250000 relationC's to check which makes my implementation
        > > rather slow.
        > >[/color]
        >
        > OK, I just got flamed every which way in another thread for suggesting
        > this, but I'm stubborn. If a relationC object really is a key object in
        > your design, then groups of them shouldn't be gathering together in dumb
        > collections. They should belong to typed collection classes that can
        > answer reasonable domain-specific questions:
        >
        > ArrayList results = list.GetBs(A);
        >
        > Now you haven't solved the problem, but at least you've encapsulated the
        > search algorithm into a relationCCollec tion class, rather than having
        > the search logic spread out all over your code. At that point there's a
        > lot of implementation options that should come quite easily. You could
        > maintain an index of references to A's relations in a Hashtable, which
        > would make lookup very fast. If memory's the issue, you could implement
        > a load-on-demand scheme, where the relationCCollec tion pulls down the
        > records from the database when they're requested. You could implement a
        > lazy-load scheme, where records are pulled down when requested, but are
        > then kept in the collection.
        >
        > What I'd probably try is lazy loading the records, then keeping them in
        > collections of WeakReferences, and then indexing the keys that were
        > important to me. But it depends on what exactly you're trying to
        > achieve. There's always a natural trade-off between memory and
        > performance, so where the balance should be depends on your application.
        >
        > Basically, though, the trade-off comes down to this. If you want fast
        > lookup based on a key, you want a hash. If you want fast lookup based
        > on different keys, you need multiple hashes. If you don't have the
        > memory for multiple hashes, you need to offload the calculation (e.g.,
        > to the database).
        >
        >[/color]


        Comment

        Working...