Multiple Tables or just one big one?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • TheServant
    Recognized Expert Top Contributor
    • Feb 2008
    • 1168

    Multiple Tables or just one big one?

    Hey guys,
    I am making an online game and one of the issues I have is this: Obviously I have a lot of variables per user, however probably half are needed on most pages, and then the rest are only needed on select pages. On the load of every page where any variables are needed I am connecting to the database and getting the latest results to display for the user.

    When I connect it is connecting to a big table and not using most variables there very often. So my question is, would it be better to have a few smaller tables and connect to several small ones per page, or leave it as one big one?

    Or to put it another way, are more resources used connecting to a bigger table, or are more resources used connecting to multiple tables?
  • amitpatel66
    Recognized Expert Top Contributor
    • Mar 2007
    • 2358

    #2
    The most important thing you need to take under consideration is the Normalization. Your tables needs to be in Normalised form so that there is no data redunduncy and better performance.

    Check Normalization and its Structure
    Last edited by Atli; May 8 '08, 04:00 AM. Reason: Link fixed

    Comment

    • chaarmann
      Recognized Expert Contributor
      • Nov 2007
      • 785

      #3
      Originally posted by amitpatel66
      The most important thing you need to take under consideration is the Normalization. Your tables needs to be in Normalised form so that there is no data redunduncy and better performance.

      Check Normalization and its Structure
      The link doesn't work.

      Comment

      • chaarmann
        Recognized Expert Contributor
        • Nov 2007
        • 785

        #4
        basically it's faster to fetch all data from one row of a single table than parts of data from many tables. I mean parts of data fetched either by join or by multiple queries. On the other side, updating would be much slower if you need to update data frequently and you need to update only some values.

        The best way is to keep these frequently changing values inside a small, memory-mapped table, whereas you store the more constant and less used data in a big table.

        Data that does not change during the game can be read only once during startup and kept into the cache of your program server.

        If you have performance problems with frequently changing data, then you can use breakpoints, that means all changes are done inside the memory of your program server, and every 5 minutes or so it writes the data to the database. So if the game crashes, you can restore from this point. I mean it's a game and not a financial application where loss of data needs to be avoided at all.

        Comment

        • r035198x
          MVP
          • Sep 2006
          • 13225

          #5
          Originally posted by chaarmann
          The link doesn't work.
          Perhaps Amit meant this one.

          Comment

          • amitpatel66
            Recognized Expert Top Contributor
            • Mar 2007
            • 2358

            #6
            Originally posted by chaarmann
            The link doesn't work.
            Oh Apologies. Thanks r035198x!!

            Comment

            • TheServant
              Recognized Expert Top Contributor
              • Feb 2008
              • 1168

              #7
              Thanks for your replies. I am still kinda confused though? I do have one table which is only called once and stored in a session, as it will never change (except for very special reasons). I also have data required every page, but this will change often, so I will be connecting to the database often.

              I have thought about having it so that it only connects once, and variables remain in sessions, but if the user closes the browser all the changes he has made will not be recorded. So I do need to connect everytime a variable is changed.

              Then I also have some data that is required on specific pages (let's say 3 tables). So my options are:
              1. Current Set Up:
                • Constant Data Table (called once => Sessions)
                • Variable Data Table (called every page)
              2. Proposed Set Up:
                • Constant Data Table (called once => Sessions)
                • Variable Data Table A (called every page)
                • Variable Data Table B (called on specific pages)
                • Variable Data Table C (called on specific pages)
                • Variable Data Table D (called on specific pages)


              In the second case, there will be times when all tables are called, but then others when only one is called. So again: is it more resource demanding to call one big table, or several smaller tables (which size-wise add up to one big table)?

              Comment

              • TheServant
                Recognized Expert Top Contributor
                • Feb 2008
                • 1168

                #8
                Nobody know? Just let me know if you don't so I can ask somewhere else.

                Comment

                • chaarmann
                  Recognized Expert Contributor
                  • Nov 2007
                  • 785

                  #9
                  Originally posted by TheServant
                  Nobody know? Just let me know if you don't so I can ask somewhere else.
                  Maybe you should ask a better question, or nobody can answer you.
                  I can't see the differences between using "variable data table" and "variable data table" version A, B, C, D described. To decide whether the first or second approach is better, we need more information. Especially how frequent the data is updated inside each table and what data you will store.

                  I don't feel that you really apply my answer, that I gave before, to your problem. You ask a general question and I gave a general answer. So if you want a better answer, it can only be done if you give more details.

                  Let me elaborate:

                  Originally posted by chaarmann
                  basically it's faster to fetch all data from one row of a single table than parts of data from many tables. I mean parts of data fetched either by join or by multiple queries. On the other side, updating would be much slower if you need to update data frequently and you need to update only some values.
                  So what's your setup here? Do you need to update only some data? How frequently do you need to update? Minutes or Seconds or milliseconds?
                  Do you need to normalize a lot which forces you to have many small tables? Do you have a lot of joins between tables?

                  Originally posted by chaarmann
                  The best way is to keep these frequently changing values inside a small, memory-mapped table, whereas you store the more constant and less used data in a big table.
                  You should group your data in "frequently used", "less frequently used" and "seldom used"
                  How big is your frequently changing data? less than 100 bytes? some Kilobyes?
                  And what about the others?
                  I mean a position of a spaceship for example, which can be steered by the user changes every second, is data you should keep into a small memory-mapped table and not put it together with other data (like the name of the spaceship that the user gave at game startup and which does not change during the game)

                  Originally posted by chaarmann
                  Data that does not change during the game can be read only once during startup and kept into the cache of your program server.
                  I am sure you have parameters which are game-relevant only. That means the same for each user. Like the name of the game for example. Don't store them inside a session which is user-related. Just store them inside the cache of your program server. That means: if you use Java, you should read them only once when you startup your game server (not client!) and then store them in a constant place.That means if you use Java, store them on class level and not on object level. So you load them only once and many servlets can access the same data without reloading the data. If you have multiple servers, you can store them inside a EJB (or CORBA object). So my question is:
                  What language do you use for programming the game?
                  How many server do you use for your game?
                  Do these severs need to exchange information?
                  Can the same data be accessed from different servers at the same time?

                  Originally posted by chaarmann

                  If you have performance problems with frequently changing data, then you can use breakpoints, that means all changes are done inside the memory of your program server, and every 5 minutes or so it writes the data to the database. So if the game crashes, you can restore from this point. I mean it's a game and not a financial application where loss of data needs to be avoided at all.
                  So what is your need here? store data changes every millisecond ? That would be not possible without creating huge traffic to your server. You probably have some actions (like turn-based systems have). And you submit these commands at a bunch, let's say every 5 seconds. So if something goes wrong (connection loss, or user closes the browser window), you have all the needed data still stored on your server inside the memory. So if the user reconnects, you can send him the latest view of the data, up to the last 5 seconds. No database access needed.
                  But my question is here: if the server itself crashes (not the client), do you really need to restore up to the last 5 seconds? I mean how often will the server crash? maybe once in a year? So if you store all these data let's say every 5 minutes, you are doing a good job in restoring. Other systems maybe would only once a day or so. If other game server crash, there is no restore at all. You just lost your game. And it's not important, because it wasn't a financial transaction.
                  So my question is here: do you have financial transactions? Like casino plays or games where you can win a huge amount of money? If yes, then that should be the highest priority of your design where you need to build all other things around that, and this will change your design drastically. Your question is not "speed" and "performanc e" anymore, but "security" and "legal issues of data storage" and "data loss".

                  So you see a lot of "if" and "when" statements and guessings are here. Without giving further details, you will never get the desired answer.

                  Comment

                  • TheServant
                    Recognized Expert Top Contributor
                    • Feb 2008
                    • 1168

                    #10
                    Thanks for your very thorough reply. I would like to apologise if I sounded ungrateful or rude, it was not at all my intention. I have only being doing this from January and I hold all Experts, Mods and contributers here in the highest regard.

                    First of all, the game is not going to be *that* server intensive. The database will only be updated when the user submits a form to the server. So there will be cron jobs running every 10 minutes, but only updating about 5 or so variables. The user(s) information is not updated periodically but just called on every page.

                    So I guess a further question is updating x variables in a small table basically (obviously not exactly, but basically) the same as updating x variables in a big table?

                    The tables are not joined, but all use the user_name or user_id as the primary key, so that they can be called in a similar way.

                    Each form will only update a certain number of variables, so say a maximum of 10 variables. For example, if a player trains some soldiers the following groups of data (currently all the same table, but I am thinking about separating them):
                    {user resources - money, ore, wood}
                    {user army - swordsmen, archers, cavalry}

                    So the current setup will include the following data groups (slightly simplified):
                    1. {user resources - money, ore, wood, food, stone}
                    2. {user army - swordsmen, archers, cavalry}
                    3. {user weapons - level 1 weapon, level 2 weapon, level 3 weapon}
                    4. {user alliance information}
                    5. {user attack log and history}
                    6. {user messages}

                    Can't think of anymore examples but that will do. Now if I purchase weapons, it will only change {user resources} and {user weapons} but the page will require {user resources}, {user army} and {user weapons}. At the moment, it calls the big table (only opens one table) will all the user information, but I am wondering if it more efficient to separate the data into 6 smaller tables and so for this case, only 3 tables will be called (half the original size).

                    So my concern is that opening several smaller tables is more resource intensive than opening one big one? I hope you understand.

                    I use PHP as my language.
                    I have one server and so no information exchanges occur.
                    All information I am referring to is user-specific, the game-specific ones are either in the page or called once during login so should not impact anything asked here.

                    Again, I sincerely thank you for your time, and I look forward to hearing back from you.

                    Comment

                    • chaarmann
                      Recognized Expert Contributor
                      • Nov 2007
                      • 785

                      #11
                      Originally posted by TheServant
                      First of all, the game is not going to be *that* server intensive. The database will only be updated when the user submits a form to the server. So there will be cron jobs running every 10 minutes, but only updating about 5 or so variables. The user(s) information is not updated periodically but just called on every page.
                      If you update variables with cron-jobs, then you must somehow notify all sessions that use these tables to clear their cache and reload these tables.
                      You can achieve this in different ways. I will give you here a maybe not very elegant, but simple way:
                      You make a table with one row that stores only a version number, let's say "3". To be fast, you should use prepared statements and use a mySQL table engine that is memory-mapped and has no rollback. like myISAM or MEMORY. Now your session reads this version number along with all the table data and caches it in internal session variables. Every time a user submits a new form and his data needs to be processed, you will make an SQL first to check for this version number and compare it with the one stored inside the session. If it's the same, do nothing, else reload the tables. The version number is changed only by the cron-jobs. They increase the number by one when they make a change to the tables.

                      Originally posted by TheServant
                      So I guess a further question is updating x variables in a small table basically (obviously not exactly, but basically) the same as updating x variables in a big table?
                      If you update a single table, you have only one SQL to send over the network (or internal port if it's on the same machine). Also the SQL needs to be parsed only once. The result data is grabbed from the harddisk in one chunk.
                      But if you have multiple small tables instead, it will be slower because you need to send over an SQL for every table (or make a complicated join-query instead), you need to parse all SQLs and your haddisk must move its head to different location to grab all the needed data, which can be thousand times slower.
                      So reading/updating in a big table is faster than from many small tables. On the other side, if all the table data can be hold inside the memory of the database, then it's much faster but still not as fast as a big table, because of the parsing/sending overhead of the queries.
                      So if performance matters, you should use a big table and regularly defragment it. But if perfromance does not matter, it's better to hold all data in small tables, so changes and extensions can be done easier. And you have a better overview in case you need to debug or search manually, to export partial data etc.

                      Originally posted by TheServant
                      The tables are not joined, but all use the user_name or user_id as the primary key, so that they can be called in a similar way.
                      if you use user_id as primary key in one table which matches the user_id in another table, that's exactly called joined data. So they are joined.
                      if you want to fetch the data, you must execute either:
                      Code:
                      select * from table1, table2 where table1.user_id = ? and table2.user_id=?
                      or call both statements:
                      [HTML]select * from table1 where user_id = ?;
                      select * from table2 where user_id = ?; [/HTML]
                      whereas a big table would let you get all dat with a single statement:
                      [HTML]select * from big_table where user_id = ? [/HTML]
                      that's a good example for the explanation in the chapter above. The assumption is that you need data from both tables.
                      But if you need only data from one table, then calling only
                      Code:
                      select soldier_name, rank from table1 where user_id = ?
                      is slightly (and maybe not noticable) faster than calling
                      Code:
                      select soldier_name, rank from big_table where user_id = ?
                      .

                      Originally posted by TheServant
                      Each form will only update a certain number of variables, so say a maximum of 10 variables. For example, if a player trains some soldiers the following groups of data (currently all the same table, but I am thinking about separating them):
                      {user resources - money, ore, wood}
                      {user army - swordsmen, archers, cavalry}
                      So the current setup will include the following data groups (slightly simplified):
                      1. {user resources - money, ore, wood, food, stone}
                      2. {user army - swordsmen, archers, cavalry}
                      3. {user weapons - level 1 weapon, level 2 weapon, level 3 weapon}
                      4. {user alliance information}
                      5. {user attack log and history}
                      6. {user messages}

                      Can't think of anymore examples but that will do. Now if I purchase weapons, it will only change {user resources} and {user weapons} but the page will require {user resources}, {user army} and {user weapons}. At the moment, it calls the big table (only opens one table) will all the user information, but I am wondering if it more efficient to separate the data into 6 smaller tables and so for this case, only 3 tables will be called (half the original size).
                      The proposed structure how you divide the data is very good.
                      I hope if you call the big table, you only return the needed fields (I mean not using "select *"). If you do otherwise, then you would return unneeded data which is costly.
                      Also do a proper normalization, which forces you to hold data in separate tables.
                      Never ever store the same data twice! Only store it in one place and reference to it everywhere. For example a user can send many chat messages and one name. Then avoid solutions a) and b), but do solution c):

                      Solution a)
                      one big table, stores user_name, chat_message:
                      "The Servant", "hello how are You?"
                      "The Servant", "I am fine, too !"
                      "The Servant", "I don't think so !"

                      Solution b)
                      one big table, stores user_name, chat_message1, chat_message2, ... chat_message100 :
                      "The Servant", "hello how are You?", "I am fine, too !", "I don't think so !", null, ...

                      Solution c)
                      two tables,
                      c1) user_id, name
                      1, "The Servant"
                      c2) user_id, chat_message
                      1, "hello how are You?"
                      1, "I am fine, too !"
                      1, "I don't think so !"

                      On the other side, avoid splitting up data that should be logically together. For example you need to store the map position of a soldier: Don't do solution a) but do solution b):

                      a) two tables:
                      a1) soldier_id, x_coordinate
                      1, 800
                      a2) soldier_id, y_coordinate
                      1, 600

                      b) one table: soldier_id, x_coordinate, y_coordinate
                      1, 800, 600

                      first, it's faster and second, it's more secure. You probably would define all table columns "not null".
                      But in case a), if a table is corrupt or only half the data got saved because of a server crash it's possible to have the x-coordinate, but not the y-coordinate. This may lead to null-pointer exceptions in your program when you try to access a position on the map with these coordinates.
                      In case of b), all data is lost or all data is there, so no crashes.

                      Originally posted by TheServant
                      So my concern is that opening several smaller tables is more resource intensive than opening one big one? I hope you understand.
                      yes it is. And retrieval of data in case of corruption is harder (checking all the broken joins). But you will lose less data and can retrieve more.
                      If you want an overview of all user's data,the SQL is more complicated, but the overview over partial data is easier.
                      But on the other side, extension of additional data is very easy, and the performace loss may not be that bad or relevant.

                      If you have an index on the user_id (or primary key), that means you don't need to make a full table scan, then calling the data from the big table is faster. But if it doesn't go over the network, only local, the performance gain shrinks.
                      All tables where you need to make a full table scan should be divided into smaller ones. For example you are storing the user chat messages, and you want to know which user uses a bad, dirty word, I mean which can occur everywhere inside the stored user input string, then searching for this word is only possible with a full table scan (assuming you don't use mySqls text search facilities and you just have it stored as a normal string and you seach with "like" or "instr"). So in this case it's better to store it inside a small table, because the database most likely then can hold all the table data inside the memory and doesn't need to access the slow disk, or at least can read it in quick, whole disk buffer chunks. Reading a whole small file is faster than reading a whole big file or jumping with the head around a big file.

                      Originally posted by TheServant
                      I use PHP as my language.
                      I have one server and so no information exchanges occur.
                      All information I am referring to is user-specific, the game-specific ones are either in the page or called once during login so should not impact anything asked here.
                      Again, I sincerely thank you for your time, and I look forward to hearing back from you.
                      sounds good, but what if your game is a big success and one server is not enough anymore? What if your data grows so big that retrieval becomes a performance issue, and you decide to split up the big table into 2 smaller ones, and you store one table on one server, the other table on the second server, and then you inquire both servers in parallel to retrieve the data in double speed as if you would have retrieved from one server only?
                      Then you have to scap your whole design and rewrite all your SQL's. A big work and coding nightmare. If you have many small tables from the beginning, you don't need to do that. You can handle the relocation on database level (remote access, views etc.)
                      My advice:
                      It's much better to split the tables from the beginning on (as what you have proposed, or even more), and if you want to tune the performance of your game, you can put data together later on and only in cases of bottlenecks.

                      Comment

                      • TheServant
                        Recognized Expert Top Contributor
                        • Feb 2008
                        • 1168

                        #12
                        Great! Thanks for you help. You have made things a lot clearer and brought my attention to some ideas which I have never thought of. So what I might do, for now, as this is only the Alpha, so I will probably re-code a lot of it for the Beta (I am always learning more efficient ways of doing things and it's easier to start from the beginning), but I wil have one big table but call data like:
                        Code:
                        SELECT variable_a, variable_b, variable_c, variable_d, variable_e, variable_f 
                        FROM big_table 
                        WHERE user_id = Player_Name
                        This is really what I wanted from the start but I presumed it was accessing all the information anyway (ie same as
                        Code:
                        SELECT * FROM big_table WHERE user_id = Player_Name
                        ) but obviously not.

                        Thanks again for your very thorough help!

                        Comment

                        Working...