How to find anomalous usage

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • rdemyan via AccessMonster.com

    How to find anomalous usage

    My app contains utility meter usage. One of the things we have to deal with
    is when a usage is clearly incorrect. Perhaps someone wrote the meter
    reading down incorrectly or made a factor of 10 error when entering the
    reading, etc. At other times the usage is zero or somehow was entered as a
    negative number.

    So I'm thinking about adding functionality to search for such anomalies. For
    instance, show months where the meter reading is 25% higher than the average
    for the prior 12 months. Or show months for a particular meter where there
    is a difference of 20% between adjacent monthly usage. Here's a data example

    Meter 5678

    Jan-06 100
    Feb-06 105
    Mar-06 75
    Apr-06 90
    May-06 101
    Jun-06 900
    Jul-06 89


    So you can see from this data that 900 is clearly incorrect and probably
    should be 90. The 75 usage in Mar-06 would show up on a search where there
    is a difference between adjacent months of 25% or more. We'll probably also
    code the functionality to search for zero usage and negative usage.

    Bear in mind that we have several thousand meters and around a 100,000
    monthly meter usages spanning several years.

    I'm looking for an approach to implement this functionality. Searching row
    by row through the tables would probably take a very long time. Is there a
    clever way to handle this through SQL alone or mostly through SQL? Or does
    anyone have any other suggestions? It would seem that this could be a very
    slow process.


    Thanks.

    --
    Message posted via AccessMonster.c om


  • Tom van Stiphout

    #2
    Re: How to find anomalous usage

    On Wed, 04 Oct 2006 14:23:13 GMT, "rdemyan via AccessMonster.c om"
    <u6836@uwewrote :

    I would compare the readings against a scaled version of the common
    trend. The trend would be an average over all meters, showing for
    example that the usage in winter months is higher than in summer
    months. The scaling is to account for a larger home putting up higher
    numbers than a smaller one.

    I would not worry about speed until it's proven to be an issue.

    -Tom.

    >My app contains utility meter usage. One of the things we have to deal with
    >is when a usage is clearly incorrect. Perhaps someone wrote the meter
    >reading down incorrectly or made a factor of 10 error when entering the
    >reading, etc. At other times the usage is zero or somehow was entered as a
    >negative number.
    >
    >So I'm thinking about adding functionality to search for such anomalies. For
    >instance, show months where the meter reading is 25% higher than the average
    >for the prior 12 months. Or show months for a particular meter where there
    >is a difference of 20% between adjacent monthly usage. Here's a data example
    >
    >Meter 5678
    >
    >Jan-06 100
    >Feb-06 105
    >Mar-06 75
    >Apr-06 90
    >May-06 101
    >Jun-06 900
    >Jul-06 89
    >
    >
    >So you can see from this data that 900 is clearly incorrect and probably
    >should be 90. The 75 usage in Mar-06 would show up on a search where there
    >is a difference between adjacent months of 25% or more. We'll probably also
    >code the functionality to search for zero usage and negative usage.
    >
    >Bear in mind that we have several thousand meters and around a 100,000
    >monthly meter usages spanning several years.
    >
    >I'm looking for an approach to implement this functionality. Searching row
    >by row through the tables would probably take a very long time. Is there a
    >clever way to handle this through SQL alone or mostly through SQL? Or does
    >anyone have any other suggestions? It would seem that this could be a very
    >slow process.
    >
    >
    >Thanks.

    Comment

    • Lyle Fairfield

      #3
      Re: How to find anomalous usage

      "rdemyan via AccessMonster.c om" <u6836@uwewro te in
      news:6743ad47a3 f7b@uwe:
      My app contains utility meter usage. One of the things we have to
      deal with is when a usage is clearly incorrect. Perhaps someone wrote
      the meter reading down incorrectly or made a factor of 10 error when
      entering the reading, etc. At other times the usage is zero or
      somehow was entered as a negative number.
      >
      So I'm thinking about adding functionality to search for such
      anomalies. For instance, show months where the meter reading is 25%
      higher than the average for the prior 12 months. Or show months for
      a particular meter where there is a difference of 20% between adjacent
      monthly usage. Here's a data example
      >
      Meter 5678
      >
      Jan-06 100
      Feb-06 105
      Mar-06 75
      Apr-06 90
      May-06 101
      Jun-06 900
      Jul-06 89
      >
      >
      So you can see from this data that 900 is clearly incorrect and
      probably should be 90. The 75 usage in Mar-06 would show up on a
      search where there is a difference between adjacent months of 25% or
      more. We'll probably also code the functionality to search for zero
      usage and negative usage.
      >
      Bear in mind that we have several thousand meters and around a 100,000
      monthly meter usages spanning several years.
      >
      I'm looking for an approach to implement this functionality.
      Searching row by row through the tables would probably take a very
      long time. Is there a clever way to handle this through SQL alone or
      mostly through SQL? Or does anyone have any other suggestions? It
      would seem that this could be a very slow process.
      >
      >
      Thanks.
      OTTOMH

      SELECT m.Reading, (m.Reading-sq.Average)/sq.StDev AS ZScore FROM Meter m
      LEFT JOIN
      [SELECT Avg(Meter.Readi ng) AS Average, StDev(Meter.Rea ding) AS StDev
      FROM Meter]. sq
      ON m.Reading*1000 <sq.Average
      WHERE ((m.Reading-sq.Average)/sq.StDev)>=2
      ORDER BY (m.Reading-sq.Average)/sq.StDev

      You, of course, would have to modify this for your own situation. I have
      suggested that a Score >= 2 would be suspect but your own experience
      would be the best guide here.

      No, I don't really expect that you will be able to use this, but hope
      springs eternal.

      --
      Lyle Fairfield

      Comment

      • rdemyan via AccessMonster.com

        #4
        Re: How to find anomalous usage

        Interesting, Lyle. I'll see what I can do with this and report back. You
        show 2 but that can be easily changed by the user on the form (however, I'll
        have to think about what that really means in terms us mere mortals can
        understand).

        One definate thing I will want to add is the ability to select a specific
        time frame.

        Lyle Fairfield wrote:
        >My app contains utility meter usage. One of the things we have to
        >deal with is when a usage is clearly incorrect. Perhaps someone wrote
        >[quoted text clipped - 34 lines]
        >>
        >Thanks.
        >
        >OTTOMH
        >
        >SELECT m.Reading, (m.Reading-sq.Average)/sq.StDev AS ZScore FROM Meter m
        >LEFT JOIN
        >[SELECT Avg(Meter.Readi ng) AS Average, StDev(Meter.Rea ding) AS StDev
        >FROM Meter]. sq
        >ON m.Reading*1000 <sq.Average
        >WHERE ((m.Reading-sq.Average)/sq.StDev)>=2
        >ORDER BY (m.Reading-sq.Average)/sq.StDev
        >
        >You, of course, would have to modify this for your own situation. I have
        >suggested that a Score >= 2 would be suspect but your own experience
        >would be the best guide here.
        >
        >No, I don't really expect that you will be able to use this, but hope
        >springs eternal.
        >
        --
        Message posted via AccessMonster.c om


        Comment

        • rdemyan via AccessMonster.com

          #5
          Re: How to find anomalous usage

          Tom:

          It's a good point about summer and winter months. This is also a function of
          geographic area. In Seattle, electricity usage is fairly constant throughout
          the year (no summer air conditioning). Water is also fairly constant (not
          much irrigation needed in the Pacific Northwest). Heating, though will vary
          substnatially.

          In Hawaii, cooling occurs year round but will vary with the cooling degree
          days. No heating degree days there so heating is not an issue.

          In writing this, I realize that I may want to incorporate weather data in
          determining what an "anomaly" is for those utilities that show variance due
          to weather in the particular geographical area. My data tables contain all
          the necessary weather data so this should be doable.

          Tom van Stiphout wrote:
          >I would compare the readings against a scaled version of the common
          >trend. The trend would be an average over all meters, showing for
          >example that the usage in winter months is higher than in summer
          >months. The scaling is to account for a larger home putting up higher
          >numbers than a smaller one.
          >
          >I would not worry about speed until it's proven to be an issue.
          >
          >-Tom.
          >
          >>My app contains utility meter usage. One of the things we have to deal with
          >>is when a usage is clearly incorrect. Perhaps someone wrote the meter
          >[quoted text clipped - 32 lines]
          >>
          >>Thanks.
          --
          Message posted via AccessMonster.c om


          Comment

          • Tom van Stiphout

            #6
            Re: How to find anomalous usage

            On Wed, 04 Oct 2006 15:43:31 GMT, Lyle Fairfield
            <lylefairfield@ aim.comwrote:

            I'll have to study this some more. The way I'm calculating z-scores
            for a project is quite a bit more involved.

            I think Abs(Score) >= 2 is worth another look.

            -Tom.

            <clip>
            >
            >OTTOMH
            >
            >SELECT m.Reading, (m.Reading-sq.Average)/sq.StDev AS ZScore FROM Meter m
            >LEFT JOIN
            >[SELECT Avg(Meter.Readi ng) AS Average, StDev(Meter.Rea ding) AS StDev
            >FROM Meter]. sq
            >ON m.Reading*1000 <sq.Average
            >WHERE ((m.Reading-sq.Average)/sq.StDev)>=2
            >ORDER BY (m.Reading-sq.Average)/sq.StDev
            >
            >You, of course, would have to modify this for your own situation. I have
            >suggested that a Score >= 2 would be suspect but your own experience
            >would be the best guide here.
            >
            >No, I don't really expect that you will be able to use this, but hope
            >springs eternal.

            Comment

            • Lyle Fairfield

              #7
              Re: How to find anomalous usage

              Tom van Stiphout <no.spam.tom774 4@cox.netwrote in
              news:vfo8i2576u 69ilv65og04c01q n4f906bfl@4ax.c om:
              I think Abs(Score) >= 2 is worth another look.
              >
              -Tom.
              I think you are right. Abs() is a good idea.

              --
              Lyle Fairfield

              Comment

              • rdemyan via AccessMonster.com

                #8
                Re: How to find anomalous usage

                Lyle:

                I got the following to produce results, but I need to test it further.

                SELECT m.USAGE, (m.USAGE-sq.Average)/sq.StDev AS ZScore
                FROM [MONTHLY_METER_U SAGE] AS m LEFT JOIN (SELECT Avg(USAGE) AS Average,
                StDev(USAGE) AS StDev
                FROM [MONTHLY_METER_U SAGE]
                WHERE METER_ID = '000001'
                AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#) AS sq
                ON m.USAGE <sq.Average
                WHERE ((m.USAGE-sq.Average)/sq.StDev)>=2
                AND m.METER_ID = '000001'
                AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#
                ORDER BY (m.USAGE-sq.Average)/sq.StDev;




                Lyle Fairfield wrote:
                >I think Abs(Score) >= 2 is worth another look.
                >>
                >-Tom.
                >
                >I think you are right. Abs() is a good idea.
                >
                --
                Message posted via AccessMonster.c om


                Comment

                • Lyle Fairfield

                  #9
                  Re: How to find anomalous usage

                  rdemyan via AccessMonster.c om wrote:
                  Lyle:
                  >
                  I got the following to produce results, but I need to test it further.
                  >
                  SELECT m.USAGE, (m.USAGE-sq.Average)/sq.StDev AS ZScore
                  FROM [MONTHLY_METER_U SAGE] AS m LEFT JOIN (SELECT Avg(USAGE) AS Average,
                  StDev(USAGE) AS StDev
                  FROM [MONTHLY_METER_U SAGE]
                  WHERE METER_ID = '000001'
                  AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#) AS sq
                  ON m.USAGE <sq.Average
                  WHERE ((m.USAGE-sq.Average)/sq.StDev)>=2
                  AND m.METER_ID = '000001'
                  AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#
                  ORDER BY (m.USAGE-sq.Average)/sq.StDev;
                  If you use Tom's revision:
                  WHERE Abs(((m.USAGE-sq.Average)/sq.StDev))>=2
                  you will identify scores that are unusually low as well as scores that
                  are unusually high,

                  Comment

                  • rdemyan via AccessMonster.com

                    #10
                    Re: How to find anomalous usage

                    Got this to work nicely but had to add a Having clause because there is the
                    possibility that StDev can be zero and dividing by zero, of course, leads to
                    an error.

                    Lyle Fairfield wrote:
                    >Lyle:
                    >>
                    >[quoted text clipped - 11 lines]
                    >AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#
                    >ORDER BY (m.USAGE-sq.Average)/sq.StDev;
                    >
                    >If you use Tom's revision:
                    >WHERE Abs(((m.USAGE-sq.Average)/sq.StDev))>=2
                    >you will identify scores that are unusually low as well as scores that
                    >are unusually high,
                    --
                    Message posted via http://www.accessmonster.com

                    Comment

                    • Lyle Fairfield

                      #11
                      Re: How to find anomalous usage


                      rdemyan via AccessMonster.c om wrote:
                      Got this to work nicely but had to add a Having clause because there is the
                      possibility that StDev can be zero and dividing by zero, of course, leads to
                      an error.
                      Good point!

                      Comment

                      Working...