619814 - Set up seamicro nodes for smoke test

Reporter

Description

•

14 years ago

We need 20?  VMs to run Grinder against PHX.

Laura Thomson :laura

Reporter

Updated

•

14 years ago

Blocks: 619815

Laura Thomson :laura

Reporter

Updated

•

14 years ago

Blocks: 619811

Robert Helmer [:rhelmer]

Assignee

Comment 1

•

14 years ago

jabba, can you locate and install 20 VMs for this? 

RHEL 5.5 or 6 should be fine, we can start by getting one set up and then figure out how we want to install to the others.

Assignee: robert → jdow

Robert Helmer [:rhelmer]

Assignee

Comment 2

•

14 years ago

(In reply to comment #1)
> jabba, can you locate and install 20 VMs for this? 
> 
> RHEL 5.5 or 6 should be fine, we can start by getting one set up and then
> figure out how we want to install to the others.

You can toss the bug back to me once they are up and running.

Corey Shields [:cshields]

Comment 3

•

14 years ago

What is the life cycle for these?  I presume they will be used just to test the PHX build-out, is it safe to assume that they will be unnecessary and deleted by, say, February?

Robert Helmer [:rhelmer]

Assignee

Comment 4

•

14 years ago

(In reply to comment #3)
> What is the life cycle for these?  I presume they will be used just to test the
> PHX build-out, is it safe to assume that they will be unnecessary and deleted
> by, say, February?

Yes, this should be plenty of time.

Corey Shields [:cshields]

Comment 5

•

14 years ago

Would it be okay to do these in EC2 then?  It would be easier (and probably quicker) on our end to do this.

Justin Dow [:jabba]

Updated

•

14 years ago

Assignee: jdow → cshields

Robert Helmer [:rhelmer]

Assignee

Comment 6

•

14 years ago

(In reply to comment #5)
> Would it be okay to do these in EC2 then?  It would be easier (and probably
> quicker) on our end to do this.

I think this would be fine; do we have an account already? 

Pretty sure we have a rackspace/slicehost account at least, I can look into that if not.

Corey Shields [:cshields]

Comment 7

•

14 years ago

Another option we have would be to do this with part of our new seamicro cluster (which is in the same data center).  We could turn up 40+ nodes there pretty easily. The problem with it is that next week we are swapping hardware out on that chassis, so the earliest we could get it would be the end of next week at best.  Would this be acceptable?  I would rather go this route than spin up some new vms  (would be cheaper too)

Robert Helmer [:rhelmer]

Assignee

Comment 8

•

14 years ago

(In reply to comment #7)
> Another option we have would be to do this with part of our new seamicro
> cluster (which is in the same data center).  We could turn up 40+ nodes there
> pretty easily. The problem with it is that next week we are swapping hardware
> out on that chassis, so the earliest we could get it would be the end of next
> week at best.  Would this be acceptable?  I would rather go this route than
> spin up some new vms  (would be cheaper too)

Our migration checklist currently has the smoke/load test happening 12/20-12/29.

So we may take you up on this, in the meantime I'll investigate the rackspace option.

Corey Shields [:cshields]

Comment 9

•

14 years ago

For the seamicro option, I've put the remote hands request in to swap out the parts (they have them now).  I am not sure if they are able to do this overnight there or if it has to wait until business hours.  Either way the chassis will need to be reconfigured for the smaller storage space we will be left with, assuming all goes well with the remote hands work.

So, I would not count out the possibility of having 20-40 nodes ready by the end of Thursday, but it will be a stretch.  You want these setup with root access for yourself to get in and run the tests?

Robert Helmer [:rhelmer]

Assignee

Comment 10

•

14 years ago

(In reply to comment #9)
> For the seamicro option, I've put the remote hands request in to swap out the
> parts (they have them now).  I am not sure if they are able to do this
> overnight there or if it has to wait until business hours.  Either way the
> chassis will need to be reconfigured for the smaller storage space we will be
> left with, assuming all goes well with the remote hands work.
> 
> So, I would not count out the possibility of having 20-40 nodes ready by the
> end of Thursday, but it will be a stretch.  You want these setup with root
> access for yourself to get in and run the tests?

WFM. Hopefully will not really need to use root for much though :)

Can you give me any details on exactly what we're getting? Are these VMs or real machines? I'd like to make a note of the specs too, but I can probably figure it out once I have access if that's not handy.

Corey Shields [:cshields]

Comment 11

•

14 years ago

Real machines, just low power machines.  They are all 32 bit Atom procs, each with 2GB ram.  We will have RHEL running on them.  So in a sense they can be considered equivalent to a VM in shared processing power, but you have dedicated hardware instead.

The bad news is that we have run into a snag and the chassis is offline after last night's remote work.  Working to get it back online, but this is a miss for the holiday weekend.

Corey Shields [:cshields]

Comment 12

•

14 years ago

This is looking to be less of a possibility as the Seamicro cluster is in a broken state, and we are working with the vendor to bring it back to life.

Let's move forward with getting you some cloud VMs

Corey Shields [:cshields]

Comment 13

•

14 years ago

(per irc) You are set with pp-sea-dist[01-40].phx.mozilla.com..  These are Atom 2GB nodes, RHEL6 32 bit.

Let me know as soon as the testing is done so I can repurpose these to another goal we have.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Robert Helmer [:rhelmer]

Assignee

Updated

•

14 years ago

Summary: Set up VMs for smoke test → Set up seamicro nodes for smoke test

Robert Helmer [:rhelmer]

Assignee

Comment 14

•

14 years ago

Reusing this bug to track setup of the nodes too.

Assignee: cshields → rhelmer

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Robert Helmer [:rhelmer]

Assignee

Comment 15

•

14 years ago

Attached file simple load-test setup/run script (obsolete) — Details

This is a simple loadtest setup/run script, which has two modes:

./socorro-loadtest.sh setup

1) creates socorro user on remote machine
2) downloads specific version of Socorro to the node (from Hudson)
3) downloads specific crash set (from wherever the script is run

./socorro-loadtest.sh runtest

1) runs submitter.py pointing to crashes directory (each node in background with it's own log)

The script will catch ctrl-C and abort any running jobs.

It catches errors in setup, but this is harder to do for actual test runs. For now, we'll need to check node${n}.log for errors (which we need to look at for analysis anyway).

One outstanding item is that we need to build a list of crashes, and ideally upload it to some central place. Right now I have a crashes.tar.gz with a single crash in it.

Attachment #502019 - Flags: review?(lars)

Attachment #502019 - Flags: feedback?(jdow)

Robert Helmer [:rhelmer]

Assignee

Updated

•

14 years ago

Attachment #502019 - Flags: review?(laura)

Robert Helmer [:rhelmer]

Assignee

Updated

•

14 years ago

Attachment #502019 - Attachment is patch: false

Laura Thomson :laura

Reporter

Comment 16

•

14 years ago

Comment on attachment 502019 [details]
simple load-test setup/run script

seems reasonable to me

Attachment #502019 - Flags: review?(laura) → review+

Laura Thomson :laura

Reporter

Comment 17

•

14 years ago

Rob: Daniel says the best way to get a set of crashes is as follows:
deinspanjer: Basically, the quickest thing would be to take the hbaseClient.py method export_jsonz_tarball_for_date and build a utility that does something similar but exports both dump and json
deinspanjer: That is straight python hacking.  Once you have a version that works against staging, I can run it against prod.

Robert Helmer [:rhelmer]

Assignee

Comment 18

•

14 years ago

Attached patch export_dump_for_date (obsolete) — Details — Splinter Review

Allows one to dump both jsonz and dump like so:

python ./socorro/storage/hbaseClient.py export_jsonz_for_date 110101 /tmp/test
python ./socorro/storage/hbaseClient.py export_dump_for_date 110101 /tmp/test

Not sure if we want to keep this code, I just hacked it up quickly based on the export_jsonz_for_date method. Would be nice to have a single method that does both, but this seems to do the right thing.

Attachment #502064 - Flags: review?(deinspanjer)

Attachment #502064 - Flags: feedback?(lars)

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 19

•

14 years ago

Comment on attachment 502064 [details] [diff] [review]
export_dump_for_date

Below are a couple of quick points as an inline review.

I'm also going to see if I can attach my own attempt at pseudo-code for what I'm thinking of.  I don't have a sandbox handy at the moment to compile and test the code, but it should be useful for reference in the least.


>diff --git a/socorro/storage/hbaseClient.py b/socorro/storage/hbaseClient.py
>index f858a9a..7d65d90 100644
>--- a/socorro/storage/hbaseClient.py
>+++ b/socorro/storage/hbaseClient.py
>@@ -453,6 +453,27 @@ class HBaseConnectionForCrashReports(HBaseConnection):
>         finally:
>           file_handle.close()
> 

I believe that a single method that exports both json and dump in one pass is much better than two separate ones.  Biggest reason for this is that a two pass method doesn't allow for random sampling.

>+  def export_dump_for_date(self,date,path):
>+    """
>+    Iterates through all rows for a given date and dumps the raw_data:dump out as a .dump file.
>+    The implementation opens up 16 scanners (one for each leading hex character of the salt)
>+    one at a time and returns all of the rows matching
>+    """
>+

The "10" argument to limited_iteration means that this method would never export more than 10 items.

>+    for row in self.limited_iteration(self.union_scan_with_prefix('crash_reports', date, ['raw_data:dump']),10):
>+      ooid = row_id_to_ooid(row['_rowkey'])
>+      if row['raw_data:dump']:
>+        file_name = os.path.join(path,ooid+'.dump')
>+        file_handle = None
>+        try:
>+          file_handle = open(file_name,'w')
>+        except IOError,x:
>+          raise
>+        try:
>+          file_handle.write(row['raw_data:dump'])
>+        finally:
>+          file_handle.close()
>+
>   def export_jsonz_tarball_for_date(self,date,path,tarball_name):
>     """
>     Iterates through all rows for a given date and dumps the processed_data:json out as a jsonz file.

Attachment #502064 - Flags: review?(deinspanjer) → review-

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 20

•

14 years ago

Attached patch Added method for sampled crash export (pseudocode) (obsolete) — Details — Splinter Review

Attachment #502086 - Flags: review?(rhelmer)

Attachment #502086 - Flags: review?(lars)

Robert Helmer [:rhelmer]

Assignee

Updated

•

14 years ago

Attachment #502064 - Flags: feedback?(lars)

K Lars Lohn [:lars] [:klohn]

Comment 21

•

14 years ago

Comment on attachment 502086 [details] [diff] [review]
Added method for sampled crash export (pseudocode)

the open for tf should happen outside the try..finally.  If something were to raise an exception before that tarfile.open line or if that tarfile.open line itself were to fail, an new exception would be raised in the 'finally' statement.  This would mask the original exception.

the variable 'index' is referenced without ever being initialized.

you might be able to reduce the amount of code here with the random.sample method call.  see http://docs.python.org/library/random.html for details

Attachment #502086 - Flags: review?(lars) → review-

K Lars Lohn [:lars] [:klohn]

Comment 22

•

14 years ago

Comment on attachment 502019 [details]
simple load-test setup/run script

rhelmer, in the "simple load-test setup/run script", you're invoking the submitter without telling what URL to submit to.  It'll default to staging unless you're changing the .../config/submitterconfig.py file's default.  Is this intentional?

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 23

•

14 years ago

(In reply to comment #21)
> Comment on attachment 502086 [details] [diff] [review]
> Added method for sampled crash export (pseudocode)
> 
> the open for tf should happen outside the try..finally.  If something were to
> raise an exception before that tarfile.open line or if that tarfile.open line
> itself were to fail, an new exception would be raised in the 'finally'
> statement.  This would mask the original exception.

Makes sense, I'm not actually sure how to cleanly handle having a try for opening the tf and a separate try for the temp file handles.. hopefully :rhelmer can figure it out. :)

> the variable 'index' is referenced without ever being initialized.

Oops.. that should be records_evaluated counter.
More importantly, I forgot to *increment* the records_evaluated counter.  That should happen at the end of the sampling loop, regardless of whether the record was selected for inclusion or not.  I didn't use a for x in enumerate() because we need the total count of records for all dates in the list rather than a separate counter for each list.  There might be a fancier way to use enumerate or a completely different way I didn't think of.

> you might be able to reduce the amount of code here with the random.sample
> method call.  see http://docs.python.org/library/random.html for details

The description of random.sample() sounds like it *might* be suitable, but I'm not sure if it could be passed an iterable and if it was passed one, whether it would try to keep the entire structure in memory.  I implemented a naive Reservoir sample (based on Wikipedia code example) specifically to avoid requiring all candidates to be in memory.  Didn't see anything in the pydocs that mentioned such an algo.

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 24

•

14 years ago

I just found this URL with the following code snippit:
http://data-analytics-tools.blogspot.com/2009/09/reservoir-sampling-algorithm-in-perl.html

for i,line in enumerate(input):
    if i < N:
        sample.append(line)
    elif i >= N and random.random() < N/float(i+1):
        replace = random.randint(0,len(sample)-1)
        sample[replace] = line

I wonder whether the condition "random.random() < N/float(i+1)" is equivalent to the built-in "random.randrange(records_evaluated)" that I used..

Robert Helmer [:rhelmer]

Assignee

Comment 25

•

14 years ago

Attached patch get psuedocode running (obsolete) — Details — Splinter Review

This just gets everything running, still testing. I have not taken the other comments in here into account yet.

Attachment #502086 - Attachment is obsolete: true

Attachment #502099 - Flags: review?(deinspanjer)

Attachment #502086 - Flags: review?(rhelmer)

K Lars Lohn [:lars] [:klohn]

Comment 26

•

14 years ago

re: random.sample, a quote from the doc:

"To choose a sample from a range of integers, use an xrange() object as an argument. This is especially fast and space efficient for sampling from a large population: sample(xrange(10000000), 60)."

xrange is a generator that does not have its whole population in memory. In order to be "space efficient" it would have to implement an algorithm similar to the one in wikipedia.

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 27

•

14 years ago

Comment on attachment 502099 [details] [diff] [review]
get psuedocode running

These edits look like they fix the glaring flaws Lars pointed out.
At this point, the index = 0 is superfluous.  I believe that tarfile.open can throw an exception which it might be clean to catch although it wouldn't cause any harm since this method is designed to be run from the commandline anyway.
Do the unlinks need to be more robust to clean up in the event of failure? It looks like never more than two files could be left behind in the case of an error, so doesn't seem critical.

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 28

•

14 years ago

(In reply to comment #26)
> re: random.sample, a quote from the doc:
> 
> "To choose a sample from a range of integers, use an xrange() object as an
> argument. This is especially fast and space efficient for sampling from a large
> population: sample(xrange(10000000), 60)."
> 
> xrange is a generator that does not have its whole population in memory. In
> order to be "space efficient" it would have to implement an algorithm similar
> to the one in wikipedia.

excellent, sounds like that is a naive implementation of reservoir that would work as substitute then.

Robert Helmer [:rhelmer]

Assignee

Comment 29

•

14 years ago

(In reply to comment #27)
> Comment on attachment 502099 [details] [diff] [review]
> get psuedocode running
> 
> These edits look like they fix the glaring flaws Lars pointed out.
> At this point, the index = 0 is superfluous.  I believe that tarfile.open can
> throw an exception which it might be clean to catch although it wouldn't cause
> any harm since this method is designed to be run from the commandline anyway.
> Do the unlinks need to be more robust to clean up in the event of failure? It
> looks like never more than two files could be left behind in the case of an
> error, so doesn't seem critical.

I would not mind having them left around for investigation in the event of a failure. I have fixed the other two issues, will attach after re-testing.

Robert Helmer [:rhelmer]

Assignee

Comment 30

•

14 years ago

Attached patch get pseudocode running (obsolete) — Details — Splinter Review

This passes simple testing against staging, and addresses the two issues dre brought up (open tar in the loop, remove vestigial index).

Attachment #502099 - Attachment is obsolete: true

Attachment #502099 - Flags: review?(deinspanjer)

Robert Helmer [:rhelmer]

Assignee

Updated

•

14 years ago

Attachment #502064 - Attachment is obsolete: true

Robert Helmer [:rhelmer]

Assignee

Comment 31

•

14 years ago

(In reply to comment #30)
> brought up (open tar in the loop, remove vestigial index).

Meant "inside the try", not in the loop (it's only opened once, before the loop starts).

Robert Helmer [:rhelmer]

Assignee

Comment 32

•

14 years ago

(In reply to comment #22)
> Comment on attachment 502019 [details]
> simple load-test setup/run script
> 
> rhelmer, in the "simple load-test setup/run script", you're invoking the
> submitter without telling what URL to submit to.  It'll default to staging
> unless you're changing the .../config/submitterconfig.py file's default.  Is
> this intentional?

Good point. We probably want to change this, either hardcoded in this script or as a variable at the top to make it more discoverable.

Robert Helmer [:rhelmer]

Assignee

Updated

•

14 years ago

Attachment #502106 - Flags: review?(lars)

Attachment #502106 - Flags: feedback?(deinspanjer)

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 33

•

14 years ago

Comment on attachment 502106 [details] [diff] [review]
get pseudocode running

Looks good to me from a logic point.

Attachment #502106 - Flags: feedback?(deinspanjer) → feedback+

K Lars Lohn [:lars] [:klohn]

Comment 34

•

14 years ago

Attached patch a patch that adds the random raw crash tarball fetching to hbaseClient.py (obsolete) — Details — Splinter Review

okay, I've been looking at this code and decided to implement it myself.  I think I made a cleaner implementation of the reservoir algorithm.

Shocked to discover that the tarfile object doesn't support the context and the with statement.

Attachment #502148 - Flags: review?(rhelmer)

Attachment #502148 - Flags: review?(deinspanjer)

Robert Helmer [:rhelmer]

Assignee

Comment 35

•

14 years ago

Comment on attachment 502148 [details] [diff] [review]
a patch that adds the random raw crash tarball fetching to hbaseClient.py

LGTM, much more readable. Found 3 small issues (one of which you did not introduce, and I was just looking into when you posted this); I have tested this code with hacks in place for these and it works as intended:

>Index: socorro/storage/hbaseClient.py
>===================================================================
>--- socorro/storage/hbaseClient.py	(revision 2858)
>+++ socorro/storage/hbaseClient.py	(working copy)


Need "import random"


>@@ -736,6 +736,49 @@
>     self.client.mutateRow('crash_reports_index_signature_ooid', sig_ooid_idx_row_key,
>                           [self.mutationClass(column="ids:ooid",value=ooid)])
> 
>+  def export_sampled_crashes_tarball_for_dates(self,sample_size,dates,path,tarball_name):
>+    """
>+    Iterates through all rows for given dates and dumps json and dump for N random crashes.
>+    The implementation opens up 16 scanners (one for each leading hex character of the salt)
>+    one at a time and returns all of the rows randomly selected using a resovoir sampling algorithm
>+    """
>+    sample_size = int(sample_size)
>+    dates = str.split(dates, ',')
>+    def gen():
>+      """generate all rows for given dates"""
>+      for date in dates:
>+        for id_row in self.union_scan_with_prefix('crash_reports', date, ['ids:ooid']):
>+          yield id_row
>+    row_gen = gen() #start the generator
>+    # get inital sample
>+    ooids_to_export = [x for i, x in itertools.izip(range(sample_size), row_gen)]
>+    # cycle through remaining rows
>+    for records_evaluated, id_row in enumerate(row_gen):
>+      # Randomly replace elements with decreasing probability
>+      rand = random.randrange(records_evaluated)


records_evaluated will be 0 first time through the loop, which will throw a ValueError if passed to random.randrange()


>+      if rand < sample_size:
>+        ooids_to_export[rand] = id_row['ids:ooid']
>+
>+    # open output tar file
>+    tf = tarfile.open(tarball_name, 'w:gz')
>+    try:
>+      for ooid in ooids_to_export:
>+        json_file_name = os.path.join(path, ooid+'.json')
>+        json_file_handle = None
>+        dump_file_name = os.path.join(path, ooid+'.dump')
>+        dump_file_handle = None
>+        with open(json_file_name,'w') as json_file_handle:
>+          with open(dump_file_name,'w') as dump_file_handle:
>+            row = self.get_raw_report(ooid)
>+            json.dump(row['meta_data:json'],json_file_handle)


I think we just want to write the string as-is here, otherwise we end up with the whole string quoted, and quotes escaped (because this is a string not a json object).

Or, you could do json.load(row['meta_data:json']) and then json.dump that... this would add some time but would ensure that we're dumping valid JSON. I don't really have an opinion on which we do, dumping the string is what I did in my local copy and I am ok with that.

Attachment #502148 - Flags: review?(rhelmer) → review-

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 36

•

14 years ago

Sorry, the code certainly looks a lot more pythonic now, but it is also not quite as easy for me to understand so my questions and concerns might be way off track.

It looks like initial assignment to ooids_to_export is using izip to create a list of sample_size tuples with the first part being an incrementing integer that matches the index of the list, and the second part being the result of the generator (i.e. id_row).  I have three questions about this:
1. Why do you want the first part of the tuple? I don't see how it is used.
2. Wouldn't you want the second half of the tuple to be id_row['ids:ooid'] rather than the row dict itself?
3. Is the counter variable records_evaluated ever initialized with the size of this list or does it start at 0 in the loop below? If it starts at 0 below then it is giving the first N records a slightly lower chance of appearing in the sampled result set.

Robert Helmer [:rhelmer]

Assignee

Comment 37

•

14 years ago

(In reply to comment #36)
> 2. Wouldn't you want the second half of the tuple to be id_row['ids:ooid']
> rather than the row dict itself?

For larger sample sizes (500 seems to reliably reproduce it), I get:

Traceback (most recent call last):
  File "./socorro/storage/hbaseClient.py", line 941, in <module>
    connection.export_sampled_crashes_tarball_for_dates(*args)
  File "./socorro/storage/hbaseClient.py", line 767, in export_sampled_crashes_tarball_for_dates
    json_file_name = os.path.join(path, ooid+'.json')
TypeError: unsupported operand type(s) for +: 'dict' and 'str'

Still debugging, but I think it's due to Daniel's point #2 above.

Robert Helmer [:rhelmer]

Assignee

Comment 38

•

14 years ago

Attached patch a patch that adds the random raw crash tarball fetching to hbaseClient.py (obsolete) — Details — Splinter Review

I think this resolves the problems I found, as well as the points Daniel brought up, and seems to test ok at a sample size of 500. Let me know if I missed anything.

Attachment #502148 - Attachment is obsolete: true

Attachment #502164 - Flags: review?(lars)

Attachment #502164 - Flags: feedback?(deinspanjer)

Attachment #502148 - Flags: review?(deinspanjer)

K Lars Lohn [:lars] [:klohn]

Comment 39

•

14 years ago

Attached patch a patch that adds the random raw crash tarball fetching to hbaseClient.py — Details — Splinter Review

I changed three things on this round:

the initial list comprehension now uses xrange instead of range.  For larger values of 'sample_size', this will be more efficient.

refactored the symbol 'records_evaluated' to be simply 'i' because it no longer stands for the total number of records evaluated.  It is now the index of the record being evaluated beyond the initial 'sample_size' number of records.

I changed the random number generation to be 'i + sample_size' in an attempt to restore the statistical integrity and assure an even distribution.  Unfortunately, as I write this, I realize that it should likely have been 'i + sample_size + 1', but I'm too lazy to correct it on a Friday night when I really should be lounging in front of the fire in the yurt.

Attachment #502164 - Attachment is obsolete: true

Attachment #502175 - Flags: review?(rhelmer)

Attachment #502175 - Flags: review?(deinspanjer)

Attachment #502164 - Flags: review?(lars)

Attachment #502164 - Flags: feedback?(deinspanjer)

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 40

•

14 years ago

Comment on attachment 502175 [details] [diff] [review]
a patch that adds the random raw crash tarball fetching to hbaseClient.py

Actually, I believe i+sample_size is correct because the wikipedia article explicitly says it is a range between 0 and index - 1.

This looks good to me now.

Attachment #502175 - Flags: review?(deinspanjer) → review+

Robert Helmer [:rhelmer]

Assignee

Updated

•

14 years ago

Attachment #502175 - Flags: review?(rhelmer) → review+

Robert Helmer [:rhelmer]

Assignee

Updated

•

14 years ago

Attachment #502106 - Attachment is obsolete: true

Attachment #502106 - Flags: review?(lars)

Robert Helmer [:rhelmer]

Assignee

Comment 41

•

14 years ago

Comment on attachment 502175 [details] [diff] [review]
a patch that adds the random raw crash tarball fetching to hbaseClient.py

Looks good, seems to work with sample size of 1000. Thanks guys!

I'm going to fire up a test run against staging (just 1 node) and make sure collector accepts them.

K Lars Lohn [:lars] [:klohn]

Comment 42

•

14 years ago

something that I didn't notice on that last iteration of the code: what is the purpose of the setting the two file handles to None just before the with statements?  To my eye, it seems unnecessary.  The with statements themselves will introduce the names into the local scope.

Robert Helmer [:rhelmer]

Assignee

Comment 43

•

14 years ago

Attached file simple load-test setup/run script (take 2) (obsolete) — Details

* make COLLECTOR_URL configurable
* add cleanup function
* do less as root
* log stdout/stderr separately
* handle crash.tar.gz as generated by hbaseClient.export_sampled_crashes_tarball_for_dates

Attachment #502019 - Attachment is obsolete: true

Attachment #502200 - Flags: review?(lars)

Attachment #502019 - Flags: review?(lars)

Attachment #502019 - Flags: feedback?(jdow)

Robert Helmer [:rhelmer]

Assignee

Updated

•

14 years ago

Attachment #502200 - Attachment mime type: application/x-sh → text/plain

Robert Helmer [:rhelmer]

Assignee

Comment 44

•

14 years ago

(In reply to comment #42)
> something that I didn't notice on that last iteration of the code: what is the
> purpose of the setting the two file handles to None just before the with
> statements?  To my eye, it seems unnecessary.  The with statements themselves
> will introduce the names into the local scope.

I agree, these should be removed. Looks left over from when there was an inner "try" (such as attachment 502106 [details] [diff] [review]); I also tested and seems to work fine with this change.

Robert Helmer [:rhelmer]

Assignee

Updated

•

14 years ago

No longer blocks: 619815

Robert Helmer [:rhelmer]

Assignee

Comment 45

•

14 years ago

Comment on attachment 502175 [details] [diff] [review]
a patch that adds the random raw crash tarball fetching to hbaseClient.py

Lars would you mind landing this? I have been testing with crashes pulled from staging, but we may as well get this run from production at this point.

K Lars Lohn [:lars] [:klohn]

Comment 46

•

14 years ago

landed as r2859.

Robert Helmer [:rhelmer]

Assignee

Comment 47

•

14 years ago

(In reply to comment #46)
> landed as r2859.

Hrm just auto-installed in staging (RHEL5.5, py2.4) and it fails on import due to use of the "with" statement:

"""
Traceback (most recent call last):
  File "/data/socorro/application/scripts/startProcessor.py", line 12, in ?
    import socorro.processor.externalProcessor as processor
  File "/data/socorro/application/socorro/processor/externalProcessor.py", line 15, in ?
    import processor
  File "/data/socorro/application/socorro/processor/processor.py", line 26, in ?
    import socorro.storage.crashstorage as cstore
  File "/data/socorro/application/socorro/storage/crashstorage.py", line 21, in ?
    import socorro.storage.hbaseClient as hbc
  File "/data/socorro/application/socorro/storage/hbaseClient.py", line 770
    with open(json_file_name,'w') as json_file_handle:
            ^
SyntaxError: invalid syntax
"""

Sorry I should have realized this would happen on import, even though we'll never actually execute that function; since we're doing 1.7.6 on current production, we should probably support python 2.4 here instead (rewriting this as try/except is probably the most straightforward way).

Robert Helmer [:rhelmer]

Assignee

Comment 48

•

14 years ago

After consulting with dre and xstevens, I launched 10 instances of hbaseClient and pulled roughly 240k the crashes over the period of January 1st-10th (roughly 10% of the total):

for date in 110101 110102 110103 110104 110105 110106 110107 110108 110109 110110
do 
  mkdir -p /tmp/test/${date}
  python ./application/socorro/storage/hbaseClient.py -h ${prod} export_sampled_crashes_tarball_for_dates 24000 ${date} /tmp/test/${date} crashes-${date}.tar.gz > ${date}.log 2>&1 & 
done

This was partitioned into 10 processes for performance reasons, but I just realized that the seamicro nodes only have ~8GB disk space free so it turned out to be advantageous to have this split into smaller files (they are about 500MB each, ~3GB uncompressed). I went ahead and modified the loadtest setup script to distribute the crashes.tar.gz files amongst the machines, so we'll have enough disk space to store these uncompressed.

We could instead have the submitter extract the files on-demand, but I think optimizing for performance over disk space is a better trade-off.

Robert Helmer [:rhelmer]

Assignee

Comment 49

•

14 years ago

Attached file simple load-test setup/run script (take 3) (obsolete) — Details

distribute crash files amongst nodes (e.g. crashes-11010${n:1}.tar.gz where n is 01 through 40)

Attachment #502200 - Attachment is obsolete: true

Attachment #502200 - Flags: review?(lars)

Robert Helmer [:rhelmer]

Assignee

Comment 50

•

14 years ago

One thing I am missing here - we now have ~5GB of crashes in MPT that we need to get over to PHX. I can't seem to connect to the seamicro nodes from anywhere except my machine (via VPN) which is on a very slow DSL.

Can pm-app-collector01 be temporarily allowed to connect to the seamicro nodes?

Robert Helmer [:rhelmer]

Assignee

Comment 51

•

14 years ago

Attached file simple load-test setup/run script (take 4) — Details

* copy and unpack crashes in parallel
* note that the simple way crashes are distributed assumes the crash files are numbered 0-9 not 1-10... worked around by symlinking the Jan 10th crash:
ln -s crashes-110110.tar.gz crashes-110100.tar.gz

Attachment #503073 - Attachment is obsolete: true

Robert Helmer [:rhelmer]

Assignee

Comment 52

•

14 years ago

(In reply to comment #51)
> Created attachment 503222 [details]
> simple load-test setup/run script (take 4)
> 
> * copy and unpack crashes in parallel
> * note that the simple way crashes are distributed assumes the crash files are
> numbered 0-9 not 1-10... worked around by symlinking the Jan 10th crash:
> ln -s crashes-110110.tar.gz crashes-110100.tar.gz

BTW currently the script is only using 39 nodes, since I am using 01 as the controller. We can safely use 01, but I need to remove the crashes-*.tar.gz files to free up enough disk space, and I want to test the other nodes before I do this.

Robert Helmer [:rhelmer]

Assignee

Updated

•

14 years ago

Status: REOPENED → RESOLVED

Closed: 14 years ago → 14 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

13 years ago

Component: Socorro → General

Product: Webtools → Socorro

simple load-test setup/run script 14 years ago Robert Helmer [:rhelmer] 2.14 KB, text/plain	laura : review+	Details
export_dump_for_date 14 years ago Robert Helmer [:rhelmer] 1.99 KB, patch	dre : review-	Details \| Diff \| Splinter Review
Added method for sampled crash export (pseudocode) 14 years ago Daniel Einspanjer [:dre] [:deinspanjer] 2.68 KB, patch	lars : review-	Details \| Diff \| Splinter Review
get psuedocode running 14 years ago Robert Helmer [:rhelmer] 3.69 KB, patch		Details \| Diff \| Splinter Review
get pseudocode running 14 years ago Robert Helmer [:rhelmer] 3.68 KB, patch	dre : feedback+	Details \| Diff \| Splinter Review
a patch that adds the random raw crash tarball fetching to hbaseClient.py 14 years ago K Lars Lohn [:lars] [:klohn] 3.19 KB, patch	rhelmer : review-	Details \| Diff \| Splinter Review
a patch that adds the random raw crash tarball fetching to hbaseClient.py 14 years ago Robert Helmer [:rhelmer] 3.28 KB, patch		Details \| Diff \| Splinter Review
a patch that adds the random raw crash tarball fetching to hbaseClient.py 14 years ago K Lars Lohn [:lars] [:klohn] 3.26 KB, patch	rhelmer : review+ dre : review+	Details \| Diff \| Splinter Review
simple load-test setup/run script (take 2) 14 years ago Robert Helmer [:rhelmer] 2.81 KB, text/plain		Details
simple load-test setup/run script (take 3) 14 years ago Robert Helmer [:rhelmer] 2.87 KB, text/plain		Details
simple load-test setup/run script (take 4) 14 years ago Robert Helmer [:rhelmer] 2.98 KB, text/plain		Details