Category: Uncategorized

That “Oh shit” moment

3 July 2013 at 10:11 PMCategory:Uncategorized

A previous encounter with a failed upgrade that required recovery (another story for another time) left a sour taste in my mouth regarding the state of backups that were in place. As a brief overview, a nightly script attempts to iterate over every database that exists on the read-only slave and, assuming the database is not set to replicate-ignore-db, it tries to run mysqldump against the database. That’s it. No frills, bells, or whistles. Seems fairly strait forward and simple, right? Well, this is no where near good enough for the environment that I work on. First off, if the database is indeed in replicate-ignore-db, the script locks and dumps the data from the live master…. every night…. That means that the database is essentially down without notice every night for as long as the backup takes. For most databases on our servers, this goes unnoticed as they are < 5G and only see maybe a minute of impact. For others though…. well 150G data sets don’t dump instantly. Second, if you read closely, I mentioned that it “tries” to dump the database, if it runs into any issues it fails silently leaving an unusable backup. Third, notice that I didn’t mention a thing about bin-logs, that’s because they don’t get backed up. If this means nothing to you yet, this means that point in time recovery is impossible. And lastly, I mentioned in a previous post that the environment is currently undergoing an upgrade from 5.0 to 5.1. Well if you are not in the know, 5.0 only support statement replication. This setting carried over to 5.1 for the purposes of upgrading (I am hoping to change this in the near future) and thus we have a fair amount data drift that virtually never gets corrected. That’s right! No data consistency checks. This whole setup reminds me of a common FPS phrase “spray ‘n pray”.

So over the past weeks, I have been devising a replacement solution. The solution has to accomplish the following:

  • reliable recovery
  • quick recovery
  • consistent recovery
  • no customer impact
  • ability to give customers text based dumps
  • back up nightly
  • restore from up to 72 hrs ago to any point in time

So with all this information, I got to work…

Problem #1 no customer impact

This was fortunately mostly already solved for me. We already have slaves replicating from the master that we can take backups from at our leisure.

 

Problem #2 consistent recovery

This definitely needs some attention as we already have some pretty serious data drift. Couple that with customers who are at liberty to write any non deterministic query they so choose… you have a recipe for a shit time. If you don’t already know what a non-deterministic query is, it’s a query that returns different results depending on when it was ran. a basic example would be “INSERT INTO `table` (`var1`) VALUES (UUID());”. As you can see this would be different every time, that includes the replicating slave if it is set to replicate in statement mode. The answer? Well, the one I am looking at is firstly, switch to mixed mode replication. This should replicate most non-deterministic queries as the affected rows data rather than the statements to “attempt” to recreate said data. The second part is to use pt-table-checksum to verify the consistency of each database at a minimum of once every 3 days and generate alerts whenever inconsistencies are found. As for automatically rectifying these inconsistencies, that part is up in the air for now as pt-table-sync’s default method of repair has the potential to destroy your known good copy.

 

Problem #3 reliable recovery

As I mentioned earlier, the currently implemented solution only “tries” to create a backup, there is know validation and no reporting if one fails and therefor no reliable recovery. So, I say, lets strip this part out. Let’s instead look towards LVM for binary backups, take the snapshot, move it off the server, delete the snapshot…. profit! There are 2 routes that this can be done, and I have ultimately not made a decision which is better yet, but here they are. Option 1) shutdown MySQL, take the snapshot, start MySQL ….ect Option 2) stop the slave thread, flush tables, take the snapshot, start the slave thread (optionally start a second MySQL instance on the snapshot to go through innodb recovery). Let me touch a little on the pros and cons. Option 1 has the advantage of not having to worry about crash recovery at all  and being more simplistic, as well as sharing the same fate as the backup, if MySQL comes up cleanly then the backup should as well. On the down side, option 1 also flushes all the caches so in the need of fail over it may take longer to get up to speed, also, logic would have to be built in to ensure that MySQL is not in a failed over state before shutting it down. You definitely don’t want to bring down your DR box when you are already in a DR situation. Lastly, this would flood our monitoring agents with false alarms every night that MySQL was restarted so these alerts would need to be ignored, but then what if there where a legit restart that got missed? On to option 2…. Option 2 is slightly more complex, though not much. The caches stay warm so the box is ready for fail over at any time. No need for logic to ensure that the box is not live, even if it is, a snapshot will degrade performance but not bring it down (in this environment, that is acceptable). Cons though include either going through innodb crash recovery or assuming that the backup can be restored from easily (I prefer the former), there is also the possibility that the slave may get stopped midway though an operation loading data from a tmp table/location that did not get completed and the tmp data would not get backed up. All in all I am leaning towards Option 2.

 

Problem #4 quick recovery

Having binary backups from LVM accomplishes this nicely. It doesn’t get much quicker than copying the data files into place and turning on MySQL.

 

Problem #5 ability to give customers text based dumps

Now this is where some trickery is needed. Since the system backups are binary copies of the data files, we can’t easily just hand them over to customers, especially if they are using innodb (which is our default storage engine). Customers also have come to expect that we will provide them a historical dump upon request even though we clearly state that backups are THEIR responsibility. But…. lets see what we can do. So this piece is still very much a work in progress. As I have mentioned before, one server may have hundreds of databases on it. Keeping that in mind, my options have to preserve the backup in its original backup-y form. My current thought is to export an iSCSI target to each MySQL instance that is used to offload the nightly snapshots. These iSCSI targets would use LVM and whenever a dump is requested, a snapshot of the backup would be taken and mounted R/W to a recovery location, lets say, /mnt/recover-<db_name> (gotta account for multiple concurrent recoveries). So with the snapshot mounted in RW a “recovery” instance of MySQL is started on the recovery location and bin-logs are played to the point in time requested. After bin-logs get ran, in comes mysqldump to spit out all of the customer data to a sql file which gets handed off to the customer. Then, like good lil’ children, we clean up our mess by un-mounting and deleting the snapshot that was used for recovery leaving us the original backup in its pristine state. Like I said though, this is very much a work in progress. A few of the down sides include: hundreds of iSCSI targets, wasted space between each target needed for snapshots, expensive (but that part just comes with binary backups).

 

Problem #6 nightly backups with 72hr retention

Now this is simply a policy, not much to flush out here, except, now that we are talking about binary backups rather than gzip’d text dumps, we are looking at a 300%+ increase in storage needs (assuming no compression of the binary files). I don’t have a solution for this except… buy storage…. I know, it sucks, but that’s what I got.

So that’s a quick overview of how I plan to revamp the backup plan for my current environment. Got any ideas or suggestions? Let me know!

Scott

Upgrades

26 June 2013 at 9:12 AMCategory:Uncategorized

Today I thought I would start to catch everyone up to whats going on now rather than staying in the past. So I thought I would write a bit about how I am doing the upgrades of hundreds of MySQL 5.0 boxes. Well, let me tell you, it’s not all sunshine and rainbow farts.

Let’s start with what didn’t work….. binary upgrades. Let me start off saying that this may be a feasible option for you if you only have a hand full of servers and databases and are running a stock build of MySQL, but for me this would not work. The 5.0 binary that I was working with was in fact a custom build  with a subset of Percona patches pulled in. Our 5.1 binary is a full blown Percona 5.1 build. With the hundreds of servers that we had to upgrade it was not reasonable to expect an unknown amount of failures that would require manual intervention that deviated from a standard process. In fact, the DBA before me had actually tried this and, on the first production box we attempted this on, we hit a bug with temp tables not getting cleaned up on shutdown and offsetting every innodb tablespace id causing mass corruption. This was a 3 day recovery….

So, armed with this, and coming into this “new roll” I searched and searched for a reliable way to make binary upgrades happen safely. Using intermediate versions…. bringing the new box up in innodb recovery…. and other things that are escaping me at the moment. (Ohh, and did I mention that the existing backups solution was unreliable at best, the slaves have no verification of data drift, and there is currently no mechanism for file system snapshots in place?) But what I learned was that this just is flat out unsafe. In fact the docs completely recommend against it! So, it was decided that the upgrade process would be a dump and import, but how can we do this with hundreds of servers and each server containing hundreds of databases ensuring up-time and consistency?

Fortunately for me, around the same time, we had also decided to convert our slaves to VM’s and consolidate them 3 to a box. Doing this freed up ~1/3 of the hardware. This came in very handy when it was decided that our process would be to stand up a new 5.1 box and dump all of the 5.0 data into the 5.1 and promote it to the new master. I had all the hardware I needed to play with.

Let me give you a bit of an idea of the process that was used:

  • Kick a new HW box with MySQL 5.1 (with lvm)
  • set up the master as a slave to the 5.0 master ignoring all db’s
  • lock a user out of their db on the 5.0 master
  • stop the slave thread on 5.1
  • dump the given db
  • unlock the user
  • import the db
  • remove the ignore db line
  • restart mysql5.1
  • rinse and repeat through all the databases on the server (this was all scripted)

Once all the data was copied, pt-table-checksum was used to verify consistency of the new master. Reconciling any issues was a different story though. This was very open ended and ranged from manual corrections, to re dump/importing the database, or pt-table-sync (which by the way I dislike as this tool, by default, overwrites your *known good* masters data to replicate through to the slave so you potentially destroy your good data, if you use it, use –print and throw it to a file to run directly on the slave). Then we just wait till maintenance time…

Before the maintenance work let me explain one more thing about how we run MySQL. A MySQL pair is set up with three IPs: the master primary, the slave primary, and a floating “virtual” IP (VIP) as a secondary. All of the traffic is routed to the VIP and the primary IPs are used only for replication and management. ONWARD!

  • flip the VIP to the 5.0 slave in read-only
  • verify the slave thread on 5.1 is caught up
  • reset slave on 5.1
  • run mysql_upgrade (needed to fix permissions and UDFs as the mysql DB was imported from the 5.0 schema)
  • reset master
  • shutdown mysql5.1
  • take an lvm snapshot
  • start mysql
  • flip the vip to 5.1

At this point, the impacting portion of the maintenance was over, all that was left was to set up a new 5.1 slave!

  • re-kick the old 5.0 slave as 5.1 (with lvm)
  • rsync the data from the 5.1 snapshot created during the maintenance
  • change master to ….
  • slave start
  • go back and delete the lvm snapshot from the 5.1 master

DONE!

Now this may seem like a fair amount of work, and it is, but its much less work than trying to go through recovery with a bunch of unknowns as well as leaving our old 5.0 master completely untouched if the need for a rollback or any sort of recovery should arise. This also gave us the chance to resolve some technical debt that we once had such as servers that were set up without innodb_file_per_table so the ibdata1 file was unnecessarily taking up large amounts of disk space.

I would also like to note, that with a slight modification to the process, this could be a nearly zero downtime maintenance, loosing only current sessions. The differences being that you would run mysql_upgrade and take your lvm snapshot prior to the maint and not reset slave. Then when MySQL comes back up, you would need to re-import all MEMORY tables and flip the VIP strait to 5.1. Completely reasonable to do a no downtime upgrade, though we decided that the extra effort of finding all of the MEMORY tables and ensuring their consistency last minute wasn’t worth the work, 5 minutes of read only time was acceptable.

So, now that you know the process, and you know the volume of servers to upgrade, how many do you think we did a week? 1? 3? 5? 10? 20?…. All wrong! At the end of our first DC we had done 42 servers in 1 day…. 2 of us. Now, in all fairness, lessons we learned from this is that 42 a day is unsustainable. That amount of irons in the fire is to many to track for 2 people. What is sustainable is closer to 5-10 per person. And that brings us to where we are today. A third of the infrastructure is upgraded with two thirds to go and as soon as the slaves have been consolidated for the remaining boxes, we will be back to 2-3 of us doing 5-10 boxes per person, per day, Monday through Thursday.

So there you have it, how we do upgrades on a large MySQL environment.

Let me know if you have any thoughts, questions, ideas!

Scott

A taste of MySQL wat…

25 June 2013 at 8:36 AMCategory:Uncategorized

Here are just a few things that I have seen so far that make me …. wat… out loud.

mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| 123e45_table   |
+----------------+
3 rows in set (0.00 sec)

mysql> select * from 123e45_table;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use near
'123e45_table' at line 1

mysql> wat...
mysql> select * from `123e45_table`;
Empty set (0.00 sec)

mysql> select 123e45;
+----------+
| 123e45   |
+----------+
| 1.23e+47 |
+----------+
1 row in set (0.00 sec)

It turns out that MySQL interprets [0-9]+e[0-9]+ as scientific notation so make sure that you have your quotes.
Now for round 2….

mysql> create table test (`test               
    `> column` int);
Query OK, 0 rows affected (0.02 sec)

mysql> show create table test;
+-------+---------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                      |
+-------+---------------------------------------------------------------------------------------------------+
| test  | CREATE TABLE `test` (
  `test
column` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-------+---------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

wat…
You saw right, MySQL does indeed allow for newlines in the table schema. Enjoy that one.

Off we go again for some more joy and fun!

mysql> create table test (`id` float);
Query OK, 0 rows affected (0.02 sec)

mysql> insert into test values(-0),('-0');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from test;
+------+
| id   |
+------+
|    0 | 
|   -0 | 
+------+
2 rows in set (0.00 sec)

wat…
For me this one was particularly annoying as mysqldump does not quote numerical values magically turning a negative zero into a positive zero… TADA!!

Next up, replication…

slave1> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 28376
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000004
          Read_Master_Log_Pos: 6775
               Relay_Log_File: mysql_sandbox28377-relay-bin.000487
                Relay_Log_Pos: 6920
        Relay_Master_Log_File: mysql-bin.000004
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: test
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 6775
              Relay_Log_Space: 7131
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
1 row in set (0.00 sec)

slave1> drop database test;
Query OK, 4 rows affected (0.04 sec)

slave1> Bye

master> create table test.new (id int);
Query OK, 0 rows affected (0.02 sec)

master> Bye
slave1> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 28376
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000004
          Read_Master_Log_Pos: 6864
               Relay_Log_File: mysql_sandbox28377-relay-bin.000489
                Relay_Log_Pos: 251
        Relay_Master_Log_File: mysql-bin.000004
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: test
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1049
                   Last_Error: Error 'Unknown database 'test'' on query. Default database: ''. Query: 'create table test.new (id int)'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 6775
              Relay_Log_Space: 508
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1049
               Last_SQL_Error: Error 'Unknown database 'test'' on query. Default database: ''. Query: 'create table test.new (id int)'
1 row in set (0.00 sec)

wat…
So, turns out that MySQL will only actually ignore queries on a database if the default databases has been set… very annoying when you don’t control the queries coming into the server.

Well I think that is enough wat for one day. Do you have more wat to share? Let me know!

Scott

Inheritance…..

24 June 2013 at 8:19 AMCategory:Uncategorized

Since this is my first post I thought that I might share some beginnings of where this blog comes from and the intentions for it.

Around April 2013 I inherited MySQL infrastructure containing several hundred MySQL 5.0 master-slave pairs. Yup, you read right, 2013 and these boxes were all running 5.0. So, before I get to far ahead of myself, lets back up a few years.

The year was 2010 and I was hired in to the Rackspace family as a Linux Systems Operations Administrator II, on 3rd shift non the less, and to all of the night walkers out there, you are loved. So 3rd shift breeds an interesting bunch, and I had worked with several. A very intelligent group of people that inspired me to push myself to new heights and dig my feet in deeper. Well, ~1.5 years later and my feet are nice and wet cutting my teeth on a fairly large and complex beast of a network and I find myself moving to 1st shift, then shortly there after, up the ranks to Ops III. All is going well, but I’m finding out quickly that I am getting rather bored with the day in – day out work of Ops. Low and behold an Engineering requisition opens up for the very product I have been working on for the past ~2 years! So I apply, get hired, and just like that, things start to find that I am back with a group of people inspiring me to do more!

Little did I know that almost 3 months later, I would be inheriting the entire MySQL infrastructure. Around January 2013, our DBA leaves the team. “Why Whatever are we to do!!!” That’t how everyone else remembers this going down. Me? I remember it more like “Fuck it, let the FNG do it!”

So there I am, 3 months green on the engineering team and I get all the technical debt that comes with a MySQL infrastructure that is hundreds of servers deep that has not been tuned in probably 3 years plus (even though hardware has been replaced and loads change near daily). My response to all this you ask? FML! So there I am faced with a decision…. I can 1) run screaming like a little bitch (probably the most sane response) 2) bend over and let the environment rape the will to live out of me 3) make these boxes my bitch and own the shit out of em’. I (perhaps stupidly) chose the 3rd option.

So here we are end of June and thus far I have successfully upgraded 1/3 of the infrastructure to at least a slightly more respectable version of …… 5.1 FML again. Turns out there was a decision before I inherited this mess that we would be going to 5.1, not newer. So, here I am today tasked with finishing off upgrading the other 2/3’s of the infrastructure, tuning the servers to not be steaming piles, as well as performing other Linux-y duties…. Oh and try to keep my sanity.

So, to get to the point, this blog is to document where my time and sanity has gone, and perhaps to save some of you from the same pit holes, or perhaps, for you to save me from my own pit holes.

In short, here is a list of project that I hope to complete in the coming months. By no means is this a complete list, but rather a launching pad.

  • Completing the upgrade of the infrastructure to 5.1
  • Develop a point-in-time backup procedure
  • Go down the path to 5.5
  • Improve tuning of the environment to account for new (to me) features/hardware in MySQL and the present workloads
  • Develop a plan for no/minimal downtime cross server database migrations

….. well that’s a good start, we will see how its goes.

Let me know what ideas you all would like to see!

Scott