Ruby Forum Rails-core (closed, excessive spam) > Unsigned integers for MySQL patch

Posted by rob-twf (Guest)
on 04.06.2008 09:53
(Received via mailing list)
Hi all,

I posted this patch the other day, now it's time to drum up some
support for it :D
http://rails.lighthouseapp.com/projects/8994/tickets/300-unsigned-integers-for-mysql

To summarise, it does the following:

* changes the primary key data type for the MySQL adapter to be
"int(11) unsigned"

* changes the references/belongs_to 'sexy' migration methods to
generate unsigned foreign key fields by default when using MySQL
(other adapters remain unchanged)

* adds an :unsigned option (defaults to false) to table creation and
modification migrations

* updates the SchemaDumper to understand unsigned integer fields


If this sounds good to you, please take a look!

--
Rob Anderton
http://www.thewebfellas.com/
Posted by Tarmo Tänav (Guest)
on 04.06.2008 12:17
(Received via mailing list)
On K, 2008-06-04 at 00:52 -0700, rob-twf wrote:
> Hi all,
> 
> I posted this patch the other day, now it's time to drum up some
> support for it :D
> http://rails.lighthouseapp.com/projects/8994/tickets/300-unsigned-integers-for-mysql
> 
> To summarise, it does the following:
> 
> * changes the primary key data type for the MySQL adapter to be
> "int(11) unsigned"

Hi,

Why do you need unsigned integers and what happens if the values go
outside the range of signed integers, will rails fall back to
BigDecimal?


--
Tarmo T�nav <tarmo@itech.ee>
Posted by rob-twf (Guest)
on 04.06.2008 15:26
(Received via mailing list)
One of the main advantages would be that it allows you to optimise
your database structure, for example, if you want to store a value up
to 4294967295 in an integer field you can either do it with a signed
bigint (which needs 8 bytes of storage) or in an unsigned integer
(which only needs 4 bytes). While a 4 byte saving may not sound huge,
it can quickly add up and can also have a knock-on effect for indices
on those fields.

Of course this applies to other integer types in MySQL too: for
example a 2 byte signed smallint vs a 1 byte unsigned tinyint to store
values up to 255, etc.

Rails will use a Bignum class for values above 2147483647, it'll
continue to use Fixnum for smaller unsigned attributes.

As a practical example: I originally started work on this because I
wanted to store IPv4 addresses in my database, and to do that I really
needed an unsigned 32 bit integer.

Hope this helps...


--
Rob Anderton
http://www.thewebfellas.com/
Posted by Tarmo Tänav (Guest)
on 04.06.2008 16:17
(Received via mailing list)
On K, 2008-06-04 at 06:25 -0700, rob-twf wrote:
> As a practical example: I originally started work on this because I
> wanted to store IPv4 addresses in my database, and to do that I really
> needed an unsigned 32 bit integer.
> 
> Hope this helps...

Yes, I agree that for IPv4 it is indeed useful.
But I do have a few more questions.

1. If unsigned was made default and a project that was built before this
started using the newer migration code that would mean that part of
their database uses unsigned and part uses signed integers.

2. Does mysql support changing the type of a column (including a primary
key column) from signed to unsigned? Would rails' change_column be able
to do it?

3. I think mysql does not support FK's between different integer
types, so wouldn't making unsigned the default cause problems
for all new FK fields (and their constraints) that refer to existing
tables (with signed int primary keys)?

Apart from the possibility of storing IPv4 addresses the usefulness
of this seems quite limited, most applications probably do not have
tables that touch the limit of 32bit signed integers and even if they
did a two-fold increase in the number of keys would not be a good
long-term solution.

So basically I'm concerned about the fact that making this the
default seems to not give great benefit yet unless I'm wrong about
the above things could cause issues if done so for an existing
project. So perhaps the default should be configurable and at
least for older applications should not be changed?


Thank you,

--
Tarmo Tänav <tarmo@itech.ee>
Posted by rob-twf (Guest)
on 04.06.2008 17:59
(Received via mailing list)
1. That would be true only for the automatically created primary key
fields. For all other integer fields the default for unsigned is false
- so nothing changes.

2. Yes and yes. For example, if you move to a version of Rails with
unsigned support, you could write a migration that does things like
this:

change_column :suppliers, :category_id, :integer, :unsigned => true

or

change_table :products do |t|
  t.change :id, :integer, :unsigned => true
end

3. That's correct, you can't have a signed foreign key referencing an
unsigned primary key (or vice-versa). In this situation you'll have
two choices:

- keep your signed primary key, and pass :unsigned => false when
creating your new foreign key
- update your existing keys to make use of unsigned integers

For example, if we already have a suppliers table with a signed
primary key, and are now creating a products table that will reference
it, we can either create a signed foreign key:

create_table :products do |t|
  # various columns go here
  t.references :supplier, :unsigned => false
end

Or we can change the existing key to be unsigned, like this:

change_table :suppliers do |t|
  t.change :id, :integer, :unsigned => true
end

create_table :products do |t|

  # No need to be signed!
  t.references :supplier

end


So to summarise:

- signed integers are still the default
- *except* for auto-generated primary keys, and foreign keys created
using the 'sexy' references/belongs_to syntax
- unsigned foreign keys can be created by passing :unsigned => false
to references/belongs_to
- existing columns (including primary keys) can be changed to/from
unsigned using change_column


While IPv4 addresses were my inspiration for the patch (as well as
seeing it come up as a requirement on various Rails forums from time
to time) there are probably other situations where having the option
to go unsigned will be helpful. Hopefully we'll hear from some people
who have some other, more interesting, examples :)


Thanks for the feedback.


--

Rob Anderton
http://www.thewebfellas.com/
Posted by Jan De Poorter (Guest)
on 04.06.2008 18:04
(Received via mailing list)
Hello,

I am personally in favor of the unsigned option. I think it is a
better default for primary keys, since those won't ever be -something.

Just my 2 cents.

Regards,
Jan De Poorter
http://workswithruby.com
Posted by Tarmo Tänav (Guest)
on 04.06.2008 18:20
(Received via mailing list)
On K, 2008-06-04 at 08:58 -0700, rob-twf wrote:
> So to summarise:
> 
> - signed integers are still the default
> - *except* for auto-generated primary keys, and foreign keys created
> using the 'sexy' references/belongs_to syntax
> - unsigned foreign keys can be created by passing :unsigned => false
> to references/belongs_to
> - existing columns (including primary keys) can be changed to/from
> unsigned using change_column

I'm still not sure why you think this should be made the default, sure
there are benefits in some specific cases but in the general case it
would just create unnecessary work for those who have an existing
project already using signed integers.

Basically everyone on an existing database will have to do a lot of
work to convert their integer types in order for their usual workflow
for creating migrations to not have to change. (In case one already has
FK constraints on signed types how easy would it be in mysql to
convert both columns to an unsigned type while also keeping the
FK constraint?)

Or they will have to keep track of which tables have signed and which
have unsigned primary keys and when to make an FK column signed,
which is not fun because until you add the FK constraint it will not be
clear that the types differ so you may even get partially ran migrations
because of this (and since mysql can not run migrations inside
a transaction this makes it even more difficult to recover).

How would you justify this?

Note that I don't use mysql so I shouldn't really care much about this,
but if I did I would not like to have to do either of the above for no
immediate benefit. I however have nothing against supporting :unsigned
option or making it possible to configure the default type of primary
key to be an unsigned int.


Thank you,
--
Tarmo T�nav <tarmo@itech.ee>
Posted by Steven Soroka (Guest)
on 04.06.2008 18:21
(Received via mailing list)
FYI, you can currently store IPv4 in the database using a signed field.

MIN_INT4_UNSIGNED = -2147483648

I have a method like this that converts an ip string to an integer,
then adds the offset above to make it signed.

   def self.ip_to_int4(an_ip)
     an_ip.split('.').inject(0){|sum, i| (sum << 8) + i.to_i } +
Integer::MIN_INT4_UNSIGNED
   end

To convert it back you subtract the offset when you load it from the db.

   def self.int4_to_ip(int4)
     int4 -= Integer::MIN_INT4_UNSIGNED
     r = []
     4.times{
       r << int4 % 256
       int4 >>= 8
     }
     r.reverse.join('.')
   end

I still support the unsigned integer idea though.  Perhaps a setting
in environment.rb could explicitly turn it on, then projects could
decide whether or not they wanted to use it.  This would be handy for
both existing apps (not to break anything) and new ones (which can
rake db:migrate:reset without penalty).

- Steve
Posted by Frederick Cheung (Guest)
on 04.06.2008 18:41
(Received via mailing list)
On 4 Jun 2008, at 16:58, rob-twf wrote:

>
> 1. That would be true only for the automatically created primary key
> fields. For all other integer fields the default for unsigned is false
> - so nothing changes.

Does that screw up foreign keys ? mysql is rather nit picky about the
foreign key and the id having the same type (eg you can't have a
foreign key constraint connecting a bigint column on one table to an
int column on another: they both have to be bigints or both ints)



Fred
Posted by Frederick Cheung (Guest)
on 04.06.2008 18:43
(Received via mailing list)
On 4 Jun 2008, at 17:40, Frederick Cheung wrote:

> the foreign key and the id having the same type (eg you can't have a  
> foreign key constraint connecting a bigint column on one table to an  
> int column on another: they both have to be bigints or both ints)
>
Sigh. it's been a long day. the rest of your mail trickled through. I
do share tarmo's concerns (eg when I create a new table with a user_id
I have to remember that my users table is old and has a signed primary
key)

Fred
Posted by rob-twf (Guest)
on 05.06.2008 00:47
(Received via mailing list)
Hehe, I'll try and keep this reply a bit shorter :)

@Steven: won't that prevent you from using the IPs in some kind of
range query? e.g. SELECT * FROM some_ips WHERE ip >= start_ip AND ip
<= end_ip  (maybe not a great example, but you get the idea)

My main reason for making primary keys unsigned by default is that
this is the convention for MySQL databases and in my opinion Rails
should follow this convention. A less important reason is that it
saves me extra work when creating new migrations as I get an unsigned
key out of the box!

I appreciate this would require developers to give some thought when
making the transition from pre-patch to post-patch versions of Rails,
however a similar scenario occurred when this change was committed:
http://github.com/rails/rails/commit/a37546517dad9f6d9a7de6e1dba4d960909d71e8
(for example what were once created as 4 byte integers could become 1,
2, 4 or 8 byte integers after this change).

I may be wrong but won't this old/new migration issue only arise in
situations where you're adding a new foreign key that references an
existing primary key, and even then only if you choose to use 'sexy'
syntax instead of add_column? And shouldn't you really do a quick
double check on a column type before adding an FK constraint, just in
case it's not the simple numeric key you were expecting?

Anyway, I think there are currently three options:

1. Change the default to unsigned for primary keys
Requires developers choose between updating their existing keys or
specifying :unsigned => false when creating new foreign keys that
reference old signed primary keys.

2. Add a :primary_key_unsigned => true|false setting to create_table
(default to false)
Requires developers manually specify when they want an unsigned key in
each new migration.

3. Add some kind of config setting (I'm not exactly sure where it'd
go) that determines the default for primary keys when using MySQL
Developers would then be able to choose the global default, this could
be combined with option 2 to allow the default to be overridden in
individual migrations. Seems like a few too many settings to me
though...


--

Rob Anderton
http://www.thewebfellas.com/
Posted by Steven Soroka (Guest)
on 05.06.2008 08:23
(Received via mailing list)
Not if you convert your ip to the compressed signed integer format
first.

I think your option 3 below is the best choice.
Posted by rob-twf (Guest)
on 13.06.2008 13:03
(Received via mailing list)
Just a quick bump to say that I posted an updated patch the other day
- this one does not touch primary keys or references/belongs_to.

http://rails.lighthouseapp.com/projects/8994/tickets/300-unsigned-integers-for-mysql#ticket-300-2

I'll revisit the primary key options at a later date (I've a couple of
other related changes I'd like to investigate) - didn't seem worth
holding up this patch while I do it though...

Thanks.