We’re doing some work right now with persistance of object graphs into a SQL server image field using the BinaryFormatter. We’re using “simple” serialization, meaning we’re just marking our types with the SerializableAttribute and whatever transient fields shouldn’t be serialized we’re marking with NonSerializedAttribute. I realize that this problem could be solved if we go and implement ISerializable ourselves, but… I’d rather talk about the issue at hand first.
For starters, everything works 100% as advertised until you rebuild the assembly of the types that have been persisted. After a rebuild, upon deserialization of those objects you’ll end up with a nice little SerializationException with the message: “Insufficient state to deserialize the object. More information is needed.”.
Now, at first I realized that we were not specifying a FormatterAssemblyStyle at all. Therefore, I knew the versioning information was being included in the serialized stream and when the BinaryFormatter went to deserialize the stream it couldn’t find the older version and, well… kaboom. Ok, “no problem”, I thought. Let’s just throw the FormatterAssemblyStyle.Simple on there and go with it. After all, we’re not making changes to the structure of the objects, just implementing more functionality and/or fixing bugs, so the serializer should have no problem. Well, I was wrong. What’s happening is either a bug or is a very strange design decision that I don’t quite understand.
The first step I took while debugging this problem was figuring out how see what the BinaryFormatter was writing into the stream. The way I did this was to serialize the root object into a MemoryStream and then take those bytes and Debug.WriteLine them out using a StreamReader with UTF8Encoding. Now, the root object of our graph is composed of other custom types from the same assembly as well as basic framework types (String, Int32, ArrayList, etc.). Without specifying the FormatterAssemblyStyle.Simple, I could clearly see the AQN being written out for each and every type. Then, we set FormatterAssemblyStyle.Simple, but what I saw next was very strange. As expected, the only thing being specified for the root object was its partial name. The same went for all the framework types, no matter where they appeared in the object graph. However, it looks like at least one of the instances of a custom types contained by the root object (which, if you recall are also from the same exact assembly) was still being serialized with a full AQN!
So I just have a few questions:
- What the…?
- How the…?
- Why the…?
For anyone interested, here’s the text output from the process that I mentioned earlier. Maybe someone out there (from MS?) can make sense out of it. You can clearly see on the fourth line (assuming you’re viewing with Notepad without wrapping turned on) that there’s all of a sudden a full assembly name written out in the form of:
[CustomAssembly, Version=0.1.1690.30673, Culture=neutral, PublicKeyToken=821ded9c8ea11544
For there to be a PublicKeyToken, the assembly must have been signed. Perhaps the notion is that once an assembly has been signed, versioning on it must be enforced. So if you don’t sign things…
Yes it was signed and I realize in that forces the runtime into a mode where you have a 100% guarentee in terms of versioning enforcement. However, it is possible to redirect older versions of assemblies to newer ones right (i.e. <bindingRedirect>)? Not to mention you can load assemblies with partial names which will always give you the latest version out of your application directory. So I figured that FormatterAssemblyStyle.Simple would basically switch the logic inside the Serialization process to save the types with partial names instead and then inside the Deserialization process to just call Type.GetType with the partial name.
Now, since it apparently doesn’t, I went ahead and created my own SerializationBinder which had an implementation of BindToType that basically stripped all the public key and versioning info from the value passed in as assemblyName and then loaded the correct type using Type.GetType passing only the partial name. Basically this guarenteed that I always load the version that’s in my application dir (obviously this is bad, but I was just experimenting). The problem is, this STILL causes the bug to happen. So maybe there’s a version check even after the Types in the graph are loaded? :