Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unions default to string serialization #397

Closed
jzabroski opened this issue Jul 11, 2023 · 27 comments
Closed

Unions default to string serialization #397

jzabroski opened this issue Jul 11, 2023 · 27 comments

Comments

@jzabroski
Copy link

The following XSD generates a string representation for Date/DateTime:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0.48448">
  <xs:simpleType name="SnapshotDate">
    <xs:restriction base="xs:date" />
  </xs:simpleType>
  <xs:simpleType name="SnapshotDateTime">
    <xs:restriction base="xs:dateTime" />
  </xs:simpleType>

  <xs:element name="Snapshot">
    <xs:complexType>
      <xs:attribute name="Date" use="required">
        <xs:simpleType>
          <xs:union memberTypes="SnapshotDate SnapshotDateTime" />
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>
</xs:schema>

It would be ideal if this used either the OneOf nuget package to handle the union directly, or did clever inference to infer date and dateTime are both the same .NET data type (except if on .NET 6 and DateOnly is supported). For backwards compatibility reasons, I'd rather the generator not use DateOnly, as I still rely on your tool for .NET Framework apps until for at least another 6-12 months.

@jzabroski
Copy link
Author

My immediate workaround is to use a partial class:

using System;

public partial class Snapshot
{
        public DateTime DateObject
        {
            get => DateTime.Parse(Date);
            set => Date = $"{value:yyyy-MM-dd}";
        }
}

and have my client code use DateObject when setting the Date. This works as I don't actually need to send time components.

@jzabroski
Copy link
Author

Deleted previous comment; found my issue.

It looks like to support OneOf, we would need to extend CodeUtilities.GetEffectiveType. I dont know enough about the XmlSchemaDatatype API to understand how to do this.

@jzabroski
Copy link
Author

jzabroski commented Jul 11, 2023

Possible that XmlSampleGenerator.Generator.CreateUnionGenerator has some logic that may be relevant as well. Looking at this logic also makes sense why I couldn't find any public APIs in System.Xml that explicitly allow destructuring the Union to get its member types: the code here is using reflection to get at that information.

@jzabroski
Copy link
Author

I made some decent progress trying to figure this out. I was able to build a mapping (I think) for this union example, but I am getting an error when running XmlTests.cs - not sure why yet, although I think the error message is fairly self-explanatory, I dont know what to "do" about it just yet:

Cannot serialize member 'Date' of type OneOf.OneOf`2[System.DateOnly,System.DateTime]. XmlAttribute/XmlText cannot be used to encode complex types.

I then tried to muck with the generated output, and that is when I realized a more basic flaw with my approach: OneOf derived types do not have a parameterless constructor, so they trivially cannot be serialized as Xml via System.Xml.Serialization package.

It seems like your approach for Choice Attributes might be the best approach. While it might be possible to use OneOf<DateOnly, DateTime> via XmlIgnoreAttribute, I'm not sure its all that powerful.

Need to think more on this.

@jzabroski
Copy link
Author

Some additional thoughts:

For net462, OneOf is supported, but DateOnly is not supported. Therefore, there needs to be a way to handle the special scenario where the union is xs:date and xs:datetime types as a single DateTime object.

Other special cases would be various numeric data types, particularly the restrictions on xs:integer

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0.48448">
  <xs:simpleType name="GoodData">
    <xs:restriction base="xs:positiveInteger" />
  </xs:simpleType>
  <xs:simpleType name="BadData">
    <xs:restriction base="xs:nonPositiveInteger" />
  </xs:simpleType>

  <xs:element name="Snapshot">
    <xs:complexType>
      <xs:attribute name="Data" use="required">
        <xs:simpleType>
          <xs:union memberTypes="GoodData BadData" />
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>
</xs:schema>

@mganss
Copy link
Owner

mganss commented Jul 12, 2023

It seems XmlSchemaSimpleTypeUnion.BaseMemberTypes might be a way to get at the member types of a union. Perhaps XmlSampleGenerator predates even net20 when this was introduced, at least according to the docs.

@mganss
Copy link
Owner

mganss commented Jul 12, 2023

Perhaps a fairly simple approach would be to forgo OneOf for now and hack some logic into GetEffectiveType that covers a few basic use cases (Date and time related types, integers, floats):

XmlTypeCode.AnyAtomicType => typeof(string),// union

@jzabroski
Copy link
Author

I found LinqToXsd on GitHub and they have some pretty interesting support for unions, including IsOrHasUnion - the scenario of List with unions.

https://github.com/mamift/LinqToXsdCore/blob/3f4c14fd64a66fa5f6a612d56be9917b066c4bdf/XObjectsCode/Src/SOMQueryExtensions.cs#L121C39-L134C20

Looking at what you pulled off gives me such deep appreciation for this great project you put together. Also makes me realize our initiative to implement this third party vendor that uses XSDs, 4 years ago, would not have been possible without your project. How did you even get started with this project?

@jzabroski
Copy link
Author

Perhaps a fairly simple approach would be to forgo OneOf for now and hack some logic into GetEffectiveType that covers a few basic use cases (Date and time related types, integers, floats):

XmlTypeCode.AnyAtomicType => typeof(string),// union

That's what I already tried, and I got an error about the property being a complex type and cant be annotated with XmlAttributeAttribute/XmlTextAttribute.

I think the part I dont understand with how you have architected this is how to use GetEffectiveType as both a backing store type and the serializable type, and where the C# code generation magic takes place.

@jzabroski
Copy link
Author

jzabroski commented Jul 12, 2023

Here was my implementation of your GetEffectiveType method. It requires a boolean flag to be added to GeneratorConfiguration for UseOneOfForUnionDataType in order to allow people from disabling such feature. I defaulted it to true, mostly to make it easy to test.

        public static Type GetEffectiveType(this XmlSchemaDatatype type, GeneratorConfiguration configuration, IEnumerable<RestrictionModel> restrictions, bool attribute = false)
        {

            var resultType = type.TypeCode switch
            {
                XmlTypeCode.AnyAtomicType =>  typeof(string),// union
                XmlTypeCode.AnyUri or XmlTypeCode.GDay or XmlTypeCode.GMonth or XmlTypeCode.GMonthDay or XmlTypeCode.GYear or XmlTypeCode.GYearMonth => typeof(string),
                XmlTypeCode.Duration => configuration.NetCoreSpecificCode ? type.ValueType : typeof(string),
                XmlTypeCode.Time => typeof(DateTime),
                XmlTypeCode.Idref => typeof(string),
                XmlTypeCode.Integer or XmlTypeCode.NegativeInteger or XmlTypeCode.NonNegativeInteger or XmlTypeCode.NonPositiveInteger or XmlTypeCode.PositiveInteger => GetIntegerDerivedType(type, configuration, restrictions),
                _ => type.ValueType,
            };

            if (type.Variety == XmlSchemaDatatypeVariety.List)
            {
                if (resultType.IsArray)
                    resultType = resultType.GetElementType();

                // XmlSerializer doesn't support xsd:list for elements, only for attributes:
                // https://docs.microsoft.com/en-us/previous-versions/dotnet/netframework-4.0/t84dzyst(v%3dvs.100)

                // Also, de/serialization fails when the XML schema type is ambiguous
                // DateTime -> date, datetime, or time
                // byte[] -> hexBinary or base64Binary

                if (!attribute || resultType == typeof(DateTime) || resultType == typeof(byte[]))
                    resultType = typeof(string);
            }

            if (type.Variety == XmlSchemaDatatypeVariety.Union && configuration.UseOneOfForUnionDataType)
            {
                XmlSchemaSimpleType[] memberTypes = (XmlSchemaSimpleType[])type.GetType().InvokeMember("_types", BindingFlags.GetField | BindingFlags.NonPublic | BindingFlags.Instance, null, type, null);

                Type genericType = memberTypes.Length switch
                {
                    0 => throw new ArgumentOutOfRangeException(nameof(type.ValueType.Name)),
                    1 => typeof(OneOf<>),
                    2 => typeof(OneOf<,>),
                    3 => typeof(OneOf<,,>),
                    4 => typeof(OneOf<,,,>),
                    5 => typeof(OneOf<,,,,>),
                    6 => typeof(OneOf<,,,,,>),
                    7 => typeof(OneOf<,,,,,,>),
                    8 => typeof(OneOf<,,,,,,,>),
                    9 => typeof(OneOf<,,,,,,,,>),
                    _ => throw new ArgumentOutOfRangeException(nameof(type.ValueType.Name), "Union cannot support more than 9 type parameters.")
                };

                resultType =
                    genericType.MakeGenericType(memberTypes.Select(mt => mt.Datatype.ValueType).ToArray());
            }

            return resultType;
        }

Note, this version does not Distinct the list of memberTypes to remove duplicates, as could be the case with xs:DateTime and xs:Date

@jzabroski
Copy link
Author

Actually, I think I see your point. In other words, literally just handle the DateOnly/DateTime scenario. I am not sure how useful that would be. My thought was to make it generic. I guess the upside to initial approach to support union serialization with CLR types as backing fields would be that others could extend it and then it could be later generalized once enough examples were provided.

@mganss
Copy link
Owner

mganss commented Jul 13, 2023

Thanks for the kind words, @jzabroski. Appreciate your contributions ❤️

How did you even get started with this project?

I developed a REST client for an API that used XML and I wasn't satisfied with the classes generated by xsd.exe. Naively, I thought it can't be that hard to create a custom generator, after all XML is just elements and attributes, right? Wrong.

@mganss mganss closed this as completed in 45e06bc Jul 17, 2023
@jzabroski
Copy link
Author

When can you ship a new version of https://www.nuget.org/packages/dotnet-xscgen

@mganss
Copy link
Owner

mganss commented Jul 17, 2023

I've just deployed 2.0.858. This has basic support for unions with date/time, integer, and decimal member types. If you encounter cases that are not covered, please reopen or open a new issue.

One case that is only partially supported now that I discovered in the unit tests is where a member type has an enumeration restriction that would enable use of a narrower type for the union:

<xs:simpleType name="revisionLimitType">
    <xs:union>
        <xs:simpleType>
            <xs:restriction base="xs:integer">
                <xs:enumeration value="-1"/>
            </xs:restriction>
        </xs:simpleType>
        <xs:simpleType>
            <xs:restriction base="xs:integer">
                <xs:minInclusive value="2"/>
                <xs:maxInclusive value="10000"/>
            </xs:restriction>
        </xs:simpleType>
    </xs:union>
</xs:simpleType>

This would allow use of a short but currently the enumeration is not taken into account here.

@jzabroski
Copy link
Author

Neat. Do you think my OneOf prototype is still somewhat of interest? I actually think it would be easier to implement with the new API changes you made.

@jzabroski
Copy link
Author

It works! Man, that is so much cleaner code vs. having that partial class attribute called DateObject that thunks data to/from the Date property string. One potential problem - that may or not be a problem in my case. I've asked the vendor. But, for DateTime/Date, you are automatically choosing the widest type, and there is no way to control the exact serialization as Date-only or DateTime. e.g., previously we were sending the vendor 2023-03-15, and now it is sending that same value as 2023-03-15T00:00:00. If the presence of a time component carries some semantic difference, then this would be a breaking change when working with this vendors XSD. My fingers are crossed that in my scenario it doesn't matter, but I could see it mattering for some implementations.

Again: This tool is so awesome vs. the old xsd.exe :-)

I remember running into frightening bugs with the old xsd.exe on a WCF project, too... where exceptions just got swallowed. And no way to handle Java 1.5 BigDecimal serialization stupidity.

@mganss
Copy link
Owner

mganss commented Jul 18, 2023

Perhaps OneOf can be supported by adding IXmlSerializable hackery?

@jzabroski
Copy link
Author

That might work. I still have my initial OneOf Proof of concept code from last week that I never got fully working. I completely forgot about XmlSerializable because https://learn.microsoft.com/en-us/dotnet/standard/serialization/attributes-that-control-xml-serialization does not mention it at all (because it's not an attribute). I can probably extend the proof of concept with a boilerplate implementation of Ixmlserializable. Have other things to get done this week but will try to take a look.

In mean time, it is still possible the current auto widening works. It's probably safe for numbers, but DateTime and Date Only are very different semantic domains, so we shall see what the vendor says. Of course, my concern is this results in a bug due to change in behavior and I look silly for adopting a "technology solution" VS. Business requirements.

@mganss
Copy link
Owner

mganss commented Jul 18, 2023

You're right, this is a breaking change and should be opt-in through configuration.

@jzabroski
Copy link
Author

Setting should probably be called something like MapXmlSchemaUnionToWidestCommonType - at least then it's fairly explicit.

I think, the other day, I was so excited to get rid of dealing with strings that I wasn't even thinking about potential semantic issues. But I do think this is a really useful feature. I just also vaguely recall woes with DateTime representation working on a WCF project in 2011 when I had a c# WCF service talking to an iBatis Java framework. I believe to manage xs:DateTime correctly we had to set the DateTimeKind to Unspecified prior to calling Serialize so that it would not try to format it relative to a time zone, and on the way in, after de-serialization, we had to fix-up the values in C# prior to giving them to our services.

mganss pushed a commit that referenced this issue Jul 18, 2023
@mganss
Copy link
Owner

mganss commented Jul 18, 2023

I've added the --unionCommonType option, called MapUnionToWidestCommonType in code.

@jzabroski
Copy link
Author

I experimented with ReadXml and WriteXml. I think it's probably do-able with this.Match. In terms of fitting it into here, I may need some pointers on how to add symbols to your symbol table. I'm guessing how you did Choice via sub class polymorphism a good approximation. Where is that code

@mganss
Copy link
Owner

mganss commented Jul 19, 2023

I'm not sure I understand what you're looking for. The new code that determines the common types is here:

static readonly Type[] intTypes = new[] { typeof(byte), typeof(sbyte), typeof(ushort), typeof(short), typeof(uint), typeof(int), typeof(ulong), typeof(long), typeof(decimal) };
static readonly Type[] decimalTypes = new[] { typeof(float), typeof(double), typeof(decimal) };
private static Type GetUnionType(GeneratorConfiguration configuration, XmlSchemaType schemaType, bool attribute)
{
if (schemaType is not XmlSchemaSimpleType simpleType || simpleType.Content is not XmlSchemaSimpleTypeUnion unionType) return typeof(string);
var baseMemberEffectiveTypes = unionType.BaseMemberTypes.Select(t =>
{
var restriction = t.Content as XmlSchemaSimpleTypeRestriction;
var facets = restriction?.Facets.OfType<XmlSchemaFacet>().ToList();
var restrictions = GetRestrictions(facets, t, configuration).Where(r => r != null).Sanitize().ToList();
return GetEffectiveType(t.Datatype, configuration, restrictions, t, attribute);
}).ToList();
// all member types are the same
if (baseMemberEffectiveTypes.Distinct().Count() == 1) return baseMemberEffectiveTypes[0];
// all member types are integer types
if (baseMemberEffectiveTypes.TrueForAll(t => intTypes.Contains(t)))
{
var maxTypeIndex = baseMemberEffectiveTypes.Max(t => Array.IndexOf(intTypes, t));
var maxType = intTypes[maxTypeIndex];
// if the max type is signed and the corresponding unsigned type is also in the set we have to use the next higher type
if (maxTypeIndex % 2 == 1 && baseMemberEffectiveTypes.Exists(t => Array.IndexOf(intTypes, t) == maxTypeIndex - 1))
return intTypes[maxTypeIndex + 1];
return maxType;
}
// all member types are float/double/decimal
if (baseMemberEffectiveTypes.TrueForAll(t => decimalTypes.Contains(t)))
{
var maxTypeIndex = baseMemberEffectiveTypes.Max(t => Array.IndexOf(decimalTypes, t));
var maxType = decimalTypes[maxTypeIndex];
return maxType;
}
return typeof(string);
}

@jzabroski
Copy link
Author

Here are the problems with IXmlSerializable:

  1. If a class implements IXmlSerializable, using that class as the return type of a property annotated with XmlAttribute is not allowed. XmlSerializer will not call ReadXml/WriteXml (unless there is some step I missed in my demo code, linked below - the concensus online is that IXmlSerializable is tricky to implement, so it is possible I missed some step, but I took inspiration from https://www.reflectionit.nl/blog/2022/implement-ixmlserializable-in-a-readonly-struct).
  2. That likely means either the whole hierarchy or the type containing the property needs to implement IXmlSerializable.
  3. There is https://learn.microsoft.com/en-us/dotnet/api/system.xml.serialization.xmlschemaproviderattribute?view=net-7.0 but its not clear it could solve this problem if added to the mix. It sounds like it was purpose built for one use case, WSDL, so the odds it works is pretty low. Would likely need to search the open source System.Xml code to see how this attribute is used to know if it's possible.

https://dotnetfiddle.net/27Y7I6

@mganss
Copy link
Owner

mganss commented Jul 20, 2023

Perhaps the easiest approach then is to create a OneOf adapter property hidden behind [XmlIgnore] or possibly even one adapter property for each separate schema type as you had suggested above.

Did the vendor get back about the semantic difference between date and datetime?

@jzabroski
Copy link
Author

They did get back, but while they assured me there was no problem, my "senior developer gut" says not to rely on that long-term. Effectively, DateTime is intended for pre-event intelligence based on a Snapshot of the world as we know it at that point in time, whereas Date is for post-event analysis based on a Snapshot of the world on that date. They said there is no difference, but thinking like an engineer, I just worry that behavior could change. Maybe it's not a big deal, but I'd rather not find out.

Can you sketch our what you mean by adapter property? Is it something like this?

[XmlRoot("Snapshot")]
public class Snapshot
{
  [XmlIgnore]
  public DateOnlyOrDateTime Date {get; set;}

  [XmlAttribute("Date")]
  public string _date {get; set;}
}

That would be an improvement over what I have today, in that it would allow me to seamlessly handle the pre-event intelligence feature if ever asked to do it (unlikely, but still useful).

My current solution is basically a variation of the above (not bad, but just not elegant):

    public partial class Snapshot
    {
        [XmlIgnore]
        public DateTime DateObject
        {
            get
            {
                if (DateTime.TryParseExact(Date, "yyyy-MM-dd", new DateTimeFormatInfo(), DateTimeStyles.None,
                        out var dateOnly)) return dateOnly;
                if (DateTime.TryParseExact(Date, "yyyy-MM-ddThh:mm:ss", new DateTimeFormatInfo(), DateTimeStyles.None,
                        out var dateTime)) return dateTime;
                return default;
            }
            set
            {
                if (value.TimeOfDay == TimeSpan.Zero)
                    Date = $"{value:yyyy-MM-dd}";
                else
                    Date = $"{value:yyyy-MM-ddThh:mm:ss}";
            }
        }
    }

Of course, the Union OneOf + adding a feature for #310 would be ideal.

@mganss
Copy link
Owner

mganss commented Jul 20, 2023

Yes, that's exactly what I meant by adapter properties.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants