Chidley
Convert XML to Go structs / XML to JSON
Install / Use
/learn @gnewton/ChidleyREADME
chidley
chidley converts any XML to Go structs (and therefor to JSON)
- By any, any XML that can be read by the Go xml package decoder.
- Where convert means, generates Go code that when compiled, will convert the XML to JSON
- or will just generate the Go structs that represent the input XML
- or converts XML to XML (useful for validation)
Author: Glen Newton Language: Go
New
2016.08.14
Added ability to sort structs into the same order the XML is encountered in the file. Useful for human readers comparing the Go structs to the original XML.
Use flag -X to invoke. Overrides default of sorting by alphabetical sorting.
2015.07.24
chidley now supports the user naming of the resulting JAXB Java class package.
Previously the package name could only be ca/gnewton/chidley/jaxb.
Now, using the -P name, the jaxb default can be altered.
So chidley -J -P "foobar" sample.xml will result in Java classes with package name: ca/gnewton/chidley/foobar.
Previous
chidley now has support for Java/JAXB. It generates appropriate Java/JAXB classes and associated maven pom.
See Java/JAXB section below for usage.
How does it work (with such a small memory footprint)
chidley uses the input XML to build a model of each XML element (aka tag).
It examines each instance of a tag, and builds a (single) prototypical representation; that is, the union of all the attributes and all of the child elements of all instances of the tag.
So even if there are million instances of a specific tag, there is only one model tag representation.
Note that a tag is unique by its namespace+tagname combination (in the Go xml package parlance, space + local.
Types
chidley by default makes all values (attributes, tag content) in the generated Go structs a string (which is always valid for XML attributes and content), but it has a flag (-t) where it will detect and use the most appropriate type.
chidley tries to fit the smallest Go type.
For example, if all instances of a tag contain a number, and all instances are -128 to 127, then it will use an int8 in the Go struct.
chidley binary
Compiled for 64bit Linux Fedora18, go version go1.3 linux/amd64
Usage
Usage of ./chidley:
-B Add database metadata to created Go structs
-D string
Base directory for generated Java code (root of maven project) (default "java")
-G Only write generated Go structs to stdout
-J Generated Java code for Java/JAXB
-P string
Java package name (rightmost in full package name
-W Generate Go code to convert XML to JSON or XML (latter useful for validation) and write it to stdout
-X Sort output of structs in Go code by order encounered in source XML (default is alphabetical order)
-a string
Prefix to attribute names (default "Attr_")
-c Read XML from standard input
-d Debug; prints out much information
-e string
Prefix to struct (element) names; must start with a capital (default "Chi")
-k string
App name for Java code (appended to ca.gnewton.chidley Java package name)) (default "jaxb")
-n Use the XML namespace prefix as prefix to JSON name; prefix followed by 2 underscores (__)
-p Pretty-print json in generated code (if applicable)
-r Progress: every 50000 input tags (elements)
-t Use type info obtained from XML (int, bool, etc); default is to assume everything is a string; better chance at working if XMl sample is not complete
-u Filename interpreted as an URL
-x Add XMLName (Space, Local) for each XML element, to JSON
$
Specific Usages:
chidley -W ...: writes Go code to standard out, so this output should be directed to a filename and subsequently be compiled. When compiled, the resulting binary will:- convert the XML file to JSON
- or convert the XML file to XML (useful for validation)
- or count the # of elements (space, local) in the XML file
chidley -G ...: writes just the Go structs that represent the input XML. For incorporation into the user's code base.
Example chidley -W:
$ chidley -W xml/test1.xml > examples/test1/ChidTest1.go
Usage of generated code
$ cd examples/test1
$ go build
$ ./test1
Usage of ./test1:
-c=false: Count each instance of XML tags
-f="/home/gnewton/work/chidley/xml/test1.xml": XML file or URL to read in
-h=false: Usage
-j=false: Convert to JSON
-s=false: Stream XML by using XML elements one down from the root tag. Good for huge XML files (see http://blog.davidsingleton.org/parsing-huge-xml-files-with-go/
-x=false: Convert to XML
Generated code: Convert XML to JSON -j
$ ./test1 -j -f ../../xml/test1.xml
{
"doc": [
{
"Attr_type": "book",
"author": {
"firstName": {
"Text": "Frank"
},
"last-name": {
"Text": "Herbert"
}
},
"title": {
"Text": "Dune"
}
},
{
"Attr_type": "article",
"author": {
"firstName": {
"Text": "Aldous"
},
"last-name": {
"Text": "Huxley"
}
},
"title": {
"Text": "Brave New Wold"
}
}
]
}
Generated code: Convert XML to XML -x
$ ./test1 -x -f ../../xml/test1.xml
<Chi_docs>
<doc type="book">
<author>
<firstName>Frank</firstName>
<last-name>Herbert</last-name>
</author>
<title>Dune</title>
</doc>
<doc type="article">
<author>
<firstName>Aldous</firstName>
<last-name>Huxley</last-name>
</author>
<title>Brave New Wold</title>
</doc>
</Chi_docs>
Generated code: Count elements -c
XML elements (or tags) are counted in the source file (space,local) and are printed-out, unsorted
$ ./test1 -c
1 _:docs
2 _:doc
2 _:title
2 _:author
2 _:last-name
2 _:firstName
Note: the underscore before the colon indicates there is no (or the default) namespace for the element.
Example chidley -G:
Just prints out the Go structs to standard out:
$ chidley -G xml/test1.xml
type Chi_root struct {
Chi_docs *Chi_docs `xml:" docs,omitempty" json:"docs,omitempty"`
}
type Chi_docs struct {
Chi_doc []*Chi_doc `xml:" doc,omitempty" json:"doc,omitempty"`
}
type Chi_doc struct {
Attr_type string `xml:" type,attr" json:",omitempty"`
Chi_author *Chi_author `xml:" author,omitempty" json:"author,omitempty"`
Chi_title *Chi_title `xml:" title,omitempty" json:"title,omitempty"`
}
type Chi_title struct {
Text string `xml:",chardata" json:",omitempty"`
}
type Chi_author struct {
Chi_firstName *Chi_firstName `xml:" firstName,omitempty" json:"firstName,omitempty"`
Chi_last_name *Chi_last_name `xml:" last-name,omitempty" json:"last-name,omitempty"`
}
type Chi_last_name struct {
Text string `xml:",chardata" json:",omitempty"`
}
type Chi_firstName struct {
Text string `xml:",chardata" json:",omitempty"`
}
Name changes: XML vs. Go structs
XML names can contain dots . and hyphens or dashes -. These are not valid identifiers for Go structs or variables. These are mapped as:
"-": "_"".": "_dot_"
Note that the original XML name strings are used in the struct XML and JSON annotations for the element.
Type example
<people>
<person>
<name>bill</name>
<age>37</age>
<married>true</married>
</person>
<person>
<name>sarah</name>
<age>24</age>
<married>false</married>
</person>
</people>
Default
$ ./chidley -G xml/testType.xml
type Chi_root struct {
Chi_people *Chi_people `xml:" people,omitempty" json:"people,omitempty"`
}
type Chi_people struct {
Chi_person []*Chi_person `xml:" person,omitempty" json:"person,omitempty"`
}
type Chi_person struct {
Chi_age *Chi_age `xml:" age,omitempty" json:"age,omitempty"`
Chi_married *Chi_married `xml:" married,omitempty" json:"married,omitempty"`
Chi_name *Chi_name `xml:" name,omitempty" json:"name,omitempty"`
}
type Chi_name struct {
Text string `xml:",chardata" json:",omitempty"`
}
type Chi_age struct {
Text string `xml:",chardata" json:",omitempty"`
}
type Chi_married struct {
Text string `xml:",chardata" json:",omitempty"`
}
Types turned on -t
$ ./chidley -G -t xml/testType.xml
$ type Chi_root struct {
Chi_people *Chi_people `xml:" people,omitempty" json:"people,omitempty"`
}
type Chi_people struct {
Chi_person []*Chi_person `xml:" person,omitempty" json:"person,omitempty"`
}
type Chi_person struct {
Chi_age *Chi_age `xml:" age,omitempty" json:"age,omitempty"`
Chi_married *Chi_married `xml:" married,omitempty" json:"married,omitempty"`
Chi_name *Chi_name `xml:" name,omitempty" json:"name,omitempty"`
}
type Chi_name struct {
Text string `xml:",chardata" json:",omitempty"`
}
type Chi_age struct {
Text int8 `xml:",chardata" json:",omitempty"`
}
type Chi_married struct {
Text bool `xml:",chardata" json:",omitempty"`
}
Notice:
Text int8inChi_ageText boolinChi_married
Go struct name prefix
chidley by default prepends a prefix to Go struct type identifiers. The default is Chi but this can be changed with the -e flag. If changed from the default, the new prefix must start with a capital letter (for the XML annotation and decoder to work: the struct fields must be public).
Warning
If you are going to use the chidley generated Go structs on XML other than the input XML, you need to make sure the input XML has examples of all tags, and tag attribute and tag child tag combinations.
If the input does not have all of these, and you use new XML that has tags not found in the input XML, attributes not seen in tags in the input XML, or child tags not encountered in the input XML, these will not be seen by the xml decoder, as they will not be in the Go structs used by the xml decoder.
Limitations
chidley is constrained by the underlying Go [xml package](http://golang.org/pkg
