AI-Generated Music
GenMusic
Symbolic Music Generation with a
Compound-Word Transformer
MusicXML in, transformer in the middle, MusicXML and audio out. A notation-round-trippable system trained on open-source classical music corpora — Bach chorales, Palestrina motets, Beethoven, Mozart, Haydn — on consumer hardware.
What is GenMusic?
GenMusic is a symbolic music generation system I built. It takes real musical scores (MusicXML), encodes them as compound-word token streams over a 30-field vocabulary, trains a decoder-only transformer (~26M parameters), and decodes generated tokens back to MusicXML notation that can be opened in MuseScore, Finale, or any notation software.
The system is designed for round-trip fidelity: every musical feature in the input notation — pitch, duration, articulation, dynamics, time signature, key signature, instrument, voicing, ties, slurs, grace notes — is preserved in the token stream and recoverable on output.
I trained on 1,968 polyphonic Western art-music pieces (~2.2M tokens) drawn from the music21 corpus plus user-supplied MusicXML, on consumer hardware (Apple M4, 24 GB unified memory) for ~7 hours. 80% of low-temperature samples scored ≥27/30 on a 6-axis musical evaluation rubric.
What Makes It Different
Most music AI systems work with MIDI. GenMusic works with notation.
Hz-as-Canonical Pitch
The internal representation stores pitch as continuous Hz — not MIDI integers. Current training is 12-TET, but the architecture doesn't preclude microtonal or ultrasonic generation.
Notation Round-Trip
Every musical feature in the input MusicXML round-trips through the parser and tokenizer without loss. If a feature isn't preserved, the model can't learn it.
Scale Degree Awareness
A derived scale_degree field alongside absolute pitch lets the model attend to key-relative position — "the dominant of the current key" — enabling cross-key generalization.
Metric Stress Over Bar Position
Instead of conventional bar-position grids, GenMusic uses a 5-level metric stress field that captures the musical role of a beat — because bar position has little to do with sound.
Listen
Generated samples from the trained model. These are AI-composed pieces rendered through MuseScore — no human editing.
Samples coming soon — model evaluation in progress.
Training Corpus
1,968
Pieces
2.2M
Tokens
26M
Parameters
~7h
Training Time
Trained on Apple M4, 24 GB unified memory. All training data from open-source corpora (music21, public domain).
Status
GenMusic is actively in development. Models will be published to HuggingFace Hub under an open-source license alongside a research paper. Follow the blog for updates.
Follow on the Blog