AI-Generated Music

GenMusic

Symbolic Music Generation with a
Compound-Word Transformer

MusicXML in, transformer in the middle, MusicXML and audio out. A notation-round-trippable system trained on open-source classical music corpora — Bach chorales, Palestrina motets, Beethoven, Mozart, Haydn — on consumer hardware.

What is GenMusic?

GenMusic is a symbolic music generation system I built. It takes real musical scores (MusicXML), encodes them as compound-word token streams over a 30-field vocabulary, trains a decoder-only transformer (~26M parameters), and decodes generated tokens back to MusicXML notation that can be opened in MuseScore, Finale, or any notation software.

The system is designed for round-trip fidelity: every musical feature in the input notation — pitch, duration, articulation, dynamics, time signature, key signature, instrument, voicing, ties, slurs, grace notes — is preserved in the token stream and recoverable on output.

I trained on 1,968 polyphonic Western art-music pieces (~2.2M tokens) drawn from the music21 corpus plus user-supplied MusicXML, on consumer hardware (Apple M4, 24 GB unified memory) for ~7 hours. 80% of low-temperature samples scored ≥27/30 on a 6-axis musical evaluation rubric.

What Makes It Different

Most music AI systems work with MIDI. GenMusic works with notation.

01

Hz-as-Canonical Pitch

The internal representation stores pitch as continuous Hz — not MIDI integers. Current training is 12-TET, but the architecture doesn't preclude microtonal or ultrasonic generation.

02

Notation Round-Trip

Every musical feature in the input MusicXML round-trips through the parser and tokenizer without loss. If a feature isn't preserved, the model can't learn it.

03

Scale Degree Awareness

A derived scale_degree field alongside absolute pitch lets the model attend to key-relative position — "the dominant of the current key" — enabling cross-key generalization.

04

Metric Stress Over Bar Position

Instead of conventional bar-position grids, GenMusic uses a 5-level metric stress field that captures the musical role of a beat — because bar position has little to do with sound.

Listen

Generated samples from the trained model. These are AI-composed pieces rendered through MuseScore — no human editing.

Samples coming soon — model evaluation in progress.

Training Corpus

1,968

Pieces

2.2M

Tokens

26M

Parameters

~7h

Training Time

Trained on Apple M4, 24 GB unified memory. All training data from open-source corpora (music21, public domain).

Status

GenMusic is actively in development. Models will be published to HuggingFace Hub under an open-source license alongside a research paper. Follow the blog for updates.

Follow on the Blog