Music has a number of similarities to language, and just as language has its grammar, so does music.
The most significant thing though, is that they are both oral expressions of oral traditions with oral origins, where, in the case of music, 'oral' includes voice, but also the sound of things being hit, scraped or blown.
Grammar is the way in which we attempt to impose a framework over these traditions so that we can understand how and why they work. Humans seem to have an innate ability to string vocal sounds together (i.e. words) in a way that others will understand and can respond to, even though in different regions the words will be different and strung together in different ways.
Similarly, humans can consistently favour a set of musical pitches where these pitches have a particular relationship with each other. Like language, the pitches and their relationships will vary according to region and culture. These do not need to be explicitly taught, but are often passed on through oral traditions.
Whether the notes or the listener determines the key is an interesting question. A sequence of notes and their relationship to each other will allow the listener to establish a key. But it's not always so clear cut.
As an illustration, I have been driving along, listening to music at a comparatively low level. I will recognise one song as it competes with the ambient road and engine noise. The next song will come on, and because I don't recognise it immediately and my mind is still accustomed to the previous song, I sometimes hear it as if it was in the key of the previous song. Thus I listen to it in a key quite different to that in which it was written.
.