I find aesthetics in fighting games to be a double edged sword. Though, my preferences regarding aesthetics in general are really atypical of your average gamer.
Audio and Visual are two entirely different worlds, IMO.
With audio, I feel you can achieve a certain level of greatness through highly skilled artists, but you can only go wrong from there.
Even if you have amazing music, the wow-factor is only going to last for so long. Eventually it gets played out and people start tuning it out. But on the flipside you can make absolutely horrible sounds and music that will strait up turn people away. High pitched screaming bitches, annoying/bad quotes, horrible acting, terrible music that has no charisma or soul, and incoherent themes can really kill a game.
I think CFJ was a good example of really bad audio aesthetics. Even if it was a fun and enjoyable game (which it isn't), I would actually refuse to play the game based entirely on the grounds that I don't ever want to hear that game again. And do I really even need to mention how bad the music in MvC2 is?
BlazBlue in contrast to IaMP is also a testament to the fact that less is more. While IaMP completely lacks character voice samples, BlazBlue has entirely too much. I agree with Bellreisa when he calls BlazBlue "aural diarrhea", because frankly it is. The characters say something any time anything happens, and sometimes even when nothing is happening at all. What's worse, the audio clips will cut each other off, so when a character is getting hit by or blocking a long series of attacks you get the same bloody word that's part of a sentence over and over very rapidly. Sorry, but it sounds bad.
Though sadly, audio is really important and a great selling point. I have to admit I only got interested in MeltyBlood because I heard Akiha performing her Origami super. And likewise I only became interested (more like madly obsessed) with Jojo's because I heard S.Dio and Young Joseph sound samples and S.Dio's background music. This isn't an uncommon trend with me, either.
Audio clips are also something people can really grasp and share with each other. Many gamers will repeat quotes to one another or even just to themselves which raises a certain level of hype for the game. Such as yelling out "Final Atomic Buster" or "I am Red Cyclone" when getting hype for Zangief, or any number of other highly memorable fighting game quotes (sonic boom and hadoken, anyone?).
With visuals I think the opposite is true. You can really only start with something that already looks fundamentally bad in the first place (IMO) and go upwards from there to "achieve a certain level of greatness through highly skilled artists", as said before.
Vampire Savior is visually a very charismatic game, while Warzard and SF3-3rd Strike are visually very well constructed and pleasing to the eye. But where I find visuals to be polarized to audio is that bad visuals aren't nearly as much of a turn-off as bad audio. While I do mind CFJ and BlazBlue sounding like liquid shit in my ear, I don't mind games like Monster or ClayFighter looking like liquid shit. It was even really difficult for me to think of games that I thought were aesthetically bad.
So personally I think audio has the ability to get me interested in a game or turn me away from it. While I think visuals only have a very minor chance/ability to get me interested. Yet, I really don't think visuals can actually turn me away unless they were physically painful to look at, like flashy/blinky seizure inducing nonsense.
In closing, I think I would probably play a game where the characters are simply poorly pixelated trees. Fortunately, trees don't talk so that's also a bonus. Where is my tree fighter?