The onboard card were fine for me unless I[snip]
I'm still curious tho, whether anyone can plug in and try to play thru speakers and see what if does.
You're basically missing the point. Unless someone were being paid, had masochistic curiosity or very specialized goals (i.e. OP wants to continue to track via on board cards) there is little, if any reason to pursue this. It is trivial information and extremely specific, to specific systems in any case.
For at least the past ten years it is not merely 'common' knowledge, but knowledge available from MS widely distributed white papers that MS imposes so many layers, hooks, goiters, impediments to streaming data that it's audio 'layers' are virtually unusable for multitrack overdubs. (an acquaintance got Nuendo to run under Windows 7 and was/is dealing with 23 ms delay with only a dual track load)
ASIO has never been a perfect solution but Stienberg made it relatively easy for developers who did not want to produce their own proprietary kernel streaming code to make use of ASIO which is why it bubbled to the surface as a de facto standard. I still have better over all performance (in terms of both stability and speed) using Echo's kernel streaming and Audition 1.5 (pre ASIO) then using MOTU's ASIO drivers with Audition 3 (post ASIO).
MS boasts that it's WAVERT (real time) is policed via the registry . . . There is simply no f*&*^%^ way that this approach, certainly out of the box) permits real time multi channel monitoring of streaming data
How bad it will be for any particular system depends of variables that are very specific (to individual systems) these start with processors and relationship between the processor and code (app) you use to record and monitor streaming data. And we are a long way from standardizing multi thread code . . . Additionally specific chip sets, how the MOBO cache and FSB are addressed (utilized) makes a huge difference in performance for streaming audio (and generally speaking none of this stuff is optimized for 'production' as opposed to consumer experience)
The solutions that over the past twenty years have, generally, worked all involve (no matter the OS) sidestepping OS interference with streaming data . . . Monitoring input (available on many low rent entry level audio cards) cuts the lag in half, right out of the box . . . (for example)
That you found performance of on board cards to be acceptable simply means you existed in a relatively specialized niche (with regard to producing streaming data), that it no longer works for you comes as no surprise to most
Solutions are relatively simple . . . Go back to XP (hope that nothing changes in how MOBO cache and FSB are addressed) . . . Move forward to a card that supports some realistic variation of Kernel streaming . . . Learn to live with the delay, you might discover some optimization tweaks that get it down to a functional level for you . . . But that endeavor is primarily individual, personal to you and specific for your individual system, other then simply learning to get used to the delay it is not going to have a lot of broad based applicability (the 'state' of your registry is going to be very specific to your system and dependent not merely on whether you use your recording machine as a general purpose computer but specific implementation (including 'where' the hooks reside in the registry) of network (for example) protocols . . . Things might be working more or less OK then you automatically upgrade a video driver and the entire complexion alters)
Windows 7 is available as a beta, download it, check it out report back . . . It would be appreciated I'm sure