
Perry Lorier wrote:
There is often 1 proc listed as blocked so I guess the harddrives just can't keep up.
20% system is "high".
When doing mass imports, it's much better to run with no indexes, import the entire data, then regenerate the indexes at the end. Also, make sure if you're running a recent postgres that the autovacuum daemon isn't running revacuuming your data regularly.
As Oliver mentioned, using COPY instead of INSERT makes a significant difference (I think it does the above itself, instead of having to do it manually).
I don't think that COPY could be an option. A lot of the data is repeated, so store the string once and then use an ID for each string. The strings need to be indexed, as to the IDs so I can find the ID for an existing string. Currently the import is reporting a 85% hit ratio of existing strings. For each string I have statistics for different periods and sources which I store the stats for. I have run vmstat again and looked at the sys% and its not looking good at all. Yesterday during the import it was around 40% (ouch!) currently it is sitting around 20%. I might suggest that we get some more ram for the machine. Try and cache more of the indicies.