Hi
I have written a Python program to strip down a csv log file that contains every connection to our streaming server.
The data is stripped from the csv log and then processed line by line and each line is stored in a table.
Every time I read a line from the csv file, I check if there isn't an existing entry like this in the database to prevent duplicates.
The problem I have is that the database table currently has over 1 900 000 records. I'm using a select query by only selecting one column to try and optimize the query and feeding it only the necessary columns to match. I only need a row count returned.
Each time I run the query, it takes about 3 to 4 seconds to return a count. Checking the CPU usage of mysql shows it maxed out to 100%
A csv log file can grow as big with 4000 new lines in one hour and will expand when we have more clients.
If one line takes 3 seconds to process, This means the current instance of the python program will complete over just 3 hours.
To add to the problem, I have a cronjob running each hour to run the python program to strip out the past hour's entries. This will slow the system even further down as there is an existing instance running of the program.
Is there any way that I can optimize the query any further or change settings in mysql or any other ways to speed up these queries?
I have written a Python program to strip down a csv log file that contains every connection to our streaming server.
The data is stripped from the csv log and then processed line by line and each line is stored in a table.
Every time I read a line from the csv file, I check if there isn't an existing entry like this in the database to prevent duplicates.
The problem I have is that the database table currently has over 1 900 000 records. I'm using a select query by only selecting one column to try and optimize the query and feeding it only the necessary columns to match. I only need a row count returned.
Each time I run the query, it takes about 3 to 4 seconds to return a count. Checking the CPU usage of mysql shows it maxed out to 100%
A csv log file can grow as big with 4000 new lines in one hour and will expand when we have more clients.
If one line takes 3 seconds to process, This means the current instance of the python program will complete over just 3 hours.
To add to the problem, I have a cronjob running each hour to run the python program to strip out the past hour's entries. This will slow the system even further down as there is an existing instance running of the program.
Is there any way that I can optimize the query any further or change settings in mysql or any other ways to speed up these queries?