Jul-27-2017, 04:18 PM
Bit of a long one, so apologies in advance. I'm at the last stage of my dissertation, and all that's left for me is to use a neural network I co-developed with my supervisor to scan PE files to check whether they are infected or not.
My current to-do list, (not including testing and report write-up):
1. Generate a list of known system calls from a .txt file
2. Scan a drive/directory for PE files
3. Use said scan results to extract system calls from those PE files that were detected.
4. Generate a list of system calls the file makes, removing those which are not on the master list
5. The two lists then need to be compared, with system calls that appear on both lists generating a '1', and system calls that are only on the master list to appear as a '0'.
6. The list then needs to run through the neural network, (currently an .rda file, still need to convert it to .pmml)
7. The end result is the file being flagged as a virus, or not.
I'm not really a Python programmer, (or a programmer as such), hence me asking. Where should I start with all this? Should I be using lists, or try to generate a dictionary; and which libraries should I use for these tasks? From my understanding, the code won't exactly be long, but I genuinely have no clue how to approach this, or even how to begin.
I was given a small section of code to use for extracting the system calls:
My current to-do list, (not including testing and report write-up):
1. Generate a list of known system calls from a .txt file
2. Scan a drive/directory for PE files
3. Use said scan results to extract system calls from those PE files that were detected.
4. Generate a list of system calls the file makes, removing those which are not on the master list
5. The two lists then need to be compared, with system calls that appear on both lists generating a '1', and system calls that are only on the master list to appear as a '0'.
6. The list then needs to run through the neural network, (currently an .rda file, still need to convert it to .pmml)
7. The end result is the file being flagged as a virus, or not.
I'm not really a Python programmer, (or a programmer as such), hence me asking. Where should I start with all this? Should I be using lists, or try to generate a dictionary; and which libraries should I use for these tasks? From my understanding, the code won't exactly be long, but I genuinely have no clue how to approach this, or even how to begin.
I was given a small section of code to use for extracting the system calls:
import pefile import sys value = sys.argv[1] pe = pefile.PE(value) for entry in pe.DIRECTORY_ENTRY_IMPORT: for imp in entry.imports: print(hex(imp.address), imp.name)It's supposed to extract system calls from PE files, though I have no idea how to get it to work. As mentioned before, any advice/help would be greatly appreciated, as the deadline, (17th August), is closing in, and I would hate to lose months of work over a tiny bit of code.