UTILITY: ScoobyTables - tables markup utility

alesv · **Joined:** Fri Aug 26, 2016 8:21 am **Posts:** 154

Hi guys!

I've made a utility that should significantly decrease time of primary ROM analysis, namely, table markup. It can detect table type - boost, ignition timing, tip in and others. Of course, it must first be trained. One should prepare several definitions from which it will study. To achieve best results, these definitions should be from same CPU architecture, car model and close years.

You can get ScoobyTables here:

https://github.com/aalesv/ScoobyTables

You can get pre-trained model(s) here:

https://github.com/aalesv/ScoobyTables-pretrained

ScoobyTables

This software is designed for automatic Subaru Denso ROM table markup using machine learning. Currently, only the k-nearest neighbors algorithm is supported because it has distance metrics that can effectively eliminate false positives.

What ROMs are supported

For now, only Subaru Denso ROMs (and maybe other ROMs that have the same table format) are supported. It includes SH7055, SH7058, and SH72531.

How to use, briefly

ScoobyTables works with defs in ScoobyRom XML or CSV format. You can get ScoobyRom here. This is a fork of ScoobyRom that has support for CSV export and some other improvements.

Get ScoobyRom if you don't have one
Download ScoobyTables.py to a separate directory
Install python and then pandas, numpy, pyarrow, scikit-learn and whatever else is needed
Train your model or get pre-trained here
Copy your ROM to the directory where ScoobyTables.py is located
Open it with ScoobyRom, select all 2D and all 3D tables, and save them by pressing 'Ctrl+S' or by clicking the menu 'File\Save XML'. Defs will be saved to file 'YOUR-ROM-NAME.xml' located in the same directory.
Run
Code:
ScoobyTables.py --predict YOUR-ROM-NAME.xml -v
Predicted XML defs should be in the output.xml file
Backup YOUR-ROM-NAME.xml if you want and rename output.xml to YOUR-ROM-NAME.xml
Reopen your ROM with ScoobyRom
Inspect tables; you should see tables that ScoobyTables has found.
The software is not 100% accurate. There will be incorrectly detected or named tables, some tables could be missing. Please verify every table before proceeding further!

How it works and why exactly this way and not otherwise

ScoobyTables uses k-nearest neighbors algorithm. This means that every table should be presented as a point in n-dimensional space. Then the distance from each new, unknown, table to every known table is calculated. Based on this distance, a decision is made about class membership.

The idea is that tables placement does not change significantly from ROM to ROM. Of course, I'm talking about close enough ROMs - same car models, same CPU models, close years. The best illustration of this is Subaru Forester Gen 4 with SH72531 CPU. All the tables in those ROMs are located very similarly relative to each other. That's why this approach works.

The table contents themselves do not matter, only the data that describes the table:

Table structure relative ROM placement
Length of table axes
Data type of X and Y axes - RPM, temperature, engine load, etc.
Data type of table data itself
Table multiplier and offset, which are needed to convert from integer to float

Table structure relative ROM placement is easy to calculate if its address is known. Only ScoobyRom stores them. Length of table axes is defined in most defs except ScoobyRom. Data type of X or Y axes could be approximately represented by average number of the axis value - they can be calculated. Fortunately, ScoobyRom stores min and max values in defs' comments. Data type of table data itself are defined in every software. Table multiplier and offset are not stored directly in any definition.

So, no existing defs contain all information. More or less ScoobyRom XML defs are suited, but some info needs to be exctracted from binary ROM file. Or I can modify ScoobyRom to export all the data I need in some format, for example, in CSV, that can be easily imported by pandas. Well, I did both.

That's why if you want to use the XML format, you need both .bin and .xml files. But you can use modified ScoobyRom to export to CSV, and then you don't need a .bin file. By the way, I highly recommend to use modified ScoobyRom because it saves all annotated tables and all selected tables, which original ScoobyRom does not do - original saves only annotated tables. And to calculate relative table position, we need to have all the tables in def.

And now a couple of words about results. KNN predicts very well, but it makes many wrong predictions on incomplete data. Hardly someone defined all tables in ROM. So many wrong predictions are made. To filter them, distance thresholds are used - one for 2D tables (because they are more similar) and another for 3D tables. See CLI help.

How to train

First, you need to prepare the data. It is crucial that the same tables in all ROMs be called the same. Symbol case does not matter; a table name may end with '_1', '_2' etc. or '_A', '_B' etc. - all this ending will be stripped. For example, names 'Base_Timing_1' and 'Base_Timing_A' are good. And names 'BaseTimingA' and 'Base_Timing1' are not. This greatly affects the accuracy of the prediction.

I assume that you use modified ScoobyRom 0.8.5 or later

Open the ROM and select all 3D and 2D tables
Check that there are no erroneous tables in the list at the start and at the end of lists. Unselect erroneous tables if they exist
Save XML or export to CSV
Put your CSV files or ROM+XML files in a separate directory, for example, directory 'dataset'
Repeat for all your training ROMs
Run
Code:
ScoobyTables.py --train dataset -v --test-accuracy -i <xml|csv>
This means - load all data files from directory
Code:
dataset
, be verbose, and test accuracy
If you use XML, you can omit
Code:
-i
parameter. Otherwise, you should specify
Code:
-i csv

Command line parameters

One of the arguments

Code:

--train

or

Code:

--predict

is required. Run with

Code:

--help

to see help.

If you specify

Code:

--predict

, file format is guessed from the extension. Only ScoobyRom XML and CSV formats are supported.

You can set a distance threshold above which classifying will be ignored:

Code:

--knn-min-2d-reliable-metric

for 2D tables,

Code:

--knn-min-3d-reliable-metric

for 3D tables.

Full CLI help:

Code:

usage: ScoobyTables.py [-h] (--train <dirname> | --predict <filename>) [-i {xml,csv}] [--model-dump-file <filename>]
                       [-v] [--neighbors <number of neighbors>] [--test-accuracy]
                       [--test-sample-size <float number from 0.0 to 1.0>] [--random-state <int number>] [--dry-run]
                       [--knn-min-2d-reliable-metric <float number>] [--knn-min-3d-reliable-metric <float number>]
                       [--pre-xml-filename <filename>] [--dump-txt] [--pre-txt-filename <filename>] [--version]

options:
  -h, --help            show this help message and exit
  --train <dirname>     Train, test and dump model. Specify --test-accuracy to test accuracy. (default: None)
  --predict <filename>  Predict. Get data from <filename>. XML and CVS formats are supported and autodetected based on
                        the extension. (default: None)
  -i {xml,csv}, --input-format {xml,csv}
                        Input data format. (default: xml)
  --model-dump-file <filename>
                        Model dump file name. (default: scoobytables.dmp)
  -v, --verbose         Be verbose. (default: False)
  --neighbors <number of neighbors>
                        Number of neighbors for KNN model. (default: 2)
  --test-accuracy       Test model accuracy during model training. (default: False)
  --test-sample-size <float number from 0.0 to 1.0>
                        Test sample size. (default: 0.1)
  --random-state <int number>
                        Test sample random state pseudo-random number. If not set, test sample is purely random.
                        (default: None)
  --dry-run             Do not save anything to files. (default: False)
  --knn-min-2d-reliable-metric <float number>
                        Minimum reliable metric for 2D tables. (default: 0.5)
  --knn-min-3d-reliable-metric <float number>
                        Minimum reliable metric for 3D tables. (default: 5)
  --pre-xml-filename <filename>
                        Predicted XML definitions file name. (default: output.xml)
  --dump-txt            Write predicted data to text file. (default: False)
  --pre-txt-filename <filename>
                        Predicted dataframe text file name. (default: output.txt)
  --version             Print version number.

Also there's two additional scripts:

beautify_xml.py makes predicted ScoobyRom XML file more readable:

Code:

usage: beautify_xml.py [-h] [-i <filename>] [-o <filename>] [-F <symbol>] [--numeric-suffix] [--version]

Add suffixes "_A", "_B" etc or "_1", "_2" etc to same table names.

options:
  -h, --help            show this help message and exit
  -i <filename>, --input <filename>
                        Input filename (default: None)
  -o <filename>, --output <filename>
                        Output filename. stdout if not specified (default: None)
  -F <symbol>, --word-separator <symbol>
                        Word separator (default: _)
  --numeric-suffix      Suffix is numeric (default: False)
  --version             Print version number.

For better results please clean and rename tables first!

def_to_sr.py is IDA 6.8 (python 2.7) script to convert RomRaider definitions to ScoobyRom definitions. Run it inside IDA.

Happy hacking!

jimihimisimi · **Joined:** Tue Apr 05, 2022 4:57 pm **Posts:** 60

Cool stuff!

I did this manually to compare a 2013 vs 2014 Tribeca, I would be interested in trying out the script. I probably won't be able to get to it for a few weeks due to many other automoative side-quests at the moment.

https://www.romraider.com/forum/viewtopic.php?f=8&t=15179

alesv · **Joined:** Fri Aug 26, 2016 8:21 am **Posts:** 154

I did not test with only one ROM for training, but it could work. Following could improve accuracy: do not test accuracy when training (omit '--test-accuracy' flag) and set number of neighbors to 1 when training (by specifying '--neighbors 1' flag).

RomRaider

Forum rules

UTILITY: ScoobyTables - tables markup utility

Who is online