LogBoost
Convert a variety of log formats to CSV while enriching detected IPs with Geolocation, ASN, DNS, WhoIs, Shodan InternetDB and Threat Indicator matches.
Install / Use
/learn @joeavanzato/LogBoostREADME
What is it?
LogBoost is a command-line utility originally designed to enrich IP addresses in CSV files with ASN, Country and City information provided by the freely available MaxMind GeoLite2 DBs.
LogBoost can parse and convert a variety of structured and semi-structured log formats to CSV while simultaneously enriching detected IP addresses, including JSON, IIS, W3C, ELF, CLF, CEF, KV, SYSLOG.
The tool can also perform reverse lookups on each IP address detected in the source files to identify currently related domains. If 'GeoLite2-Domain.mmdb' is detected in the specified MaxMind DB Dir (CWD by default), the associated TLD of the enriched IP address is provided in the output as well.
On top of this, LogBoost can download text-based threat intelligence as configured in feed_config.json and parse these into a local SQLite DB which is then used to further enrich detected IP addresses with indicator matches.
Additionally, LogBoost is capable of live querying both WhoIs servers for IP addresses/Domains as well as Shodan InternetDB to provide additional enrichment detail for analysts.
All in - LogBoost can convert a variety of log formats to CSV while enriching IP addresses with ASN Org/Number, Country, City, Domains and Indicator Match Information.
Wiki: https://github.com/joeavanzato/LogBoost/wiki
QuickStart: https://github.com/joeavanzato/LogBoost/wiki/Quick-Start-Guide
Common Use Cases
- Enriching and combining a log directory containing thousands of similarly-structured files (WebServer logs, Cloudtrail dumps, Firewall exports, etc)
- Converting JSON Lines/Multi-line JSON blobs into more easily filterable CSVs
- Parsing KV-pair logging, such as Firewall dumps (k1=v1,k2=v2, etc)
- Parsing CEF-style logging, from Syslog or otherwise, into CSV
- Finding suspicious IP addresses in any inspected file through threat indicator matching
- Enriching IP addresses to find associated domain names and geolocations in any inspected file
Example Usage
To use, just download the latest release binary (along with feed_config.json if you wish to enhance results with threat intelligence. Additionally, setup a free MaxMind account at https://www.maxmind.com/en/geolite2/signup?utm_source=kb&utm_medium=kb-link&utm_campaign=kb-create-account to get a license key for the free GeoLite2 Databases. Once that key is acquired, you can either put it in an environment variable (MM_API), put it in a file in the CWD (mm_api.txt) or provide it at the command-line via the flag '-api'.
Common Use
-
LogBoost.exe -buildti- Build the Threat Indicator database locally - will also update all configured feeds. -
LogBoost.exe -updateti- Update the Threat Indicator database - run periodically to ingest new indicators from configured feeds. -
LogBoost.exe -updateti -includedc- Update the Threat Indicator database and also include datacenter IP addresses - this will add approximately ~129 million IPs consuming approximately 7 GB of space on disk. This is typically not necessary as LogBoost also contains a built-in list of ASN Numbers derived from https://raw.githubusercontent.com/X4BNet/lists_vpn/main/input/datacenter/ASN.txt -
LogBoost.exe -logdir logs -regex -api XXX- Enrich a directory containing one or more CSV files with Geolocation information, using regex to find the first non-private IP address in each row -
LogBoost.exe -useti -dns -whois -idb -getall -convert -regex- Enrich any files inside /input with Threat Intelligence, DNS, WhoIS, InternetDB and MaxMind data -
LogBoost.exe -logdir input -jsoncol data -ipcol client -fullparse- Enrich any CSV file within 'input' while also expanding JSON blobs located in the column named 'data' - the enriched IP address will be pulled from the column named 'client'. -
LogBoost.exe -logdir input -jsoncol data -fullparse -regex- Same as above but use regex to find the first non-private IP address. -
LogBoost.exe -logdir input -jsoncol data -fullparse -regex -useti- Same as above but also use the threat indicator db to enrich with IP matches. -
LogBoost.exe -logdir input -jsoncol data -fullparse -regex -useti -dns- Same as above but also do live DNS lookups on each IP address to find any associated domains. -
LogBoost.exe -logdir logs -convert -rawtxt- Process all .csv/.log/.txt files in 'logs' - look for relevant parsers or parse as raw text as last resort. -
LogBoost.exe -logdir logs -convert -getall- Process any file in 'logs', regardless of extension, with relevant parser or as raw text as last resort. -
LogBoost.exe -logdir logs -maxgoperfile 40 -batchsize 100 -writebuffer 2000 -concurrentfiles 1000- Process up to 1k concurrent files with 40 'threads' per file, each thread handling 100 records and the writer for each output buffering 2000 records at a time. -
LogBoost.exe -logdir logs -convert -dns -useti -regex -combineLook for all .csv/.log/.txt files inside 'logs' and enrich regexed IPs with Threat Indicators and DNS, combining all output files into a single CSV if a parser for the format is detected. -
LogBoost.exe -convert -logdir iislogs -startdate 01/01/2023 -datecol date -dateformat 2006-01-02 -convert -enddate 01/04/2023- Parse and Convert logs storing date in a column/key named 'date' with a format as specified between the specified dates (inclusive ranging)
Example Outputs
<h4 align="center">Enriching Azure Audit Log Export</h4> <p align="center"> <img src="images/azure_audit_enrich.png"> </p> <h4 align="center">Enriching and Expanding Azure Audit Log Export</h4> <p align="center"> <img src="images/azure_audit_enrich_expand.png"> </p> <h4 align="center">Enriching IPs with DNS (Live and MaxMind TLD if available)</h4> <p align="center"> <img src="images/azure_audit_enrich_dns.png"> </p> <h4 align="center">Enriching logs with built-in threat indicators</h4> <p align="center"> <img src="images/azure_audit_enrich_ti.png"> </p> <h4 align="center">Convert Common/Combined Log Format to CSV while enriching source IP address</h4> <p align="center"> <img src="images/convert_CLF_logs.png"> </p> <h4 align="center">Converting JSON Lines using Shallow or Deep Key parsing</h4> <p align="center"> <img src="images/json_line_logging.png"> </p> <h4 align="center">Parsing CloudTrail Multi-Line Records</h4> <p align="center"> <img src="images/cloudtrail_parse.png"> </p> <h4 align="center">Parsing arbitrary KV-style logs using provided separators/delimiters</h4> <p align="center"> <img src="images/kv_firewall_logs.png"> </p> <h4 align="center">Parsing Syslog (Generic/RFC 3164/RFC 5424) to CSV </h4> <p align="center"> <img src="images/syslog_parsing.png"> </p> <h4 align="center">Transparently handling GZ files</h4> <p align="center"> <img src="images/gz_parsing.png"> </p>Primary Features
- Process Structured/Semi-Structured/Unstructured data to enriched CSV
- CSV
- Internet Information Services (IIS)
- W3C Extended Format (W3C)
- Extended Log Format (ELF)
- Common Log Format / Combined Log Format (CLF)
- Common Event Format (CEF)
- Shallow or Deep Parsing
- JSON per-line logging
- Shallow or Deep Parsing
- Multi-Line JSON Blobs from Fixed Inputs
- AWS CloudTrail Exports
- Generic Syslog
- KV (key1=value1, key2="value 2") style logging
- Shallow or Deep Parsing
- Raw Text Files
- Read plain-text files or GZ archives transparently for all parser types
- Handles files 'line by line' to avoid reading entire file into memory
- Expand JSON blobs embedded within CSV to individual columns
- Filtering outputs on specific datetime ranges
- Enriching detected IP with MaxMind Geo/ASN Information
- Enriching detected IP with DNS lookups
- Enriching detected IP with configurable threat indicator feeds
- Enriching detected IP with WhoIs data
- Enriching detected IP with Shodan InternetDB data
- Enriching detected domain-names from DNS with WhoIs data
- Ingesting custom indicator files
- Combining outputs on per-directory basis
- Customizing concurrency settings to fine-tune efficiency/throughput
- Capable of handling thousands of files concurrently by default
- Auto-download / update of MaxMind and configured Threat Feeds
Requirements
To use this tool, a free API key from MaxMind is required - once an account is registered, a personal license key can be generated at https://www.maxmind.com/en/accounts/.
In order to update MaxMind MMDBs, you must provide your Account ID and API Key in one of 3 ways to the tool:
- via commandline argument '-api'
- via environment variable 'MM_API'
- via file in current working directory named 'mm_api.txt'
The expected format is "$ACCOUNTID:$APIKEY" - for example, -api "222111:6ij3x2_GRChRSGRAWeHuFbu4W136UDGdrLeV_sse"
The tool will automatically download and extract the latest version of each database if they are not found in the current working directory.
Updates to local databases can be triggered via '-updategeo' flag.
Outputs
The ultimate output of running LogBoost against one or more input files is a CSV file which represents the original data stream but will contain an additional 7 columns as listed below:
- lb_IP - Represents the IP address used for enrichment tasks.
- lb_ASN - Represents the name of the ASN Organization associated with the IP address.
- lb_ASN_Number - Represents the number of the ASN associated with the IP address
- lb_Country - Represents the name of the Country associated with the IP address.
- lb_City - Represents the name of the City associated with the IP address.
- lb_Domains (-dns) - Represents any domain name associated with the IP address, split by '|' if there are multiple.
- lb_TLD (-dns) - Represents the Top Level Domain associated with the IP address - only populated if 'GeoIP2-Domain.mmdb' is found in the specified MaxMind DB directory (CWD by default).
- lb_Th
