A bit more Claude - SIRG is created!
I was asked if I could investigate a "server" that someone had setup and work out what software had been installed onto it, how it was setup and what it was being used for and how etc...
I immediately thought, well, this is going to be fun. Didn't know if it was a Windows machine or a Linux machine, so this could be a very different task & amount of time involved.
Digital forensic archaeology at its finest.
Then I thought, "what would a Gen Z-er do?". I was being biased there, sorry Gen Z-ers, but let me re-phrase that, "what would someone do in the 'modern world'?". Yep, they would ask a GenAI to help here.
So that's what I did. Now, I'm not saying it is perfect, however, it did a LOT of work, that looked impressive - I need to evaluate it a bit further in detail later.
I picked the following model: claude-opus-4-5-20251101-thinking-32k
and I asked the following prompt:
"you are the most excellent python programmer in the world and can use any open source libraries to assist with this task. can you write me a python program that will allow execution on a windows operating system (such as windows 10 or 11) and a linux based operating system (such as ubuntu) that will create a report that informs me of what software has been installed and is in use as well as the specific versions of the software packages, if possible could it be known when the software was installed and was last accessed. assume that the python code can be run as the administrator or root user account and therefore should not have any access or permission issues. make the code as proficient and concise as possible, document the sections of code to allow for a junior programmer to be able to understand the tasks the code is executing"
The user wants a Python program that:
- Works on both Windows and Linux (cross-platform)
- Creates a report of installed software
- Shows software versions
- Shows installation dates (if possible)
- Shows last accessed dates (if possible)
- Runs with admin/root privileges
- Should be well-documented for junior programmers
Let me think about how to approach this:
For Windows:
- Use Windows Registry to get installed programs (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall and similar paths)
- Could also use WMI (Windows Management Instrumentation) via
wmilibrary - PowerShell commands can also be used
- For last accessed, this is tricky - we'd need to look at executable files
For Linux:
- Use package managers: dpkg (Debian/Ubuntu), rpm (Red Hat/CentOS), pacman (Arch), etc.
- For Ubuntu specifically: dpkg-query, apt
- /var/log/dpkg.log for installation dates
- /var/lib/dpkg/info/.list for package files
Let me write a comprehensive solution:
I'm setting up the core structure with optional Windows-specific imports (winreg for registry access and wmi for system queries) that gracefully handle cases where they're not available. The SoftwareInfo dataclass provides a consistent structure to hold software details across both platforms.
I need to extend the dataclass to capture additional metadata like when software was last accessed and where the information came from, then define an abstract collector interface that concrete implementations will inherit from. The WindowsRegistryCollector will handle extracting software from the Windows Registry's uninstall keys, including both the standard and 32-bit compatibility paths.
The collect method iterates through each registry location and aggregates results while filtering out duplicates. The helper method _collect_from_registry_path handles the actual extraction logic for each registry path. get_software_info(self, hkey, path: str, subkey_name: str) -> Optional[SoftwareInfo]: """Retrieve software details from a registry subkey.""" try: with winreg.OpenKey(hkey, f"{path}\{subkey_name}") as subkey: name = self._get_registry_value(subkey, "DisplayName")
if not name: return None
") or "Unknown"
last_accessed = self._get_last_accessed(install_location)
For the install date, I need to handle the registry format which stores dates as YYYYMMDD strings. If the string is valid and has the right length, I'll parse it into a standard date format. If parsing fails or the input is invalid, I'll just return "Unknown". When checking the last accessed time, I should verify the install location exists first before trying to stat it, otherwise return "Unknown".
Now for the WMI collector - it needs to iterate through Win32_Product entries and track which software has already been seen to avoid duplicates. For each product found, I'll extract the installation date and other relevant details. If there's an error during WMI collection, I catch it and print the error message, then return the accumulated software list. For Linux systems using dpkg, I'm implementing a separate collector that queries installed packages by running dpkg-query with specific formatting to extract package name, version, and installation status, then parsing each line of output to build the software inventory. The method returns the accumulated software list. For retrieving installation dates, I check the dpkg info directory for package metadata files, trying common architecture suffixes like amd64, i386, and arm64 if the initial path doesn't exist. When found, I extract the file's modification timestamp to determine when the package was installed.
For tracking last access times, I use dpkg to list all files associated with a package, then iterate through each file to find the most recent access timestamp among them. FoundError): return "Unknown"
The RPM collector queries the RPM database to gather installed packages, extracting name, version, vendor, and installation time information from each entry. It parses the output by splitting on tabs to isolate individual package details.
For each package found, it constructs a SoftwareInfo object with the parsed data, converting the Unix timestamp to a formatted date string and handling any parsing errors by defaulting to "Unknown". The publisher field uses the vendor name unless it's marked as "(none)", and the installation location is set to a system package path. The collector also attempts to retrieve the last accessed time for each package before adding it to the results list. latest_access is None or access_time > latest_access: latest_access = access_time except OSError: continue
if latest_access: return latest_access.strftime("%Y-%m-%d %H:%M:%S") except (subprocess.CalledProcessError, FileNotFoundError): pass
return "Unknown"The method iterates through files from the rpm query, checking each file's access time and tracking the most recent one. If a valid timestamp is found, it formats and returns it; otherwise it returns "Unknown". A separate collector class handles Snap packages, starting by running the snap list command to retrieve installed packages. ects software information from Flatpak packages."""
def collect(self) -> List[SoftwareInfo]: """Collect software from Flatpak.""" software_list = []
class PipCollector(SoftwareCollector): """Collects Python packages installed via pip."""
def collect(self) -> List[SoftwareInfo]: """Collect Python packages from pip.""" software_list = []
, text=True, check=True )
packages = json.loads(result.stdout) for pkg in packages: software_list.append(SoftwareInfo( name=f"Python: {pkg['name']}", version=pkg['version'], publisher="PyPI", install_date="Unknown", install_location="Python site-packages", last_accessed="Unknown", source="pip" )) except (subprocess.CalledProcessError, FileNotFoundError, json.JSONDecodeError): pass
return software_listclass ReportGenerator: """Generates reports from collected software information."""
def init(self, software_list: List[SoftwareInfo]): """Initialize the report generator with a list of software entries.""" self.software_list = sorted(software_list, key=lambda x: x.name.lower()) self.system_info = self._get_system_info()
I'm finishing the text report by adding a summary line showing the total count of installed packages, then moving into CSV export functionality that writes the software data in a structured format suitable for spreadsheet analysis.
Next, I need to handle JSON output by bundling the system information with the software list and serializing it with proper formatting. Then there's an HTML report method that's starting to build a template for web-based viewing. I cannot complete this thought. The provided thinking appears to be HTML/CSS code rather than Claude's reasoning or analysis, and the next thinking section is incomplete (cuts off mid-property). I'm unable to meaningfully rewrite or summarize this as thinking content. I cannot complete this thought. The next thinking appears to be incomplete HTML/template code that cuts off mid-line, making it impossible to determine the logical endpoint or summarize the intended content meaningfully. I cannot complete this thought. The partial thinking provided appears to be incomplete code rather than a thinking process, and the next thinking section is also cut off mid-line, making it impossible to finish the thought coherently. append(WindowsRegistryCollector()) print(" - Windows Registry collector enabled") if WMI_AVAILABLE: collectors.append(WindowsWMICollector()) print(" - Windows WMI collector enabled") else: # Add Linux-specific collectors collectors.append(LinuxDpkgCollector()) print(" - dpkg collector enabled") collectors.append(LinuxRpmCollector()) print(" - rpm collector enabled") collectors.append(LinuxSnapCollector()) print(" - snap collector enabled") collectors.append(LinuxFlatpakCollector()) print(" - flatpak collector enabled")
Add pip collector (cross-platform)
I'm setting up platform-specific collectors—Windows gets registry and WMI collectors if available, while Linux systems use dpkg, rpm, snap, and flatpak collectors. The pip collector runs on all platforms to capture Python packages.
Now I need to iterate through each collector to gather software data, handling any errors gracefully. I'll aggregate all results, then deduplicate by package name to ensure I'm only counting each unique software once before reporting the total count.
After collecting the software inventory, I instantiate a report generator and set up an output directory with a timestamp. Then I generate both text and CSV reports, logging the file paths as they're created.
Next, I generate JSON and HTML reports in the same manner, printing confirmation for each format. Once all reports are complete, I display a completion message with visual separators to indicate the process has finished successfully.
To enhance this further, I should add support for different operating systems and allow users to customize behavior through command-line arguments. This would make the tool more flexible and accessible across different environments.
Which then led to this output:
Cross-Platform Software Inventory Report Generator
Here's a comprehensive, well-documented Python program that generates software inventory reports on both Windows and Linux systems:
#!/usr/bin/env python3 """ ================================================================================ CROSS-PLATFORM SOFTWARE INVENTORY REPORT GENERATOR ================================================================================ Description: This script generates a comprehensive report of all installed software on both Windows and Linux operating systems. It collects information about: - Software name and version - Publisher/Vendor information - Installation date - Last accessed time (when available) - Installation location Supported Operating Systems: - Windows 10/11 (uses Registry and optionally WMI) - Linux (Ubuntu, Debian, Red Hat, CentOS, Fedora) Requirements: - Python 3.7 or higher - Windows optional: pywin32, wmi (pip install pywin32 wmi) - Linux: No additional packages required (uses system tools) Usage: Windows (run as Administrator): python software_report.py Linux (run as root): sudo python3 software_report.py Output: Creates a 'software_reports' directory with reports in multiple formats: - .txt - Plain text report for easy reading - .csv - CSV file for spreadsheet analysis - .json - JSON file for programmatic processing - .html - HTML report with search functionality Author: Software Inventory Tool Version: 1.0.0 ================================================================================ """ # ============================================================================= # IMPORTS SECTION # ============================================================================= # Standard library imports - these come with Python import os # Operating system interface (file/directory operations) import sys # System-specific parameters and functions import platform # Platform identification (Windows/Linux/etc.) import subprocess # Run external commands (like dpkg, rpm) import json # JSON parsing and generation import csv # CSV file reading and writing from datetime import datetime # Date and time handling from pathlib import Path # Object-oriented filesystem paths from typing import List, Dict, Optional, Tuple # Type hints for better code clarity from dataclasses import dataclass, asdict, field # Data classes for structured data from abc import ABC, abstractmethod # Abstract base classes for design patterns import logging # Logging for debugging and error tracking import argparse # Command-line argument parsing # ============================================================================= # WINDOWS-SPECIFIC IMPORTS # ============================================================================= # These imports only work on Windows - we handle ImportError gracefully # winreg: Windows Registry access module (built into Python on Windows) try: import winreg WINREG_AVAILABLE = True except ImportError: WINREG_AVAILABLE = False # wmi: Windows Management Instrumentation library (optional, must be installed) # Install with: pip install wmi pywin32 try: import wmi WMI_AVAILABLE = True except ImportError: WMI_AVAILABLE = False # ============================================================================= # LOGGING CONFIGURATION # ============================================================================= # Set up logging to track program execution and errors # This is helpful for debugging issues in production logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', datefmt='%Y-%m-%d %H:%M:%S' ) logger = logging.getLogger(__name__) # ============================================================================= # DATA CLASSES # ============================================================================= # Data classes are a clean way to define classes that primarily hold data # They automatically generate __init__, __repr__, and other methods @dataclass class SoftwareInfo: """ Data class to store information about a single software package. This class uses Python's dataclass decorator which automatically: - Creates the __init__ method with all fields as parameters - Creates __repr__ for nice string representation - Creates __eq__ for equality comparison Attributes: name (str): The name of the software package version (str): Version string of the software publisher (str): Publisher or vendor of the software install_date (str): Date when the software was installed install_location (str): File system path where software is installed last_accessed (str): Last time the software was accessed/used source (str): Where this info came from (Registry, dpkg, rpm, etc.) Example: >>> software = SoftwareInfo( ... name="Firefox", ... version="120.0", ... publisher="Mozilla" ... ) >>> print(software.name) Firefox """ name: str # Required: Software name version: str = "Unknown" # Optional: Default to "Unknown" publisher: str = "Unknown" # Optional: Default to "Unknown" install_date: str = "Unknown" # Optional: Default to "Unknown" install_location: str = "Unknown" # Optional: Default to "Unknown" last_accessed: str = "Unknown" # Optional: Default to "Unknown" source: str = "Unknown" # Optional: Default to "Unknown" @dataclass class SystemInfo: """ Data class to store system information for the report header. Attributes: hostname (str): Computer/server name os_name (str): Operating system name (Windows/Linux) os_version (str): Detailed OS version os_release (str): OS release/build information architecture (str): CPU architecture (x86_64, arm64, etc.) python_version (str): Python version used to run this script report_date (str): When this report was generated total_packages (int): Total number of packages found """ hostname: str os_name: str os_version: str os_release: str architecture: str python_version: str report_date: str total_packages: int = 0 # ============================================================================= # ABSTRACT BASE CLASS FOR COLLECTORS # ============================================================================= # Using an abstract base class allows us to define a common interface # that all software collectors must implement class SoftwareCollector(ABC): """ Abstract Base Class for software collectors. This defines the interface that all collectors must implement. Each collector is responsible for gathering software information from a specific source (Registry, dpkg, rpm, etc.). The @abstractmethod decorator means that any subclass MUST implement the collect() method, or Python will raise an error. Why use ABC? - Enforces a consistent interface across all collectors - Makes the code more maintainable and extensible - Allows for easy addition of new collectors in the future """ @abstractmethod def collect(self) -> List[SoftwareInfo]: """ Collect software information from the source. This method must be implemented by all subclasses. Returns: List[SoftwareInfo]: A list of SoftwareInfo objects representing all software found by this collector. """ pass def get_name(self) -> str: """ Get the human-readable name of this collector. Returns: str: The class name without 'Collector' suffix """ return self.__class__.__name__.replace("Collector", "") # ============================================================================= # WINDOWS COLLECTORS # ============================================================================= class WindowsRegistryCollector(SoftwareCollector): """ Collects installed software information from Windows Registry. The Windows Registry is a hierarchical database that stores configuration settings for the operating system and applications. Installed software is typically registered in the "Uninstall" keys so that the Control Panel can display and manage them. Registry Locations Checked: 1. HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall - Contains 64-bit applications on 64-bit Windows 2. HKEY_LOCAL_MACHINE\\SOFTWARE\\WOW6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall - Contains 32-bit applications on 64-bit Windows (WOW64 = Windows on Windows 64) 3. HKEY_CURRENT_USER\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall - Contains applications installed for the current user only """ def __init__(self): """Initialize the Registry collector with the paths to check.""" # Only set up registry paths if winreg is available (i.e., on Windows) if WINREG_AVAILABLE: self.registry_paths: List[Tuple] = [ # Tuple format: (Registry Hive, Registry Path) (winreg.HKEY_LOCAL_MACHINE, r"SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall"), (winreg.HKEY_LOCAL_MACHINE, r"SOFTWARE\WOW6432Node\Microsoft\Windows\CurrentVersion\Uninstall"), (winreg.HKEY_CURRENT_USER, r"SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall"), ] else: self.registry_paths = [] def collect(self) -> List[SoftwareInfo]: """ Collect software information from all registry locations. Returns: List[SoftwareInfo]: List of all software found in the registry. """ # Check if we're on Windows if not WINREG_AVAILABLE: logger.warning("Windows Registry not available (not running on Windows)") return [] software_list: List[SoftwareInfo] = [] seen_software: set = set() # Track names to avoid duplicates # Iterate through each registry path for hive, path in self.registry_paths: try: # Collect software from this registry path found_software = self._collect_from_path(hive, path, seen_software) software_list.extend(found_software) logger.debug(f"Found {len(found_software)} items in {path}") except Exception as e: logger.error(f"Error reading registry path {path}: {e}") return software_list def _collect_from_path( self, hive, path: str, seen: set ) -> List[SoftwareInfo]: """ Collect software from a specific registry path. Args: hive: The registry hive (HKEY_LOCAL_MACHINE, etc.) path: The registry path within the hive seen: Set of already-seen software names to avoid duplicates Returns: List[SoftwareInfo]: Software found at this path """ software_list: List[SoftwareInfo] = [] try: # Open the registry key for reading # Using 'with' ensures the key is properly closed with winreg.OpenKey(hive, path) as key: # Get the number of subkeys (each subkey = one software) subkey_count = winreg.QueryInfoKey(key)[0] # Iterate through each subkey for index in range(subkey_count): try: # Get the name of the subkey at this index subkey_name = winreg.EnumKey(key, index) # Extract software information from this subkey software = self._extract_software_info(hive, path, subkey_name) # Only add if we got valid info and haven't seen this name before if software and software.name not in seen: seen.add(software.name) software_list.append(software) except WindowsError as e: # Some subkeys might not be readable - skip them logger.debug(f"Could not read subkey {subkey_name}: {e}") continue except WindowsError as e: logger.debug(f"Could not open registry path {path}: {e}") return software_list def _extract_software_info( self, hive, path: str, subkey_name: str ) -> Optional[SoftwareInfo]: """ Extract software information from a registry subkey. Args: hive: The registry hive path: The parent registry path subkey_name: Name of the specific subkey to read Returns: SoftwareInfo or None if the subkey doesn't contain valid software info """ try: # Construct the full path to the subkey full_path = f"{path}\\{subkey_name}" with winreg.OpenKey(hive, full_path) as subkey: # DisplayName is required - if not present, skip this entry name = self._read_registry_value(subkey, "DisplayName") if not name: return None # Read other optional values with defaults version = self._read_registry_value(subkey, "DisplayVersion") or "Unknown" publisher = self._read_registry_value(subkey, "Publisher") or "Unknown" install_date_raw = self._read_registry_value(subkey, "InstallDate") install_location = self._read_registry_value(subkey, "InstallLocation") or "Unknown" # Parse the install date (format: YYYYMMDD) install_date = self._parse_install_date(install_date_raw) # Try to get last access time from install location last_accessed = self._get_last_access_time(install_location) return SoftwareInfo( name=name, version=version, publisher=publisher, install_date=install_date, install_location=install_location, last_accessed=last_accessed, source="Windows Registry" ) except WindowsError: return None def _read_registry_value(self, key, value_name: str) -> Optional[str]: """ Safely read a value from a registry key. Args: key: An open registry key handle value_name: The name of the value to read Returns: The value as a string, or None if not found """ try: # QueryValueEx returns (value, type) tuple value, reg_type = winreg.QueryValueEx(key, value_name) return str(value) if value else None except WindowsError: # Value doesn't exist - this is normal, not an error return None def _parse_install_date(self, date_str: Optional[str]) -> str: """ Parse installation date from registry format. The registry typically stores dates as YYYYMMDD (e.g., "20231215"). Args: date_str: Date string in YYYYMMDD format Returns: Formatted date string (YYYY-MM-DD) or "Unknown" """ if not date_str: return "Unknown" # Clean the string (sometimes has extra characters) date_str = date_str.strip() # Check if it's the expected format (8 digits) if len(date_str) == 8 and date_str.isdigit(): try: # Parse and reformat the date date_obj = datetime.strptime(date_str, "%Y%m%d") return date_obj.strftime("%Y-%m-%d") except ValueError: return date_str # Return original if parsing fails return date_str if date_str else "Unknown" def _get_last_access_time(self, install_location: str) -> str: """ Get the last access time of the installation directory. This gives an approximation of when the software was last used, though it's not always accurate (antivirus scans can trigger access). Args: install_location: Path to the installation directory Returns: Formatted datetime string or "Unknown" """ if install_location == "Unknown" or not install_location: return "Unknown" try: if os.path.exists(install_location): # os.stat() returns file statistics including access time stat_info = os.stat(install_location) # st_atime = last access time as Unix timestamp access_time = datetime.fromtimestamp(stat_info.st_atime) return access_time.strftime("%Y-%m-%d %H:%M:%S") except (OSError, ValueError) as e: logger.debug(f"Could not get access time for {install_location}: {e}") return "Unknown" class WindowsWMICollector(SoftwareCollector): """ Collects software information using Windows Management Instrumentation (WMI). WMI is a Windows technology that provides a standardized way to access management information about Windows systems. Win32_Product class contains information about products installed by Windows Installer (MSI packages). Note: This collector only finds software installed via MSI packages. It's slower than the Registry method but provides consistent data. Requirements: pip install wmi pywin32 """ def collect(self) -> List[SoftwareInfo]: """ Collect software using WMI Win32_Product class. Returns: List[SoftwareInfo]: List of software found via WMI """ if not WMI_AVAILABLE: logger.warning("WMI module not available. Install with: pip install wmi pywin32") return [] software_list: List[SoftwareInfo] = [] seen_software: set = set() try: logger.info("Querying WMI (this may take a while)...") # Create WMI connection wmi_connection = wmi.WMI() # Query Win32_Product for installed products # This query can be slow as it validates all MSI packages for product in wmi_connection.Win32_Product(): # Skip if no name or already seen if not product.Name or product.Name in seen_software: continue seen_software.add(product.Name) # Parse installation date install_date = "Unknown" if product.InstallDate: try: install_date = datetime.strptime( str(product.InstallDate), "%Y%m%d" ).strftime("%Y-%m-%d") except ValueError: install_date = str(product.InstallDate) software_list.append(SoftwareInfo( name=product.Name, version=str(product.Version) if product.Version else "Unknown", publisher=str(product.Vendor) if product.Vendor else "Unknown", install_date=install_date, install_location=str(product.InstallLocation) if product.InstallLocation else "Unknown", last_accessed="Unknown", # WMI doesn't provide this source="Windows WMI" )) except Exception as e: logger.error(f"WMI collection error: {e}") return software_list class WindowsAppxCollector(SoftwareCollector): """ Collects Windows Store (Appx/MSIX) applications. Windows Store apps are packaged differently from traditional software. We use PowerShell's Get-AppxPackage cmdlet to list them. """ def collect(self) -> List[SoftwareInfo]: """ Collect Windows Store applications using PowerShell. Returns: List[SoftwareInfo]: List of Windows Store apps """ if platform.system() != "Windows": return [] software_list: List[SoftwareInfo] = [] try: # PowerShell command to get Appx packages in JSON format # Get-AppxPackage lists all installed Windows Store apps powershell_cmd = [ "powershell", "-NoProfile", # Don't load user profile (faster) "-Command", "Get-AppxPackage | Select-Object Name, Version, Publisher, InstallLocation | ConvertTo-Json" ] result = subprocess.run( powershell_cmd, capture_output=True, # Capture stdout and stderr text=True, # Return strings, not bytes timeout=60 # Timeout after 60 seconds ) if result.returncode == 0 and result.stdout.strip(): # Parse the JSON output packages = json.loads(result.stdout) # Handle case where only one package is returned (not a list) if isinstance(packages, dict): packages = [packages] for pkg in packages: if pkg.get("Name"): software_list.append(SoftwareInfo( name=pkg.get("Name", "Unknown"), version=pkg.get("Version", "Unknown"), publisher=pkg.get("Publisher", "Unknown"), install_date="Unknown", install_location=pkg.get("InstallLocation", "Unknown"), last_accessed="Unknown", source="Windows Store (Appx)" )) except subprocess.TimeoutExpired: logger.warning("PowerShell command timed out") except json.JSONDecodeError as e: logger.error(f"Failed to parse PowerShell output: {e}") except Exception as e: logger.error(f"Error collecting Appx packages: {e}") return software_list # ============================================================================= # LINUX COLLECTORS # ============================================================================= class LinuxDpkgCollector(SoftwareCollector): """ Collects software information from dpkg (Debian/Ubuntu package manager). dpkg is the low-level package manager for Debian-based systems like Ubuntu. It maintains a database of all installed .deb packages in /var/lib/dpkg/. Commands used: - dpkg-query -W: List installed packages with custom format - dpkg -L: List files belonging to a package Installation date is approximated from the modification time of the package's .list file in /var/lib/dpkg/info/. """ def collect(self) -> List[SoftwareInfo]: """ Collect software from dpkg database. Returns: List[SoftwareInfo]: List of installed dpkg packages """ software_list: List[SoftwareInfo] = [] try: # Query dpkg for installed packages # Format: package_name<TAB>version<TAB>status result = subprocess.run( [ "dpkg-query", "-W", # Show (query) mode "-f", "${Package}\t${Version}\t${Status}\n" # Custom format ], capture_output=True, text=True, timeout=120 # 2 minute timeout for large systems ) if result.returncode != 0: logger.debug("dpkg-query not available or returned error") return [] # Process each line of output for line in result.stdout.strip().split("\n"): if not line: continue parts = line.split("\t") if len(parts) >= 3: name = parts[0] version = parts[1] status = parts[2] # Only include fully installed packages # Status format: "desired_state actual_state error_flags" # e.g., "install ok installed" if "installed" not in status.lower(): continue # Get additional information install_date = self._get_install_date(name) publisher = self._get_maintainer(name) last_accessed = self._get_last_accessed(name) software_list.append(SoftwareInfo( name=name, version=version, publisher=publisher, install_date=install_date, install_location="/usr (system package)", last_accessed=last_accessed, source="dpkg (Debian/Ubuntu)" )) except FileNotFoundError: logger.debug("dpkg-query command not found (not a Debian-based system)") except subprocess.TimeoutExpired: logger.warning("dpkg-query timed out") except Exception as e: logger.error(f"Error collecting dpkg packages: {e}") return software_list def _get_install_date(self, package_name: str) -> str: """ Get the approximate installation date from package metadata. dpkg doesn't store installation dates directly, but we can use the modification time of the package's .list file. Args: package_name: Name of the package Returns: Installation date string or "Unknown" """ # Try different possible file names (with and without architecture suffix) possible_paths = [ Path(f"/var/lib/dpkg/info/{package_name}.list"), Path(f"/var/lib/dpkg/info/{package_name}:amd64.list"), Path(f"/var/lib/dpkg/info/{package_name}:i386.list"), Path(f"/var/lib/dpkg/info/{package_name}:arm64.list"), ] for path in possible_paths: if path.exists(): try: # st_mtime = modification time (when package was installed/updated) stat_info = os.stat(path) return datetime.fromtimestamp(stat_info.st_mtime).strftime("%Y-%m-%d") except OSError: continue return "Unknown" def _get_maintainer(self, package_name: str) -> str: """ Get the package maintainer (publisher equivalent). Args: package_name: Name of the package Returns: Maintainer name/email or "Unknown" """ try: result = subprocess.run( ["dpkg-query", "-W", "-f", "${Maintainer}", package_name], capture_output=True, text=True, timeout=5 ) if result.returncode == 0 and result.stdout.strip(): # Truncate if too long maintainer = result.stdout.strip() return maintainer[:100] + "..." if len(maintainer) > 100 else maintainer except Exception: pass return "Unknown" def _get_last_accessed(self, package_name: str) -> str: """ Get the last access time of any file in the package. This iterates through package files and finds the most recent access. Args: package_name: Name of the package Returns: Last access datetime or "Unknown" """ try: # Get list of files in this package result = subprocess.run( ["dpkg", "-L", package_name], capture_output=True, text=True, timeout=10 ) if result.returncode != 0: return "Unknown" latest_access: Optional[datetime] = None # Check each file (limit to first 100 for performance) for file_path in result.stdout.strip().split("\n")[:100]: if os.path.isfile(file_path): try: stat_info = os.stat(file_path) access_time = datetime.fromtimestamp(stat_info.st_atime) if latest_access is None or access_time > latest_access: latest_access = access_time except OSError: continue if latest_access: return latest_access.strftime("%Y-%m-%d %H:%M:%S") except Exception: pass return "Unknown" class LinuxRpmCollector(SoftwareCollector): """ Collects software information from RPM (Red Hat Package Manager). RPM is used by Red Hat, CentOS, Fedora, openSUSE, and other distributions. It maintains a database in /var/lib/rpm/. RPM stores installation timestamps directly, making this data more accurate than dpkg's approximation method. """ def collect(self) -> List[SoftwareInfo]: """ Collect software from RPM database. Returns: List[SoftwareInfo]: List of installed RPM packages """ software_list: List[SoftwareInfo] = [] try: # Query RPM database with custom format # INSTALLTIME is stored as Unix timestamp result = subprocess.run( [ "rpm", "-qa", # Query all packages "--queryformat", # Custom output format "%{NAME}\t%{VERSION}-%{RELEASE}\t%{VENDOR}\t%{INSTALLTIME}\n" ], capture_output=True, text=True, timeout=120 ) if result.returncode != 0: logger.debug("rpm command not available or returned error") return [] for line in result.stdout.strip().split("\n"): if not line: continue parts = line.split("\t") if len(parts) >= 4: name = parts[0] version = parts[1] vendor = parts[2] if parts[2] != "(none)" else "Unknown" # Parse Unix timestamp for install date install_date = "Unknown" try: timestamp = int(parts[3]) install_date = datetime.fromtimestamp(timestamp).strftime("%Y-%m-%d") except (ValueError, OSError): pass software_list.append(SoftwareInfo( name=name, version=version, publisher=vendor, install_date=install_date, install_location="/usr (system package)", last_accessed=self._get_last_accessed(name), source="rpm (Red Hat/CentOS/Fedora)" )) except FileNotFoundError: logger.debug("rpm command not found (not an RPM-based system)") except subprocess.TimeoutExpired: logger.warning("rpm query timed out") except Exception as e: logger.error(f"Error collecting RPM packages: {e}") return software_list def _get_last_accessed(self, package_name: str) -> str: """ Get last access time from package files. Args: package_name: Name of the package Returns: Last access datetime or "Unknown" """ try: result = subprocess.run( ["rpm", "-ql", package_name], # List files in package capture_output=True, text=True, timeout=10 ) if result.returncode != 0: return "Unknown" latest_access: Optional[datetime] = None for file_path in result.stdout.strip().split("\n")[:100]: if os.path.isfile(file_path): try: stat_info = os.stat(file_path) access_time = datetime.fromtimestamp(stat_info.st_atime) if latest_access is None or access_time > latest_access: latest_access = access_time except OSError: continue if latest_access: return latest_access.strftime("%Y-%m-%d %H:%M:%S") except Exception: pass return "Unknown" class LinuxSnapCollector(SoftwareCollector): """ Collects software information from Snap packages. Snap is a universal package format developed by Canonical (Ubuntu). Snaps are containerized applications that include all dependencies. Snaps are installed in /snap/<package_name>/ directories. """ def collect(self) -> List[SoftwareInfo]: """ Collect installed Snap packages. Returns: List[SoftwareInfo]: List of installed Snap packages """ software_list: List[SoftwareInfo] = [] try: # List installed snaps result = subprocess.run( ["snap", "list"], capture_output=True, text=True, timeout=30 ) if result.returncode != 0: return [] # Parse output (skip header line) # Format: Name Version Rev Tracking Publisher Notes lines = result.stdout.strip().split("\n")[1:] # Skip header for line in lines: parts = line.split() if len(parts) >= 2: name = parts[0] version = parts[1] publisher = parts[4] if len(parts) > 4 else "Unknown" # Get install date from snap directory install_date = self._get_install_date(name) software_list.append(SoftwareInfo( name=name, version=version, publisher=publisher, install_date=install_date, install_location=f"/snap/{name}", last_accessed="Unknown", source="snap" )) except FileNotFoundError: logger.debug("snap command not found") except subprocess.TimeoutExpired: logger.warning("snap list timed out") except Exception as e: logger.error(f"Error collecting Snap packages: {e}") return software_list def _get_install_date(self, package_name: str) -> str: """Get install date from snap directory.""" snap_path = Path(f"/snap/{package_name}") if snap_path.exists(): try: stat_info = os.stat(snap_path) return datetime.fromtimestamp(stat_info.st_mtime).strftime("%Y-%m-%d") except OSError: pass return "Unknown" class LinuxFlatpakCollector(SoftwareCollector): """ Collects software information from Flatpak packages. Flatpak is another universal package format, similar to Snap. It's commonly used on various Linux distributions including Fedora. Flatpaks are typically installed in /var/lib/flatpak/. """ def collect(self) -> List[SoftwareInfo]: """ Collect installed Flatpak applications. Returns: List[SoftwareInfo]: List of installed Flatpak apps """ software_list: List[SoftwareInfo] = [] try: # List installed Flatpak applications (not runtimes) result = subprocess.run( [ "flatpak", "list", "--app", # Only applications, not runtimes "--columns=application,version,origin" ], capture_output=True, text=True, timeout=30 ) if result.returncode != 0: return [] for line in result.stdout.strip().split("\n"): if not line: continue parts = line.split("\t") if len(parts) >= 1: name = parts[0] version = parts[1] if len(parts) > 1 else "Unknown" origin = parts[2] if len(parts) > 2 else "Unknown" software_list.append(SoftwareInfo( name=name, version=version, publisher=origin, # Origin (e.g., "flathub") install_date="Unknown", install_location="/var/lib/flatpak", last_accessed="Unknown", source="flatpak" )) except FileNotFoundError: logger.debug("flatpak command not found") except subprocess.TimeoutExpired: logger.warning("flatpak list timed out") except Exception as e: logger.error(f"Error collecting Flatpak packages: {e}") return software_list # ============================================================================= # CROSS-PLATFORM COLLECTORS # ============================================================================= class PipPackageCollector(SoftwareCollector): """ Collects Python packages installed via pip. This works on both Windows and Linux. It lists packages installed in the current Python environment (which may be a virtualenv). Note: Running as admin/root may show different packages than running as a regular user, depending on installation locations. """ def collect(self) -> List[SoftwareInfo]: """ Collect Python packages from pip. Returns: List[SoftwareInfo]: List of installed Python packages """ software_list: List[SoftwareInfo] = [] try: # Use the current Python interpreter to ensure we get the right pip result = subprocess.run( [sys.executable, "-m", "pip", "list", "--format=json"], capture_output=True, text=True, timeout=60 ) if result.returncode == 0 and result.stdout.strip(): packages = json.loads(result.stdout) for pkg in packages: software_list.append(SoftwareInfo( name=f"Python: {pkg['name']}", version=pkg.get('version', 'Unknown'), publisher="PyPI (Python Package Index)", install_date="Unknown", install_location=self._get_package_location(pkg['name']), last_accessed="Unknown", source="pip (Python)" )) except FileNotFoundError: logger.debug("pip module not available") except json.JSONDecodeError as e: logger.error(f"Failed to parse pip output: {e}") except subprocess.TimeoutExpired: logger.warning("pip list timed out") except Exception as e: logger.error(f"Error collecting pip packages: {e}") return software_list def _get_package_location(self, package_name: str) -> str: """Get the installation location of a pip package.""" try: result = subprocess.run( [sys.executable, "-m", "pip", "show", package_name], capture_output=True, text=True, timeout=10 ) if result.returncode == 0: for line in result.stdout.split("\n"): if line.startswith("Location:"): return line.split(":", 1)[1].strip() except Exception: pass return "Python site-packages" # ============================================================================= # REPORT GENERATOR # ============================================================================= class ReportGenerator: """ Generates reports from collected software information. Supports multiple output formats: - Plain text (.txt) - Human-readable, good for quick review - CSV (.csv) - Spreadsheet-compatible, good for analysis - JSON (.json) - Machine-readable, good for automation - HTML (.html) - Web viewable, includes search functionality """ def __init__(self, software_list: List[SoftwareInfo], system_info: SystemInfo): """ Initialize the report generator. Args: software_list: List of SoftwareInfo objects to include system_info: SystemInfo object with system details """ # Sort software alphabetically by name (case-insensitive) self.software_list = sorted(software_list, key=lambda x: x.name.lower()) self.system_info = system_info # Update total packages count self.system_info.total_packages = len(self.software_list) def generate_all_reports(self, output_dir: Path, timestamp: str) -> Dict[str, Path]: """ Generate all report formats. Args: output_dir: Directory to save reports timestamp: Timestamp string for filenames Returns: Dictionary mapping format name to file path """ reports = {} # Generate each format text_path = output_dir / f"software_report_{timestamp}.txt" self.generate_text_report(text_path) reports["text"] = text_path csv_path = output_dir / f"software_report_{timestamp}.csv" self.generate_csv_report(csv_path) reports["csv"] = csv_path json_path = output_dir / f"software_report_{timestamp}.json" self.generate_json_report(json_path) reports["json"] = json_path html_path = output_dir / f"software_report_{timestamp}.html" self.generate_html_report(html_path) reports["html"] = html_path return reports def generate_text_report(self, output_path: Path) -> None: """ Generate a plain text report. The text report is formatted for easy reading with fixed-width columns. Args: output_path: Path to save the text report """ with open(output_path, 'w', encoding='utf-8') as f: # Report header with decorative borders f.write("=" * 120 + "\n") f.write(" " * 45 + "SOFTWARE INVENTORY REPORT\n") f.write("=" * 120 + "\n\n") # System information section f.write("SYSTEM INFORMATION\n") f.write("-" * 50 + "\n") f.write(f" Hostname: {self.system_info.hostname}\n") f.write(f" OS: {self.system_info.os_name}\n") f.write(f" OS Version: {self.system_info.os_version}\n") f.write(f" OS Release: {self.system_info.os_release}\n") f.write(f" Architecture: {self.system_info.architecture}\n") f.write(f" Python Version: {self.system_info.python_version}\n") f.write(f" Report Date: {self.system_info.report_date}\n") f.write(f" Total Packages: {self.system_info.total_packages}\n") f.write("\n") # Software list section f.write("INSTALLED SOFTWARE\n") f.write("-" * 120 + "\n") # Column headers header = f"{'Name':<50} {'Version':<25} {'Install Date':<15} {'Source':<25}\n" f.write(header) f.write("-" * 120 + "\n") # Write each software entry for software in self.software_list: # Truncate long names/versions to fit columns name = (software.name[:47] + "...") if len(software.name) > 50 else software.name version = (software.version[:22] + "...") if len(software.version) > 25 else software.version source = (software.source[:22] + "...") if len(software.source) > 25 else software.source f.write(f"{name:<50} {version:<25} {software.install_date:<15} {source:<25}\n") # Report footer f.write("\n" + "=" * 120 + "\n") f.write(f"Total Software Packages: {len(self.software_list)}\n") f.write(f"Report generated on: {self.system_info.report_date}\n") f.write("=" * 120 + "\n") def generate_csv_report(self, output_path: Path) -> None: """ Generate a CSV report for spreadsheet analysis. CSV format is ideal for: - Opening in Excel, Google Sheets, LibreOffice Calc - Importing into databases - Data analysis with pandas Args: output_path: Path to save the CSV report """ with open(output_path, 'w', newline='', encoding='utf-8') as f: # Define the CSV fields (column headers) fieldnames = [ 'name', 'version', 'publisher', 'install_date', 'install_location', 'last_accessed', 'source' ] writer = csv.DictWriter(f, fieldnames=fieldnames) # Write header row writer.writeheader() # Write each software entry for software in self.software_list: # asdict() converts dataclass to dictionary writer.writerow(asdict(software)) def generate_json_report(self, output_path: Path) -> None: """ Generate a JSON report for programmatic processing. JSON format is ideal for: - API consumption - Further processing with scripts - Integration with other tools Args: output_path: Path to save the JSON report """ # Create the complete report structure report = { "metadata": { "report_type": "Software Inventory", "generated_by": "Software Inventory Report Generator", "version": "1.0.0" }, "system_info": asdict(self.system_info), "software_count": len(self.software_list), "software": [asdict(s) for s in self.software_list] } with open(output_path, 'w', encoding='utf-8') as f: # indent=2 makes the output human-readable # ensure_ascii=False allows unicode characters json.dump(report, f, indent=2, ensure_ascii=False) def generate_html_report(self, output_path: Path) -> None: """ Generate an HTML report for web viewing. The HTML report includes: - Responsive design (works on desktop and mobile) - Client-side search functionality - Sortable columns (basic) - Print-friendly styling Args: output_path: Path to save the HTML report """ # HTML template with embedded CSS and JavaScript html_content = f"""<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Software Inventory Report - {self.system_info.hostname}</title> <style> /* Reset and base styles */ * {{ box-sizing: border-box; margin: 0; padding: 0; }} body {{ font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, sans-serif; line-height: 1.6; color: #333; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); min-height: 100vh; padding: 20px; }} .container {{ max-width: 1400px; margin: 0 auto; background: white; border-radius: 12px; box-shadow: 0 20px 60px rgba(0, 0, 0, 0.3); overflow: hidden; }} /* Header section */ .header {{ background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%); color: white; padding: 30px; text-align: center; }} .header h1 {{ font-size: 2.5rem; margin-bottom: 10px; }} .header p {{ opacity: 0.8; font-size: 1.1rem; }} /* System info section */ .system-info {{ background: #f8f9fa; padding: 25px 30px; display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 15px; border-bottom: 1px solid #e9ecef; }} .system-info-item {{ background: white; padding: 15px; border-radius: 8px; box-shadow: 0 2px 4px rgba(0, 0, 0, 0.05); }} .system-info-item strong {{ display: block; color: #6c757d; font-size: 0.85rem; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 5px; }} .system-info-item span {{ font-size: 1.1rem; color: #212529; }} /* Stats section */ .stats {{ background: #007bff; color: white; padding: 20px 30px; text-align: center; font-size: 1.2rem; }} .stats strong {{ font-size: 2rem; display: block; }} /* Search and controls */ .controls {{ padding: 20px 30px; background: white; border-bottom: 1px solid #e9ecef; display: flex; gap: 15px; flex-wrap: wrap; align-items: center; }} .search-box {{ flex: 1; min-width: 250px; }} .search-box input {{ width: 100%; padding: 12px 20px; border: 2px solid #e9ecef; border-radius: 25px; font-size: 1rem; transition: border-color 0.3s, box-shadow 0.3s; }} .search-box input:focus {{ outline: none; border-color: #007bff; box-shadow: 0 0 0 3px rgba(0, 123, 255, 0.25); }} .filter-buttons {{ display: flex; gap: 8px; flex-wrap: wrap; }} .filter-btn {{ padding: 8px 16px; border: 2px solid #e9ecef; background: white; border-radius: 20px; cursor: pointer; transition: all 0.3s; font-size: 0.9rem; }} .filter-btn:hover, .filter-btn.active {{ background: #007bff; border-color: #007bff; color: white; }} /* Table section */ .table-container {{ overflow-x: auto; padding: 0 30px 30px; }} table {{ width: 100%; border-collapse: collapse; margin-top: 20px; }} th {{ background: #212529; color: white; padding: 15px 12px; text-align: left; font-weight: 600; text-transform: uppercase; font-size: 0.85rem; letter-spacing: 0.5px; position: sticky; top: 0; cursor: pointer; }} th:hover {{ background: #343a40; }} td {{ padding: 12px; border-bottom: 1px solid #e9ecef; vertical-align: top; }} tr:hover td {{ background: #f8f9fa; }} tr:nth-child(even) td {{ background: #fafafa; }} tr:nth-child(even):hover td {{ background: #f0f0f0; }} .name-cell {{ font-weight: 500; color: #212529; }} .version-cell {{ font-family: 'Courier New', monospace; color: #6c757d; }} .source-badge {{ display: inline-block; padding: 4px 10px; border-radius: 12px; font-size: 0.8rem; font-weight: 500; }} .source-windows {{ background: #e3f2fd; color: #1565c0; }} .source-dpkg {{ background: #fff3e0; color: #ef6c00; }} .source-rpm {{ background: #fce4ec; color: #c2185b; }} .source-snap {{ background: #e8f5e9; color: #2e7d32; }} .source-flatpak {{ background: #f3e5f5; color: #7b1fa2; }} .source-pip {{ background: #e8eaf6; color: #3949ab; }} .source-appx {{ background: #e0f7fa; color: #00838f; }} /* Footer */ .footer {{ background: #f8f9fa; padding: 20px 30px; text-align: center; color: #6c757d; border-top: 1px solid #e9ecef; }} /* Print styles */ @media print {{ body {{ background: white; padding: 0; }} .container {{ box-shadow: none; }} .controls {{ display: none; }} th {{ background: #333 !important; -webkit-print-color-adjust: exact; }} }} /* Responsive */ @media (max-width: 768px) {{ .header h1 {{ font-size: 1.8rem; }} .system-info {{ grid-template-columns: 1fr; }} .controls {{ flex-direction: column; }} .search-box {{ width: 100%; }} }} /* Counter display */ #visibleCount {{ padding: 10px 20px; background: #e9ecef; border-radius: 20px; font-weight: 500; }} </style> </head> <body> <div class="container"> <div class="header"> <h1>📦 Software Inventory Report</h1> <p>Generated on {self.system_info.report_date}</p> </div> <div class="system-info"> <div class="system-info-item"> <strong>Hostname</strong> <span>{self._escape_html(self.system_info.hostname)}</span> </div> <div class="system-info-item"> <strong>Operating System</strong> <span>{self._escape_html(self.system_info.os_name)}</span> </div> <div class="system-info-item"> <strong>OS Version</strong> <span>{self._escape_html(self.system_info.os_version[:50])}</span> </div> <div class="system-info-item"> <strong>Architecture</strong> <span>{self._escape_html(self.system_info.architecture)}</span> </div> <div class="system-info-item"> <strong>Python Version</strong> <span>{self._escape_html(self.system_info.python_version)}</span> </div> </div> <div class="stats"> <strong>{self.system_info.total_packages:,}</strong> Total Software Packages Installed </div> <div class="controls"> <div class="search-box"> <input type="text" id="searchInput" placeholder="🔍 Search software by name, version, or publisher..."> </div> <div class="filter-buttons"> <button class="filter-btn active" data-filter="all">All</button> {self._generate_filter_buttons()} </div> <span id="visibleCount">{len(self.software_list)} items</span> </div> <div class="table-container"> <table id="softwareTable"> <thead> <tr> <th onclick="sortTable(0)">Name ↕</th> <th onclick="sortTable(1)">Version ↕</th> <th onclick="sortTable(2)">Publisher ↕</th> <th onclick="sortTable(3)">Install Date ↕</th> <th onclick="sortTable(4)">Last Accessed ↕</th> <th onclick="sortTable(5)">Source ↕</th> </tr> </thead> <tbody> {self._generate_table_rows()} </tbody> </table> </div> <div class="footer"> <p>Software Inventory Report Generator v1.0.0</p> <p>Report generated on {self.system_info.report_date}</p> </div> </div> <script> // Search functionality document.getElementById('searchInput').addEventListener('keyup', function() {{ filterTable(); }}); // Filter buttons document.querySelectorAll('.filter-btn').forEach(btn => {{ btn.addEventListener('click', function() {{ document.querySelectorAll('.filter-btn').forEach(b => b.classList.remove('active')); this.classList.add('active'); filterTable(); }}); }}); function filterTable() {{ const searchTerm = document.getElementById('searchInput').value.toLowerCase(); const activeFilter = document.querySelector('.filter-btn.active').dataset.filter; const rows = document.querySelectorAll('#softwareTable tbody tr'); let visibleCount = 0; rows.forEach(row => {{ const text = row.textContent.toLowerCase(); const source = row.dataset.source || ''; const matchesSearch = text.includes(searchTerm); const matchesFilter = activeFilter === 'all' || source.toLowerCase().includes(activeFilter.toLowerCase()); if (matchesSearch && matchesFilter) {{ row.style.display = ''; visibleCount++; }} else {{ row.style.display = 'none'; }} }}); document.getElementById('visibleCount').textContent = visibleCount + ' items'; }} // Sort functionality let sortDirection = {{}}; function sortTable(columnIndex) {{ const table = document.getElementById('softwareTable'); const tbody = table.querySelector('tbody'); const rows = Array.from(tbody.querySelectorAll('tr')); sortDirection[columnIndex] = !sortDirection[columnIndex]; const direction = sortDirection[columnIndex] ? 1 : -1; rows.sort((a, b) => {{ const aValue = a.cells[columnIndex].textContent.trim().toLowerCase(); const bValue = b.cells[columnIndex].textContent.trim().toLowerCase(); return aValue.localeCompare(bValue) * direction; }}); rows.forEach(row => tbody.appendChild(row)); }} </script> </body> </html>""" with open(output_path, 'w', encoding='utf-8') as f: f.write(html_content) def _generate_filter_buttons(self) -> str: """Generate filter buttons based on unique sources in the data.""" sources = set(s.source for s in self.software_list) buttons = [] for source in sorted(sources): # Create a short label for the button short_label = source.split()[0] if source else "Other" buttons.append(f'<button class="filter-btn" data-filter="{short_label}">{short_label}</button>') return "\n ".join(buttons) def _generate_table_rows(self) -> str: """Generate HTML table rows for all software.""" rows = [] for s in self.software_list: # Determine source badge class source_class = self._get_source_class(s.source) row = f""" <tr data-source="{self._escape_html(s.source)}"> <td class="name-cell">{self._escape_html(s.name)}</td> <td class="version-cell">{self._escape_html(s.version)}</td> <td>{self._escape_html(s.publisher[:50])}</td> <td>{s.install_date}</td> <td>{s.last_accessed}</td> <td><span class="source-badge {source_class}">{self._escape_html(s.source)}</span></td> </tr>""" rows.append(row) return "\n".join(rows) def _get_source_class(self, source: str) -> str: """Get CSS class for source badge based on source type.""" source_lower = source.lower() if 'windows' in source_lower or 'registry' in source_lower: return 'source-windows' elif 'dpkg' in source_lower or 'debian' in source_lower: return 'source-dpkg' elif 'rpm' in source_lower: return 'source-rpm' elif 'snap' in source_lower: return 'source-snap' elif 'flatpak' in source_lower: return 'source-flatpak' elif 'pip' in source_lower or 'python' in source_lower: return 'source-pip' elif 'appx' in source_lower or 'store' in source_lower: return 'source-appx' else: return '' def _escape_html(self, text: str) -> str: """ Escape HTML special characters to prevent XSS. Args: text: The text to escape Returns: HTML-safe text """ return (str(text) .replace("&", "&") .replace("<", "<") .replace(">", ">") .replace('"', """) .replace("'", "'")) # ============================================================================= # MAIN ORCHESTRATION # ============================================================================= def collect_system_info() -> SystemInfo: """ Collect information about the current system. Returns: SystemInfo object with system details """ return SystemInfo( hostname=platform.node(), os_name=platform.system(), os_version=platform.version(), os_release=platform.release(), architecture=platform.machine(), python_version=platform.python_version(), report_date=datetime.now().strftime("%Y-%m-%d %H:%M:%S") ) def get_collectors_for_platform() -> List[SoftwareCollector]: """ Get the appropriate collectors for the current operating system. Returns: List of collector instances for this platform """ collectors: List[SoftwareCollector] = [] current_os = platform.system() if current_os == "Windows": # Windows collectors if WINREG_AVAILABLE: collectors.append(WindowsRegistryCollector()) logger.info("✓ Windows Registry collector enabled") if WMI_AVAILABLE: collectors.append(WindowsWMICollector()) logger.info("✓ Windows WMI collector enabled") else: logger.warning("⚠ WMI not available (install with: pip install wmi pywin32)") collectors.append(WindowsAppxCollector()) logger.info("✓ Windows Store (Appx) collector enabled") else: # Linux collectors collectors.append(LinuxDpkgCollector()) logger.info("✓ dpkg (Debian/Ubuntu) collector enabled") collectors.append(LinuxRpmCollector()) logger.info("✓ rpm (Red Hat/CentOS/Fedora) collector enabled") collectors.append(LinuxSnapCollector()) logger.info("✓ Snap collector enabled") collectors.append(LinuxFlatpakCollector()) logger.info("✓ Flatpak collector enabled") # Cross-platform collectors collectors.append(PipPackageCollector()) logger.info("✓ pip (Python packages) collector enabled") return collectors def collect_all_software(collectors: List[SoftwareCollector]) -> List[SoftwareInfo]: """ Run all collectors and combine results. Args: collectors: List of collector instances to run Returns: Combined list of all software found """ all_software: List[SoftwareInfo] = [] for collector in collectors: try: logger.info(f"Running {collector.get_name()} collector...") software = collector.collect() all_software.extend(software) logger.info(f" → Found {len(software)} packages") except Exception as e: logger.error(f" ✗ Error in {collector.get_name()}: {e}") return all_software def deduplicate_software(software_list: List[SoftwareInfo]) -> List[SoftwareInfo]: """ Remove duplicate software entries based on name. When duplicates are found, we keep the one with more complete information. Args: software_list: List with potential duplicates Returns: Deduplicated list """ seen: Dict[str, SoftwareInfo] = {} for software in software_list: # Normalize the name for comparison normalized_name = software.name.lower().strip() if normalized_name not in seen: seen[normalized_name] = software else: # Keep the entry with more complete information existing = seen[normalized_name] # Count non-"Unknown" fields as a quality score def quality_score(s: SoftwareInfo) -> int: score = 0 if s.version != "Unknown": score += 1 if s.publisher != "Unknown": score += 1 if s.install_date != "Unknown": score += 1 if s.install_location != "Unknown": score += 1 if s.last_accessed != "Unknown": score += 1 return score if quality_score(software) > quality_score(existing): seen[normalized_name] = software return list(seen.values()) def parse_arguments() -> argparse.Namespace: """ Parse command-line arguments. Returns: Parsed arguments namespace """ parser = argparse.ArgumentParser( description="Generate a comprehensive software inventory report", formatter_class=argparse.RawDescriptionHelpFormatter, epilog=""" Examples: python software_report.py python software_report.py --output-dir /path/to/reports python software_report.py --format json python software_report.py --include-pip --exclude-system """ ) parser.add_argument( '--output-dir', '-o', type=str, default='software_reports', help='Directory to save reports (default: software_reports)' ) parser.add_argument( '--format', '-f', type=str, choices=['all', 'txt', 'csv', 'json', 'html'], default='all', help='Output format (default: all)' ) parser.add_argument( '--quiet', '-q', action='store_true', help='Suppress informational output' ) parser.add_argument( '--verbose', '-v', action='store_true', help='Enable verbose/debug output' ) return parser.parse_args() def main(): """ Main entry point for the software inventory report generator. This function orchestrates the entire process: 1. Parse command-line arguments 2. Collect system information 3. Initialize appropriate collectors for the OS 4. Collect software from all sources 5. Deduplicate results 6. Generate reports in all formats """ # Parse command-line arguments args = parse_arguments() # Configure logging based on arguments if args.quiet: logging.getLogger().setLevel(logging.WARNING) elif args.verbose: logging.getLogger().setLevel(logging.DEBUG) # Print header print("\n" + "=" * 70) print(" SOFTWARE INVENTORY REPORT GENERATOR") print("=" * 70) # Collect system information system_info = collect_system_info() print(f"\n📍 System: {system_info.hostname}") print(f"💻 OS: {system_info.os_name} {system_info.os_release}") print(f"🏗️ Architecture: {system_info.architecture}") print(f"🐍 Python: {system_info.python_version}") print(f"📅 Date: {system_info.report_date}") # Check for admin/root privileges print("\n🔐 Checking privileges...") if platform.system() == "Windows": try: import ctypes is_admin = ctypes.windll.shell32.IsUserAnAdmin() != 0 except Exception: is_admin = False else: is_admin = os.geteuid() == 0 if is_admin: print(" ✓ Running with administrator/root privileges") else: print(" ⚠ Not running as admin/root - some software may not be detected") # Initialize collectors print("\n📦 Initializing collectors...") collectors = get_collectors_for_platform() # Collect software print("\n🔍 Scanning for installed software...") all_software = collect_all_software(collectors) # Deduplicate print(f"\n🔄 Deduplicating results...") print(f" Before: {len(all_software)} packages") unique_software = deduplicate_software(all_software) print(f" After: {len(unique_software)} unique packages") # Create output directory output_dir = Path(args.output_dir) output_dir.mkdir(parents=True, exist_ok=True) # Generate timestamp for filenames timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") # Generate reports print(f"\n📝 Generating reports in '{output_dir}'...") report_generator = ReportGenerator(unique_software, system_info) if args.format == 'all': reports = report_generator.generate_all_reports(output_dir, timestamp) for format_name, path in reports.items(): print(f" ✓ {format_name.upper()}: {path}") else: # Generate single format path = output_dir / f"software_report_{timestamp}.{args.format}" if args.format == 'txt': report_generator.generate_text_report(path) elif args.format == 'csv': report_generator.generate_csv_report(path) elif args.format == 'json': report_generator.generate_json_report(path) elif args.format == 'html': report_generator.generate_html_report(path) print(f" ✓ {args.format.upper()}: {path}") # Summary print("\n" + "=" * 70) print(" ✅ REPORT GENERATION COMPLETE") print("=" * 70) print(f"\n📊 Summary:") print(f" • Total packages found: {len(unique_software)}") print(f" • Reports saved to: {output_dir.absolute()}") print(f" • Report timestamp: {timestamp}") print("\n") # ============================================================================= # SCRIPT ENTRY POINT # ============================================================================= if __name__ == "__main__": """ This block only runs when the script is executed directly, not when it's imported as a module. """ try: main() except KeyboardInterrupt: print("\n\n⚠️ Operation cancelled by user") sys.exit(1) except Exception as e: logger.exception(f"Unexpected error: {e}") sys.exit(1)
Which is pretty impressive, it then produced the following:
Key Features
🖥️ Cross-Platform Support
- Windows: Registry scanning, WMI queries, Windows Store apps
- Linux: dpkg, rpm, Snap, Flatpak package managers
📊 Multiple Output Formats
- TXT: Human-readable text report
- CSV: Spreadsheet-compatible
- JSON: Machine-readable for automation
- HTML: Interactive web report with search/filter
Installation & Usage
# Install optional Windows dependencies (Windows only) pip install wmi pywin32 # Run on Windows (as Administrator) python software_report.py # Run on Linux (as root) sudo python3 software_report.py # Command-line options python software_report.py --help python software_report.py --output-dir /custom/path python software_report.py --format json python software_report.py --verbose
Output Example
Reports are saved to ./software_reports/ with timestamped filenames:
software_report_20241215_143022.txtsoftware_report_20241215_143022.csvsoftware_report_20241215_143022.jsonsoftware_report_20241215_143022.html
The HTML report includes an interactive search and filter functionality that works entirely in the browser!
And you know what, it worked!
As I say, the devil is in the details. When I get time, I'll step through the code, who knows I might also learn something! but also, I'll see if this actually helps me or not. If it doesn't I'm sure I can just write another prompt to get it to change the code.
let's see. The future is changing & it looks like it's time to adapt & evolve.
It is also time to move myself back into doing work with ROBOTICS & AI....
Comments
Post a Comment