Python Forum
Next/Prev file without loading all filenames
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Next/Prev file without loading all filenames
#1
I'm writing an image viewer and need a way to get the next/previous image in a folder. I've cobbled together an approach based on various searches of the internet and documentation, but I don't really like it.

My existing solution depends on using os.listdir to load all the filenames in the directory into memory, then sorting that list. This is slow for large directories, especially when initially creating the list. It seems like every approach to doing this in Python requires loading and sorting all the filenames in one form or another.

Is there not a way to do this kind of query on a lower level, without having to store and sort the paths in Python itself? Maybe via some extended filesystem library? I do need this to be cross-platform.


Here is what I have now, for reference (this is part of a larger program, so some things are referenced that aren't present):

    def applyFileListSort(self, sort):
        self.sort_type = sort

        if not self.file_list:
            return

        if sort is SortType.ALPHABETICAL:
            self.file_list.sort(key=cmp_to_key(locale.strcoll))
        elif sort is SortType.MODIFIED:
            self.file_list.sort(key=os.path.getmtime)
        else:
            raise ValueError("Unhandled sorting")

    def getFileList(self):
        if not self.file_list:
            self.file_list = []
            dir = os.path.dirname(self.getFilename())
            for file in os.listdir(path=dir):
                if os.path.splitext(file)[1].lower() in SUPPORTED_EXTENSIONS:
                    self.file_list.append(os.path.join(dir, file))

            self.applyFileListSort(self.sort_type)

        return self.file_list


    def changeImage(self, offset):
        file_list = self.getFileList()
        cur_i = file_list.index(self.getFilename())
        new_filename = file_list[(cur_i + offset) % len(file_list)]
        self.openFile(new_filename)
Reply
#2
You can use os.scandir() instead of os.listdir(), which doesn't load the list.
« We can solve any problem by introducing an extra level of indirection »
Reply
#3
Not too sure what you want exactly.

I think you don't want a long list in memory, but that is only text, won't take up too much space.
The pictures won't remain open, just 1 pic at a time, close it before opening the next pic.

from pathlib import Path

mydir = Path('/home/pedro/Pictures/')
# only get files, not directories
filelist = [filename for filename in mydir.iterdir() if filename.is_file()]

for filename in filelist:
    print(f"\nfilename: {filename.name}")
    print(f"file suffix: {filename.suffix}")

# get the ending like gif or jpg
def getSuffix(filename):
    ending = filename.suffix[1:]
    return ending

# sort the files according to the endings
# could also save files in folders according to the endings
files = sorted(filelist, key=getSuffix)
Using yield and next you can get the pics 1 at a time, but you can't go back, unless you save the name of the pic, which would entail making a list!

def showDetails():
    filelist = [filename for filename in mydir.iterdir() if filename.is_file()]
    yield (filename, filename.suffix)

f = showDetails()
# get information about the file
details = next(f)
# get information from all files in filelist
for i in f:
    print(i)
Reply
#4
Here as a class with type hints. mypy does not complain :-)
This loads the whole directory content for the first time, and then only if the sort_function has been changed.
The __init__ method makes the object iterable.
Index access can be used on this object. __getitem__ allows it.

Without type hints it's lesser code. I do this, to be able to check my code.


from pathlib import Path
from typing import Any, Callable, Iterable


class Images:
    def __init__(
        self,
        root: str | Path,
        filetypes: Iterable[str],
        sort_func: Callable[[Path], Any] | None = None,
    ):
        self.root = Path(root)
        self.filetypes = filetypes
        self.images: list[Path] = []
        self._sort_func: Callable[[Path], Any] | None = sort_func
        self.pointer: int | None = None
        self.scan_dir()

    @property
    def sort_func(self) -> Callable[[Path], Any] | None:
        return self._sort_func

    @sort_func.setter
    def sort_func(self, func: Callable[[Path], Any] | None) -> None:
        self._sort_func = func
        self.scan_dir()

    def _iter_dir(self):
        return (
            file
            for file in self.root.iterdir()
            if file.is_file() and file.suffix.lower() in self.filetypes
        )

    def scan_dir(self):
        self.images = sorted(self._iter_dir(), key=self.sort_func)

    def _xt(self, direction: int) -> Path | None:
        if not len(self):
            return None

        if self.pointer is None:
            self.pointer = 0
        else:
            self.pointer += direction
            self.pointer %= len(self.images)
        return self.images[self.pointer]

    def next(self) -> Path | None:
        return self._xt(1)

    def prev(self) -> Path | None:
        return self._xt(-1)

    def __getitem__(self, index: int) -> Path:
        return self.images[index]

    def __iter__(self):
        yield from self.images

    def __len__(self) -> int:
        return len(self.images)


def sort_by_size(path: Path) -> int:
    return path.stat().st_size


img = Images(r"C:\Users\XXX\Pictures", (".png", ".jpg"), sort_by_size)

print(list(img))
snippsat, WilliamKappler, Pedroski55 like this post
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#5
I don't think the problem has anything to do with the posted code. I created a folder with 10,000 files and ran your code. It took 0,0415 seconds to load all the filenames into a list and 0,004 seconds to sort the files in alphabetical order. Sorting by last modified takes 100 times longer, 0.4 seconds.

Do you have more than 10,000 image files in your folder?

I also tried using pathlib.glob() to get all the image files. This was faster than looping through os.listdir, taking only 0.003 seconds to load the files.
WilliamKappler likes this post
Reply
#6
So, what I was hoping was that there was a lower level operation I could use that wouldn't (may not?) actually require storing all the filenames, sorting them, searching them to move to the next one, so on. However, it seems like that doesn't exist and even when it does in something like C/C++, is highly operating system dependent.

With that said, I went further into this and I think my original problem was more user error on my part with how I was testing this. I was unknowingly running this on quite a slow device and a lot of what I was attributing to this file search process was due to a large image that kept making my test look slower than it really was. It turns out, it is pretty fast, definitely enough I'm not worried anymore.

I appreciate the responses and apologize this was somewhat a non-issue in the end. Also going to take a closer look at DeaD_EyE's suggestion to help clean up this process.

Thanks a lot!
Reply
#7
(Apr-11-2024, 03:32 PM)WilliamKappler Wrote: So, what I was hoping was that there was a lower level operation
os.scandir is a lower level operation. It doesn't store all the filenames.
« We can solve any problem by introducing an extra level of indirection »
Reply
#8
Quote:os.scandir is a lower level operation. It doesn't store all the filenames.
It also doesn't sort the filenames alphabetically or by modification date.

You could always write your own scanning/sorting external function and call it from python, or run shell script, but I really don't think long delays are related to getting and sorting the files. A program that displays one image at a time using prev/next is not appropriate for sifting through thousands of images. Even if there are tens of thousands of images it takes less than a second to get all the files and sort them. If there is a lengthy delay in the OP's program I would look elsewhere for the source of that delay.
Reply
#9
(Apr-08-2024, 07:24 AM)Pedroski55 Wrote: Not too sure what you want exactly.

I think you don't want a long list in memory, but that is only text, won't take up too much space.
The pictures won't remain open, just 1 pic at a time, close it before opening the next pic.

from pathlib import Path

mydir = Path('/home/pedro/Pictures/')
# only get files, not directories
filelist = [filename for filename in mydir.iterdir() if filename.is_file()]

for filename in filelist:
    print(f"\nfilename: {filename.name}")
    print(f"file suffix: {filename.suffix}")

# get the ending like gif or jpg
def getSuffix(filename):
    ending = filename.suffix[1:]
    return ending

# sort the files according to the endings
# could also save files in folders according to the endings
files = sorted(filelist, key=getSuffix)
Using yield and next you can get the pics 1 at a time, but you can't go back, unless you save the name of the pic, which would entail making a list!

def showDetails(): [url=https://pmkisanyojanastatus.com/]PM Kisan Status[/url]
    filelist = [filename for filename in mydir.iterdir() if filename.is_file()]
    yield (filename, filename.suffix)

f = showDetails()
# get information about the file
details = next(f)
# get information from all files in filelist
for i in f:
    print(i)
Thanks for share good iinformation.
Reply
#10
Just as a comparison, you can make a generator and a list, then compare the sizes.

The generator is tiny!

from pathlib import Path
import sys

mydir = Path('/home/pedro/Pictures/')
# only get files, not directories
# make a list
file_list = [filename for filename in mydir.iterdir() if filename.is_file()]

# make a generator, very small in memory
filelist = (filename for filename in mydir.iterdir() if filename.is_file())
for filename in filelist:
    print(f"\nfilename: {filename.name}")
    print(f"file suffix: {filename.suffix}")

sys.getsizeof(filelist) # returns 104
sys.getsizeof(file_list) # returns 1656
type(filelist) # returns <class 'generator'>
type(file_list) # returns <class 'list'>
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Very simple question about filenames and backslashes! garynewport 4 2,034 Jan-17-2023, 05:02 AM
Last Post: deanhystad
  select Eof extension files based on text list of filenames with if condition RolanRoll 1 1,559 Apr-04-2022, 09:29 PM
Last Post: Larz60+
  Loading large .csv file with pandas hangejj 2 2,429 Jun-08-2020, 01:32 AM
Last Post: hangejj
  Getting a list of filenames in a directory DavidHT 2 9,173 Feb-03-2020, 06:56 AM
Last Post: DavidHT
  How to double quote filenames (spaces)? Winfried 2 3,514 Jan-25-2020, 09:39 PM
Last Post: Winfried
  Importing/loading a library file (.so) tomasby 0 2,103 Aug-24-2019, 08:13 PM
Last Post: tomasby
  Help loading INT from a file. lovepeace 1 1,920 Apr-13-2019, 12:55 PM
Last Post: ichabod801
  Getting error while loading excel(.xlsx) file using openpyxl module shubhamjainj 1 9,029 Mar-01-2019, 01:05 PM
Last Post: buran
  error creating new object after loading pickled objects from file arogers 2 3,493 Feb-02-2019, 10:43 AM
Last Post: Larz60+
  Escaping whitespace and parenthesis in filenames jehoshua 2 9,810 Mar-21-2018, 09:12 AM
Last Post: jehoshua

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020