Python Forum
Loop through all files in a directory?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Loop through all files in a directory?
#1
Hello,

I need to find all the files located under a given directory, including all the sub-directories it may contain.

Is this a/the right way to do it?

import os
import glob

for filename in glob.iglob(r".\parent" '**/**', recursive=True):
  #ignore sub-directories, just grab files
  if not os.path.isdir(filename):
    print(filename)
Thank you.
Reply
#2
You could use
import os
root = '.'
files = [os.path.join(d, f) for d, _, files in os.walk(root) for f in files]
print(files)
« We can solve any problem by introducing an extra level of indirection »
Reply
#3
Use Path from pathlib module.
This is the modern way to handle paths.

Mostly all methods, which are functions in os and os.path for path handling, are attached to the Path object. Some are missing, but it's very useful to handle paths across different operating systems.

Path.home()
returns a PosixPath of the home directory on Linux.
On Windows it returns a WindowsPath, which points to the right home directory.
I think macOS is similar to Linux.

from collections.abc import Generator
from pathlib import Path


def get_files(root: str | Path) -> Generator[Path, None, None]:
    """
    Generator, which recursively yields all files of a directory
    """
    for path in Path(root).rglob("*"):
        if path.is_file():
            yield path


all_files = list(get_files(Path.home().joinpath("Downloads/")))
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#4
Thank you.

(Apr-22-2024, 07:10 PM)Gribouillis Wrote:
files = [os.path.join(d, f) for d, _, files in os.walk(root) for f in files]

I'm not used to one-liners. Am I correct in understanding it's the equivalent of the following?

def somefunc:
  for d, _, files in os.walk(root)
    for f in files
      return os.path.join(d, f)
(Apr-22-2024, 07:17 PM)DeaD_EyE Wrote:
from collections.abc import Generator
from pathlib import Path

def get_files(root: str | Path) -> Generator[Path, None, None]:
    """
    Generator, which recursively yields all files of a directory
    """
    for path in Path(root).rglob("*"):
        if path.is_file():
            yield path

all_files = list(get_files(Path.home().joinpath("Downloads/")))

I've never seen that syntax (def get_files(root: str | Path) -> Generator[Path, None, None]:) . I'll have to do some read up.
Reply
#5
I agree with Dead_EyE, Path is very useful, because it will work for different operating systems and it has other tricks up its sleeve!

if filename.is_file() means you won't pick up directories or their contents.

from pathlib import Path
import sys 
 
# mydir looks like: PosixPath('/home/pedro/temp')
mydir = Path('/home/pedro/temp')
# create a generator: filelist <generator object <genexpr> at 0x7ad729970900>
filelist = (filename for filename in mydir.iterdir() if filename.is_file())
# create a list of files
file_list = [filename for filename in mydir.iterdir() if filename.is_file()]

# have a look at the files
for f in filelist:
    print(f)

# use Path to look at the files
# need to recreate the generator
filelist = (filename for filename in mydir.iterdir() if filename.is_file())

for filename in filelist:
    print(f"\nfilename: {filename.name}")
    print(f"file suffix: {filename.suffix}")
    print(f"full path: {filename.resolve()}")
    print(f"filepath parts: {filename.parts}")
The advantage of the generator is size. Imagine you had millions of files. The list would be very big in memory. The generator is tiny!

sys.getsizeof(filelist) # returns 104
sys.getsizeof(file_list) # returns 472 nearly 5 times bigger than filelist
Reply
#6
(Apr-23-2024, 02:02 AM)Winfried Wrote: I'm not used to one-liners. Am I correct in understanding it's the equivalent of the following?
It is not exactly equivalent because the one-liner here produces a list, so it would be
def somefunc(root):
    result = []
    for d, _, files in os.walk(root):
        for f in files:
            result.append(os.path.join(d, f))
    return result
You could also write a generator and also you can have it produce Path instances if you want
import os
from pathlib import Path

def somefunc(root):
    for d, _, files in os.walk(root):
        p = Path(d)
        for f in files:
            yield p/f
The same generator as a one-liner
files = (p/f for d, _, files in os.walk('.') for p in (Path(d),) for f in files)
Advice: use 4 spaces to indent Python code. Better: use the black utility to format your code automatically.
« We can solve any problem by introducing an extra level of indirection »
Reply
#7
Thanks much. The list I'll work on is tiny enough that I don't need a generator, but it's nice to know that it's available. Ditto for one liners and function annotations.
Reply
#8
(Apr-23-2024, 06:17 AM)Gribouillis Wrote: You could also write a generator and also you can have it produce Path instances if you want
import os
from pathlib import Path
 
def somefunc(root):
    for d, _, files in os.walk(root):
        p = Path(d)
        for f in files:
            yield p/f
To mix ios and pathlib may be more confusing than it need to be,this dos just the same with pathlib alone.
from pathlib import Path

def somefunc(root):
    for path in Path(dest).rglob('*'):
        if path.is_file():
            yield path

In it's simplest form Winfried a strip down version of what DeaD_EyE code dos.
So this will recursively scan for all .txt in folder Test,all files would be rglob('*').
from pathlib import Path

dest = r'C:\Test'
for path in Path(dest).rglob('*.txt'):
    if path.is_file():
        print(path)
Quote:I've never seen that syntax (def get_files(root: str | Path) -> Generator[Path, None, None]:) . I'll have to do some read up.
Look at Support for type hints,so it can make code clearer what it take as input and expected output.
Can work as better documentation of code and also show this in Editors autocompletion when eg mouse over or use help what get_files dos.
It have no impact running it with Python,has to use eg Mypy to do a static type check.
So as a example when i say not needed,this will work fine.
from pathlib import Path

def get_files(root):
    """
    Generator, which recursively yields all files of a directory
    """
    for path in Path(root).rglob("*.txt"):
        if path.is_file():
            yield path

dest = r'C:\Test'
for path in get_files(dest):
    print(path)
So less code,but also lose information of what root can take as input eg it can take both str or Path as input.
Type hint is no used as standard in many biggest workplaces that use Python,also in many 3-party libraries.
Eg FastAPI
Doc Wrote:FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.8+ based on standard Python type hints.
Reply
#9
(Apr-23-2024, 08:11 AM)snippsat Wrote: To mix ios and pathlib may be more confusing than it need to be,this dos just the same with pathlib alone.
What I don't like with the pathlib solution is the call to .is_file() for every path, which does a system call while os.walk() produces the list of files with internal mechanism. This needs to be checked but I suspect that os.walk() does less system calls.
« We can solve any problem by introducing an extra level of indirection »
Reply
#10
TypeHints are not required, but they help the developers of libraries to communicate what a method/function is expecting and what it should return.

Quote:What I don't like with the pathlib solution is the call to .is_file() for every path, which does a system call while os.walk() produces the list of files with internal mechanism. This needs to be checked but I suspect that os.walk() does less system calls.

It does lesser calls and is faster.

Output:
public@ganymed:~$ for script in walk?.py; do echo -n "$script: "; strace -e newfstatat python3 $script 2>&1 | wc -l; done walk1.py: 27556 walk2.py: 129893 walk3.py: 11484
Output:
public@ganymed:~$ for script in walk?.py; do echo -n "$script: "; time python3 $script; echo ; done walk1.py: real 0m0,351s user 0m0,192s sys 0m0,160s walk2.py: real 0m1,208s user 0m0,883s sys 0m0,325s walk3.py: real 0m0,285s user 0m0,179s sys 0m0,106s
pathlib.Path.walk were added since Python 3.12: https://docs.python.org/3/library/pathli....Path.walk
This is similar to os.walk, but has some differences with the handling of symlinks.

The last example uses
Output:
Path.walk()
walk1.py
import os


for root, dirs, files in os.walk("/usr"):
    for file in files:
         ...
walk2.py
from pathlib import Path


for element in Path("/usr").rglob("*"):
    element.is_file()
walk3.py
from pathlib import Path


for root, dirs, files in Path("/usr").walk():
    for file in files:
        pass
Gribouillis likes this post
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Filer and sort files by modification time in a directory tester_V 6 220 May-02-2024, 05:39 PM
Last Post: tester_V
  [SOLVED] Loop through directories and files one level down? Winfried 3 237 Apr-28-2024, 02:31 PM
Last Post: Gribouillis
  File loop curiously skipping files - FIXED mbk34 10 884 Feb-10-2024, 07:08 AM
Last Post: buran
  uploading files from a ubuntu local directory to Minio storage container dchilambo 0 490 Dec-22-2023, 07:17 AM
Last Post: dchilambo
  change directory of save of python files akbarza 3 937 Jul-23-2023, 08:30 AM
Last Post: Gribouillis
  Using pyinstaller with .ui GUI files - No such file or directory error diver999 3 3,483 Jun-27-2023, 01:17 PM
Last Post: diver999
  Monitoring a Directory for new mkv and mp4 Files lastyle 3 1,684 May-07-2023, 12:33 PM
Last Post: deanhystad
  How to loop through all excel files and sheets in folder jadelola 1 4,557 Dec-01-2022, 06:12 PM
Last Post: deanhystad
  Read directory listing of files and parse out the highest number? cubangt 5 2,426 Sep-28-2022, 10:15 PM
Last Post: Larz60+
  How to save files in a separate directory Scordomaniac 3 1,989 Mar-16-2022, 10:17 AM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020