Skip to content

Use relative path in cache key#907

Open
ntkme wants to merge 1 commit intomasterfrom
relative-path-in-cache-key
Open

Use relative path in cache key#907
ntkme wants to merge 1 commit intomasterfrom
relative-path-in-cache-key

Conversation

@ntkme
Copy link
Copy Markdown
Collaborator

@ntkme ntkme commented Apr 21, 2026

This is an alternative solution to the issue described in #904 and #905.

The problem statement is that when use aws codebuild as self-hosted runner, the $GITHUB_WORKSPACE would change on every single build, and currently the cache key contains process.cwd(), thus it will always invalidate the cache.

The solution in this PR is to use a relative cwd computed from $GITHUB_WORKSPACE instead of the absolute cwd, so that it works consistently on github runner and other runners.

Note: This change will invalidate all caches on all repositories once when updating to this version due to updating the cache key, but after that cache should work again.

Copy link
Copy Markdown

@ZimbiX ZimbiX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work too! I like that this is automatic, but yeah, it might be surprising to invalidate the cache. Is GITHUB_WORKSPACE set for all builds or just on AWS CodeBuild?

@ntkme
Copy link
Copy Markdown
Collaborator Author

ntkme commented Apr 23, 2026

$GITHUB_WORKSPACE is a standard variable for GitHub hosted runner, and should be available on all self-hosted runners using GitHub provided action runtime. I never tried AWS code build runners, but from other similar issues I found on GitHub, it appears to be supported in the expected way.

@ntkme ntkme force-pushed the relative-path-in-cache-key branch from 26be17b to d83d8b8 Compare April 24, 2026 17:39
@dentarg
Copy link
Copy Markdown

dentarg commented Apr 25, 2026

Note: This change will break all caches on all repositories once when updating to this version due to updating the cache key, but after that cache should work again.

Should you have to opt-in to this new cache key behavior?

I think the report in #904 is the first one about this, in all the years the setup-ruby action have been around.

No strong opinion from my side, I don't know how conservative we need to be with cache key behavior changes.

@ntkme
Copy link
Copy Markdown
Collaborator Author

ntkme commented Apr 25, 2026

Should you have to opt-in to this new cache key behavior?

From git blame we can see the cache key has changed a few times in the past without any notice, warning, or opt-in.

While it would invalidate all existing caches once, it’s not really a breaking change as it requires no action from the users.

@eregon
Copy link
Copy Markdown
Member

eregon commented Apr 26, 2026

I think this has been discussed in the past, but I can't find it quickly right now.

The main risk here is the cache would not work and break, which is worse than having no cache, because installing gems is likely to embed absolute paths.
For example if some gem builds some native library and then a C extension links to it, it might link using an absolute path (in fact this is easier than linking using a relative path), and this would always break if CWD changes between runs.
Could also be some config file generated by gemspec.extensions = ... which embeds an absolute path, or even some file lazily created on first require of the gem, etc.

@eregon
Copy link
Copy Markdown
Member

eregon commented Apr 26, 2026

I asked ChatGPT to find some gems which could have that issue:

  • mysql2: doesn't have the issue as long as libmysql/libmariadb is installed under a stable path
  • red-arrow: I tried version 20.0.0. It seems fixed now (Unable to use red-arrow gem on Heroku/Ubuntu 20.04 (focal) apache/arrow#29672)
  • ffi: fine these days because it's a binary gems, and even without that it defaults to system libffi. I tried gem i --platform ruby ffi -- --disable-system-libffi as well and that seems to statically link libffi so still not a problem.

However there is something that looks problematic for all extensions built from source:

$ gem i json
$ readelf -d /home/eregon/.rubies/ruby-4.0.2/lib/ruby/gems/4.0.0/extensions/x86_64-linux/4.0.0/json-2.19.4/json/ext/parser.so

Dynamic section at offset 0xcdc8 contains 27 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libruby.so.4.0]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000001d (RUNPATH)            Library runpath: [/home/eregon/.rubies/ruby-4.0.2/lib]
...

That library runpath hardcodes the Ruby prefix and the $HOME, if that changes it would likely fail.
Not sure if the Ruby prefix changes with #904 or not, probably not since we force a given location for the CRuby prefix.

So on one hand it seems several such issues have been solved, but also there are a number of them and since for most users they would never notice it there might still be plenty of such issues in practice.

If we merged this, don't we risk getting reports about some extensions failing with the cache? I think we would, and in fact users wouldn't be able to reproduce locally, because no one changes their working dir between every run locally.

One of my motto for setup-ruby is: be as close as possible as a local development workflow, so it's easy to reproduce issues locally and minimize behavior changes in setup-ruby. Having a changing CWD is a big violation of that, so if there any way to solve this in AWS CodeBuild I think that is the better fix by far: #904 (comment)

@ntkme
Copy link
Copy Markdown
Collaborator Author

ntkme commented Apr 26, 2026

Not sure if the Ruby prefix changes

Ruby prefix is determined by setup-ruby action, unless we are talking about the system provided ruby. Even for system provided ruby, I don’t see a reason that will change as long as Ruby version isn’t changing , because that is usually provided by either an OS image or a container image. If we are worried about location of ruby to change, we should have full path to RUBY_PREFIX as part of the cache key. Location of system libraries changing due to environment being different can be a concern, but I think that is out of the scope of this project, because that’s out of our control. We can only hope that the runner image is as stable as possible.

In general, native gems may link with libruby.so and maybe a few system libraries. As long as the location of those libraries do not change, the linking will remain working even if the RPATH is hardcoded.

I think the only case thing would break is that if gem number one ships a “libone.so” and then gem number two compiles a “libtwo.so” that links to “libone.so”, but this can already be broken just by updating gem number one without recompiling gem number two so that’s just a bad design where gem number two is not self-contained, which can cause issue regardless of this PR or not. For any gem that is self-contained or only depends on the libruby and system libraries, there shouldn’t be any issue even if the cache restore directory changes.

If we merged this, don't we risk getting reports about some extensions failing with the cache?

For GitHub hosted runner or any runners that has a fixed location for $GITHUB_WORKSPACE, nothing would break as the logically meaning of CWD isn’t changing at all. This PR would only affect self-hosted runners with third party runtimes with dynamic $GITHUB_WORKSPACE.

If we really don’t want to change this, I think the best workaround for CodeBuild users would be that they can just create their own hardcoded working directory at the beginning of the workflow, and use the working directory option of setup ruby with a hardcoded value. The downside is that the user experience would be very poor, that all the subsequent steps would require a hardcoded working directory, too.

@eregon
Copy link
Copy Markdown
Member

eregon commented Apr 26, 2026

Also this change would affect other self-hosted runners which have varying CWD.
For example we had reports of people using self-hosted runners on macOS with different users, now they would share the cache and before they wouldn't, that would be bad if there is any error caused by sharing the cache.


I think the only case thing would break is that if gem number one ships a “libone.so” and then gem number two compiles a “libtwo.so” that links to “libone.so”,

The case I was thinking about is some gem compiles a shared library say libfoo.so (if it's a static library like libfoo.a it's fine), and then has an extension linking to it, using an absolute path. This is for instance the case for nokogiri linking libxml2, when using gem i --platform ruby nokogiri -- --disable-static:

$ readelf -d /home/eregon/.rubies/ruby-4.0.2/lib/ruby/gems/4.0.0/extensions/x86_64-linux/4.0.0/nokogiri-1.19.2/nokogiri/nokogiri.so

Dynamic section at offset 0x7bb08 contains 32 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libruby.so.4.0]
 0x0000000000000001 (NEEDED)             Shared library: [libexslt.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libxml2.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libxslt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [liblzma.so.5]
 0x0000000000000001 (NEEDED)             Shared library: [libz.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000001d (RUNPATH)            Library runpath: [/home/eregon/.rubies/ruby-4.0.2/lib:/home/eregon/.rubies/ruby-4.0.2/lib/ruby/gems/4.0.0/gems/nokogiri-1.19.2/ports/x86_64-linux/libxml2/2.13.9/lib:/home/eregon/.rubies/ruby-4.0.2/lib/ruby/gems/4.0.0/gems/nokogiri-1.19.2/ports/x86_64-linux/libxslt/1.1.43/lib:/home/eregon/.rubies/ruby-4.0.2/lib/ruby/gems/4.0.0/gems/nokogiri-1.19.2/ext/nokogiri/ports/x86_64-linux/libgumbo/1.0.0-nokogiri/lib]

And it gets worse, because if the directories in Library runpath don't exist it might not error early if there are system libraries of the same name, but it will be the wrong version and that could segfault, etc.
It's quite a edge case because one needs to both use --platform ruby (so not the precompiled gem) and --disable-static to hit it, but I wouldn't be surprised if there are gems which hit this case without any special configuration.

@eregon
Copy link
Copy Markdown
Member

eregon commented Apr 26, 2026

@ZimbiX Could you test this PR (- uses: ruby/setup-ruby@relative-path-in-cache-key) on the repository with the most gems you have and report whether it works or what's the error?

I think it's too risky to have this by default, but I suppose it could be an option, and then we can document in action.yml it's a risky thing.

@ntkme
Copy link
Copy Markdown
Collaborator Author

ntkme commented Apr 26, 2026

As for nokogiri case mentioned above, using '-Wl,-rpath,$ORIGIN/relative/path' instead of absolute runtime path during compilation can solve the issue, as it makes the complete package relocate-able as long as the relative structure remains stable.

@eregon
Copy link
Copy Markdown
Member

eregon commented Apr 26, 2026

Yes, if that's done and the variant on macOS too it's probably fine but as we see it's not always done for such cases.

I could also imagine something like -DCONFIG_FILE=<some absolute path> (e.g. set by extconf.rb) during compilation and that'd fail too when moved.

I guess we could also be optimistic and just do this behavior by default, and if it proves problematic then introduce the option and have it false by default. But that might be considered breaking and anyway I want these two things before making my decision or spending more time discussing:

@eregon
Copy link
Copy Markdown
Member

eregon commented Apr 26, 2026

Ruby prefix is determined by setup-ruby action

Not quite, see

rubyPrefix = path.join(os.homedir(), '.rubies', `${engine}-${version}`)
which depends on $HOME, that gets used when this is not true:
return (engine === 'ruby' && !isHeadVersion(version)) || isSelfHostedRunner()

so on ruby-head or non-CRuby.

@ZimbiX
Copy link
Copy Markdown

ZimbiX commented Apr 28, 2026

@eregon I'm happy to report that this PR does work on our monolith (195 gems) with no errors upon cache restoration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bundler cache not working due to ephemeral workdir

4 participants