Skip to content

feat: Support overriding language ID of the text emitted#1837

Open
cqjjjzr wants to merge 1 commit intorime:masterfrom
cqjjjzr:commit-langid
Open

feat: Support overriding language ID of the text emitted#1837
cqjjjzr wants to merge 1 commit intorime:masterfrom
cqjjjzr:commit-langid

Conversation

@cqjjjzr
Copy link
Copy Markdown

@cqjjjzr cqjjjzr commented Apr 9, 2026

参照:

发现 weasel::Config 里面的字段其实是没有被用到的(可以被 style 替代),因此去掉并换成了 LANGID,主要是利用 TSF 在输入ITfRange 时可以能通过 GUID_PROP_LANGID 指定这段文本的语言的特性,覆盖掉由于当前键盘设置导致的使用 RIME 输入的其它语言文本被指定为中文而引起字体、拼写检查的错误。

增加了新的配置项,可能需要 document。

另外 pre-edit 文本闪烁的问题也解决了,如下图(注意到日文自动切换到了 Yu Mincho,而中文使用默认的等线)

image

@fxliang fxliang requested a review from Copilot April 10, 2026 14:30
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a mechanism to override the TSF language ID applied to committed / composing text, so apps can use the correct font/spellcheck language even when the active keyboard layout would otherwise force a different LANGID.

Changes:

  • Introduces a commit_langid config value transported over IPC and stored per session.
  • Applies GUID_PROP_LANGID on TSF ranges (composition start, inline preedit updates, and committed text insertion).
  • Removes reliance on the previously-unused Config::inline_preedit field and uses UI style instead.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
WeaselTSF/WeaselTSF.h Adds _SetRangeLanguage API and _textLangId storage for TSF language override.
WeaselTSF/EditSession.cpp Reads commit_langid from IPC response and switches inline-preedit decision to style.
WeaselTSF/DisplayAttribute.cpp Implements setting GUID_PROP_LANGID on a TSF range.
WeaselTSF/Composition.cpp Applies the language override to composition/preedit/commit ranges.
WeaselIPC/Configurator.cpp Parses config.commit_langid from IPC messages.
RimeWithWeasel/RimeWithWeasel.cpp Loads locale-based override from configs and emits config.commit_langid over IPC.
include/WeaselIPCData.h Updates IPC Config struct to carry commit_langid.
include/RimeWithWeasel.h Extends session status to store commit_langid and adds loader method declaration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread WeaselTSF/EditSession.cpp
Comment on lines 18 to 21
if (ok) {
bool inline_preedit = _cand->style().inline_preedit;
_textLangId = static_cast<LANGID>(config.commit_langid);
if (!commit.empty()) {
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_textLangId is stored as a mutable WeaselTSF member, but the actual language assignment happens later in separately requested (potentially async) edit sessions (_StartComposition/_InsertText/_ShowInlinePreedit). If multiple edit sessions are queued, _textLangId can be overwritten before those sessions run, causing the wrong LANGID to be applied to the range. Consider capturing the langid per edit-session instance (store it in the edit session object) or applying the GUID_PROP_LANGID value within the same edit session that sets the text.

Copilot uses AI. Check for mistakes.
Comment thread WeaselTSF/EditSession.cpp

if (ok) {
bool inline_preedit = _cand->style().inline_preedit;
_textLangId = static_cast<LANGID>(config.commit_langid);
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting config.commit_langid (int) to LANGID will silently truncate out-of-range values. Please validate/clamp to the valid LANGID range (0..0xFFFF) and treat invalid values as “no override” (or log) to avoid setting an unintended language ID.

Suggested change
_textLangId = static_cast<LANGID>(config.commit_langid);
if (config.commit_langid >= 0 && config.commit_langid <= 0xFFFF) {
_textLangId = static_cast<LANGID>(config.commit_langid);
}

Copilot uses AI. Check for mistakes.
Comment on lines +64 to +69
if (SUCCEEDED(pContext->GetProperty(GUID_PROP_LANGID, &pLangIdProperty))) {
VARIANT var;
var.vt = VT_I4;
var.lVal = _textLangId;
hr = pLangIdProperty->SetValue(ec, pRange, &var);
pLangIdProperty->Release();
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VARIANT var is used without VariantInit(). Even though only VT_I4 is assigned, initializing the VARIANT avoids uninitialized padding/fields and aligns with COM best practices (same pattern also exists in _SetCompositionDisplayAttributes).

Copilot uses AI. Check for mistakes.
Comment on lines 15 to 21
void Configurator::Store(Deserializer::KeyType const& key,
std::wstring const& value) {
if (!m_pTarget->p_context || key.size() < 2)
if (!m_pTarget->p_config || key.size() < 2)
return;
bool bool_value = (!value.empty() && value != L"0");
if (key[1] == L"inline_preedit") {
m_pTarget->p_config->inline_preedit = bool_value;
if (key[1] == L"commit_langid") {
m_pTarget->p_config->commit_langid = _wtoi(value.c_str());
}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing commit_langid with _wtoi provides no error reporting and accepts negative / out-of-range values, which then flow into TSF language property setting. Prefer a checked parse (e.g., wcstol with endptr) and clamp/reject values outside the valid LANGID range to keep IPC robust against malformed inputs.

Copilot uses AI. Check for mistakes.
if (!lcid)
return false;

*commit_langid = LANGIDFROMLCID(lcid);
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TryGetLangIdFromConfig writes the resulting LANGID into an int without any bounds/type enforcement. Since this value is later serialized over IPC and cast to LANGID, it’s safer to explicitly constrain it to 0..0xFFFF (and treat 0 as “not set”) before storing/sending.

Suggested change
*commit_langid = LANGIDFROMLCID(lcid);
const unsigned long langid = static_cast<unsigned long>(LANGIDFROMLCID(lcid));
if (langid == 0 || langid > 0xFFFFul)
return false;
*commit_langid = static_cast<int>(langid);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants