Skip to content

Commit 997fd0c

Browse files
committed
SARIF reporter
SARIF is a unified file format used to exchange information between static analysis tools (like pylint) and various types of formatters, meta-runners, broadcasters / alert system, ... This implementation is ad-hoc, and non-validating. Spec v Github ------------- Turns out Github both doesn't implement all of SARIF (which makes sense) and requires a bunch of properties which the spec considers optional. The [official SARIF validator][] (linked to by both oasis and github) was used to validate the output of the reporter, ensuring that all the github requirements it flags are fulfilled, and fixing *some* of the validator's pet issues. As of now the following issues are left unaddressed: - azure requires `run.automationDetails`, looking at the spec I don't think it makes sense for the reporter to inject that, it's more up to the CI - the validator wants a `run.versionControlProvenance`, same as above - the validator wants rule names in PascalCase, lol - the validator wants templated result messages, but without pylint providing the args as part of the `Message` that's a bit of a chore - the validator wants `region` to include a snippet (the flagged content) - the validator wants `physicalLocation` to have a `contextRegion` (most likely with a snippet) On URIs ------- The reporter makes use of URIs for artifacts (~files). Per ["guidance on the use of artifactLocation objects"][3.4.7], `uri` *should* capture the deterministic part of the artifact location and `uriBaseId` *should* capture the non-deterministic part. However as far as I can tell pylint has no requirement (and no clean way to require) consistent resolution roots: `path` is just relative to the cwd, and there is no requirement to have project-level files to use pylint. This makes the use of relative uris dodgy, but absolute uris are pretty much always broken for the purpose of *interchange* so they're not really any better. As a side-note, Github [asserts][relative-uri-guidance] > While this [nb: `originalUriBaseIds`] is not required by GitHub for > the code scanning results to be displayed correctly, it is required > to produce a valid SARIF output when using relative URI references. However per [3.4.4][] this is incorrect, the `uriBaseId` can be resolved through end-user configuration, `originalUriBaseIds`, external information (e.g. envvars), or heuristics. It would be nice to document the "relative root" via `originalUriBaseIds` (which may be omitted for that purpose per [3.14.14][], but per the above claiming a consistent project root is dodgy. We *could* resolve known project files (e.g. pyproject.toml, tox.ini, etc...) in order to find a consistent root (project root, repo root, ...) and set / use that for relative URIs but that's a lot of additional complexity which I'm not sure is warranted at least for a first version. Fixes #5493 [3.4.4]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540869 [3.4.7]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540872 [3.14.14]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540936 [relative-uri-guidance]: https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning#relative-uri-guidance-for-sarif-producers [official SARIF validator]: https://sarifweb.azurewebsites.net/
1 parent 7588243 commit 997fd0c

File tree

6 files changed

+411
-1
lines changed

6 files changed

+411
-1
lines changed
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Support for SARIF as an output format.
2+
3+
Closes #5493
4+
Closes #10647

pylint/lint/base_options.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,8 @@ def _make_linter_options(linter: PyLinter) -> Options:
104104
"group": "Reports",
105105
"help": "Set the output format. Available formats are: 'text', "
106106
"'parseable', 'colorized', 'json2' (improved json format), 'json' "
107-
"(old json format), msvs (visual studio) and 'github' (GitHub actions). "
107+
"(old json format), msvs (visual studio), 'github' (GitHub actions), "
108+
"and 'sarif'. "
108109
"You can also give a reporter class, e.g. mypackage.mymodule."
109110
"MyReporterClass.",
110111
"kwargs": {"linter": linter},

pylint/reporters/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
from pylint.reporters.json_reporter import JSON2Reporter, JSONReporter
1515
from pylint.reporters.multi_reporter import MultiReporter
1616
from pylint.reporters.reports_handler_mix_in import ReportsHandlerMixIn
17+
from pylint.reporters.sarif_reporter import SARIFReporter
1718

1819
if TYPE_CHECKING:
1920
from pylint.lint.pylinter import PyLinter
@@ -31,4 +32,5 @@ def initialize(linter: PyLinter) -> None:
3132
"JSONReporter",
3233
"MultiReporter",
3334
"ReportsHandlerMixIn",
35+
"SARIFReporter",
3436
]

pylint/reporters/sarif_reporter.py

Lines changed: 278 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,278 @@
1+
# Licensed under the GPL: https://www.gnu.org/licenses/old-licenses/gpl-2.0.html
2+
# For details: https://github.com/pylint-dev/pylint/blob/main/LICENSE
3+
# Copyright (c) https://github.com/pylint-dev/pylint/blob/main/CONTRIBUTORS.txt
4+
5+
# pylint: disable=wrong-spelling-in-comment
6+
7+
from __future__ import annotations
8+
9+
import json
10+
import os.path
11+
from textwrap import shorten
12+
from typing import TYPE_CHECKING, Literal, TypedDict
13+
from urllib.parse import quote
14+
15+
import pylint
16+
import pylint.message
17+
from pylint.constants import MSG_TYPES
18+
from pylint.reporters import BaseReporter
19+
20+
if TYPE_CHECKING:
21+
from pylint.lint import PyLinter
22+
from pylint.reporters.ureports.nodes import Section
23+
24+
25+
def register(linter: PyLinter) -> None:
26+
linter.register_reporter(SARIFReporter)
27+
28+
29+
class SARIFReporter(BaseReporter):
30+
name = "sarif"
31+
extension = "sarif"
32+
linter: PyLinter
33+
34+
def display_reports(self, layout: Section) -> None:
35+
"""Don't do anything in this reporter."""
36+
37+
def _display(self, layout: Section) -> None:
38+
"""Do nothing."""
39+
40+
def display_messages(self, layout: Section | None) -> None:
41+
"""Launch layouts display."""
42+
output: Log = {
43+
"version": "2.1.0",
44+
"$schema": "https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/schemas/sarif-schema-2.1.0.json",
45+
"runs": [
46+
{
47+
"tool": {
48+
"driver": {
49+
"name": "pylint",
50+
"fullName": f"pylint {pylint.__version__}",
51+
"version": pylint.__version__,
52+
# should be versioned but not all versions are kept so...
53+
"informationUri": "https://pylint.readthedocs.io/",
54+
"rules": [
55+
{
56+
"id": m.msgid,
57+
"name": m.symbol,
58+
"deprecatedIds": [
59+
msgid for msgid, _ in m.old_names
60+
],
61+
"deprecatedNames": [
62+
name for _, name in m.old_names
63+
],
64+
# per 3.19.19 shortDescription should be a
65+
# single sentence which can't be guaranteed,
66+
# however github requires it...
67+
"shortDescription": {
68+
"text": m.description.split(".", 1)[0]
69+
},
70+
# github requires that this is less than 1024 characters
71+
"fullDescription": {
72+
"text": shorten(
73+
m.description, 1024, placeholder="..."
74+
)
75+
},
76+
"help": {"text": m.format_help()},
77+
"helpUri": f"https://pylint.readthedocs.io/en/stable/user_guide/messages/{MSG_TYPES[m.msgid[0]]}/{m.symbol}.html",
78+
# handle_message only gets the formatted message,
79+
# so to use `messageStrings` we'd need to
80+
# convert the templating and extract the args
81+
# out of the msg
82+
}
83+
for checker in self.linter.get_checkers()
84+
for m in checker.messages
85+
if m.symbol in self.linter.stats.by_msg
86+
],
87+
}
88+
},
89+
"results": [self.serialize(message) for message in self.messages],
90+
}
91+
],
92+
}
93+
json.dump(output, self.out)
94+
95+
@staticmethod
96+
def serialize(message: pylint.message.Message) -> Result:
97+
region: Region = {
98+
"startLine": message.line,
99+
"startColumn": message.column + 1,
100+
"endLine": message.end_line or message.line,
101+
"endColumn": (message.end_column or message.column) + 1,
102+
}
103+
104+
location: Location = {
105+
"physicalLocation": {
106+
"artifactLocation": {
107+
"uri": path_to_uri(message.path),
108+
},
109+
"region": region,
110+
},
111+
}
112+
if message.obj:
113+
logical_location: LogicalLocation = {
114+
"name": message.obj,
115+
"fullyQualifiedName": f"{message.module}.{message.obj}",
116+
}
117+
location["logicalLocations"] = [logical_location]
118+
119+
return {
120+
"ruleId": message.msg_id,
121+
"message": {"text": message.msg},
122+
"level": CATEGORY_MAP[message.category],
123+
"locations": [location],
124+
"partialFingerprints": {
125+
# encoding the node path seems like it would be useful to dedup alerts?
126+
"nodePath/v1": "",
127+
},
128+
}
129+
130+
131+
def path_to_uri(path: str) -> str:
132+
"""Converts a relative FS path to a relative URI.
133+
134+
Does not check the
135+
validity of the path.
136+
137+
An alternative would be to use `Path.as_uri` (on concrete path) on both the
138+
artifact path and a reference path, then create a relative URI from this.
139+
"""
140+
if os.path.altsep:
141+
path = path.replace(os.path.altsep, "/")
142+
if os.path.sep != "/":
143+
path = path.replace(os.path.sep, "/")
144+
return quote(path)
145+
146+
147+
CATEGORY_MAP: dict[str, ResultLevel] = {
148+
"convention": "note",
149+
"refactor": "note",
150+
"statement": "note",
151+
"info": "note",
152+
"warning": "warning",
153+
"error": "error",
154+
"fatal": "error",
155+
}
156+
157+
158+
class Run(TypedDict):
159+
tool: Tool
160+
# invocation parameters / environment for the tool
161+
# invocation: list[Invocations]
162+
results: list[Result]
163+
# originalUriBaseIds: dict[str, ArtifactLocation]
164+
165+
166+
Log = TypedDict(
167+
"Log",
168+
{
169+
"version": Literal["2.1.0"],
170+
"$schema": Literal[
171+
"https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/schemas/sarif-schema-2.1.0.json"
172+
],
173+
"runs": list[Run],
174+
},
175+
)
176+
177+
178+
class Tool(TypedDict):
179+
driver: Driver
180+
181+
182+
class Driver(TypedDict):
183+
name: Literal["pylint"]
184+
# optional but azure wants it
185+
fullName: str
186+
version: str
187+
informationUri: str # not required but validator wants it
188+
rules: list[ReportingDescriptor]
189+
190+
191+
class ReportingDescriptorOpt(TypedDict, total=False):
192+
deprecatedIds: list[str]
193+
deprecatedNames: list[str]
194+
messageStrings: dict[str, MessageString]
195+
196+
197+
class ReportingDescriptor(ReportingDescriptorOpt):
198+
id: str
199+
# optional but validator really wants it (then complains that it's not pascal cased)
200+
name: str
201+
# not required per spec but required by github
202+
shortDescription: MessageString
203+
fullDescription: MessageString
204+
help: MessageString
205+
helpUri: str
206+
207+
208+
class MarkdownMessageString(TypedDict, total=False):
209+
markdown: str
210+
211+
212+
class MessageString(MarkdownMessageString):
213+
text: str
214+
215+
216+
ResultLevel = Literal["none", "note", "warning", "error"]
217+
218+
219+
class ResultOpt(TypedDict, total=False):
220+
ruleId: str
221+
ruleIndex: int
222+
223+
level: ResultLevel
224+
225+
226+
class Result(ResultOpt):
227+
message: Message
228+
# not required per spec but required by github
229+
locations: list[Location]
230+
partialFingerprints: dict[str, str]
231+
232+
233+
class Message(TypedDict, total=False):
234+
# needs to have either text or id but it's a PITA to type
235+
236+
#: plain text message string (can have markdown links but no other formatting)
237+
text: str
238+
#: formatted GFM text
239+
markdown: str
240+
#: rule id
241+
id: str
242+
#: arguments for templated rule messages
243+
arguments: list[str]
244+
245+
246+
class Location(TypedDict, total=False):
247+
physicalLocation: PhysicalLocation # actually required by github
248+
logicalLocations: list[LogicalLocation]
249+
250+
251+
class PhysicalLocation(TypedDict):
252+
artifactLocation: ArtifactLocation
253+
# not required per spec, required by github
254+
region: Region
255+
256+
257+
class ArtifactLocation(TypedDict, total=False):
258+
uri: str
259+
#: id of base URI for resolving relative `uri`
260+
uriBaseId: str
261+
description: Message
262+
263+
264+
class LogicalLocation(TypedDict, total=False):
265+
name: str
266+
fullyQualifiedName: str
267+
#: schema is `str` with a bunch of *suggested* terms, of which this is a subset
268+
kind: Literal[
269+
"function", "member", "module", "parameter", "returnType", "type", "variable"
270+
]
271+
272+
273+
class Region(TypedDict):
274+
# none required per spec, all required by github
275+
startLine: int
276+
startColumn: int
277+
endLine: int
278+
endColumn: int

tests/lint/unittest_expand_modules.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,14 @@ def test__is_in_ignore_list_re_match() -> None:
151151
"name": "reporters.unittest_reporting",
152152
"isignored": False,
153153
},
154+
str(REPORTERS_PATH / "unittest_sarif_reporter.py"): {
155+
"basename": "reporters",
156+
"basepath": str(REPORTERS_PATH / "__init__.py"),
157+
"isarg": False,
158+
"path": str(REPORTERS_PATH / "unittest_sarif_reporter.py"),
159+
"name": "reporters.unittest_sarif_reporter",
160+
"isignored": False,
161+
},
154162
}
155163

156164

0 commit comments

Comments
 (0)